feat: home-gateway 초기 구성 — Mac mini에서 GPU 서버로 전면 이전

OrbStack 라이선스 만료로 Mac mini Docker 서비스를 GPU 서버로 통합.
nginx → Caddy 전환, 12개 서브도메인 자동 HTTPS, fail2ban Caddy JSON 연동.

주요 변경:
- home-caddy: Caddy 리버스 프록시 (Let's Encrypt 자동 HTTPS)
- home-fail2ban: Caddy JSON 로그 기반 보안 모니터링
- home-ddns: Cloudflare DDNS (API 키 .env 분리)
- gpu-hub-api/web: AI 백엔드 라우터 + 웹 UI (gpu-services에서 이전)
- AI 런타임(Ollama) 내부망 전용, 외부는 gpu-hub 인증 게이트웨이 경유

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-05 04:55:28 +00:00
commit 79c09cede4
52 changed files with 6847 additions and 0 deletions

17
.gitignore vendored Normal file
View File

@@ -0,0 +1,17 @@
# Secrets
.env
ddns/.env
# Runtime data
caddy/logs/
fail2ban/data/
docker-compose.test.yml
# Node
hub-web/node_modules/
hub-web/dist/
# Python
hub-api/__pycache__/
hub-api/*.pyc
hub-api/.venv/

68
README.md Normal file
View File

@@ -0,0 +1,68 @@
# home-gateway
홈 네트워크 통합 게이트웨이. GPU 서버(192.168.1.186)에서 운영.
## 구성
| 컨테이너 | 역할 |
|----------|------|
| home-caddy | Caddy 리버스 프록시 (80/443, 자동 HTTPS) |
| home-fail2ban | Caddy JSON 로그 기반 보안 모니터링 |
| home-ddns-vpn | Cloudflare DDNS (vpn.hyungi.net) |
| home-ddns-mail | Cloudflare DDNS (mail.hyungi.net) |
| gpu-hub-api | AI 백엔드 라우터 (인증 게이트웨이) |
| gpu-hub-web | AI 허브 웹 UI |
## 라우팅 대상
### GPU 서버 (로컬)
- `komga.hyungi.net` → :25600
- `document.hyungi.net` → :8080 (Document Server 내부 Caddy)
- `ai.hyungi.net` → gpu-hub-api (인증된 외부 AI 접근)
### NAS (192.168.1.227)
- `ds1525.hyungi.net` → :5000 (DSM)
- `webdav.hyungi.net` → :5006 (WebDAV)
- `git.hyungi.net` → :10300 (Gitea)
- `vault.hyungi.net` → :8443 (Vaultwarden)
- `link.hyungi.net` → :10002 (Synology Drive)
- `mailplus.hyungi.net` → :21680 (MailPlus)
- `contacts.hyungi.net` → :25555 (Contacts)
- `calendar.hyungi.net` → :20002 (Calendar)
- `note.hyungi.net` → :9350 (Note Station)
### Mac mini (192.168.1.122)
- `jellyfin.hyungi.net` → :8096
## AI 접근 정책
- Ollama/AI 런타임: 내부망 전용 (127.0.0.1:11434)
- 외부 AI: gpu-hub-api 인증 게이트웨이를 통해서만 접근
- `gpu.hyungi.net`: 폐기 (내부망/Tailscale 전용)
## 디렉토리 구조
```
home-gateway/
├── docker-compose.yml
├── backends.json # gpu-hub AI 백엔드 설정
├── caddy/
│ ├── Caddyfile # 리버스 프록시 설정 (12개 서브도메인)
│ └── logs/ # Caddy JSON 로그 (fail2ban 연동)
├── fail2ban/
│ ├── jail.local
│ └── data/filter.d/ # Caddy용 커스텀 필터
├── ddns/
│ └── .env # Cloudflare API 키
├── hub-api/ # GPU Hub FastAPI 백엔드
└── hub-web/ # GPU Hub React 프론트엔드
```
## 관련 독립 서비스 (별도 compose)
- `~/qdrant/` — Qdrant 벡터 DB (127.0.0.1:6333)
- `~/ollama/` — Ollama GPU 추론 (127.0.0.1:11434)
## 마이그레이션 이력
- 2026-04-05: Mac mini (OrbStack) → GPU 서버 전면 이전
- nginx → Caddy 통합
- Let's Encrypt 수동 관리 → Caddy 자동 HTTPS
- Cloudflare DDNS API 키 .env 분리
- fail2ban nginx 필터 → Caddy JSON 필터 전환

22
backends.json Normal file
View File

@@ -0,0 +1,22 @@
[
{
"id": "ollama-gpu",
"type": "ollama",
"url": "http://host.docker.internal:11434",
"models": [
{ "id": "bge-m3", "capabilities": ["embed"], "priority": 1 }
],
"access": "all",
"rate_limit": null
},
{
"id": "mlx-mac",
"type": "openai-compat",
"url": "http://192.168.1.122:8800",
"models": [
{ "id": "qwen3.5:35b-a3b", "backend_model_id": "mlx-community/Qwen3.5-35B-A3B-4bit", "capabilities": ["chat"], "priority": 1 }
],
"access": "all",
"rate_limit": null
}
]

168
caddy/Caddyfile Normal file
View File

@@ -0,0 +1,168 @@
{
# Global options
log default {
output file /var/log/caddy/access.log {
roll_size 100MiB
roll_keep 5
}
format json
}
servers {
trusted_proxies static 173.245.48.0/20 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2606:4700::/32 2803:f800::/32 2405:b500::/32 2405:8100::/32 2a06:98c0::/29 2c0f:f248::/32
}
}
# ============================================================
# GPU Hub — default route (direct IP access, no HTTPS)
# ============================================================
:80 {
handle /v1/* {
reverse_proxy gpu-hub-api:8000 {
flush_interval -1
}
}
handle /auth/* {
reverse_proxy gpu-hub-api:8000
}
handle /health {
reverse_proxy gpu-hub-api:8000
}
handle /health/* {
reverse_proxy gpu-hub-api:8000
}
handle /gpu {
reverse_proxy gpu-hub-api:8000
}
handle {
reverse_proxy gpu-hub-web:80
}
}
# ============================================================
# AI Gateway — authenticated external access
# ============================================================
ai.hyungi.net {
reverse_proxy gpu-hub-api:8000 {
flush_interval -1
}
}
# ============================================================
# Jellyfin — Mac mini (192.168.1.122)
# ============================================================
jellyfin.hyungi.net {
reverse_proxy 192.168.1.122:8096 {
transport http {
read_timeout 300s
write_timeout 300s
}
}
}
# ============================================================
# Komga — GPU local
# ============================================================
komga.hyungi.net {
reverse_proxy host.docker.internal:25600
}
# ============================================================
# Document Server — GPU local (via internal Caddy, Phase 6에서 직접 라우팅 전환)
# ============================================================
document.hyungi.net {
request_body {
max_size 100MB
}
reverse_proxy host.docker.internal:8080
}
# ============================================================
# WebDAV — NAS (192.168.1.227)
# ============================================================
webdav.hyungi.net {
request_body {
max_size 2GB
}
reverse_proxy https://192.168.1.227:5006 {
transport http {
tls_insecure_skip_verify
read_timeout 600s
write_timeout 600s
}
header_up Host {host}
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Proto {scheme}
}
}
# ============================================================
# DSM — NAS
# ============================================================
ds1525.hyungi.net {
request_body {
max_size 0
}
reverse_proxy 192.168.1.227:5000
}
# ============================================================
# Gitea — NAS
# ============================================================
git.hyungi.net {
request_body {
max_size 512MB
}
reverse_proxy 192.168.1.227:10300
}
# ============================================================
# Vaultwarden — NAS (WebSocket)
# ============================================================
vault.hyungi.net {
reverse_proxy 192.168.1.227:8443
}
# ============================================================
# Synology Drive — NAS (WebSocket, unlimited upload)
# ============================================================
link.hyungi.net {
request_body {
max_size 0
}
reverse_proxy 192.168.1.227:10002
}
# ============================================================
# MailPlus — NAS
# ============================================================
mailplus.hyungi.net {
request_body {
max_size 100MB
}
reverse_proxy 192.168.1.227:21680
}
# ============================================================
# Contacts — NAS
# ============================================================
contacts.hyungi.net {
reverse_proxy 192.168.1.227:25555
}
# ============================================================
# Calendar — NAS
# ============================================================
calendar.hyungi.net {
reverse_proxy 192.168.1.227:20002
}
# ============================================================
# Note Station — NAS (WebSocket, unlimited upload)
# ============================================================
note.hyungi.net {
request_body {
max_size 0
}
reverse_proxy 192.168.1.227:9350
}

105
docker-compose.yml Normal file
View File

@@ -0,0 +1,105 @@
services:
# ============================================================
# Edge Layer — Reverse Proxy + Security + DDNS
# ============================================================
home-caddy:
image: caddy:2-alpine
container_name: home-caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
- "443:443/udp"
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
- ./caddy/logs:/var/log/caddy
- caddy_data:/data
- caddy_config:/config
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
gpu-hub-api:
condition: service_healthy
networks:
- gateway-net
home-fail2ban:
image: crazymax/fail2ban:latest
container_name: home-fail2ban
restart: unless-stopped
network_mode: host
cap_add:
- NET_ADMIN
- NET_RAW
volumes:
- ./fail2ban/data:/data
- ./caddy/logs:/var/log/caddy:ro
- ./fail2ban/jail.local:/etc/fail2ban/jail.local:ro
environment:
- TZ=Asia/Seoul
- F2B_LOG_LEVEL=INFO
home-ddns-vpn:
image: oznu/cloudflare-ddns:latest
container_name: home-ddns-vpn
restart: unless-stopped
env_file:
- ./ddns/.env
environment:
- ZONE=hyungi.net
- SUBDOMAIN=vpn
- PROXIED=false
home-ddns-mail:
image: oznu/cloudflare-ddns:latest
container_name: home-ddns-mail
restart: unless-stopped
env_file:
- ./ddns/.env
environment:
- ZONE=hyungi.net
- SUBDOMAIN=mail
- PROXIED=false
# ============================================================
# GPU Hub — AI Backend Router + Web UI
# ============================================================
gpu-hub-api:
build: ./hub-api
container_name: gpu-hub-api
restart: unless-stopped
environment:
- OWNER_PASSWORD=${OWNER_PASSWORD}
- GUEST_PASSWORD=${GUEST_PASSWORD}
- JWT_SECRET=${JWT_SECRET}
- BACKENDS_CONFIG=/app/config/backends.json
- CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
- DB_PATH=/app/data/gateway.db
volumes:
- hub_data:/app/data
- ./backends.json:/app/config/backends.json:ro
extra_hosts:
- "host.docker.internal:host-gateway"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 15s
timeout: 5s
retries: 3
networks:
- gateway-net
gpu-hub-web:
build: ./hub-web
container_name: gpu-hub-web
restart: unless-stopped
networks:
- gateway-net
volumes:
caddy_data:
caddy_config:
hub_data:
networks:
gateway-net:
name: home-gateway-network

View File

@@ -0,0 +1,4 @@
[Definition]
failregex = ^.*"client_ip":"<HOST>".*"status":\s*401.*$
ignoreregex =
datepattern = "ts":{EPOCH}

View File

@@ -0,0 +1,4 @@
[Definition]
failregex = ^.*"client_ip":"<HOST>".*"status":\s*(403|404|444).*$
ignoreregex =
datepattern = "ts":{EPOCH}

27
fail2ban/jail.local Normal file
View File

@@ -0,0 +1,27 @@
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 5
backend = auto
enabled = false
[sshd]
enabled = false
# Caddy 봇/스캐너 차단 (404/403 반복)
[caddy-botsearch]
enabled = true
port = 80,443
filter = caddy-botsearch
logpath = /var/log/caddy/access.log
maxretry = 2
bantime = 86400
# Caddy 인증 실패 차단 (401 반복)
[caddy-auth]
enabled = true
port = 80,443
filter = caddy-auth
logpath = /var/log/caddy/access.log
maxretry = 3
bantime = 1800

16
hub-api/Dockerfile Normal file
View File

@@ -0,0 +1,16 @@
FROM python:3.12-slim
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN mkdir -p /app/data
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

21
hub-api/config.py Normal file
View File

@@ -0,0 +1,21 @@
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
owner_password: str = "changeme"
guest_password: str = "guest"
jwt_secret: str = "dev-secret-change-in-production"
jwt_algorithm: str = "HS256"
jwt_expire_hours: int = 24
backends_config: str = "/app/config/backends.json"
cors_origins: str = "http://localhost:5173"
nvidia_smi_path: str = "/usr/bin/nvidia-smi"
db_path: str = "/app/data/gateway.db"
model_config = {"env_file": ".env", "extra": "ignore"}
settings = Settings()

0
hub-api/db/__init__.py Normal file
View File

50
hub-api/db/database.py Normal file
View File

@@ -0,0 +1,50 @@
import aiosqlite
from config import settings
SCHEMA = """
CREATE TABLE IF NOT EXISTS chat_sessions (
id TEXT PRIMARY KEY,
title TEXT,
model TEXT NOT NULL,
role TEXT NOT NULL DEFAULT 'guest',
created_at REAL NOT NULL
);
CREATE TABLE IF NOT EXISTS chat_messages (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL REFERENCES chat_sessions(id),
role TEXT NOT NULL,
content TEXT NOT NULL,
created_at REAL NOT NULL
);
CREATE TABLE IF NOT EXISTS usage_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
backend_id TEXT NOT NULL,
model TEXT NOT NULL,
prompt_tokens INTEGER DEFAULT 0,
completion_tokens INTEGER DEFAULT 0,
latency_ms REAL DEFAULT 0,
user_role TEXT NOT NULL DEFAULT 'guest',
created_at REAL NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_messages_session ON chat_messages(session_id);
CREATE INDEX IF NOT EXISTS idx_usage_created ON usage_logs(created_at);
"""
async def init_db():
"""Initialize SQLite database with WAL mode and schema."""
async with aiosqlite.connect(settings.db_path) as db:
await db.execute("PRAGMA journal_mode=WAL")
await db.executescript(SCHEMA)
await db.commit()
async def get_db() -> aiosqlite.Connection:
"""Get a database connection."""
db = await aiosqlite.connect(settings.db_path)
await db.execute("PRAGMA journal_mode=WAL")
return db

2
hub-api/db/models.py Normal file
View File

@@ -0,0 +1,2 @@
# DB model helpers — used in Phase 3 for logging
# Schema defined in database.py

46
hub-api/main.py Normal file
View File

@@ -0,0 +1,46 @@
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from config import settings
from middleware.auth import AuthMiddleware
from routers import auth, chat, embeddings, gpu, health, models
from services.registry import registry
@asynccontextmanager
async def lifespan(app: FastAPI):
await registry.load_backends(settings.backends_config)
registry.start_health_loop()
yield
registry.stop_health_loop()
app = FastAPI(
title="AI Gateway",
version="0.1.0",
lifespan=lifespan,
)
app.add_middleware(
CORSMiddleware,
allow_origins=settings.cors_origins.split(","),
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app.add_middleware(AuthMiddleware)
app.include_router(auth.router)
app.include_router(chat.router)
app.include_router(models.router)
app.include_router(embeddings.router)
app.include_router(health.router)
app.include_router(gpu.router)
@app.get("/")
async def root():
return {"service": "AI Gateway", "version": "0.1.0"}

View File

View File

@@ -0,0 +1,96 @@
from __future__ import annotations
import time
from jose import JWTError, jwt
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from config import settings
# Paths that don't require authentication
PUBLIC_PATHS = {"/", "/health", "/auth/login", "/docs", "/openapi.json"}
PUBLIC_PREFIXES = ("/health/",)
class AuthMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
path = request.url.path
# Skip auth for public paths
if path in PUBLIC_PATHS or any(path.startswith(p) for p in PUBLIC_PREFIXES):
request.state.role = "anonymous"
return await call_next(request)
# Skip auth for OPTIONS (CORS preflight)
if request.method == "OPTIONS":
return await call_next(request)
# Try Bearer token first, then cookie
token = _extract_token(request)
if not token:
request.state.role = "anonymous"
return await call_next(request)
# Verify JWT
payload = _verify_token(token)
if payload:
request.state.role = payload.get("role", "guest")
else:
request.state.role = "anonymous"
return await call_next(request)
def create_token(role: str) -> str:
payload = {
"role": role,
"exp": time.time() + settings.jwt_expire_hours * 3600,
"iat": time.time(),
}
return jwt.encode(payload, settings.jwt_secret, algorithm=settings.jwt_algorithm)
def _extract_token(request: Request) -> str | None:
# 1. Authorization: Bearer header
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer "):
return auth_header[7:]
# 2. httpOnly cookie
return request.cookies.get("token")
def _verify_token(token: str) -> dict | None:
try:
payload = jwt.decode(
token, settings.jwt_secret, algorithms=[settings.jwt_algorithm]
)
if payload.get("exp", 0) < time.time():
return None
return payload
except JWTError:
return None
# Login rate limiting (IP-based)
_login_attempts: dict[str, list[float]] = {}
MAX_ATTEMPTS = 5
LOCKOUT_SECONDS = 60
def check_login_rate_limit(ip: str) -> bool:
"""Returns True if login is allowed for this IP."""
now = time.time()
attempts = _login_attempts.get(ip, [])
# Clean old attempts
attempts = [t for t in attempts if now - t < LOCKOUT_SECONDS]
_login_attempts[ip] = attempts
return len(attempts) < MAX_ATTEMPTS
def record_login_attempt(ip: str):
now = time.time()
if ip not in _login_attempts:
_login_attempts[ip] = []
_login_attempts[ip].append(now)

View File

@@ -0,0 +1,18 @@
from fastapi import HTTPException
from services.registry import registry
def check_backend_rate_limit(backend_id: str):
"""Raise 429 if rate limit exceeded for this backend."""
if not registry.check_rate_limit(backend_id):
raise HTTPException(
status_code=429,
detail={
"error": {
"message": f"Rate limit exceeded for backend '{backend_id}'",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
}
},
)

7
hub-api/requirements.txt Normal file
View File

@@ -0,0 +1,7 @@
fastapi==0.115.0
uvicorn[standard]==0.30.0
httpx==0.27.0
pydantic-settings==2.5.0
python-jose[cryptography]==3.3.0
python-multipart==0.0.9
aiosqlite==0.20.0

View File

79
hub-api/routers/auth.py Normal file
View File

@@ -0,0 +1,79 @@
from fastapi import APIRouter, Request, Response
from pydantic import BaseModel
from config import settings
from middleware.auth import (
check_login_rate_limit,
create_token,
record_login_attempt,
)
router = APIRouter(prefix="/auth", tags=["auth"])
class LoginRequest(BaseModel):
password: str
class LoginResponse(BaseModel):
role: str
token: str
@router.post("/login")
async def login(body: LoginRequest, request: Request, response: Response):
ip = request.client.host if request.client else "unknown"
if not check_login_rate_limit(ip):
return _error_response(429, "Too many login attempts. Try again in 1 minute.")
record_login_attempt(ip)
if body.password == settings.owner_password:
role = "owner"
elif body.password == settings.guest_password:
role = "guest"
else:
return _error_response(401, "Invalid password")
token = create_token(role)
# Set httpOnly cookie for web UI
response.set_cookie(
key="token",
value=token,
httponly=True,
samesite="lax",
max_age=settings.jwt_expire_hours * 3600,
)
return LoginResponse(role=role, token=token)
@router.get("/me")
async def me(request: Request):
role = getattr(request.state, "role", "anonymous")
if role == "anonymous":
return _error_response(401, "Not authenticated")
return {"role": role}
@router.post("/logout")
async def logout(response: Response):
response.delete_cookie("token")
return {"ok": True}
def _error_response(status_code: int, message: str):
from fastapi.responses import JSONResponse
return JSONResponse(
status_code=status_code,
content={
"error": {
"message": message,
"type": "auth_error",
"code": f"auth_{status_code}",
}
},
)

112
hub-api/routers/chat.py Normal file
View File

@@ -0,0 +1,112 @@
from typing import List, Optional
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel
from middleware.rate_limit import check_backend_rate_limit
from services import proxy_ollama, proxy_openai
from services.registry import registry
router = APIRouter(prefix="/v1", tags=["chat"])
class ChatMessage(BaseModel):
role: str
content: str
class ChatRequest(BaseModel):
model: str
messages: List[ChatMessage]
stream: bool = False
temperature: Optional[float] = None
max_tokens: Optional[int] = None
@router.post("/chat/completions")
async def chat_completions(body: ChatRequest, request: Request):
role = getattr(request.state, "role", "anonymous")
if role == "anonymous":
raise HTTPException(
status_code=401,
detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
)
# Resolve model to backend
result = registry.resolve_model(body.model, role)
if not result:
raise HTTPException(
status_code=404,
detail={
"error": {
"message": f"Model '{body.model}' not found or not available",
"type": "invalid_request_error",
"code": "model_not_found",
}
},
)
backend, model_info = result
# Check rate limit
check_backend_rate_limit(backend.id)
# Record request for rate limiting
registry.record_request(backend.id)
messages = [{"role": m.role, "content": m.content} for m in body.messages]
kwargs = {}
if body.temperature is not None:
kwargs["temperature"] = body.temperature
# Use backend-specific model ID if configured, otherwise use the user-facing ID
actual_model = model_info.backend_model_id or body.model
# Route to appropriate proxy
if backend.type == "ollama":
if body.stream:
return StreamingResponse(
proxy_ollama.stream_chat(
backend.url, actual_model, messages, **kwargs
),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
},
)
else:
result = await proxy_ollama.complete_chat(
backend.url, actual_model, messages, **kwargs
)
return JSONResponse(content=result)
if backend.type == "openai-compat":
if body.stream:
return StreamingResponse(
proxy_openai.stream_chat(
backend.url, actual_model, messages, **kwargs
),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
},
)
else:
result = await proxy_openai.complete_chat(
backend.url, actual_model, messages, **kwargs
)
return JSONResponse(content=result)
raise HTTPException(
status_code=501,
detail={
"error": {
"message": f"Backend type '{backend.type}' not yet implemented",
"type": "api_error",
"code": "not_implemented",
}
},
)

View File

@@ -0,0 +1,67 @@
from typing import List, Union
from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel
from services import proxy_ollama
from services.registry import registry
router = APIRouter(prefix="/v1", tags=["embeddings"])
class EmbeddingRequest(BaseModel):
model: str
input: Union[str, List[str]]
@router.post("/embeddings")
async def create_embedding(body: EmbeddingRequest, request: Request):
role = getattr(request.state, "role", "anonymous")
if role == "anonymous":
raise HTTPException(
status_code=401,
detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
)
result = registry.resolve_model(body.model, role)
if not result:
raise HTTPException(
status_code=404,
detail={
"error": {
"message": f"Model '{body.model}' not found or not available",
"type": "invalid_request_error",
"code": "model_not_found",
}
},
)
backend, model_info = result
if "embed" not in model_info.capabilities:
raise HTTPException(
status_code=400,
detail={
"error": {
"message": f"Model '{body.model}' does not support embeddings",
"type": "invalid_request_error",
"code": "capability_mismatch",
}
},
)
if backend.type == "ollama":
return await proxy_ollama.generate_embedding(
backend.url, body.model, body.input
)
raise HTTPException(
status_code=501,
detail={
"error": {
"message": f"Embedding not supported for backend type '{backend.type}'",
"type": "api_error",
"code": "not_implemented",
}
},
)

13
hub-api/routers/gpu.py Normal file
View File

@@ -0,0 +1,13 @@
from fastapi import APIRouter
from services.gpu_monitor import get_gpu_info
router = APIRouter(tags=["gpu"])
@router.get("/gpu")
async def gpu_status():
info = await get_gpu_info()
if not info:
return {"error": {"message": "GPU info unavailable", "type": "api_error", "code": "gpu_unavailable"}}
return info

31
hub-api/routers/health.py Normal file
View File

@@ -0,0 +1,31 @@
from fastapi import APIRouter
from services.gpu_monitor import get_gpu_info
from services.registry import registry
router = APIRouter(tags=["health"])
@router.get("/health")
async def health():
gpu = await get_gpu_info()
return {
"status": "ok",
"backends": registry.get_health_summary(),
"gpu": gpu,
}
@router.get("/health/{backend_id}")
async def backend_health(backend_id: str):
backend = registry.backends.get(backend_id)
if not backend:
return {"error": {"message": f"Backend '{backend_id}' not found"}}
return {
"id": backend.id,
"type": backend.type,
"status": "healthy" if backend.healthy else "down",
"models": [m.id for m in backend.models],
"latency_ms": backend.latency_ms,
}

12
hub-api/routers/models.py Normal file
View File

@@ -0,0 +1,12 @@
from fastapi import APIRouter, Request
from services.registry import registry
router = APIRouter(prefix="/v1", tags=["models"])
@router.get("/models")
async def list_models(request: Request):
role = getattr(request.state, "role", "anonymous")
models = registry.list_models(role)
return {"object": "list", "data": models}

View File

View File

@@ -0,0 +1,41 @@
from __future__ import annotations
import asyncio
import logging
from config import settings
logger = logging.getLogger(__name__)
async def get_gpu_info() -> dict | None:
"""Run nvidia-smi and parse GPU info."""
try:
proc = await asyncio.create_subprocess_exec(
settings.nvidia_smi_path,
"--query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total,power.draw,name",
"--format=csv,noheader,nounits",
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=5.0)
if proc.returncode != 0:
logger.debug("nvidia-smi failed: %s", stderr.decode())
return None
line = stdout.decode().strip().split("\n")[0]
parts = [p.strip() for p in line.split(",")]
if len(parts) < 6:
return None
return {
"utilization": int(parts[0]),
"temperature": int(parts[1]),
"vram_used": int(parts[2]),
"vram_total": int(parts[3]),
"power_draw": float(parts[4]),
"name": parts[5],
}
except (FileNotFoundError, asyncio.TimeoutError):
return None

View File

@@ -0,0 +1,156 @@
from __future__ import annotations
import json
import logging
from collections.abc import AsyncGenerator
import httpx
logger = logging.getLogger(__name__)
async def stream_chat(
base_url: str,
model: str,
messages: list[dict],
**kwargs,
) -> AsyncGenerator[str, None]:
"""Proxy Ollama chat streaming, converting NDJSON to OpenAI SSE format."""
payload = {
"model": model,
"messages": messages,
"stream": True,
**{k: v for k, v in kwargs.items() if v is not None},
}
async with httpx.AsyncClient(timeout=120.0) as client:
async with client.stream(
"POST",
f"{base_url}/api/chat",
json=payload,
) as resp:
if resp.status_code != 200:
body = await resp.aread()
error_msg = body.decode("utf-8", errors="replace")
yield _error_event(f"Ollama error: {error_msg}")
return
async for line in resp.aiter_lines():
if not line.strip():
continue
try:
chunk = json.loads(line)
except json.JSONDecodeError:
continue
if chunk.get("done"):
# Final chunk — send [DONE]
yield "data: [DONE]\n\n"
return
content = chunk.get("message", {}).get("content", "")
if content:
openai_chunk = {
"id": "chatcmpl-gateway",
"object": "chat.completion.chunk",
"model": model,
"choices": [
{
"index": 0,
"delta": {"content": content},
"finish_reason": None,
}
],
}
yield f"data: {json.dumps(openai_chunk)}\n\n"
async def complete_chat(
base_url: str,
model: str,
messages: list[dict],
**kwargs,
) -> dict:
"""Non-streaming Ollama chat, returns OpenAI-compatible response."""
payload = {
"model": model,
"messages": messages,
"stream": False,
**{k: v for k, v in kwargs.items() if v is not None},
}
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.post(f"{base_url}/api/chat", json=payload)
resp.raise_for_status()
data = resp.json()
return {
"id": "chatcmpl-gateway",
"object": "chat.completion",
"model": model,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": data.get("message", {}).get("content", ""),
},
"finish_reason": "stop",
}
],
"usage": {
"prompt_tokens": data.get("prompt_eval_count", 0),
"completion_tokens": data.get("eval_count", 0),
"total_tokens": data.get("prompt_eval_count", 0)
+ data.get("eval_count", 0),
},
}
async def generate_embedding(
base_url: str,
model: str,
input_text: str | list[str],
) -> dict:
"""Ollama embedding, returns OpenAI-compatible response."""
texts = [input_text] if isinstance(input_text, str) else input_text
async with httpx.AsyncClient(timeout=60.0) as client:
resp = await client.post(
f"{base_url}/api/embed",
json={"model": model, "input": texts},
)
resp.raise_for_status()
data = resp.json()
embeddings_data = []
raw_embeddings = data.get("embeddings", [])
for i, emb in enumerate(raw_embeddings):
embeddings_data.append({
"object": "embedding",
"embedding": emb,
"index": i,
})
return {
"object": "list",
"data": embeddings_data,
"model": model,
"usage": {"prompt_tokens": 1, "total_tokens": 1},
}
def _error_event(message: str) -> str:
error = {
"id": "chatcmpl-gateway",
"object": "chat.completion.chunk",
"model": "error",
"choices": [
{
"index": 0,
"delta": {"content": f"[Error] {message}"},
"finish_reason": "stop",
}
],
}
return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"

View File

@@ -0,0 +1,83 @@
"""OpenAI-compatible proxy (MLX server, vLLM, etc.) — SSE passthrough."""
from __future__ import annotations
import json
import logging
from collections.abc import AsyncGenerator
import httpx
logger = logging.getLogger(__name__)
async def stream_chat(
base_url: str,
model: str,
messages: list[dict],
**kwargs,
) -> AsyncGenerator[str, None]:
"""Proxy OpenAI-compatible chat streaming. SSE passthrough with model field override."""
payload = {
"model": model,
"messages": messages,
"stream": True,
**{k: v for k, v in kwargs.items() if v is not None},
}
async with httpx.AsyncClient(timeout=120.0) as client:
async with client.stream(
"POST",
f"{base_url}/v1/chat/completions",
json=payload,
) as resp:
if resp.status_code != 200:
body = await resp.aread()
error_msg = body.decode("utf-8", errors="replace")
yield _error_event(f"Backend error ({resp.status_code}): {error_msg}")
return
async for line in resp.aiter_lines():
if not line.strip():
continue
# Pass through SSE lines as-is (already in OpenAI format)
if line.startswith("data: "):
yield f"{line}\n\n"
elif line == "data: [DONE]":
yield "data: [DONE]\n\n"
async def complete_chat(
base_url: str,
model: str,
messages: list[dict],
**kwargs,
) -> dict:
"""Non-streaming OpenAI-compatible chat."""
payload = {
"model": model,
"messages": messages,
"stream": False,
**{k: v for k, v in kwargs.items() if v is not None},
}
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.post(f"{base_url}/v1/chat/completions", json=payload)
resp.raise_for_status()
return resp.json()
def _error_event(message: str) -> str:
error = {
"id": "chatcmpl-gateway",
"object": "chat.completion.chunk",
"model": "error",
"choices": [
{
"index": 0,
"delta": {"content": f"[Error] {message}"},
"finish_reason": "stop",
}
],
}
return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"

View File

@@ -0,0 +1,227 @@
from __future__ import annotations
import asyncio
import json
import logging
import time
from dataclasses import dataclass, field
from pathlib import Path
import httpx
logger = logging.getLogger(__name__)
@dataclass
class ModelInfo:
id: str
capabilities: list[str]
priority: int = 1
backend_model_id: str = "" # actual model ID sent to backend (if different from id)
@dataclass
class RateLimitConfig:
rpm: int = 0
rph: int = 0
scope: str = "global"
@dataclass
class BackendInfo:
id: str
type: str # "ollama", "openai-compat", "anthropic"
url: str
models: list[ModelInfo]
access: str = "all" # "all" or "owner"
rate_limit: RateLimitConfig | None = None
# runtime state
healthy: bool = False
last_check: float = 0
latency_ms: float = 0
@dataclass
class RateLimitState:
minute_timestamps: list[float] = field(default_factory=list)
hour_timestamps: list[float] = field(default_factory=list)
class Registry:
def __init__(self):
self.backends: dict[str, BackendInfo] = {}
self._health_task: asyncio.Task | None = None
self._rate_limits: dict[str, RateLimitState] = {}
async def load_backends(self, config_path: str):
path = Path(config_path)
if not path.exists():
logger.warning("Backends config not found: %s", config_path)
return
with open(path) as f:
data = json.load(f)
for entry in data:
models = [
ModelInfo(
id=m["id"],
capabilities=m.get("capabilities", ["chat"]),
priority=m.get("priority", 1),
backend_model_id=m.get("backend_model_id", ""),
)
for m in entry.get("models", [])
]
rl_data = entry.get("rate_limit")
rate_limit = (
RateLimitConfig(
rpm=rl_data.get("rpm", 0),
rph=rl_data.get("rph", 0),
scope=rl_data.get("scope", "global"),
)
if rl_data
else None
)
backend = BackendInfo(
id=entry["id"],
type=entry["type"],
url=entry["url"].rstrip("/"),
models=models,
access=entry.get("access", "all"),
rate_limit=rate_limit,
)
self.backends[backend.id] = backend
if rate_limit:
self._rate_limits[backend.id] = RateLimitState()
logger.info("Loaded %d backends", len(self.backends))
def start_health_loop(self, interval: float = 30.0):
self._health_task = asyncio.create_task(self._health_loop(interval))
def stop_health_loop(self):
if self._health_task:
self._health_task.cancel()
async def _health_loop(self, interval: float):
while True:
await self._check_all_backends()
await asyncio.sleep(interval)
async def _check_all_backends(self):
async with httpx.AsyncClient(timeout=5.0) as client:
tasks = [
self._check_backend(client, backend)
for backend in self.backends.values()
]
await asyncio.gather(*tasks, return_exceptions=True)
async def _check_backend(self, client: httpx.AsyncClient, backend: BackendInfo):
try:
start = time.monotonic()
if backend.type == "ollama":
resp = await client.get(f"{backend.url}/api/tags")
elif backend.type in ("openai-compat", "anthropic"):
resp = await client.get(f"{backend.url}/v1/models")
else:
resp = await client.get(f"{backend.url}/health")
elapsed = (time.monotonic() - start) * 1000
backend.healthy = resp.status_code < 500
backend.latency_ms = round(elapsed, 1)
backend.last_check = time.time()
except Exception:
backend.healthy = False
backend.latency_ms = 0
backend.last_check = time.time()
logger.debug("Health check failed for %s", backend.id)
def resolve_model(self, model_id: str, role: str) -> tuple[BackendInfo, ModelInfo] | None:
"""Find the best backend for a given model ID. Returns (backend, model) or None."""
candidates: list[tuple[BackendInfo, ModelInfo, int]] = []
for backend in self.backends.values():
if not backend.healthy:
continue
if backend.access == "owner" and role != "owner":
continue
for model in backend.models:
if model.id == model_id:
candidates.append((backend, model, model.priority))
if not candidates:
return None
candidates.sort(key=lambda x: x[2])
return candidates[0][0], candidates[0][1]
def list_models(self, role: str) -> list[dict]:
"""List all available models for a given role."""
result = []
for backend in self.backends.values():
if not backend.healthy:
continue
if backend.access == "owner" and role != "owner":
continue
for model in backend.models:
result.append({
"id": model.id,
"object": "model",
"owned_by": backend.id,
"capabilities": model.capabilities,
"backend_id": backend.id,
"backend_status": "healthy" if backend.healthy else "down",
})
return result
def check_rate_limit(self, backend_id: str) -> bool:
"""Check if a request to this backend is within rate limits. Returns True if allowed."""
backend = self.backends.get(backend_id)
if not backend or not backend.rate_limit:
return True
state = self._rate_limits.get(backend_id)
if not state:
return True
now = time.time()
rl = backend.rate_limit
# Clean old timestamps
if rl.rpm > 0:
state.minute_timestamps = [t for t in state.minute_timestamps if now - t < 60]
if len(state.minute_timestamps) >= rl.rpm:
return False
if rl.rph > 0:
state.hour_timestamps = [t for t in state.hour_timestamps if now - t < 3600]
if len(state.hour_timestamps) >= rl.rph:
return False
return True
def record_request(self, backend_id: str):
"""Record a request timestamp for rate limiting."""
state = self._rate_limits.get(backend_id)
if not state:
return
now = time.time()
state.minute_timestamps.append(now)
state.hour_timestamps.append(now)
def get_health_summary(self) -> list[dict]:
return [
{
"id": b.id,
"type": b.type,
"status": "healthy" if b.healthy else "down",
"models": [m.id for m in b.models],
"latency_ms": b.latency_ms,
"last_check": b.last_check,
}
for b in self.backends.values()
]
registry = Registry()

24
hub-web/.gitignore vendored Normal file
View File

@@ -0,0 +1,24 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?

12
hub-web/Dockerfile Normal file
View File

@@ -0,0 +1,12 @@
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

73
hub-web/README.md Normal file
View File

@@ -0,0 +1,73 @@
# React + TypeScript + Vite
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
Currently, two official plugins are available:
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Oxc](https://oxc.rs)
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/)
## React Compiler
The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
## Expanding the ESLint configuration
If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
```js
export default defineConfig([
globalIgnores(['dist']),
{
files: ['**/*.{ts,tsx}'],
extends: [
// Other configs...
// Remove tseslint.configs.recommended and replace with this
tseslint.configs.recommendedTypeChecked,
// Alternatively, use this for stricter rules
tseslint.configs.strictTypeChecked,
// Optionally, add this for stylistic rules
tseslint.configs.stylisticTypeChecked,
// Other configs...
],
languageOptions: {
parserOptions: {
project: ['./tsconfig.node.json', './tsconfig.app.json'],
tsconfigRootDir: import.meta.dirname,
},
// other options...
},
},
])
```
You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
```js
// eslint.config.js
import reactX from 'eslint-plugin-react-x'
import reactDom from 'eslint-plugin-react-dom'
export default defineConfig([
globalIgnores(['dist']),
{
files: ['**/*.{ts,tsx}'],
extends: [
// Other configs...
// Enable lint rules for React
reactX.configs['recommended-typescript'],
// Enable lint rules for React DOM
reactDom.configs.recommended,
],
languageOptions: {
parserOptions: {
project: ['./tsconfig.node.json', './tsconfig.app.json'],
tsconfigRootDir: import.meta.dirname,
},
// other options...
},
},
])
```

23
hub-web/eslint.config.js Normal file
View File

@@ -0,0 +1,23 @@
import js from '@eslint/js'
import globals from 'globals'
import reactHooks from 'eslint-plugin-react-hooks'
import reactRefresh from 'eslint-plugin-react-refresh'
import tseslint from 'typescript-eslint'
import { defineConfig, globalIgnores } from 'eslint/config'
export default defineConfig([
globalIgnores(['dist']),
{
files: ['**/*.{ts,tsx}'],
extends: [
js.configs.recommended,
tseslint.configs.recommended,
reactHooks.configs.flat.recommended,
reactRefresh.configs.vite,
],
languageOptions: {
ecmaVersion: 2020,
globals: globals.browser,
},
},
])

13
hub-web/index.html Normal file
View File

@@ -0,0 +1,13 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>hub-web</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

9
hub-web/nginx.conf Normal file
View File

@@ -0,0 +1,9 @@
server {
listen 80;
root /usr/share/nginx/html;
index index.html;
location / {
try_files $uri $uri/ /index.html;
}
}

4531
hub-web/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

34
hub-web/package.json Normal file
View File

@@ -0,0 +1,34 @@
{
"name": "hub-web",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"react": "^19.2.4",
"react-dom": "^19.2.4",
"react-markdown": "^10.1.0",
"react-router-dom": "^7.13.2"
},
"devDependencies": {
"@eslint/js": "^9.39.4",
"@tailwindcss/vite": "^4.2.2",
"@types/node": "^24.12.0",
"@types/react": "^19.2.14",
"@types/react-dom": "^19.2.3",
"@vitejs/plugin-react": "^6.0.1",
"eslint": "^9.39.4",
"eslint-plugin-react-hooks": "^7.0.1",
"eslint-plugin-react-refresh": "^0.5.2",
"globals": "^17.4.0",
"tailwindcss": "^4.2.2",
"typescript": "~5.9.3",
"typescript-eslint": "^8.57.0",
"vite": "^8.0.1"
}
}

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.3 KiB

24
hub-web/public/icons.svg Normal file
View File

@@ -0,0 +1,24 @@
<svg xmlns="http://www.w3.org/2000/svg">
<symbol id="bluesky-icon" viewBox="0 0 16 17">
<g clip-path="url(#bluesky-clip)"><path fill="#08060d" d="M7.75 7.735c-.693-1.348-2.58-3.86-4.334-5.097-1.68-1.187-2.32-.981-2.74-.79C.188 2.065.1 2.812.1 3.251s.241 3.602.398 4.13c.52 1.744 2.367 2.333 4.07 2.145-2.495.37-4.71 1.278-1.805 4.512 3.196 3.309 4.38-.71 4.987-2.746.608 2.036 1.307 5.91 4.93 2.746 2.72-2.746.747-4.143-1.747-4.512 1.702.189 3.55-.4 4.07-2.145.156-.528.397-3.691.397-4.13s-.088-1.186-.575-1.406c-.42-.19-1.06-.395-2.741.79-1.755 1.24-3.64 3.752-4.334 5.099"/></g>
<defs><clipPath id="bluesky-clip"><path fill="#fff" d="M.1.85h15.3v15.3H.1z"/></clipPath></defs>
</symbol>
<symbol id="discord-icon" viewBox="0 0 20 19">
<path fill="#08060d" d="M16.224 3.768a14.5 14.5 0 0 0-3.67-1.153c-.158.286-.343.67-.47.976a13.5 13.5 0 0 0-4.067 0c-.128-.306-.317-.69-.476-.976A14.4 14.4 0 0 0 3.868 3.77C1.546 7.28.916 10.703 1.231 14.077a14.7 14.7 0 0 0 4.5 2.306q.545-.748.965-1.587a9.5 9.5 0 0 1-1.518-.74q.191-.14.372-.293c2.927 1.369 6.107 1.369 8.999 0q.183.152.372.294-.723.437-1.52.74.418.838.963 1.588a14.6 14.6 0 0 0 4.504-2.308c.37-3.911-.63-7.302-2.644-10.309m-9.13 8.234c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.894 0 1.614.82 1.599 1.82.001 1-.705 1.82-1.6 1.82m5.91 0c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.893 0 1.614.82 1.599 1.82 0 1-.706 1.82-1.6 1.82"/>
</symbol>
<symbol id="documentation-icon" viewBox="0 0 21 20">
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="m15.5 13.333 1.533 1.322c.645.555.967.833.967 1.178s-.322.623-.967 1.179L15.5 18.333m-3.333-5-1.534 1.322c-.644.555-.966.833-.966 1.178s.322.623.966 1.179l1.534 1.321"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M17.167 10.836v-4.32c0-1.41 0-2.117-.224-2.68-.359-.906-1.118-1.621-2.08-1.96-.599-.21-1.349-.21-2.848-.21-2.623 0-3.935 0-4.983.369-1.684.591-3.013 1.842-3.641 3.428C3 6.449 3 7.684 3 10.154v2.122c0 2.558 0 3.838.706 4.726q.306.383.713.671c.76.536 1.79.64 3.581.66"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M3 10a2.78 2.78 0 0 1 2.778-2.778c.555 0 1.209.097 1.748-.047.48-.129.854-.503.982-.982.145-.54.048-1.194.048-1.749a2.78 2.78 0 0 1 2.777-2.777"/>
</symbol>
<symbol id="github-icon" viewBox="0 0 19 19">
<path fill="#08060d" fill-rule="evenodd" d="M9.356 1.85C5.05 1.85 1.57 5.356 1.57 9.694a7.84 7.84 0 0 0 5.324 7.44c.387.079.528-.168.528-.376 0-.182-.013-.805-.013-1.454-2.165.467-2.616-.935-2.616-.935-.349-.91-.864-1.143-.864-1.143-.71-.48.051-.48.051-.48.787.051 1.2.805 1.2.805.695 1.194 1.817.857 2.268.649.064-.507.27-.857.49-1.052-1.728-.182-3.545-.857-3.545-3.87 0-.857.31-1.558.8-2.104-.078-.195-.349-1 .077-2.078 0 0 .657-.208 2.14.805a7.5 7.5 0 0 1 1.946-.26c.657 0 1.328.092 1.946.26 1.483-1.013 2.14-.805 2.14-.805.426 1.078.155 1.883.078 2.078.502.546.799 1.247.799 2.104 0 3.013-1.818 3.675-3.558 3.87.284.247.528.714.528 1.454 0 1.052-.012 1.896-.012 2.156 0 .208.142.455.528.377a7.84 7.84 0 0 0 5.324-7.441c.013-4.338-3.48-7.844-7.773-7.844" clip-rule="evenodd"/>
</symbol>
<symbol id="social-icon" viewBox="0 0 20 20">
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M12.5 6.667a4.167 4.167 0 1 0-8.334 0 4.167 4.167 0 0 0 8.334 0"/>
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M2.5 16.667a5.833 5.833 0 0 1 8.75-5.053m3.837.474.513 1.035c.07.144.257.282.414.309l.93.155c.596.1.736.536.307.965l-.723.73a.64.64 0 0 0-.152.531l.207.903c.164.715-.213.991-.84.618l-.872-.52a.63.63 0 0 0-.577 0l-.872.52c-.624.373-1.003.094-.84-.618l.207-.903a.64.64 0 0 0-.152-.532l-.723-.729c-.426-.43-.289-.864.306-.964l.93-.156a.64.64 0 0 0 .412-.31l.513-1.034c.28-.562.735-.562 1.012 0"/>
</symbol>
<symbol id="x-icon" viewBox="0 0 19 19">
<path fill="#08060d" fill-rule="evenodd" d="M1.893 1.98c.052.072 1.245 1.769 2.653 3.77l2.892 4.114c.183.261.333.48.333.486s-.068.089-.152.183l-.522.593-.765.867-3.597 4.087c-.375.426-.734.834-.798.905a1 1 0 0 0-.118.148c0 .01.236.017.664.017h.663l.729-.83c.4-.457.796-.906.879-.999a692 692 0 0 0 1.794-2.038c.034-.037.301-.34.594-.675l.551-.624.345-.392a7 7 0 0 1 .34-.374c.006 0 .93 1.306 2.052 2.903l2.084 2.965.045.063h2.275c1.87 0 2.273-.003 2.266-.021-.008-.02-1.098-1.572-3.894-5.547-2.013-2.862-2.28-3.246-2.273-3.266.008-.019.282-.332 2.085-2.38l2-2.274 1.567-1.782c.022-.028-.016-.03-.65-.03h-.674l-.3.342a871 871 0 0 1-1.782 2.025c-.067.075-.405.458-.75.852a100 100 0 0 1-.803.91c-.148.172-.299.344-.99 1.127-.304.343-.32.358-.345.327-.015-.019-.904-1.282-1.976-2.808L6.365 1.85H1.8zm1.782.91 8.078 11.294c.772 1.08 1.413 1.973 1.425 1.984.016.017.241.02 1.05.017l1.03-.004-2.694-3.766L7.796 5.75 5.722 2.852l-1.039-.004-1.039-.004z" clip-rule="evenodd"/>
</symbol>
</svg>

After

Width:  |  Height:  |  Size: 4.9 KiB

61
hub-web/src/App.tsx Normal file
View File

@@ -0,0 +1,61 @@
import { useState, useEffect } from 'react';
import { BrowserRouter, Routes, Route, Navigate, NavLink } from 'react-router-dom';
import { AuthCtx } from './lib/auth';
import { getMe, logout } from './lib/api';
import Login from './pages/Login';
import Dashboard from './pages/Dashboard';
import Chat from './pages/Chat';
export default function App() {
const [role, setRole] = useState<string | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
getMe().then(me => {
setRole(me?.role ?? null);
setLoading(false);
});
}, []);
if (loading) {
return <div className="flex items-center justify-center h-screen text-[hsl(var(--muted-foreground))]">Loading...</div>;
}
if (!role) {
return (
<AuthCtx.Provider value={{ role, setRole }}>
<Login />
</AuthCtx.Provider>
);
}
return (
<AuthCtx.Provider value={{ role, setRole }}>
<BrowserRouter>
<div className="min-h-screen flex flex-col dark">
<nav className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-6">
<span className="font-semibold text-lg">AI Gateway</span>
<NavLink to="/" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Dashboard</NavLink>
<NavLink to="/chat" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Chat</NavLink>
<div className="ml-auto flex items-center gap-3">
<span className="text-sm text-[hsl(var(--muted-foreground))]">{role}</span>
<button
onClick={async () => { await logout(); setRole(null); }}
className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
>
Logout
</button>
</div>
</nav>
<main className="flex-1">
<Routes>
<Route path="/" element={<Dashboard />} />
<Route path="/chat" element={<Chat />} />
<Route path="*" element={<Navigate to="/" />} />
</Routes>
</main>
</div>
</BrowserRouter>
</AuthCtx.Provider>
);
}

41
hub-web/src/index.css Normal file
View File

@@ -0,0 +1,41 @@
@import "tailwindcss";
:root {
--background: 0 0% 100%;
--foreground: 240 10% 3.9%;
--card: 0 0% 100%;
--card-foreground: 240 10% 3.9%;
--muted: 240 4.8% 95.9%;
--muted-foreground: 240 3.8% 46.1%;
--border: 240 5.9% 90%;
--primary: 240 5.9% 10%;
--primary-foreground: 0 0% 98%;
--destructive: 0 84.2% 60.2%;
--ring: 240 5.9% 10%;
--radius: 0.5rem;
}
.dark {
--background: 240 10% 3.9%;
--foreground: 0 0% 98%;
--card: 240 10% 3.9%;
--card-foreground: 0 0% 98%;
--muted: 240 3.7% 15.9%;
--muted-foreground: 240 5% 64.9%;
--border: 240 3.7% 15.9%;
--primary: 0 0% 98%;
--primary-foreground: 240 5.9% 10%;
--destructive: 0 62.8% 30.6%;
--ring: 240 4.9% 83.9%;
}
* {
border-color: hsl(var(--border));
}
body {
margin: 0;
background-color: hsl(var(--background));
color: hsl(var(--foreground));
font-family: system-ui, -apple-system, sans-serif;
}

127
hub-web/src/lib/api.ts Normal file
View File

@@ -0,0 +1,127 @@
const BASE = '';
// Store token in memory for Bearer auth (more reliable than cookies through proxies)
let _token: string | null = null;
export function setToken(token: string | null) {
_token = token;
}
function authHeaders(): Record<string, string> {
const h: Record<string, string> = { 'Content-Type': 'application/json' };
if (_token) h['Authorization'] = `Bearer ${_token}`;
return h;
}
export async function login(password: string): Promise<{ role: string; token: string }> {
const res = await fetch(`${BASE}/auth/login`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ password }),
});
if (!res.ok) {
const err = await res.json().catch(() => null);
throw new Error(err?.error?.message || 'Login failed');
}
const data = await res.json();
_token = data.token;
return data;
}
export async function getMe(): Promise<{ role: string } | null> {
if (!_token) return null;
const res = await fetch(`${BASE}/auth/me`, { headers: authHeaders() });
if (!res.ok) { _token = null; return null; }
return res.json();
}
export async function logout(): Promise<void> {
await fetch(`${BASE}/auth/logout`, { method: 'POST', headers: authHeaders() });
_token = null;
}
export interface Model {
id: string;
owned_by: string;
capabilities: string[];
backend_id: string;
backend_status: string;
}
export async function getModels(): Promise<Model[]> {
const res = await fetch(`${BASE}/v1/models`, { headers: authHeaders() });
if (!res.ok) return [];
const data = await res.json();
return data.data || [];
}
export interface BackendHealth {
id: string;
type: string;
status: string;
models: string[];
latency_ms: number;
}
export interface GpuInfo {
utilization: number;
temperature: number;
vram_used: number;
vram_total: number;
power_draw: number;
name: string;
}
export async function getHealth(): Promise<{ backends: BackendHealth[]; gpu: GpuInfo | null }> {
const res = await fetch(`${BASE}/health`);
if (!res.ok) return { backends: [], gpu: null };
return res.json();
}
export interface ChatMessage {
role: 'user' | 'assistant' | 'system';
content: string;
}
export async function* streamChat(
model: string,
messages: ChatMessage[],
): AsyncGenerator<string, void> {
const res = await fetch(`${BASE}/v1/chat/completions`, {
method: 'POST',
headers: authHeaders(),
body: JSON.stringify({ model, messages, stream: true }),
});
if (!res.ok) {
const err = await res.json().catch(() => null);
throw new Error(err?.error?.message || `Chat failed: ${res.status}`);
}
const reader = res.body?.getReader();
if (!reader) throw new Error('No response body');
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const data = line.slice(6).trim();
if (data === '[DONE]') return;
try {
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) yield content;
} catch {
// skip malformed chunks
}
}
}
}

12
hub-web/src/lib/auth.ts Normal file
View File

@@ -0,0 +1,12 @@
import { createContext, useContext } from 'react';
export interface AuthContext {
role: string | null;
setRole: (role: string | null) => void;
}
export const AuthCtx = createContext<AuthContext>({ role: null, setRole: () => {} });
export function useAuth() {
return useContext(AuthCtx);
}

10
hub-web/src/main.tsx Normal file
View File

@@ -0,0 +1,10 @@
import { StrictMode } from 'react'
import { createRoot } from 'react-dom/client'
import './index.css'
import App from './App.tsx'
createRoot(document.getElementById('root')!).render(
<StrictMode>
<App />
</StrictMode>,
)

130
hub-web/src/pages/Chat.tsx Normal file
View File

@@ -0,0 +1,130 @@
import { useState, useEffect, useRef } from 'react';
import ReactMarkdown from 'react-markdown';
import type { Model, ChatMessage } from '../lib/api';
import { getModels, streamChat } from '../lib/api';
export default function Chat() {
const [models, setModels] = useState<Model[]>([]);
const [selectedModel, setSelectedModel] = useState('');
const [messages, setMessages] = useState<ChatMessage[]>([]);
const [input, setInput] = useState('');
const [streaming, setStreaming] = useState(false);
const bottomRef = useRef<HTMLDivElement>(null);
useEffect(() => {
getModels().then(mdls => {
const chatModels = mdls.filter(m => m.capabilities.includes('chat'));
setModels(chatModels);
if (chatModels.length > 0 && !selectedModel) {
setSelectedModel(chatModels[0].id);
}
});
}, []);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
const handleSend = async () => {
if (!input.trim() || !selectedModel || streaming) return;
const userMsg: ChatMessage = { role: 'user', content: input.trim() };
const newMessages = [...messages, userMsg];
setMessages(newMessages);
setInput('');
setStreaming(true);
const assistantMsg: ChatMessage = { role: 'assistant', content: '' };
setMessages([...newMessages, assistantMsg]);
try {
for await (const chunk of streamChat(selectedModel, newMessages)) {
assistantMsg.content += chunk;
setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
}
} catch (err) {
assistantMsg.content += `\n\n[Error: ${err instanceof Error ? err.message : 'Unknown error'}]`;
setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
} finally {
setStreaming(false);
}
};
const handleKeyDown = (e: React.KeyboardEvent) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
handleSend();
}
};
return (
<div className="flex flex-col h-[calc(100vh-57px)]">
{/* Header */}
<div className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-4">
<select
value={selectedModel}
onChange={e => setSelectedModel(e.target.value)}
className="px-3 py-1.5 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-sm"
>
{models.map(m => (
<option key={m.id} value={m.id}>{m.id} ({m.owned_by})</option>
))}
</select>
<button
onClick={() => setMessages([])}
className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
>
Clear
</button>
</div>
{/* Messages */}
<div className="flex-1 overflow-y-auto px-6 py-4 space-y-4">
{messages.length === 0 && (
<div className="flex items-center justify-center h-full text-[hsl(var(--muted-foreground))]">
Send a message to start
</div>
)}
{messages.map((msg, i) => (
<div key={i} className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}>
<div className={`max-w-[80%] rounded-lg px-4 py-2 ${
msg.role === 'user'
? 'bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))]'
: 'bg-[hsl(var(--muted))]'
}`}>
{msg.role === 'assistant' ? (
<div className="prose prose-sm prose-invert max-w-none">
<ReactMarkdown>{msg.content || '...'}</ReactMarkdown>
</div>
) : (
<p className="text-sm whitespace-pre-wrap">{msg.content}</p>
)}
</div>
</div>
))}
<div ref={bottomRef} />
</div>
{/* Input */}
<div className="border-t border-[hsl(var(--border))] px-6 py-4">
<div className="flex gap-3">
<textarea
value={input}
onChange={e => setInput(e.target.value)}
onKeyDown={handleKeyDown}
placeholder="Type a message... (Enter to send, Shift+Enter for newline)"
rows={1}
className="flex-1 px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] resize-none focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
/>
<button
onClick={handleSend}
disabled={streaming || !input.trim()}
className="px-4 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
>
{streaming ? '...' : 'Send'}
</button>
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,96 @@
import { useState, useEffect } from 'react';
import type { BackendHealth, GpuInfo, Model } from '../lib/api';
import { getHealth, getModels } from '../lib/api';
export default function Dashboard() {
const [backends, setBackends] = useState<BackendHealth[]>([]);
const [gpu, setGpu] = useState<GpuInfo | null>(null);
const [models, setModels] = useState<Model[]>([]);
const refresh = async () => {
const [health, mdls] = await Promise.all([getHealth(), getModels()]);
setBackends(health.backends);
setGpu(health.gpu);
setModels(mdls);
};
useEffect(() => {
refresh();
const id = setInterval(refresh, 15000);
return () => clearInterval(id);
}, []);
return (
<div className="p-6 space-y-6">
<div className="flex items-center justify-between">
<h2 className="text-xl font-semibold">Backends</h2>
<button onClick={refresh} className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]">Refresh</button>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{backends.map(b => (
<div key={b.id} className="rounded-lg border border-[hsl(var(--border))] p-4 space-y-2">
<div className="flex items-center justify-between">
<span className="font-medium">{b.id}</span>
<span className={`text-xs px-2 py-0.5 rounded-full ${b.status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
{b.status}
</span>
</div>
<div className="text-sm text-[hsl(var(--muted-foreground))]">{b.type}</div>
<div className="text-sm">{b.models.join(', ')}</div>
{b.latency_ms > 0 && <div className="text-xs text-[hsl(var(--muted-foreground))]">{b.latency_ms}ms</div>}
</div>
))}
</div>
{gpu && (
<>
<h2 className="text-xl font-semibold">GPU</h2>
<div className="rounded-lg border border-[hsl(var(--border))] p-4 grid grid-cols-2 md:grid-cols-4 gap-4">
<Stat label="Utilization" value={`${gpu.utilization}%`} />
<Stat label="Temperature" value={`${gpu.temperature}C`} />
<Stat label="VRAM" value={`${gpu.vram_used}/${gpu.vram_total} MB`} />
<Stat label="Power" value={`${gpu.power_draw}W`} />
</div>
</>
)}
<h2 className="text-xl font-semibold">Models</h2>
<div className="rounded-lg border border-[hsl(var(--border))] overflow-hidden">
<table className="w-full text-sm">
<thead className="bg-[hsl(var(--muted))]">
<tr>
<th className="text-left px-4 py-2">Model</th>
<th className="text-left px-4 py-2">Backend</th>
<th className="text-left px-4 py-2">Capabilities</th>
<th className="text-left px-4 py-2">Status</th>
</tr>
</thead>
<tbody>
{models.map(m => (
<tr key={`${m.backend_id}-${m.id}`} className="border-t border-[hsl(var(--border))]">
<td className="px-4 py-2 font-mono">{m.id}</td>
<td className="px-4 py-2">{m.owned_by}</td>
<td className="px-4 py-2">{m.capabilities.join(', ')}</td>
<td className="px-4 py-2">
<span className={`text-xs px-2 py-0.5 rounded-full ${m.backend_status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
{m.backend_status}
</span>
</td>
</tr>
))}
</tbody>
</table>
</div>
</div>
);
}
function Stat({ label, value }: { label: string; value: string }) {
return (
<div>
<div className="text-xs text-[hsl(var(--muted-foreground))]">{label}</div>
<div className="text-lg font-semibold">{value}</div>
</div>
);
}

View File

@@ -0,0 +1,48 @@
import { useState } from 'react';
import { login } from '../lib/api';
import { useAuth } from '../lib/auth';
export default function Login() {
const { setRole } = useAuth();
const [password, setPassword] = useState('');
const [error, setError] = useState('');
const [loading, setLoading] = useState(false);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
setError('');
setLoading(true);
try {
const { role } = await login(password);
setRole(role);
} catch (err) {
setError(err instanceof Error ? err.message : 'Login failed');
} finally {
setLoading(false);
}
};
return (
<div className="dark flex items-center justify-center min-h-screen bg-[hsl(var(--background))]">
<form onSubmit={handleSubmit} className="w-80 space-y-4">
<h1 className="text-2xl font-semibold text-center text-[hsl(var(--foreground))]">AI Gateway</h1>
<input
type="password"
value={password}
onChange={e => setPassword(e.target.value)}
placeholder="Password"
autoFocus
className="w-full px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
/>
<button
type="submit"
disabled={loading}
className="w-full px-3 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
>
{loading ? 'Logging in...' : 'Login'}
</button>
{error && <p className="text-sm text-[hsl(var(--destructive))] text-center">{error}</p>}
</form>
</div>
);
}

32
hub-web/tsconfig.app.json Normal file
View File

@@ -0,0 +1,32 @@
{
"compilerOptions": {
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
"target": "ES2023",
"useDefineForClassFields": true,
"lib": ["ES2023", "DOM", "DOM.Iterable"],
"module": "ESNext",
"types": ["vite/client"],
"skipLibCheck": true,
/* Bundler mode */
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"moduleDetection": "force",
"noEmit": true,
"jsx": "react-jsx",
/* Linting */
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedSideEffectImports": true,
"baseUrl": ".",
"paths": {
"@/*": ["./src/*"]
}
},
"include": ["src"]
}

7
hub-web/tsconfig.json Normal file
View File

@@ -0,0 +1,7 @@
{
"files": [],
"references": [
{ "path": "./tsconfig.app.json" },
{ "path": "./tsconfig.node.json" }
]
}

View File

@@ -0,0 +1,26 @@
{
"compilerOptions": {
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
"target": "ES2023",
"lib": ["ES2023"],
"module": "ESNext",
"types": ["node"],
"skipLibCheck": true,
/* Bundler mode */
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"moduleDetection": "force",
"noEmit": true,
/* Linting */
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedSideEffectImports": true
},
"include": ["vite.config.ts"]
}

21
hub-web/vite.config.ts Normal file
View File

@@ -0,0 +1,21 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import tailwindcss from '@tailwindcss/vite'
import path from 'path'
export default defineConfig({
plugins: [react(), tailwindcss()],
resolve: {
alias: {
'@': path.resolve(__dirname, './src'),
},
},
server: {
proxy: {
'/v1': 'http://localhost:8000',
'/auth': 'http://localhost:8000',
'/health': 'http://localhost:8000',
'/gpu': 'http://localhost:8000',
},
},
})