feat: home-gateway 초기 구성 — Mac mini에서 GPU 서버로 전면 이전

OrbStack 라이선스 만료로 Mac mini Docker 서비스를 GPU 서버로 통합. nginx → Caddy 전환, 12개 서브도메인 자동 HTTPS, fail2ban Caddy JSON 연동. 주요 변경: - home-caddy: Caddy 리버스 프록시 (Let's Encrypt 자동 HTTPS) - home-fail2ban: Caddy JSON 로그 기반 보안 모니터링 - home-ddns: Cloudflare DDNS (API 키 .env 분리) - gpu-hub-api/web: AI 백엔드 라우터 + 웹 UI (gpu-services에서 이전) - AI 런타임(Ollama) 내부망 전용, 외부는 gpu-hub 인증 게이트웨이 경유 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 04:55:28 +00:00
commit 79c09cede4
52 changed files with 6847 additions and 0 deletions
@@ -0,0 +1,17 @@
+# Secrets
+.env
+ddns/.env
+
+# Runtime data
+caddy/logs/
+fail2ban/data/
+docker-compose.test.yml
+
+# Node
+hub-web/node_modules/
+hub-web/dist/
+
+# Python
+hub-api/__pycache__/
+hub-api/*.pyc
+hub-api/.venv/
@@ -0,0 +1,68 @@
+# home-gateway
+
+홈 네트워크 통합 게이트웨이. GPU 서버(192.168.1.186)에서 운영.
+
+## 구성
+
+| 컨테이너 | 역할 |
+|----------|------|
+| home-caddy | Caddy 리버스 프록시 (80/443, 자동 HTTPS) |
+| home-fail2ban | Caddy JSON 로그 기반 보안 모니터링 |
+| home-ddns-vpn | Cloudflare DDNS (vpn.hyungi.net) |
+| home-ddns-mail | Cloudflare DDNS (mail.hyungi.net) |
+| gpu-hub-api | AI 백엔드 라우터 (인증 게이트웨이) |
+| gpu-hub-web | AI 허브 웹 UI |
+
+## 라우팅 대상
+
+### GPU 서버 (로컬)
+- `komga.hyungi.net` → :25600
+- `document.hyungi.net` → :8080 (Document Server 내부 Caddy)
+- `ai.hyungi.net` → gpu-hub-api (인증된 외부 AI 접근)
+
+### NAS (192.168.1.227)
+- `ds1525.hyungi.net` → :5000 (DSM)
+- `webdav.hyungi.net` → :5006 (WebDAV)
+- `git.hyungi.net` → :10300 (Gitea)
+- `vault.hyungi.net` → :8443 (Vaultwarden)
+- `link.hyungi.net` → :10002 (Synology Drive)
+- `mailplus.hyungi.net` → :21680 (MailPlus)
+- `contacts.hyungi.net` → :25555 (Contacts)
+- `calendar.hyungi.net` → :20002 (Calendar)
+- `note.hyungi.net` → :9350 (Note Station)
+
+### Mac mini (192.168.1.122)
+- `jellyfin.hyungi.net` → :8096
+
+## AI 접근 정책
+- Ollama/AI 런타임: 내부망 전용 (127.0.0.1:11434)
+- 외부 AI: gpu-hub-api 인증 게이트웨이를 통해서만 접근
+- `gpu.hyungi.net`: 폐기 (내부망/Tailscale 전용)
+
+## 디렉토리 구조
+```
+home-gateway/
+├── docker-compose.yml
+├── backends.json          # gpu-hub AI 백엔드 설정
+├── caddy/
+│   ├── Caddyfile          # 리버스 프록시 설정 (12개 서브도메인)
+│   └── logs/              # Caddy JSON 로그 (fail2ban 연동)
+├── fail2ban/
+│   ├── jail.local
+│   └── data/filter.d/     # Caddy용 커스텀 필터
+├── ddns/
+│   └── .env               # Cloudflare API 키
+├── hub-api/               # GPU Hub FastAPI 백엔드
+└── hub-web/               # GPU Hub React 프론트엔드
+```
+
+## 관련 독립 서비스 (별도 compose)
+- `~/qdrant/` — Qdrant 벡터 DB (127.0.0.1:6333)
+- `~/ollama/` — Ollama GPU 추론 (127.0.0.1:11434)
+
+## 마이그레이션 이력
+- 2026-04-05: Mac mini (OrbStack) → GPU 서버 전면 이전
+  - nginx → Caddy 통합
+  - Let's Encrypt 수동 관리 → Caddy 자동 HTTPS
+  - Cloudflare DDNS API 키 .env 분리
+  - fail2ban nginx 필터 → Caddy JSON 필터 전환
@@ -0,0 +1,22 @@
+[
+  {
+    "id": "ollama-gpu",
+    "type": "ollama",
+    "url": "http://host.docker.internal:11434",
+    "models": [
+      { "id": "bge-m3", "capabilities": ["embed"], "priority": 1 }
+    ],
+    "access": "all",
+    "rate_limit": null
+  },
+  {
+    "id": "mlx-mac",
+    "type": "openai-compat",
+    "url": "http://192.168.1.122:8800",
+    "models": [
+      { "id": "qwen3.5:35b-a3b", "backend_model_id": "mlx-community/Qwen3.5-35B-A3B-4bit", "capabilities": ["chat"], "priority": 1 }
+    ],
+    "access": "all",
+    "rate_limit": null
+  }
+]
@@ -0,0 +1,168 @@
+{
+    # Global options
+    log default {
+        output file /var/log/caddy/access.log {
+            roll_size 100MiB
+            roll_keep 5
+        }
+        format json
+    }
+    servers {
+        trusted_proxies static 173.245.48.0/20 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2606:4700::/32 2803:f800::/32 2405:b500::/32 2405:8100::/32 2a06:98c0::/29 2c0f:f248::/32
+    }
+}
+
+# ============================================================
+# GPU Hub — default route (direct IP access, no HTTPS)
+# ============================================================
+:80 {
+    handle /v1/* {
+        reverse_proxy gpu-hub-api:8000 {
+            flush_interval -1
+        }
+    }
+    handle /auth/* {
+        reverse_proxy gpu-hub-api:8000
+    }
+    handle /health {
+        reverse_proxy gpu-hub-api:8000
+    }
+    handle /health/* {
+        reverse_proxy gpu-hub-api:8000
+    }
+    handle /gpu {
+        reverse_proxy gpu-hub-api:8000
+    }
+    handle {
+        reverse_proxy gpu-hub-web:80
+    }
+}
+
+# ============================================================
+# AI Gateway — authenticated external access
+# ============================================================
+ai.hyungi.net {
+    reverse_proxy gpu-hub-api:8000 {
+        flush_interval -1
+    }
+}
+
+# ============================================================
+# Jellyfin — Mac mini (192.168.1.122)
+# ============================================================
+jellyfin.hyungi.net {
+    reverse_proxy 192.168.1.122:8096 {
+        transport http {
+            read_timeout 300s
+            write_timeout 300s
+        }
+    }
+}
+
+# ============================================================
+# Komga — GPU local
+# ============================================================
+komga.hyungi.net {
+    reverse_proxy host.docker.internal:25600
+}
+
+# ============================================================
+# Document Server — GPU local (via internal Caddy, Phase 6에서 직접 라우팅 전환)
+# ============================================================
+document.hyungi.net {
+    request_body {
+        max_size 100MB
+    }
+    reverse_proxy host.docker.internal:8080
+}
+
+# ============================================================
+# WebDAV — NAS (192.168.1.227)
+# ============================================================
+webdav.hyungi.net {
+    request_body {
+        max_size 2GB
+    }
+    reverse_proxy https://192.168.1.227:5006 {
+        transport http {
+            tls_insecure_skip_verify
+            read_timeout 600s
+            write_timeout 600s
+        }
+        header_up Host {host}
+        header_up X-Real-IP {remote_host}
+        header_up X-Forwarded-For {remote_host}
+        header_up X-Forwarded-Proto {scheme}
+    }
+}
+
+# ============================================================
+# DSM — NAS
+# ============================================================
+ds1525.hyungi.net {
+    request_body {
+        max_size 0
+    }
+    reverse_proxy 192.168.1.227:5000
+}
+
+# ============================================================
+# Gitea — NAS
+# ============================================================
+git.hyungi.net {
+    request_body {
+        max_size 512MB
+    }
+    reverse_proxy 192.168.1.227:10300
+}
+
+# ============================================================
+# Vaultwarden — NAS (WebSocket)
+# ============================================================
+vault.hyungi.net {
+    reverse_proxy 192.168.1.227:8443
+}
+
+# ============================================================
+# Synology Drive — NAS (WebSocket, unlimited upload)
+# ============================================================
+link.hyungi.net {
+    request_body {
+        max_size 0
+    }
+    reverse_proxy 192.168.1.227:10002
+}
+
+# ============================================================
+# MailPlus — NAS
+# ============================================================
+mailplus.hyungi.net {
+    request_body {
+        max_size 100MB
+    }
+    reverse_proxy 192.168.1.227:21680
+}
+
+# ============================================================
+# Contacts — NAS
+# ============================================================
+contacts.hyungi.net {
+    reverse_proxy 192.168.1.227:25555
+}
+
+# ============================================================
+# Calendar — NAS
+# ============================================================
+calendar.hyungi.net {
+    reverse_proxy 192.168.1.227:20002
+}
+
+# ============================================================
+# Note Station — NAS (WebSocket, unlimited upload)
+# ============================================================
+note.hyungi.net {
+    request_body {
+        max_size 0
+    }
+    reverse_proxy 192.168.1.227:9350
+}
@@ -0,0 +1,105 @@
+services:
+  # ============================================================
+  # Edge Layer — Reverse Proxy + Security + DDNS
+  # ============================================================
+  home-caddy:
+    image: caddy:2-alpine
+    container_name: home-caddy
+    restart: unless-stopped
+    ports:
+      - "80:80"
+      - "443:443"
+      - "443:443/udp"
+    volumes:
+      - ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
+      - ./caddy/logs:/var/log/caddy
+      - caddy_data:/data
+      - caddy_config:/config
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    depends_on:
+      gpu-hub-api:
+        condition: service_healthy
+    networks:
+      - gateway-net
+
+  home-fail2ban:
+    image: crazymax/fail2ban:latest
+    container_name: home-fail2ban
+    restart: unless-stopped
+    network_mode: host
+    cap_add:
+      - NET_ADMIN
+      - NET_RAW
+    volumes:
+      - ./fail2ban/data:/data
+      - ./caddy/logs:/var/log/caddy:ro
+      - ./fail2ban/jail.local:/etc/fail2ban/jail.local:ro
+    environment:
+      - TZ=Asia/Seoul
+      - F2B_LOG_LEVEL=INFO
+
+  home-ddns-vpn:
+    image: oznu/cloudflare-ddns:latest
+    container_name: home-ddns-vpn
+    restart: unless-stopped
+    env_file:
+      - ./ddns/.env
+    environment:
+      - ZONE=hyungi.net
+      - SUBDOMAIN=vpn
+      - PROXIED=false
+
+  home-ddns-mail:
+    image: oznu/cloudflare-ddns:latest
+    container_name: home-ddns-mail
+    restart: unless-stopped
+    env_file:
+      - ./ddns/.env
+    environment:
+      - ZONE=hyungi.net
+      - SUBDOMAIN=mail
+      - PROXIED=false
+
+  # ============================================================
+  # GPU Hub — AI Backend Router + Web UI
+  # ============================================================
+  gpu-hub-api:
+    build: ./hub-api
+    container_name: gpu-hub-api
+    restart: unless-stopped
+    environment:
+      - OWNER_PASSWORD=${OWNER_PASSWORD}
+      - GUEST_PASSWORD=${GUEST_PASSWORD}
+      - JWT_SECRET=${JWT_SECRET}
+      - BACKENDS_CONFIG=/app/config/backends.json
+      - CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
+      - DB_PATH=/app/data/gateway.db
+    volumes:
+      - hub_data:/app/data
+      - ./backends.json:/app/config/backends.json:ro
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 15s
+      timeout: 5s
+      retries: 3
+    networks:
+      - gateway-net
+
+  gpu-hub-web:
+    build: ./hub-web
+    container_name: gpu-hub-web
+    restart: unless-stopped
+    networks:
+      - gateway-net
+
+volumes:
+  caddy_data:
+  caddy_config:
+  hub_data:
+
+networks:
+  gateway-net:
+    name: home-gateway-network
@@ -0,0 +1,4 @@
+[Definition]
+failregex = ^.*"client_ip":"<HOST>".*"status":\s*401.*$
+ignoreregex =
+datepattern = "ts":{EPOCH}
@@ -0,0 +1,4 @@
+[Definition]
+failregex = ^.*"client_ip":"<HOST>".*"status":\s*(403|404|444).*$
+ignoreregex =
+datepattern = "ts":{EPOCH}
@@ -0,0 +1,27 @@
+[DEFAULT]
+bantime = 3600
+findtime = 600
+maxretry = 5
+backend = auto
+enabled = false
+
+[sshd]
+enabled = false
+
+# Caddy 봇/스캐너 차단 (404/403 반복)
+[caddy-botsearch]
+enabled = true
+port = 80,443
+filter = caddy-botsearch
+logpath = /var/log/caddy/access.log
+maxretry = 2
+bantime = 86400
+
+# Caddy 인증 실패 차단 (401 반복)
+[caddy-auth]
+enabled = true
+port = 80,443
+filter = caddy-auth
+logpath = /var/log/caddy/access.log
+maxretry = 3
+bantime = 1800
@@ -0,0 +1,16 @@
+FROM python:3.12-slim
+
+RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY . .
+
+RUN mkdir -p /app/data
+
+EXPOSE 8000
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
@@ -0,0 +1,21 @@
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    owner_password: str = "changeme"
+    guest_password: str = "guest"
+    jwt_secret: str = "dev-secret-change-in-production"
+    jwt_algorithm: str = "HS256"
+    jwt_expire_hours: int = 24
+
+    backends_config: str = "/app/config/backends.json"
+    cors_origins: str = "http://localhost:5173"
+
+    nvidia_smi_path: str = "/usr/bin/nvidia-smi"
+
+    db_path: str = "/app/data/gateway.db"
+
+    model_config = {"env_file": ".env", "extra": "ignore"}
+
+
+settings = Settings()
@@ -0,0 +1,50 @@
+import aiosqlite
+
+from config import settings
+
+SCHEMA = """
+CREATE TABLE IF NOT EXISTS chat_sessions (
+    id TEXT PRIMARY KEY,
+    title TEXT,
+    model TEXT NOT NULL,
+    role TEXT NOT NULL DEFAULT 'guest',
+    created_at REAL NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS chat_messages (
+    id TEXT PRIMARY KEY,
+    session_id TEXT NOT NULL REFERENCES chat_sessions(id),
+    role TEXT NOT NULL,
+    content TEXT NOT NULL,
+    created_at REAL NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS usage_logs (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    backend_id TEXT NOT NULL,
+    model TEXT NOT NULL,
+    prompt_tokens INTEGER DEFAULT 0,
+    completion_tokens INTEGER DEFAULT 0,
+    latency_ms REAL DEFAULT 0,
+    user_role TEXT NOT NULL DEFAULT 'guest',
+    created_at REAL NOT NULL
+);
+
+CREATE INDEX IF NOT EXISTS idx_messages_session ON chat_messages(session_id);
+CREATE INDEX IF NOT EXISTS idx_usage_created ON usage_logs(created_at);
+"""
+
+
+async def init_db():
+    """Initialize SQLite database with WAL mode and schema."""
+    async with aiosqlite.connect(settings.db_path) as db:
+        await db.execute("PRAGMA journal_mode=WAL")
+        await db.executescript(SCHEMA)
+        await db.commit()
+
+
+async def get_db() -> aiosqlite.Connection:
+    """Get a database connection."""
+    db = await aiosqlite.connect(settings.db_path)
+    await db.execute("PRAGMA journal_mode=WAL")
+    return db
@@ -0,0 +1,2 @@
+# DB model helpers — used in Phase 3 for logging
+# Schema defined in database.py
@@ -0,0 +1,46 @@
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+
+from config import settings
+from middleware.auth import AuthMiddleware
+from routers import auth, chat, embeddings, gpu, health, models
+from services.registry import registry
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    await registry.load_backends(settings.backends_config)
+    registry.start_health_loop()
+    yield
+    registry.stop_health_loop()
+
+
+app = FastAPI(
+    title="AI Gateway",
+    version="0.1.0",
+    lifespan=lifespan,
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=settings.cors_origins.split(","),
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+app.add_middleware(AuthMiddleware)
+
+app.include_router(auth.router)
+app.include_router(chat.router)
+app.include_router(models.router)
+app.include_router(embeddings.router)
+app.include_router(health.router)
+app.include_router(gpu.router)
+
+
+@app.get("/")
+async def root():
+    return {"service": "AI Gateway", "version": "0.1.0"}
@@ -0,0 +1,96 @@
+from __future__ import annotations
+
+import time
+
+from jose import JWTError, jwt
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.requests import Request
+
+from config import settings
+
+# Paths that don't require authentication
+PUBLIC_PATHS = {"/", "/health", "/auth/login", "/docs", "/openapi.json"}
+PUBLIC_PREFIXES = ("/health/",)
+
+
+class AuthMiddleware(BaseHTTPMiddleware):
+    async def dispatch(self, request: Request, call_next):
+        path = request.url.path
+
+        # Skip auth for public paths
+        if path in PUBLIC_PATHS or any(path.startswith(p) for p in PUBLIC_PREFIXES):
+            request.state.role = "anonymous"
+            return await call_next(request)
+
+        # Skip auth for OPTIONS (CORS preflight)
+        if request.method == "OPTIONS":
+            return await call_next(request)
+
+        # Try Bearer token first, then cookie
+        token = _extract_token(request)
+        if not token:
+            request.state.role = "anonymous"
+            return await call_next(request)
+
+        # Verify JWT
+        payload = _verify_token(token)
+        if payload:
+            request.state.role = payload.get("role", "guest")
+        else:
+            request.state.role = "anonymous"
+
+        return await call_next(request)
+
+
+def create_token(role: str) -> str:
+    payload = {
+        "role": role,
+        "exp": time.time() + settings.jwt_expire_hours * 3600,
+        "iat": time.time(),
+    }
+    return jwt.encode(payload, settings.jwt_secret, algorithm=settings.jwt_algorithm)
+
+
+def _extract_token(request: Request) -> str | None:
+    # 1. Authorization: Bearer header
+    auth_header = request.headers.get("authorization", "")
+    if auth_header.startswith("Bearer "):
+        return auth_header[7:]
+
+    # 2. httpOnly cookie
+    return request.cookies.get("token")
+
+
+def _verify_token(token: str) -> dict | None:
+    try:
+        payload = jwt.decode(
+            token, settings.jwt_secret, algorithms=[settings.jwt_algorithm]
+        )
+        if payload.get("exp", 0) < time.time():
+            return None
+        return payload
+    except JWTError:
+        return None
+
+
+# Login rate limiting (IP-based)
+_login_attempts: dict[str, list[float]] = {}
+MAX_ATTEMPTS = 5
+LOCKOUT_SECONDS = 60
+
+
+def check_login_rate_limit(ip: str) -> bool:
+    """Returns True if login is allowed for this IP."""
+    now = time.time()
+    attempts = _login_attempts.get(ip, [])
+    # Clean old attempts
+    attempts = [t for t in attempts if now - t < LOCKOUT_SECONDS]
+    _login_attempts[ip] = attempts
+    return len(attempts) < MAX_ATTEMPTS
+
+
+def record_login_attempt(ip: str):
+    now = time.time()
+    if ip not in _login_attempts:
+        _login_attempts[ip] = []
+    _login_attempts[ip].append(now)
@@ -0,0 +1,18 @@
+from fastapi import HTTPException
+
+from services.registry import registry
+
+
+def check_backend_rate_limit(backend_id: str):
+    """Raise 429 if rate limit exceeded for this backend."""
+    if not registry.check_rate_limit(backend_id):
+        raise HTTPException(
+            status_code=429,
+            detail={
+                "error": {
+                    "message": f"Rate limit exceeded for backend '{backend_id}'",
+                    "type": "rate_limit_error",
+                    "code": "rate_limit_exceeded",
+                }
+            },
+        )
@@ -0,0 +1,7 @@
+fastapi==0.115.0
+uvicorn[standard]==0.30.0
+httpx==0.27.0
+pydantic-settings==2.5.0
+python-jose[cryptography]==3.3.0
+python-multipart==0.0.9
+aiosqlite==0.20.0
@@ -0,0 +1,79 @@
+from fastapi import APIRouter, Request, Response
+from pydantic import BaseModel
+
+from config import settings
+from middleware.auth import (
+    check_login_rate_limit,
+    create_token,
+    record_login_attempt,
+)
+
+router = APIRouter(prefix="/auth", tags=["auth"])
+
+
+class LoginRequest(BaseModel):
+    password: str
+
+
+class LoginResponse(BaseModel):
+    role: str
+    token: str
+
+
+@router.post("/login")
+async def login(body: LoginRequest, request: Request, response: Response):
+    ip = request.client.host if request.client else "unknown"
+
+    if not check_login_rate_limit(ip):
+        return _error_response(429, "Too many login attempts. Try again in 1 minute.")
+
+    record_login_attempt(ip)
+
+    if body.password == settings.owner_password:
+        role = "owner"
+    elif body.password == settings.guest_password:
+        role = "guest"
+    else:
+        return _error_response(401, "Invalid password")
+
+    token = create_token(role)
+
+    # Set httpOnly cookie for web UI
+    response.set_cookie(
+        key="token",
+        value=token,
+        httponly=True,
+        samesite="lax",
+        max_age=settings.jwt_expire_hours * 3600,
+    )
+
+    return LoginResponse(role=role, token=token)
+
+
+@router.get("/me")
+async def me(request: Request):
+    role = getattr(request.state, "role", "anonymous")
+    if role == "anonymous":
+        return _error_response(401, "Not authenticated")
+    return {"role": role}
+
+
+@router.post("/logout")
+async def logout(response: Response):
+    response.delete_cookie("token")
+    return {"ok": True}
+
+
+def _error_response(status_code: int, message: str):
+    from fastapi.responses import JSONResponse
+
+    return JSONResponse(
+        status_code=status_code,
+        content={
+            "error": {
+                "message": message,
+                "type": "auth_error",
+                "code": f"auth_{status_code}",
+            }
+        },
+    )
@@ -0,0 +1,112 @@
+from typing import List, Optional
+
+from fastapi import APIRouter, HTTPException, Request
+from fastapi.responses import JSONResponse, StreamingResponse
+from pydantic import BaseModel
+
+from middleware.rate_limit import check_backend_rate_limit
+from services import proxy_ollama, proxy_openai
+from services.registry import registry
+
+router = APIRouter(prefix="/v1", tags=["chat"])
+
+
+class ChatMessage(BaseModel):
+    role: str
+    content: str
+
+
+class ChatRequest(BaseModel):
+    model: str
+    messages: List[ChatMessage]
+    stream: bool = False
+    temperature: Optional[float] = None
+    max_tokens: Optional[int] = None
+
+
+@router.post("/chat/completions")
+async def chat_completions(body: ChatRequest, request: Request):
+    role = getattr(request.state, "role", "anonymous")
+    if role == "anonymous":
+        raise HTTPException(
+            status_code=401,
+            detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
+        )
+
+    # Resolve model to backend
+    result = registry.resolve_model(body.model, role)
+    if not result:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "error": {
+                    "message": f"Model '{body.model}' not found or not available",
+                    "type": "invalid_request_error",
+                    "code": "model_not_found",
+                }
+            },
+        )
+
+    backend, model_info = result
+
+    # Check rate limit
+    check_backend_rate_limit(backend.id)
+
+    # Record request for rate limiting
+    registry.record_request(backend.id)
+
+    messages = [{"role": m.role, "content": m.content} for m in body.messages]
+    kwargs = {}
+    if body.temperature is not None:
+        kwargs["temperature"] = body.temperature
+
+    # Use backend-specific model ID if configured, otherwise use the user-facing ID
+    actual_model = model_info.backend_model_id or body.model
+
+    # Route to appropriate proxy
+    if backend.type == "ollama":
+        if body.stream:
+            return StreamingResponse(
+                proxy_ollama.stream_chat(
+                    backend.url, actual_model, messages, **kwargs
+                ),
+                media_type="text/event-stream",
+                headers={
+                    "Cache-Control": "no-cache",
+                    "X-Accel-Buffering": "no",
+                },
+            )
+        else:
+            result = await proxy_ollama.complete_chat(
+                backend.url, actual_model, messages, **kwargs
+            )
+            return JSONResponse(content=result)
+
+    if backend.type == "openai-compat":
+        if body.stream:
+            return StreamingResponse(
+                proxy_openai.stream_chat(
+                    backend.url, actual_model, messages, **kwargs
+                ),
+                media_type="text/event-stream",
+                headers={
+                    "Cache-Control": "no-cache",
+                    "X-Accel-Buffering": "no",
+                },
+            )
+        else:
+            result = await proxy_openai.complete_chat(
+                backend.url, actual_model, messages, **kwargs
+            )
+            return JSONResponse(content=result)
+
+    raise HTTPException(
+        status_code=501,
+        detail={
+            "error": {
+                "message": f"Backend type '{backend.type}' not yet implemented",
+                "type": "api_error",
+                "code": "not_implemented",
+            }
+        },
+    )
@@ -0,0 +1,67 @@
+from typing import List, Union
+
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel
+
+from services import proxy_ollama
+from services.registry import registry
+
+router = APIRouter(prefix="/v1", tags=["embeddings"])
+
+
+class EmbeddingRequest(BaseModel):
+    model: str
+    input: Union[str, List[str]]
+
+
+@router.post("/embeddings")
+async def create_embedding(body: EmbeddingRequest, request: Request):
+    role = getattr(request.state, "role", "anonymous")
+    if role == "anonymous":
+        raise HTTPException(
+            status_code=401,
+            detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
+        )
+
+    result = registry.resolve_model(body.model, role)
+    if not result:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "error": {
+                    "message": f"Model '{body.model}' not found or not available",
+                    "type": "invalid_request_error",
+                    "code": "model_not_found",
+                }
+            },
+        )
+
+    backend, model_info = result
+
+    if "embed" not in model_info.capabilities:
+        raise HTTPException(
+            status_code=400,
+            detail={
+                "error": {
+                    "message": f"Model '{body.model}' does not support embeddings",
+                    "type": "invalid_request_error",
+                    "code": "capability_mismatch",
+                }
+            },
+        )
+
+    if backend.type == "ollama":
+        return await proxy_ollama.generate_embedding(
+            backend.url, body.model, body.input
+        )
+
+    raise HTTPException(
+        status_code=501,
+        detail={
+            "error": {
+                "message": f"Embedding not supported for backend type '{backend.type}'",
+                "type": "api_error",
+                "code": "not_implemented",
+            }
+        },
+    )
@@ -0,0 +1,13 @@
+from fastapi import APIRouter
+
+from services.gpu_monitor import get_gpu_info
+
+router = APIRouter(tags=["gpu"])
+
+
+@router.get("/gpu")
+async def gpu_status():
+    info = await get_gpu_info()
+    if not info:
+        return {"error": {"message": "GPU info unavailable", "type": "api_error", "code": "gpu_unavailable"}}
+    return info
@@ -0,0 +1,31 @@
+from fastapi import APIRouter
+
+from services.gpu_monitor import get_gpu_info
+from services.registry import registry
+
+router = APIRouter(tags=["health"])
+
+
+@router.get("/health")
+async def health():
+    gpu = await get_gpu_info()
+    return {
+        "status": "ok",
+        "backends": registry.get_health_summary(),
+        "gpu": gpu,
+    }
+
+
+@router.get("/health/{backend_id}")
+async def backend_health(backend_id: str):
+    backend = registry.backends.get(backend_id)
+    if not backend:
+        return {"error": {"message": f"Backend '{backend_id}' not found"}}
+
+    return {
+        "id": backend.id,
+        "type": backend.type,
+        "status": "healthy" if backend.healthy else "down",
+        "models": [m.id for m in backend.models],
+        "latency_ms": backend.latency_ms,
+    }
@@ -0,0 +1,12 @@
+from fastapi import APIRouter, Request
+
+from services.registry import registry
+
+router = APIRouter(prefix="/v1", tags=["models"])
+
+
+@router.get("/models")
+async def list_models(request: Request):
+    role = getattr(request.state, "role", "anonymous")
+    models = registry.list_models(role)
+    return {"object": "list", "data": models}
@@ -0,0 +1,41 @@
+from __future__ import annotations
+
+import asyncio
+import logging
+
+from config import settings
+
+logger = logging.getLogger(__name__)
+
+
+async def get_gpu_info() -> dict | None:
+    """Run nvidia-smi and parse GPU info."""
+    try:
+        proc = await asyncio.create_subprocess_exec(
+            settings.nvidia_smi_path,
+            "--query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total,power.draw,name",
+            "--format=csv,noheader,nounits",
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=5.0)
+
+        if proc.returncode != 0:
+            logger.debug("nvidia-smi failed: %s", stderr.decode())
+            return None
+
+        line = stdout.decode().strip().split("\n")[0]
+        parts = [p.strip() for p in line.split(",")]
+        if len(parts) < 6:
+            return None
+
+        return {
+            "utilization": int(parts[0]),
+            "temperature": int(parts[1]),
+            "vram_used": int(parts[2]),
+            "vram_total": int(parts[3]),
+            "power_draw": float(parts[4]),
+            "name": parts[5],
+        }
+    except (FileNotFoundError, asyncio.TimeoutError):
+        return None
@@ -0,0 +1,156 @@
+from __future__ import annotations
+
+import json
+import logging
+from collections.abc import AsyncGenerator
+
+import httpx
+
+logger = logging.getLogger(__name__)
+
+
+async def stream_chat(
+    base_url: str,
+    model: str,
+    messages: list[dict],
+    **kwargs,
+) -> AsyncGenerator[str, None]:
+    """Proxy Ollama chat streaming, converting NDJSON to OpenAI SSE format."""
+    payload = {
+        "model": model,
+        "messages": messages,
+        "stream": True,
+        **{k: v for k, v in kwargs.items() if v is not None},
+    }
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        async with client.stream(
+            "POST",
+            f"{base_url}/api/chat",
+            json=payload,
+        ) as resp:
+            if resp.status_code != 200:
+                body = await resp.aread()
+                error_msg = body.decode("utf-8", errors="replace")
+                yield _error_event(f"Ollama error: {error_msg}")
+                return
+
+            async for line in resp.aiter_lines():
+                if not line.strip():
+                    continue
+                try:
+                    chunk = json.loads(line)
+                except json.JSONDecodeError:
+                    continue
+
+                if chunk.get("done"):
+                    # Final chunk — send [DONE]
+                    yield "data: [DONE]\n\n"
+                    return
+
+                content = chunk.get("message", {}).get("content", "")
+                if content:
+                    openai_chunk = {
+                        "id": "chatcmpl-gateway",
+                        "object": "chat.completion.chunk",
+                        "model": model,
+                        "choices": [
+                            {
+                                "index": 0,
+                                "delta": {"content": content},
+                                "finish_reason": None,
+                            }
+                        ],
+                    }
+                    yield f"data: {json.dumps(openai_chunk)}\n\n"
+
+
+async def complete_chat(
+    base_url: str,
+    model: str,
+    messages: list[dict],
+    **kwargs,
+) -> dict:
+    """Non-streaming Ollama chat, returns OpenAI-compatible response."""
+    payload = {
+        "model": model,
+        "messages": messages,
+        "stream": False,
+        **{k: v for k, v in kwargs.items() if v is not None},
+    }
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        resp = await client.post(f"{base_url}/api/chat", json=payload)
+        resp.raise_for_status()
+        data = resp.json()
+
+    return {
+        "id": "chatcmpl-gateway",
+        "object": "chat.completion",
+        "model": model,
+        "choices": [
+            {
+                "index": 0,
+                "message": {
+                    "role": "assistant",
+                    "content": data.get("message", {}).get("content", ""),
+                },
+                "finish_reason": "stop",
+            }
+        ],
+        "usage": {
+            "prompt_tokens": data.get("prompt_eval_count", 0),
+            "completion_tokens": data.get("eval_count", 0),
+            "total_tokens": data.get("prompt_eval_count", 0)
+            + data.get("eval_count", 0),
+        },
+    }
+
+
+async def generate_embedding(
+    base_url: str,
+    model: str,
+    input_text: str | list[str],
+) -> dict:
+    """Ollama embedding, returns OpenAI-compatible response."""
+    texts = [input_text] if isinstance(input_text, str) else input_text
+
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        resp = await client.post(
+            f"{base_url}/api/embed",
+            json={"model": model, "input": texts},
+        )
+        resp.raise_for_status()
+        data = resp.json()
+
+    embeddings_data = []
+    raw_embeddings = data.get("embeddings", [])
+    for i, emb in enumerate(raw_embeddings):
+        embeddings_data.append({
+            "object": "embedding",
+            "embedding": emb,
+            "index": i,
+        })
+
+    return {
+        "object": "list",
+        "data": embeddings_data,
+        "model": model,
+        "usage": {"prompt_tokens": 1, "total_tokens": 1},
+    }
+
+
+def _error_event(message: str) -> str:
+    error = {
+        "id": "chatcmpl-gateway",
+        "object": "chat.completion.chunk",
+        "model": "error",
+        "choices": [
+            {
+                "index": 0,
+                "delta": {"content": f"[Error] {message}"},
+                "finish_reason": "stop",
+            }
+        ],
+    }
+    return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"
@@ -0,0 +1,83 @@
+"""OpenAI-compatible proxy (MLX server, vLLM, etc.) — SSE passthrough."""
+
+from __future__ import annotations
+
+import json
+import logging
+from collections.abc import AsyncGenerator
+
+import httpx
+
+logger = logging.getLogger(__name__)
+
+
+async def stream_chat(
+    base_url: str,
+    model: str,
+    messages: list[dict],
+    **kwargs,
+) -> AsyncGenerator[str, None]:
+    """Proxy OpenAI-compatible chat streaming. SSE passthrough with model field override."""
+    payload = {
+        "model": model,
+        "messages": messages,
+        "stream": True,
+        **{k: v for k, v in kwargs.items() if v is not None},
+    }
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        async with client.stream(
+            "POST",
+            f"{base_url}/v1/chat/completions",
+            json=payload,
+        ) as resp:
+            if resp.status_code != 200:
+                body = await resp.aread()
+                error_msg = body.decode("utf-8", errors="replace")
+                yield _error_event(f"Backend error ({resp.status_code}): {error_msg}")
+                return
+
+            async for line in resp.aiter_lines():
+                if not line.strip():
+                    continue
+                # Pass through SSE lines as-is (already in OpenAI format)
+                if line.startswith("data: "):
+                    yield f"{line}\n\n"
+                elif line == "data: [DONE]":
+                    yield "data: [DONE]\n\n"
+
+
+async def complete_chat(
+    base_url: str,
+    model: str,
+    messages: list[dict],
+    **kwargs,
+) -> dict:
+    """Non-streaming OpenAI-compatible chat."""
+    payload = {
+        "model": model,
+        "messages": messages,
+        "stream": False,
+        **{k: v for k, v in kwargs.items() if v is not None},
+    }
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        resp = await client.post(f"{base_url}/v1/chat/completions", json=payload)
+        resp.raise_for_status()
+        return resp.json()
+
+
+def _error_event(message: str) -> str:
+    error = {
+        "id": "chatcmpl-gateway",
+        "object": "chat.completion.chunk",
+        "model": "error",
+        "choices": [
+            {
+                "index": 0,
+                "delta": {"content": f"[Error] {message}"},
+                "finish_reason": "stop",
+            }
+        ],
+    }
+    return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"
@@ -0,0 +1,227 @@
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+
+import httpx
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class ModelInfo:
+    id: str
+    capabilities: list[str]
+    priority: int = 1
+    backend_model_id: str = ""  # actual model ID sent to backend (if different from id)
+
+
+@dataclass
+class RateLimitConfig:
+    rpm: int = 0
+    rph: int = 0
+    scope: str = "global"
+
+
+@dataclass
+class BackendInfo:
+    id: str
+    type: str  # "ollama", "openai-compat", "anthropic"
+    url: str
+    models: list[ModelInfo]
+    access: str = "all"  # "all" or "owner"
+    rate_limit: RateLimitConfig | None = None
+
+    # runtime state
+    healthy: bool = False
+    last_check: float = 0
+    latency_ms: float = 0
+
+
+@dataclass
+class RateLimitState:
+    minute_timestamps: list[float] = field(default_factory=list)
+    hour_timestamps: list[float] = field(default_factory=list)
+
+
+class Registry:
+    def __init__(self):
+        self.backends: dict[str, BackendInfo] = {}
+        self._health_task: asyncio.Task | None = None
+        self._rate_limits: dict[str, RateLimitState] = {}
+
+    async def load_backends(self, config_path: str):
+        path = Path(config_path)
+        if not path.exists():
+            logger.warning("Backends config not found: %s", config_path)
+            return
+
+        with open(path) as f:
+            data = json.load(f)
+
+        for entry in data:
+            models = [
+                ModelInfo(
+                    id=m["id"],
+                    capabilities=m.get("capabilities", ["chat"]),
+                    priority=m.get("priority", 1),
+                    backend_model_id=m.get("backend_model_id", ""),
+                )
+                for m in entry.get("models", [])
+            ]
+            rl_data = entry.get("rate_limit")
+            rate_limit = (
+                RateLimitConfig(
+                    rpm=rl_data.get("rpm", 0),
+                    rph=rl_data.get("rph", 0),
+                    scope=rl_data.get("scope", "global"),
+                )
+                if rl_data
+                else None
+            )
+            backend = BackendInfo(
+                id=entry["id"],
+                type=entry["type"],
+                url=entry["url"].rstrip("/"),
+                models=models,
+                access=entry.get("access", "all"),
+                rate_limit=rate_limit,
+            )
+            self.backends[backend.id] = backend
+            if rate_limit:
+                self._rate_limits[backend.id] = RateLimitState()
+
+        logger.info("Loaded %d backends", len(self.backends))
+
+    def start_health_loop(self, interval: float = 30.0):
+        self._health_task = asyncio.create_task(self._health_loop(interval))
+
+    def stop_health_loop(self):
+        if self._health_task:
+            self._health_task.cancel()
+
+    async def _health_loop(self, interval: float):
+        while True:
+            await self._check_all_backends()
+            await asyncio.sleep(interval)
+
+    async def _check_all_backends(self):
+        async with httpx.AsyncClient(timeout=5.0) as client:
+            tasks = [
+                self._check_backend(client, backend)
+                for backend in self.backends.values()
+            ]
+            await asyncio.gather(*tasks, return_exceptions=True)
+
+    async def _check_backend(self, client: httpx.AsyncClient, backend: BackendInfo):
+        try:
+            start = time.monotonic()
+            if backend.type == "ollama":
+                resp = await client.get(f"{backend.url}/api/tags")
+            elif backend.type in ("openai-compat", "anthropic"):
+                resp = await client.get(f"{backend.url}/v1/models")
+            else:
+                resp = await client.get(f"{backend.url}/health")
+            elapsed = (time.monotonic() - start) * 1000
+
+            backend.healthy = resp.status_code < 500
+            backend.latency_ms = round(elapsed, 1)
+            backend.last_check = time.time()
+        except Exception:
+            backend.healthy = False
+            backend.latency_ms = 0
+            backend.last_check = time.time()
+            logger.debug("Health check failed for %s", backend.id)
+
+    def resolve_model(self, model_id: str, role: str) -> tuple[BackendInfo, ModelInfo] | None:
+        """Find the best backend for a given model ID. Returns (backend, model) or None."""
+        candidates: list[tuple[BackendInfo, ModelInfo, int]] = []
+
+        for backend in self.backends.values():
+            if not backend.healthy:
+                continue
+            if backend.access == "owner" and role != "owner":
+                continue
+            for model in backend.models:
+                if model.id == model_id:
+                    candidates.append((backend, model, model.priority))
+
+        if not candidates:
+            return None
+
+        candidates.sort(key=lambda x: x[2])
+        return candidates[0][0], candidates[0][1]
+
+    def list_models(self, role: str) -> list[dict]:
+        """List all available models for a given role."""
+        result = []
+        for backend in self.backends.values():
+            if not backend.healthy:
+                continue
+            if backend.access == "owner" and role != "owner":
+                continue
+            for model in backend.models:
+                result.append({
+                    "id": model.id,
+                    "object": "model",
+                    "owned_by": backend.id,
+                    "capabilities": model.capabilities,
+                    "backend_id": backend.id,
+                    "backend_status": "healthy" if backend.healthy else "down",
+                })
+        return result
+
+    def check_rate_limit(self, backend_id: str) -> bool:
+        """Check if a request to this backend is within rate limits. Returns True if allowed."""
+        backend = self.backends.get(backend_id)
+        if not backend or not backend.rate_limit:
+            return True
+
+        state = self._rate_limits.get(backend_id)
+        if not state:
+            return True
+
+        now = time.time()
+        rl = backend.rate_limit
+
+        # Clean old timestamps
+        if rl.rpm > 0:
+            state.minute_timestamps = [t for t in state.minute_timestamps if now - t < 60]
+            if len(state.minute_timestamps) >= rl.rpm:
+                return False
+
+        if rl.rph > 0:
+            state.hour_timestamps = [t for t in state.hour_timestamps if now - t < 3600]
+            if len(state.hour_timestamps) >= rl.rph:
+                return False
+
+        return True
+
+    def record_request(self, backend_id: str):
+        """Record a request timestamp for rate limiting."""
+        state = self._rate_limits.get(backend_id)
+        if not state:
+            return
+        now = time.time()
+        state.minute_timestamps.append(now)
+        state.hour_timestamps.append(now)
+
+    def get_health_summary(self) -> list[dict]:
+        return [
+            {
+                "id": b.id,
+                "type": b.type,
+                "status": "healthy" if b.healthy else "down",
+                "models": [m.id for m in b.models],
+                "latency_ms": b.latency_ms,
+                "last_check": b.last_check,
+            }
+            for b in self.backends.values()
+        ]
+
+
+registry = Registry()
@@ -0,0 +1,24 @@
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+pnpm-debug.log*
+lerna-debug.log*
+
+node_modules
+dist
+dist-ssr
+*.local
+
+# Editor directories and files
+.vscode/*
+!.vscode/extensions.json
+.idea
+.DS_Store
+*.suo
+*.ntvs*
+*.njsproj
+*.sln
+*.sw?
@@ -0,0 +1,12 @@
+FROM node:20-alpine AS build
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci
+COPY . .
+RUN npm run build
+
+FROM nginx:alpine
+COPY --from=build /app/dist /usr/share/nginx/html
+COPY nginx.conf /etc/nginx/conf.d/default.conf
+EXPOSE 80
+CMD ["nginx", "-g", "daemon off;"]
@@ -0,0 +1,73 @@
+# React + TypeScript + Vite
+
+This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
+
+Currently, two official plugins are available:
+
+- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Oxc](https://oxc.rs)
+- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/)
+
+## React Compiler
+
+The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
+
+## Expanding the ESLint configuration
+
+If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
+
+```js
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      // Other configs...
+
+      // Remove tseslint.configs.recommended and replace with this
+      tseslint.configs.recommendedTypeChecked,
+      // Alternatively, use this for stricter rules
+      tseslint.configs.strictTypeChecked,
+      // Optionally, add this for stylistic rules
+      tseslint.configs.stylisticTypeChecked,
+
+      // Other configs...
+    ],
+    languageOptions: {
+      parserOptions: {
+        project: ['./tsconfig.node.json', './tsconfig.app.json'],
+        tsconfigRootDir: import.meta.dirname,
+      },
+      // other options...
+    },
+  },
+])
+```
+
+You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
+
+```js
+// eslint.config.js
+import reactX from 'eslint-plugin-react-x'
+import reactDom from 'eslint-plugin-react-dom'
+
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      // Other configs...
+      // Enable lint rules for React
+      reactX.configs['recommended-typescript'],
+      // Enable lint rules for React DOM
+      reactDom.configs.recommended,
+    ],
+    languageOptions: {
+      parserOptions: {
+        project: ['./tsconfig.node.json', './tsconfig.app.json'],
+        tsconfigRootDir: import.meta.dirname,
+      },
+      // other options...
+    },
+  },
+])
+```
@@ -0,0 +1,23 @@
+import js from '@eslint/js'
+import globals from 'globals'
+import reactHooks from 'eslint-plugin-react-hooks'
+import reactRefresh from 'eslint-plugin-react-refresh'
+import tseslint from 'typescript-eslint'
+import { defineConfig, globalIgnores } from 'eslint/config'
+
+export default defineConfig([
+  globalIgnores(['dist']),
+  {
+    files: ['**/*.{ts,tsx}'],
+    extends: [
+      js.configs.recommended,
+      tseslint.configs.recommended,
+      reactHooks.configs.flat.recommended,
+      reactRefresh.configs.vite,
+    ],
+    languageOptions: {
+      ecmaVersion: 2020,
+      globals: globals.browser,
+    },
+  },
+])
@@ -0,0 +1,13 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>hub-web</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
@@ -0,0 +1,9 @@
+server {
+    listen 80;
+    root /usr/share/nginx/html;
+    index index.html;
+
+    location / {
+        try_files $uri $uri/ /index.html;
+    }
+}
@@ -0,0 +1,34 @@
+{
+  "name": "hub-web",
+  "private": true,
+  "version": "0.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc -b && vite build",
+    "lint": "eslint .",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "react": "^19.2.4",
+    "react-dom": "^19.2.4",
+    "react-markdown": "^10.1.0",
+    "react-router-dom": "^7.13.2"
+  },
+  "devDependencies": {
+    "@eslint/js": "^9.39.4",
+    "@tailwindcss/vite": "^4.2.2",
+    "@types/node": "^24.12.0",
+    "@types/react": "^19.2.14",
+    "@types/react-dom": "^19.2.3",
+    "@vitejs/plugin-react": "^6.0.1",
+    "eslint": "^9.39.4",
+    "eslint-plugin-react-hooks": "^7.0.1",
+    "eslint-plugin-react-refresh": "^0.5.2",
+    "globals": "^17.4.0",
+    "tailwindcss": "^4.2.2",
+    "typescript": "~5.9.3",
+    "typescript-eslint": "^8.57.0",
+    "vite": "^8.0.1"
+  }
+}
@@ -0,0 +1,24 @@
+<svg xmlns="http://www.w3.org/2000/svg">
+  <symbol id="bluesky-icon" viewBox="0 0 16 17">
+    <g clip-path="url(#bluesky-clip)"><path fill="#08060d" d="M7.75 7.735c-.693-1.348-2.58-3.86-4.334-5.097-1.68-1.187-2.32-.981-2.74-.79C.188 2.065.1 2.812.1 3.251s.241 3.602.398 4.13c.52 1.744 2.367 2.333 4.07 2.145-2.495.37-4.71 1.278-1.805 4.512 3.196 3.309 4.38-.71 4.987-2.746.608 2.036 1.307 5.91 4.93 2.746 2.72-2.746.747-4.143-1.747-4.512 1.702.189 3.55-.4 4.07-2.145.156-.528.397-3.691.397-4.13s-.088-1.186-.575-1.406c-.42-.19-1.06-.395-2.741.79-1.755 1.24-3.64 3.752-4.334 5.099"/></g>
+    <defs><clipPath id="bluesky-clip"><path fill="#fff" d="M.1.85h15.3v15.3H.1z"/></clipPath></defs>
+  </symbol>
+  <symbol id="discord-icon" viewBox="0 0 20 19">
+    <path fill="#08060d" d="M16.224 3.768a14.5 14.5 0 0 0-3.67-1.153c-.158.286-.343.67-.47.976a13.5 13.5 0 0 0-4.067 0c-.128-.306-.317-.69-.476-.976A14.4 14.4 0 0 0 3.868 3.77C1.546 7.28.916 10.703 1.231 14.077a14.7 14.7 0 0 0 4.5 2.306q.545-.748.965-1.587a9.5 9.5 0 0 1-1.518-.74q.191-.14.372-.293c2.927 1.369 6.107 1.369 8.999 0q.183.152.372.294-.723.437-1.52.74.418.838.963 1.588a14.6 14.6 0 0 0 4.504-2.308c.37-3.911-.63-7.302-2.644-10.309m-9.13 8.234c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.894 0 1.614.82 1.599 1.82.001 1-.705 1.82-1.6 1.82m5.91 0c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.893 0 1.614.82 1.599 1.82 0 1-.706 1.82-1.6 1.82"/>
+  </symbol>
+  <symbol id="documentation-icon" viewBox="0 0 21 20">
+    <path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="m15.5 13.333 1.533 1.322c.645.555.967.833.967 1.178s-.322.623-.967 1.179L15.5 18.333m-3.333-5-1.534 1.322c-.644.555-.966.833-.966 1.178s.322.623.966 1.179l1.534 1.321"/>
+    <path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M17.167 10.836v-4.32c0-1.41 0-2.117-.224-2.68-.359-.906-1.118-1.621-2.08-1.96-.599-.21-1.349-.21-2.848-.21-2.623 0-3.935 0-4.983.369-1.684.591-3.013 1.842-3.641 3.428C3 6.449 3 7.684 3 10.154v2.122c0 2.558 0 3.838.706 4.726q.306.383.713.671c.76.536 1.79.64 3.581.66"/>
+    <path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M3 10a2.78 2.78 0 0 1 2.778-2.778c.555 0 1.209.097 1.748-.047.48-.129.854-.503.982-.982.145-.54.048-1.194.048-1.749a2.78 2.78 0 0 1 2.777-2.777"/>
+  </symbol>
+  <symbol id="github-icon" viewBox="0 0 19 19">
+    <path fill="#08060d" fill-rule="evenodd" d="M9.356 1.85C5.05 1.85 1.57 5.356 1.57 9.694a7.84 7.84 0 0 0 5.324 7.44c.387.079.528-.168.528-.376 0-.182-.013-.805-.013-1.454-2.165.467-2.616-.935-2.616-.935-.349-.91-.864-1.143-.864-1.143-.71-.48.051-.48.051-.48.787.051 1.2.805 1.2.805.695 1.194 1.817.857 2.268.649.064-.507.27-.857.49-1.052-1.728-.182-3.545-.857-3.545-3.87 0-.857.31-1.558.8-2.104-.078-.195-.349-1 .077-2.078 0 0 .657-.208 2.14.805a7.5 7.5 0 0 1 1.946-.26c.657 0 1.328.092 1.946.26 1.483-1.013 2.14-.805 2.14-.805.426 1.078.155 1.883.078 2.078.502.546.799 1.247.799 2.104 0 3.013-1.818 3.675-3.558 3.87.284.247.528.714.528 1.454 0 1.052-.012 1.896-.012 2.156 0 .208.142.455.528.377a7.84 7.84 0 0 0 5.324-7.441c.013-4.338-3.48-7.844-7.773-7.844" clip-rule="evenodd"/>
+  </symbol>
+  <symbol id="social-icon" viewBox="0 0 20 20">
+    <path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M12.5 6.667a4.167 4.167 0 1 0-8.334 0 4.167 4.167 0 0 0 8.334 0"/>
+    <path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M2.5 16.667a5.833 5.833 0 0 1 8.75-5.053m3.837.474.513 1.035c.07.144.257.282.414.309l.93.155c.596.1.736.536.307.965l-.723.73a.64.64 0 0 0-.152.531l.207.903c.164.715-.213.991-.84.618l-.872-.52a.63.63 0 0 0-.577 0l-.872.52c-.624.373-1.003.094-.84-.618l.207-.903a.64.64 0 0 0-.152-.532l-.723-.729c-.426-.43-.289-.864.306-.964l.93-.156a.64.64 0 0 0 .412-.31l.513-1.034c.28-.562.735-.562 1.012 0"/>
+  </symbol>
+  <symbol id="x-icon" viewBox="0 0 19 19">
+    <path fill="#08060d" fill-rule="evenodd" d="M1.893 1.98c.052.072 1.245 1.769 2.653 3.77l2.892 4.114c.183.261.333.48.333.486s-.068.089-.152.183l-.522.593-.765.867-3.597 4.087c-.375.426-.734.834-.798.905a1 1 0 0 0-.118.148c0 .01.236.017.664.017h.663l.729-.83c.4-.457.796-.906.879-.999a692 692 0 0 0 1.794-2.038c.034-.037.301-.34.594-.675l.551-.624.345-.392a7 7 0 0 1 .34-.374c.006 0 .93 1.306 2.052 2.903l2.084 2.965.045.063h2.275c1.87 0 2.273-.003 2.266-.021-.008-.02-1.098-1.572-3.894-5.547-2.013-2.862-2.28-3.246-2.273-3.266.008-.019.282-.332 2.085-2.38l2-2.274 1.567-1.782c.022-.028-.016-.03-.65-.03h-.674l-.3.342a871 871 0 0 1-1.782 2.025c-.067.075-.405.458-.75.852a100 100 0 0 1-.803.91c-.148.172-.299.344-.99 1.127-.304.343-.32.358-.345.327-.015-.019-.904-1.282-1.976-2.808L6.365 1.85H1.8zm1.782.91 8.078 11.294c.772 1.08 1.413 1.973 1.425 1.984.016.017.241.02 1.05.017l1.03-.004-2.694-3.766L7.796 5.75 5.722 2.852l-1.039-.004-1.039-.004z" clip-rule="evenodd"/>
+  </symbol>
+</svg>
@@ -0,0 +1,61 @@
+import { useState, useEffect } from 'react';
+import { BrowserRouter, Routes, Route, Navigate, NavLink } from 'react-router-dom';
+import { AuthCtx } from './lib/auth';
+import { getMe, logout } from './lib/api';
+import Login from './pages/Login';
+import Dashboard from './pages/Dashboard';
+import Chat from './pages/Chat';
+
+export default function App() {
+  const [role, setRole] = useState<string | null>(null);
+  const [loading, setLoading] = useState(true);
+
+  useEffect(() => {
+    getMe().then(me => {
+      setRole(me?.role ?? null);
+      setLoading(false);
+    });
+  }, []);
+
+  if (loading) {
+    return <div className="flex items-center justify-center h-screen text-[hsl(var(--muted-foreground))]">Loading...</div>;
+  }
+
+  if (!role) {
+    return (
+      <AuthCtx.Provider value={{ role, setRole }}>
+        <Login />
+      </AuthCtx.Provider>
+    );
+  }
+
+  return (
+    <AuthCtx.Provider value={{ role, setRole }}>
+      <BrowserRouter>
+        <div className="min-h-screen flex flex-col dark">
+          <nav className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-6">
+            <span className="font-semibold text-lg">AI Gateway</span>
+            <NavLink to="/" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Dashboard</NavLink>
+            <NavLink to="/chat" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Chat</NavLink>
+            <div className="ml-auto flex items-center gap-3">
+              <span className="text-sm text-[hsl(var(--muted-foreground))]">{role}</span>
+              <button
+                onClick={async () => { await logout(); setRole(null); }}
+                className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
+              >
+                Logout
+              </button>
+            </div>
+          </nav>
+          <main className="flex-1">
+            <Routes>
+              <Route path="/" element={<Dashboard />} />
+              <Route path="/chat" element={<Chat />} />
+              <Route path="*" element={<Navigate to="/" />} />
+            </Routes>
+          </main>
+        </div>
+      </BrowserRouter>
+    </AuthCtx.Provider>
+  );
+}
@@ -0,0 +1,41 @@
+@import "tailwindcss";
+
+:root {
+  --background: 0 0% 100%;
+  --foreground: 240 10% 3.9%;
+  --card: 0 0% 100%;
+  --card-foreground: 240 10% 3.9%;
+  --muted: 240 4.8% 95.9%;
+  --muted-foreground: 240 3.8% 46.1%;
+  --border: 240 5.9% 90%;
+  --primary: 240 5.9% 10%;
+  --primary-foreground: 0 0% 98%;
+  --destructive: 0 84.2% 60.2%;
+  --ring: 240 5.9% 10%;
+  --radius: 0.5rem;
+}
+
+.dark {
+  --background: 240 10% 3.9%;
+  --foreground: 0 0% 98%;
+  --card: 240 10% 3.9%;
+  --card-foreground: 0 0% 98%;
+  --muted: 240 3.7% 15.9%;
+  --muted-foreground: 240 5% 64.9%;
+  --border: 240 3.7% 15.9%;
+  --primary: 0 0% 98%;
+  --primary-foreground: 240 5.9% 10%;
+  --destructive: 0 62.8% 30.6%;
+  --ring: 240 4.9% 83.9%;
+}
+
+* {
+  border-color: hsl(var(--border));
+}
+
+body {
+  margin: 0;
+  background-color: hsl(var(--background));
+  color: hsl(var(--foreground));
+  font-family: system-ui, -apple-system, sans-serif;
+}
@@ -0,0 +1,127 @@
+const BASE = '';
+
+// Store token in memory for Bearer auth (more reliable than cookies through proxies)
+let _token: string | null = null;
+
+export function setToken(token: string | null) {
+  _token = token;
+}
+
+function authHeaders(): Record<string, string> {
+  const h: Record<string, string> = { 'Content-Type': 'application/json' };
+  if (_token) h['Authorization'] = `Bearer ${_token}`;
+  return h;
+}
+
+export async function login(password: string): Promise<{ role: string; token: string }> {
+  const res = await fetch(`${BASE}/auth/login`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ password }),
+  });
+  if (!res.ok) {
+    const err = await res.json().catch(() => null);
+    throw new Error(err?.error?.message || 'Login failed');
+  }
+  const data = await res.json();
+  _token = data.token;
+  return data;
+}
+
+export async function getMe(): Promise<{ role: string } | null> {
+  if (!_token) return null;
+  const res = await fetch(`${BASE}/auth/me`, { headers: authHeaders() });
+  if (!res.ok) { _token = null; return null; }
+  return res.json();
+}
+
+export async function logout(): Promise<void> {
+  await fetch(`${BASE}/auth/logout`, { method: 'POST', headers: authHeaders() });
+  _token = null;
+}
+
+export interface Model {
+  id: string;
+  owned_by: string;
+  capabilities: string[];
+  backend_id: string;
+  backend_status: string;
+}
+
+export async function getModels(): Promise<Model[]> {
+  const res = await fetch(`${BASE}/v1/models`, { headers: authHeaders() });
+  if (!res.ok) return [];
+  const data = await res.json();
+  return data.data || [];
+}
+
+export interface BackendHealth {
+  id: string;
+  type: string;
+  status: string;
+  models: string[];
+  latency_ms: number;
+}
+
+export interface GpuInfo {
+  utilization: number;
+  temperature: number;
+  vram_used: number;
+  vram_total: number;
+  power_draw: number;
+  name: string;
+}
+
+export async function getHealth(): Promise<{ backends: BackendHealth[]; gpu: GpuInfo | null }> {
+  const res = await fetch(`${BASE}/health`);
+  if (!res.ok) return { backends: [], gpu: null };
+  return res.json();
+}
+
+export interface ChatMessage {
+  role: 'user' | 'assistant' | 'system';
+  content: string;
+}
+
+export async function* streamChat(
+  model: string,
+  messages: ChatMessage[],
+): AsyncGenerator<string, void> {
+  const res = await fetch(`${BASE}/v1/chat/completions`, {
+    method: 'POST',
+    headers: authHeaders(),
+    body: JSON.stringify({ model, messages, stream: true }),
+  });
+
+  if (!res.ok) {
+    const err = await res.json().catch(() => null);
+    throw new Error(err?.error?.message || `Chat failed: ${res.status}`);
+  }
+
+  const reader = res.body?.getReader();
+  if (!reader) throw new Error('No response body');
+  const decoder = new TextDecoder();
+  let buffer = '';
+
+  while (true) {
+    const { done, value } = await reader.read();
+    if (done) break;
+
+    buffer += decoder.decode(value, { stream: true });
+    const lines = buffer.split('\n');
+    buffer = lines.pop() || '';
+
+    for (const line of lines) {
+      if (!line.startsWith('data: ')) continue;
+      const data = line.slice(6).trim();
+      if (data === '[DONE]') return;
+      try {
+        const parsed = JSON.parse(data);
+        const content = parsed.choices?.[0]?.delta?.content;
+        if (content) yield content;
+      } catch {
+        // skip malformed chunks
+      }
+    }
+  }
+}
@@ -0,0 +1,12 @@
+import { createContext, useContext } from 'react';
+
+export interface AuthContext {
+  role: string | null;
+  setRole: (role: string | null) => void;
+}
+
+export const AuthCtx = createContext<AuthContext>({ role: null, setRole: () => {} });
+
+export function useAuth() {
+  return useContext(AuthCtx);
+}
@@ -0,0 +1,10 @@
+import { StrictMode } from 'react'
+import { createRoot } from 'react-dom/client'
+import './index.css'
+import App from './App.tsx'
+
+createRoot(document.getElementById('root')!).render(
+  <StrictMode>
+    <App />
+  </StrictMode>,
+)
@@ -0,0 +1,130 @@
+import { useState, useEffect, useRef } from 'react';
+import ReactMarkdown from 'react-markdown';
+import type { Model, ChatMessage } from '../lib/api';
+import { getModels, streamChat } from '../lib/api';
+
+export default function Chat() {
+  const [models, setModels] = useState<Model[]>([]);
+  const [selectedModel, setSelectedModel] = useState('');
+  const [messages, setMessages] = useState<ChatMessage[]>([]);
+  const [input, setInput] = useState('');
+  const [streaming, setStreaming] = useState(false);
+  const bottomRef = useRef<HTMLDivElement>(null);
+
+  useEffect(() => {
+    getModels().then(mdls => {
+      const chatModels = mdls.filter(m => m.capabilities.includes('chat'));
+      setModels(chatModels);
+      if (chatModels.length > 0 && !selectedModel) {
+        setSelectedModel(chatModels[0].id);
+      }
+    });
+  }, []);
+
+  useEffect(() => {
+    bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
+  }, [messages]);
+
+  const handleSend = async () => {
+    if (!input.trim() || !selectedModel || streaming) return;
+
+    const userMsg: ChatMessage = { role: 'user', content: input.trim() };
+    const newMessages = [...messages, userMsg];
+    setMessages(newMessages);
+    setInput('');
+    setStreaming(true);
+
+    const assistantMsg: ChatMessage = { role: 'assistant', content: '' };
+    setMessages([...newMessages, assistantMsg]);
+
+    try {
+      for await (const chunk of streamChat(selectedModel, newMessages)) {
+        assistantMsg.content += chunk;
+        setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
+      }
+    } catch (err) {
+      assistantMsg.content += `\n\n[Error: ${err instanceof Error ? err.message : 'Unknown error'}]`;
+      setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
+    } finally {
+      setStreaming(false);
+    }
+  };
+
+  const handleKeyDown = (e: React.KeyboardEvent) => {
+    if (e.key === 'Enter' && !e.shiftKey) {
+      e.preventDefault();
+      handleSend();
+    }
+  };
+
+  return (
+    <div className="flex flex-col h-[calc(100vh-57px)]">
+      {/* Header */}
+      <div className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-4">
+        <select
+          value={selectedModel}
+          onChange={e => setSelectedModel(e.target.value)}
+          className="px-3 py-1.5 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-sm"
+        >
+          {models.map(m => (
+            <option key={m.id} value={m.id}>{m.id} ({m.owned_by})</option>
+          ))}
+        </select>
+        <button
+          onClick={() => setMessages([])}
+          className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
+        >
+          Clear
+        </button>
+      </div>
+
+      {/* Messages */}
+      <div className="flex-1 overflow-y-auto px-6 py-4 space-y-4">
+        {messages.length === 0 && (
+          <div className="flex items-center justify-center h-full text-[hsl(var(--muted-foreground))]">
+            Send a message to start
+          </div>
+        )}
+        {messages.map((msg, i) => (
+          <div key={i} className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}>
+            <div className={`max-w-[80%] rounded-lg px-4 py-2 ${
+              msg.role === 'user'
+                ? 'bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))]'
+                : 'bg-[hsl(var(--muted))]'
+            }`}>
+              {msg.role === 'assistant' ? (
+                <div className="prose prose-sm prose-invert max-w-none">
+                  <ReactMarkdown>{msg.content || '...'}</ReactMarkdown>
+                </div>
+              ) : (
+                <p className="text-sm whitespace-pre-wrap">{msg.content}</p>
+              )}
+            </div>
+          </div>
+        ))}
+        <div ref={bottomRef} />
+      </div>
+
+      {/* Input */}
+      <div className="border-t border-[hsl(var(--border))] px-6 py-4">
+        <div className="flex gap-3">
+          <textarea
+            value={input}
+            onChange={e => setInput(e.target.value)}
+            onKeyDown={handleKeyDown}
+            placeholder="Type a message... (Enter to send, Shift+Enter for newline)"
+            rows={1}
+            className="flex-1 px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] resize-none focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
+          />
+          <button
+            onClick={handleSend}
+            disabled={streaming || !input.trim()}
+            className="px-4 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
+          >
+            {streaming ? '...' : 'Send'}
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
@@ -0,0 +1,96 @@
+import { useState, useEffect } from 'react';
+import type { BackendHealth, GpuInfo, Model } from '../lib/api';
+import { getHealth, getModels } from '../lib/api';
+
+export default function Dashboard() {
+  const [backends, setBackends] = useState<BackendHealth[]>([]);
+  const [gpu, setGpu] = useState<GpuInfo | null>(null);
+  const [models, setModels] = useState<Model[]>([]);
+
+  const refresh = async () => {
+    const [health, mdls] = await Promise.all([getHealth(), getModels()]);
+    setBackends(health.backends);
+    setGpu(health.gpu);
+    setModels(mdls);
+  };
+
+  useEffect(() => {
+    refresh();
+    const id = setInterval(refresh, 15000);
+    return () => clearInterval(id);
+  }, []);
+
+  return (
+    <div className="p-6 space-y-6">
+      <div className="flex items-center justify-between">
+        <h2 className="text-xl font-semibold">Backends</h2>
+        <button onClick={refresh} className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]">Refresh</button>
+      </div>
+
+      <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
+        {backends.map(b => (
+          <div key={b.id} className="rounded-lg border border-[hsl(var(--border))] p-4 space-y-2">
+            <div className="flex items-center justify-between">
+              <span className="font-medium">{b.id}</span>
+              <span className={`text-xs px-2 py-0.5 rounded-full ${b.status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
+                {b.status}
+              </span>
+            </div>
+            <div className="text-sm text-[hsl(var(--muted-foreground))]">{b.type}</div>
+            <div className="text-sm">{b.models.join(', ')}</div>
+            {b.latency_ms > 0 && <div className="text-xs text-[hsl(var(--muted-foreground))]">{b.latency_ms}ms</div>}
+          </div>
+        ))}
+      </div>
+
+      {gpu && (
+        <>
+          <h2 className="text-xl font-semibold">GPU</h2>
+          <div className="rounded-lg border border-[hsl(var(--border))] p-4 grid grid-cols-2 md:grid-cols-4 gap-4">
+            <Stat label="Utilization" value={`${gpu.utilization}%`} />
+            <Stat label="Temperature" value={`${gpu.temperature}C`} />
+            <Stat label="VRAM" value={`${gpu.vram_used}/${gpu.vram_total} MB`} />
+            <Stat label="Power" value={`${gpu.power_draw}W`} />
+          </div>
+        </>
+      )}
+
+      <h2 className="text-xl font-semibold">Models</h2>
+      <div className="rounded-lg border border-[hsl(var(--border))] overflow-hidden">
+        <table className="w-full text-sm">
+          <thead className="bg-[hsl(var(--muted))]">
+            <tr>
+              <th className="text-left px-4 py-2">Model</th>
+              <th className="text-left px-4 py-2">Backend</th>
+              <th className="text-left px-4 py-2">Capabilities</th>
+              <th className="text-left px-4 py-2">Status</th>
+            </tr>
+          </thead>
+          <tbody>
+            {models.map(m => (
+              <tr key={`${m.backend_id}-${m.id}`} className="border-t border-[hsl(var(--border))]">
+                <td className="px-4 py-2 font-mono">{m.id}</td>
+                <td className="px-4 py-2">{m.owned_by}</td>
+                <td className="px-4 py-2">{m.capabilities.join(', ')}</td>
+                <td className="px-4 py-2">
+                  <span className={`text-xs px-2 py-0.5 rounded-full ${m.backend_status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
+                    {m.backend_status}
+                  </span>
+                </td>
+              </tr>
+            ))}
+          </tbody>
+        </table>
+      </div>
+    </div>
+  );
+}
+
+function Stat({ label, value }: { label: string; value: string }) {
+  return (
+    <div>
+      <div className="text-xs text-[hsl(var(--muted-foreground))]">{label}</div>
+      <div className="text-lg font-semibold">{value}</div>
+    </div>
+  );
+}
@@ -0,0 +1,48 @@
+import { useState } from 'react';
+import { login } from '../lib/api';
+import { useAuth } from '../lib/auth';
+
+export default function Login() {
+  const { setRole } = useAuth();
+  const [password, setPassword] = useState('');
+  const [error, setError] = useState('');
+  const [loading, setLoading] = useState(false);
+
+  const handleSubmit = async (e: React.FormEvent) => {
+    e.preventDefault();
+    setError('');
+    setLoading(true);
+    try {
+      const { role } = await login(password);
+      setRole(role);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : 'Login failed');
+    } finally {
+      setLoading(false);
+    }
+  };
+
+  return (
+    <div className="dark flex items-center justify-center min-h-screen bg-[hsl(var(--background))]">
+      <form onSubmit={handleSubmit} className="w-80 space-y-4">
+        <h1 className="text-2xl font-semibold text-center text-[hsl(var(--foreground))]">AI Gateway</h1>
+        <input
+          type="password"
+          value={password}
+          onChange={e => setPassword(e.target.value)}
+          placeholder="Password"
+          autoFocus
+          className="w-full px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
+        />
+        <button
+          type="submit"
+          disabled={loading}
+          className="w-full px-3 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
+        >
+          {loading ? 'Logging in...' : 'Login'}
+        </button>
+        {error && <p className="text-sm text-[hsl(var(--destructive))] text-center">{error}</p>}
+      </form>
+    </div>
+  );
+}
@@ -0,0 +1,32 @@
+{
+  "compilerOptions": {
+    "tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
+    "target": "ES2023",
+    "useDefineForClassFields": true,
+    "lib": ["ES2023", "DOM", "DOM.Iterable"],
+    "module": "ESNext",
+    "types": ["vite/client"],
+    "skipLibCheck": true,
+
+    /* Bundler mode */
+    "moduleResolution": "bundler",
+    "allowImportingTsExtensions": true,
+    "verbatimModuleSyntax": true,
+    "moduleDetection": "force",
+    "noEmit": true,
+    "jsx": "react-jsx",
+
+    /* Linting */
+    "strict": true,
+    "noUnusedLocals": true,
+    "noUnusedParameters": true,
+    "erasableSyntaxOnly": true,
+    "noFallthroughCasesInSwitch": true,
+    "noUncheckedSideEffectImports": true,
+    "baseUrl": ".",
+    "paths": {
+      "@/*": ["./src/*"]
+    }
+  },
+  "include": ["src"]
+}
@@ -0,0 +1,7 @@
+{
+  "files": [],
+  "references": [
+    { "path": "./tsconfig.app.json" },
+    { "path": "./tsconfig.node.json" }
+  ]
+}
@@ -0,0 +1,26 @@
+{
+  "compilerOptions": {
+    "tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
+    "target": "ES2023",
+    "lib": ["ES2023"],
+    "module": "ESNext",
+    "types": ["node"],
+    "skipLibCheck": true,
+
+    /* Bundler mode */
+    "moduleResolution": "bundler",
+    "allowImportingTsExtensions": true,
+    "verbatimModuleSyntax": true,
+    "moduleDetection": "force",
+    "noEmit": true,
+
+    /* Linting */
+    "strict": true,
+    "noUnusedLocals": true,
+    "noUnusedParameters": true,
+    "erasableSyntaxOnly": true,
+    "noFallthroughCasesInSwitch": true,
+    "noUncheckedSideEffectImports": true
+  },
+  "include": ["vite.config.ts"]
+}
@@ -0,0 +1,21 @@
+import { defineConfig } from 'vite'
+import react from '@vitejs/plugin-react'
+import tailwindcss from '@tailwindcss/vite'
+import path from 'path'
+
+export default defineConfig({
+  plugins: [react(), tailwindcss()],
+  resolve: {
+    alias: {
+      '@': path.resolve(__dirname, './src'),
+    },
+  },
+  server: {
+    proxy: {
+      '/v1': 'http://localhost:8000',
+      '/auth': 'http://localhost:8000',
+      '/health': 'http://localhost:8000',
+      '/gpu': 'http://localhost:8000',
+    },
+  },
+})