feat: home-gateway 초기 구성 — Mac mini에서 GPU 서버로 전면 이전
OrbStack 라이선스 만료로 Mac mini Docker 서비스를 GPU 서버로 통합. nginx → Caddy 전환, 12개 서브도메인 자동 HTTPS, fail2ban Caddy JSON 연동. 주요 변경: - home-caddy: Caddy 리버스 프록시 (Let's Encrypt 자동 HTTPS) - home-fail2ban: Caddy JSON 로그 기반 보안 모니터링 - home-ddns: Cloudflare DDNS (API 키 .env 분리) - gpu-hub-api/web: AI 백엔드 라우터 + 웹 UI (gpu-services에서 이전) - AI 런타임(Ollama) 내부망 전용, 외부는 gpu-hub 인증 게이트웨이 경유 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
17
.gitignore
vendored
Normal file
17
.gitignore
vendored
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
# Secrets
|
||||||
|
.env
|
||||||
|
ddns/.env
|
||||||
|
|
||||||
|
# Runtime data
|
||||||
|
caddy/logs/
|
||||||
|
fail2ban/data/
|
||||||
|
docker-compose.test.yml
|
||||||
|
|
||||||
|
# Node
|
||||||
|
hub-web/node_modules/
|
||||||
|
hub-web/dist/
|
||||||
|
|
||||||
|
# Python
|
||||||
|
hub-api/__pycache__/
|
||||||
|
hub-api/*.pyc
|
||||||
|
hub-api/.venv/
|
||||||
68
README.md
Normal file
68
README.md
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
# home-gateway
|
||||||
|
|
||||||
|
홈 네트워크 통합 게이트웨이. GPU 서버(192.168.1.186)에서 운영.
|
||||||
|
|
||||||
|
## 구성
|
||||||
|
|
||||||
|
| 컨테이너 | 역할 |
|
||||||
|
|----------|------|
|
||||||
|
| home-caddy | Caddy 리버스 프록시 (80/443, 자동 HTTPS) |
|
||||||
|
| home-fail2ban | Caddy JSON 로그 기반 보안 모니터링 |
|
||||||
|
| home-ddns-vpn | Cloudflare DDNS (vpn.hyungi.net) |
|
||||||
|
| home-ddns-mail | Cloudflare DDNS (mail.hyungi.net) |
|
||||||
|
| gpu-hub-api | AI 백엔드 라우터 (인증 게이트웨이) |
|
||||||
|
| gpu-hub-web | AI 허브 웹 UI |
|
||||||
|
|
||||||
|
## 라우팅 대상
|
||||||
|
|
||||||
|
### GPU 서버 (로컬)
|
||||||
|
- `komga.hyungi.net` → :25600
|
||||||
|
- `document.hyungi.net` → :8080 (Document Server 내부 Caddy)
|
||||||
|
- `ai.hyungi.net` → gpu-hub-api (인증된 외부 AI 접근)
|
||||||
|
|
||||||
|
### NAS (192.168.1.227)
|
||||||
|
- `ds1525.hyungi.net` → :5000 (DSM)
|
||||||
|
- `webdav.hyungi.net` → :5006 (WebDAV)
|
||||||
|
- `git.hyungi.net` → :10300 (Gitea)
|
||||||
|
- `vault.hyungi.net` → :8443 (Vaultwarden)
|
||||||
|
- `link.hyungi.net` → :10002 (Synology Drive)
|
||||||
|
- `mailplus.hyungi.net` → :21680 (MailPlus)
|
||||||
|
- `contacts.hyungi.net` → :25555 (Contacts)
|
||||||
|
- `calendar.hyungi.net` → :20002 (Calendar)
|
||||||
|
- `note.hyungi.net` → :9350 (Note Station)
|
||||||
|
|
||||||
|
### Mac mini (192.168.1.122)
|
||||||
|
- `jellyfin.hyungi.net` → :8096
|
||||||
|
|
||||||
|
## AI 접근 정책
|
||||||
|
- Ollama/AI 런타임: 내부망 전용 (127.0.0.1:11434)
|
||||||
|
- 외부 AI: gpu-hub-api 인증 게이트웨이를 통해서만 접근
|
||||||
|
- `gpu.hyungi.net`: 폐기 (내부망/Tailscale 전용)
|
||||||
|
|
||||||
|
## 디렉토리 구조
|
||||||
|
```
|
||||||
|
home-gateway/
|
||||||
|
├── docker-compose.yml
|
||||||
|
├── backends.json # gpu-hub AI 백엔드 설정
|
||||||
|
├── caddy/
|
||||||
|
│ ├── Caddyfile # 리버스 프록시 설정 (12개 서브도메인)
|
||||||
|
│ └── logs/ # Caddy JSON 로그 (fail2ban 연동)
|
||||||
|
├── fail2ban/
|
||||||
|
│ ├── jail.local
|
||||||
|
│ └── data/filter.d/ # Caddy용 커스텀 필터
|
||||||
|
├── ddns/
|
||||||
|
│ └── .env # Cloudflare API 키
|
||||||
|
├── hub-api/ # GPU Hub FastAPI 백엔드
|
||||||
|
└── hub-web/ # GPU Hub React 프론트엔드
|
||||||
|
```
|
||||||
|
|
||||||
|
## 관련 독립 서비스 (별도 compose)
|
||||||
|
- `~/qdrant/` — Qdrant 벡터 DB (127.0.0.1:6333)
|
||||||
|
- `~/ollama/` — Ollama GPU 추론 (127.0.0.1:11434)
|
||||||
|
|
||||||
|
## 마이그레이션 이력
|
||||||
|
- 2026-04-05: Mac mini (OrbStack) → GPU 서버 전면 이전
|
||||||
|
- nginx → Caddy 통합
|
||||||
|
- Let's Encrypt 수동 관리 → Caddy 자동 HTTPS
|
||||||
|
- Cloudflare DDNS API 키 .env 분리
|
||||||
|
- fail2ban nginx 필터 → Caddy JSON 필터 전환
|
||||||
22
backends.json
Normal file
22
backends.json
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "ollama-gpu",
|
||||||
|
"type": "ollama",
|
||||||
|
"url": "http://host.docker.internal:11434",
|
||||||
|
"models": [
|
||||||
|
{ "id": "bge-m3", "capabilities": ["embed"], "priority": 1 }
|
||||||
|
],
|
||||||
|
"access": "all",
|
||||||
|
"rate_limit": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "mlx-mac",
|
||||||
|
"type": "openai-compat",
|
||||||
|
"url": "http://192.168.1.122:8800",
|
||||||
|
"models": [
|
||||||
|
{ "id": "qwen3.5:35b-a3b", "backend_model_id": "mlx-community/Qwen3.5-35B-A3B-4bit", "capabilities": ["chat"], "priority": 1 }
|
||||||
|
],
|
||||||
|
"access": "all",
|
||||||
|
"rate_limit": null
|
||||||
|
}
|
||||||
|
]
|
||||||
168
caddy/Caddyfile
Normal file
168
caddy/Caddyfile
Normal file
@@ -0,0 +1,168 @@
|
|||||||
|
{
|
||||||
|
# Global options
|
||||||
|
log default {
|
||||||
|
output file /var/log/caddy/access.log {
|
||||||
|
roll_size 100MiB
|
||||||
|
roll_keep 5
|
||||||
|
}
|
||||||
|
format json
|
||||||
|
}
|
||||||
|
servers {
|
||||||
|
trusted_proxies static 173.245.48.0/20 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2606:4700::/32 2803:f800::/32 2405:b500::/32 2405:8100::/32 2a06:98c0::/29 2c0f:f248::/32
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# GPU Hub — default route (direct IP access, no HTTPS)
|
||||||
|
# ============================================================
|
||||||
|
:80 {
|
||||||
|
handle /v1/* {
|
||||||
|
reverse_proxy gpu-hub-api:8000 {
|
||||||
|
flush_interval -1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
handle /auth/* {
|
||||||
|
reverse_proxy gpu-hub-api:8000
|
||||||
|
}
|
||||||
|
handle /health {
|
||||||
|
reverse_proxy gpu-hub-api:8000
|
||||||
|
}
|
||||||
|
handle /health/* {
|
||||||
|
reverse_proxy gpu-hub-api:8000
|
||||||
|
}
|
||||||
|
handle /gpu {
|
||||||
|
reverse_proxy gpu-hub-api:8000
|
||||||
|
}
|
||||||
|
handle {
|
||||||
|
reverse_proxy gpu-hub-web:80
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# AI Gateway — authenticated external access
|
||||||
|
# ============================================================
|
||||||
|
ai.hyungi.net {
|
||||||
|
reverse_proxy gpu-hub-api:8000 {
|
||||||
|
flush_interval -1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Jellyfin — Mac mini (192.168.1.122)
|
||||||
|
# ============================================================
|
||||||
|
jellyfin.hyungi.net {
|
||||||
|
reverse_proxy 192.168.1.122:8096 {
|
||||||
|
transport http {
|
||||||
|
read_timeout 300s
|
||||||
|
write_timeout 300s
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Komga — GPU local
|
||||||
|
# ============================================================
|
||||||
|
komga.hyungi.net {
|
||||||
|
reverse_proxy host.docker.internal:25600
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Document Server — GPU local (via internal Caddy, Phase 6에서 직접 라우팅 전환)
|
||||||
|
# ============================================================
|
||||||
|
document.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 100MB
|
||||||
|
}
|
||||||
|
reverse_proxy host.docker.internal:8080
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# WebDAV — NAS (192.168.1.227)
|
||||||
|
# ============================================================
|
||||||
|
webdav.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 2GB
|
||||||
|
}
|
||||||
|
reverse_proxy https://192.168.1.227:5006 {
|
||||||
|
transport http {
|
||||||
|
tls_insecure_skip_verify
|
||||||
|
read_timeout 600s
|
||||||
|
write_timeout 600s
|
||||||
|
}
|
||||||
|
header_up Host {host}
|
||||||
|
header_up X-Real-IP {remote_host}
|
||||||
|
header_up X-Forwarded-For {remote_host}
|
||||||
|
header_up X-Forwarded-Proto {scheme}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# DSM — NAS
|
||||||
|
# ============================================================
|
||||||
|
ds1525.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 0
|
||||||
|
}
|
||||||
|
reverse_proxy 192.168.1.227:5000
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Gitea — NAS
|
||||||
|
# ============================================================
|
||||||
|
git.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 512MB
|
||||||
|
}
|
||||||
|
reverse_proxy 192.168.1.227:10300
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Vaultwarden — NAS (WebSocket)
|
||||||
|
# ============================================================
|
||||||
|
vault.hyungi.net {
|
||||||
|
reverse_proxy 192.168.1.227:8443
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Synology Drive — NAS (WebSocket, unlimited upload)
|
||||||
|
# ============================================================
|
||||||
|
link.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 0
|
||||||
|
}
|
||||||
|
reverse_proxy 192.168.1.227:10002
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# MailPlus — NAS
|
||||||
|
# ============================================================
|
||||||
|
mailplus.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 100MB
|
||||||
|
}
|
||||||
|
reverse_proxy 192.168.1.227:21680
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Contacts — NAS
|
||||||
|
# ============================================================
|
||||||
|
contacts.hyungi.net {
|
||||||
|
reverse_proxy 192.168.1.227:25555
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Calendar — NAS
|
||||||
|
# ============================================================
|
||||||
|
calendar.hyungi.net {
|
||||||
|
reverse_proxy 192.168.1.227:20002
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Note Station — NAS (WebSocket, unlimited upload)
|
||||||
|
# ============================================================
|
||||||
|
note.hyungi.net {
|
||||||
|
request_body {
|
||||||
|
max_size 0
|
||||||
|
}
|
||||||
|
reverse_proxy 192.168.1.227:9350
|
||||||
|
}
|
||||||
105
docker-compose.yml
Normal file
105
docker-compose.yml
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
services:
|
||||||
|
# ============================================================
|
||||||
|
# Edge Layer — Reverse Proxy + Security + DDNS
|
||||||
|
# ============================================================
|
||||||
|
home-caddy:
|
||||||
|
image: caddy:2-alpine
|
||||||
|
container_name: home-caddy
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "80:80"
|
||||||
|
- "443:443"
|
||||||
|
- "443:443/udp"
|
||||||
|
volumes:
|
||||||
|
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
|
||||||
|
- ./caddy/logs:/var/log/caddy
|
||||||
|
- caddy_data:/data
|
||||||
|
- caddy_config:/config
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
depends_on:
|
||||||
|
gpu-hub-api:
|
||||||
|
condition: service_healthy
|
||||||
|
networks:
|
||||||
|
- gateway-net
|
||||||
|
|
||||||
|
home-fail2ban:
|
||||||
|
image: crazymax/fail2ban:latest
|
||||||
|
container_name: home-fail2ban
|
||||||
|
restart: unless-stopped
|
||||||
|
network_mode: host
|
||||||
|
cap_add:
|
||||||
|
- NET_ADMIN
|
||||||
|
- NET_RAW
|
||||||
|
volumes:
|
||||||
|
- ./fail2ban/data:/data
|
||||||
|
- ./caddy/logs:/var/log/caddy:ro
|
||||||
|
- ./fail2ban/jail.local:/etc/fail2ban/jail.local:ro
|
||||||
|
environment:
|
||||||
|
- TZ=Asia/Seoul
|
||||||
|
- F2B_LOG_LEVEL=INFO
|
||||||
|
|
||||||
|
home-ddns-vpn:
|
||||||
|
image: oznu/cloudflare-ddns:latest
|
||||||
|
container_name: home-ddns-vpn
|
||||||
|
restart: unless-stopped
|
||||||
|
env_file:
|
||||||
|
- ./ddns/.env
|
||||||
|
environment:
|
||||||
|
- ZONE=hyungi.net
|
||||||
|
- SUBDOMAIN=vpn
|
||||||
|
- PROXIED=false
|
||||||
|
|
||||||
|
home-ddns-mail:
|
||||||
|
image: oznu/cloudflare-ddns:latest
|
||||||
|
container_name: home-ddns-mail
|
||||||
|
restart: unless-stopped
|
||||||
|
env_file:
|
||||||
|
- ./ddns/.env
|
||||||
|
environment:
|
||||||
|
- ZONE=hyungi.net
|
||||||
|
- SUBDOMAIN=mail
|
||||||
|
- PROXIED=false
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# GPU Hub — AI Backend Router + Web UI
|
||||||
|
# ============================================================
|
||||||
|
gpu-hub-api:
|
||||||
|
build: ./hub-api
|
||||||
|
container_name: gpu-hub-api
|
||||||
|
restart: unless-stopped
|
||||||
|
environment:
|
||||||
|
- OWNER_PASSWORD=${OWNER_PASSWORD}
|
||||||
|
- GUEST_PASSWORD=${GUEST_PASSWORD}
|
||||||
|
- JWT_SECRET=${JWT_SECRET}
|
||||||
|
- BACKENDS_CONFIG=/app/config/backends.json
|
||||||
|
- CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
|
||||||
|
- DB_PATH=/app/data/gateway.db
|
||||||
|
volumes:
|
||||||
|
- hub_data:/app/data
|
||||||
|
- ./backends.json:/app/config/backends.json:ro
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||||
|
interval: 15s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
networks:
|
||||||
|
- gateway-net
|
||||||
|
|
||||||
|
gpu-hub-web:
|
||||||
|
build: ./hub-web
|
||||||
|
container_name: gpu-hub-web
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- gateway-net
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
caddy_data:
|
||||||
|
caddy_config:
|
||||||
|
hub_data:
|
||||||
|
|
||||||
|
networks:
|
||||||
|
gateway-net:
|
||||||
|
name: home-gateway-network
|
||||||
4
fail2ban/filter.d/caddy-auth.conf
Normal file
4
fail2ban/filter.d/caddy-auth.conf
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
[Definition]
|
||||||
|
failregex = ^.*"client_ip":"<HOST>".*"status":\s*401.*$
|
||||||
|
ignoreregex =
|
||||||
|
datepattern = "ts":{EPOCH}
|
||||||
4
fail2ban/filter.d/caddy-botsearch.conf
Normal file
4
fail2ban/filter.d/caddy-botsearch.conf
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
[Definition]
|
||||||
|
failregex = ^.*"client_ip":"<HOST>".*"status":\s*(403|404|444).*$
|
||||||
|
ignoreregex =
|
||||||
|
datepattern = "ts":{EPOCH}
|
||||||
27
fail2ban/jail.local
Normal file
27
fail2ban/jail.local
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
[DEFAULT]
|
||||||
|
bantime = 3600
|
||||||
|
findtime = 600
|
||||||
|
maxretry = 5
|
||||||
|
backend = auto
|
||||||
|
enabled = false
|
||||||
|
|
||||||
|
[sshd]
|
||||||
|
enabled = false
|
||||||
|
|
||||||
|
# Caddy 봇/스캐너 차단 (404/403 반복)
|
||||||
|
[caddy-botsearch]
|
||||||
|
enabled = true
|
||||||
|
port = 80,443
|
||||||
|
filter = caddy-botsearch
|
||||||
|
logpath = /var/log/caddy/access.log
|
||||||
|
maxretry = 2
|
||||||
|
bantime = 86400
|
||||||
|
|
||||||
|
# Caddy 인증 실패 차단 (401 반복)
|
||||||
|
[caddy-auth]
|
||||||
|
enabled = true
|
||||||
|
port = 80,443
|
||||||
|
filter = caddy-auth
|
||||||
|
logpath = /var/log/caddy/access.log
|
||||||
|
maxretry = 3
|
||||||
|
bantime = 1800
|
||||||
16
hub-api/Dockerfile
Normal file
16
hub-api/Dockerfile
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
FROM python:3.12-slim
|
||||||
|
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY requirements.txt .
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
RUN mkdir -p /app/data
|
||||||
|
|
||||||
|
EXPOSE 8000
|
||||||
|
|
||||||
|
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||||
21
hub-api/config.py
Normal file
21
hub-api/config.py
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
from pydantic_settings import BaseSettings
|
||||||
|
|
||||||
|
|
||||||
|
class Settings(BaseSettings):
|
||||||
|
owner_password: str = "changeme"
|
||||||
|
guest_password: str = "guest"
|
||||||
|
jwt_secret: str = "dev-secret-change-in-production"
|
||||||
|
jwt_algorithm: str = "HS256"
|
||||||
|
jwt_expire_hours: int = 24
|
||||||
|
|
||||||
|
backends_config: str = "/app/config/backends.json"
|
||||||
|
cors_origins: str = "http://localhost:5173"
|
||||||
|
|
||||||
|
nvidia_smi_path: str = "/usr/bin/nvidia-smi"
|
||||||
|
|
||||||
|
db_path: str = "/app/data/gateway.db"
|
||||||
|
|
||||||
|
model_config = {"env_file": ".env", "extra": "ignore"}
|
||||||
|
|
||||||
|
|
||||||
|
settings = Settings()
|
||||||
0
hub-api/db/__init__.py
Normal file
0
hub-api/db/__init__.py
Normal file
50
hub-api/db/database.py
Normal file
50
hub-api/db/database.py
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
import aiosqlite
|
||||||
|
|
||||||
|
from config import settings
|
||||||
|
|
||||||
|
SCHEMA = """
|
||||||
|
CREATE TABLE IF NOT EXISTS chat_sessions (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
title TEXT,
|
||||||
|
model TEXT NOT NULL,
|
||||||
|
role TEXT NOT NULL DEFAULT 'guest',
|
||||||
|
created_at REAL NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS chat_messages (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
session_id TEXT NOT NULL REFERENCES chat_sessions(id),
|
||||||
|
role TEXT NOT NULL,
|
||||||
|
content TEXT NOT NULL,
|
||||||
|
created_at REAL NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS usage_logs (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
backend_id TEXT NOT NULL,
|
||||||
|
model TEXT NOT NULL,
|
||||||
|
prompt_tokens INTEGER DEFAULT 0,
|
||||||
|
completion_tokens INTEGER DEFAULT 0,
|
||||||
|
latency_ms REAL DEFAULT 0,
|
||||||
|
user_role TEXT NOT NULL DEFAULT 'guest',
|
||||||
|
created_at REAL NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_messages_session ON chat_messages(session_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_usage_created ON usage_logs(created_at);
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
async def init_db():
|
||||||
|
"""Initialize SQLite database with WAL mode and schema."""
|
||||||
|
async with aiosqlite.connect(settings.db_path) as db:
|
||||||
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
|
await db.executescript(SCHEMA)
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
|
||||||
|
async def get_db() -> aiosqlite.Connection:
|
||||||
|
"""Get a database connection."""
|
||||||
|
db = await aiosqlite.connect(settings.db_path)
|
||||||
|
await db.execute("PRAGMA journal_mode=WAL")
|
||||||
|
return db
|
||||||
2
hub-api/db/models.py
Normal file
2
hub-api/db/models.py
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
# DB model helpers — used in Phase 3 for logging
|
||||||
|
# Schema defined in database.py
|
||||||
46
hub-api/main.py
Normal file
46
hub-api/main.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
from contextlib import asynccontextmanager
|
||||||
|
|
||||||
|
from fastapi import FastAPI
|
||||||
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
|
from config import settings
|
||||||
|
from middleware.auth import AuthMiddleware
|
||||||
|
from routers import auth, chat, embeddings, gpu, health, models
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
await registry.load_backends(settings.backends_config)
|
||||||
|
registry.start_health_loop()
|
||||||
|
yield
|
||||||
|
registry.stop_health_loop()
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(
|
||||||
|
title="AI Gateway",
|
||||||
|
version="0.1.0",
|
||||||
|
lifespan=lifespan,
|
||||||
|
)
|
||||||
|
|
||||||
|
app.add_middleware(
|
||||||
|
CORSMiddleware,
|
||||||
|
allow_origins=settings.cors_origins.split(","),
|
||||||
|
allow_credentials=True,
|
||||||
|
allow_methods=["*"],
|
||||||
|
allow_headers=["*"],
|
||||||
|
)
|
||||||
|
|
||||||
|
app.add_middleware(AuthMiddleware)
|
||||||
|
|
||||||
|
app.include_router(auth.router)
|
||||||
|
app.include_router(chat.router)
|
||||||
|
app.include_router(models.router)
|
||||||
|
app.include_router(embeddings.router)
|
||||||
|
app.include_router(health.router)
|
||||||
|
app.include_router(gpu.router)
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
return {"service": "AI Gateway", "version": "0.1.0"}
|
||||||
0
hub-api/middleware/__init__.py
Normal file
0
hub-api/middleware/__init__.py
Normal file
96
hub-api/middleware/auth.py
Normal file
96
hub-api/middleware/auth.py
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
|
||||||
|
from jose import JWTError, jwt
|
||||||
|
from starlette.middleware.base import BaseHTTPMiddleware
|
||||||
|
from starlette.requests import Request
|
||||||
|
|
||||||
|
from config import settings
|
||||||
|
|
||||||
|
# Paths that don't require authentication
|
||||||
|
PUBLIC_PATHS = {"/", "/health", "/auth/login", "/docs", "/openapi.json"}
|
||||||
|
PUBLIC_PREFIXES = ("/health/",)
|
||||||
|
|
||||||
|
|
||||||
|
class AuthMiddleware(BaseHTTPMiddleware):
|
||||||
|
async def dispatch(self, request: Request, call_next):
|
||||||
|
path = request.url.path
|
||||||
|
|
||||||
|
# Skip auth for public paths
|
||||||
|
if path in PUBLIC_PATHS or any(path.startswith(p) for p in PUBLIC_PREFIXES):
|
||||||
|
request.state.role = "anonymous"
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
# Skip auth for OPTIONS (CORS preflight)
|
||||||
|
if request.method == "OPTIONS":
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
# Try Bearer token first, then cookie
|
||||||
|
token = _extract_token(request)
|
||||||
|
if not token:
|
||||||
|
request.state.role = "anonymous"
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
# Verify JWT
|
||||||
|
payload = _verify_token(token)
|
||||||
|
if payload:
|
||||||
|
request.state.role = payload.get("role", "guest")
|
||||||
|
else:
|
||||||
|
request.state.role = "anonymous"
|
||||||
|
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
|
||||||
|
def create_token(role: str) -> str:
|
||||||
|
payload = {
|
||||||
|
"role": role,
|
||||||
|
"exp": time.time() + settings.jwt_expire_hours * 3600,
|
||||||
|
"iat": time.time(),
|
||||||
|
}
|
||||||
|
return jwt.encode(payload, settings.jwt_secret, algorithm=settings.jwt_algorithm)
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_token(request: Request) -> str | None:
|
||||||
|
# 1. Authorization: Bearer header
|
||||||
|
auth_header = request.headers.get("authorization", "")
|
||||||
|
if auth_header.startswith("Bearer "):
|
||||||
|
return auth_header[7:]
|
||||||
|
|
||||||
|
# 2. httpOnly cookie
|
||||||
|
return request.cookies.get("token")
|
||||||
|
|
||||||
|
|
||||||
|
def _verify_token(token: str) -> dict | None:
|
||||||
|
try:
|
||||||
|
payload = jwt.decode(
|
||||||
|
token, settings.jwt_secret, algorithms=[settings.jwt_algorithm]
|
||||||
|
)
|
||||||
|
if payload.get("exp", 0) < time.time():
|
||||||
|
return None
|
||||||
|
return payload
|
||||||
|
except JWTError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# Login rate limiting (IP-based)
|
||||||
|
_login_attempts: dict[str, list[float]] = {}
|
||||||
|
MAX_ATTEMPTS = 5
|
||||||
|
LOCKOUT_SECONDS = 60
|
||||||
|
|
||||||
|
|
||||||
|
def check_login_rate_limit(ip: str) -> bool:
|
||||||
|
"""Returns True if login is allowed for this IP."""
|
||||||
|
now = time.time()
|
||||||
|
attempts = _login_attempts.get(ip, [])
|
||||||
|
# Clean old attempts
|
||||||
|
attempts = [t for t in attempts if now - t < LOCKOUT_SECONDS]
|
||||||
|
_login_attempts[ip] = attempts
|
||||||
|
return len(attempts) < MAX_ATTEMPTS
|
||||||
|
|
||||||
|
|
||||||
|
def record_login_attempt(ip: str):
|
||||||
|
now = time.time()
|
||||||
|
if ip not in _login_attempts:
|
||||||
|
_login_attempts[ip] = []
|
||||||
|
_login_attempts[ip].append(now)
|
||||||
18
hub-api/middleware/rate_limit.py
Normal file
18
hub-api/middleware/rate_limit.py
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
from fastapi import HTTPException
|
||||||
|
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
|
||||||
|
def check_backend_rate_limit(backend_id: str):
|
||||||
|
"""Raise 429 if rate limit exceeded for this backend."""
|
||||||
|
if not registry.check_rate_limit(backend_id):
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=429,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Rate limit exceeded for backend '{backend_id}'",
|
||||||
|
"type": "rate_limit_error",
|
||||||
|
"code": "rate_limit_exceeded",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
7
hub-api/requirements.txt
Normal file
7
hub-api/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
fastapi==0.115.0
|
||||||
|
uvicorn[standard]==0.30.0
|
||||||
|
httpx==0.27.0
|
||||||
|
pydantic-settings==2.5.0
|
||||||
|
python-jose[cryptography]==3.3.0
|
||||||
|
python-multipart==0.0.9
|
||||||
|
aiosqlite==0.20.0
|
||||||
0
hub-api/routers/__init__.py
Normal file
0
hub-api/routers/__init__.py
Normal file
79
hub-api/routers/auth.py
Normal file
79
hub-api/routers/auth.py
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
from fastapi import APIRouter, Request, Response
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from config import settings
|
||||||
|
from middleware.auth import (
|
||||||
|
check_login_rate_limit,
|
||||||
|
create_token,
|
||||||
|
record_login_attempt,
|
||||||
|
)
|
||||||
|
|
||||||
|
router = APIRouter(prefix="/auth", tags=["auth"])
|
||||||
|
|
||||||
|
|
||||||
|
class LoginRequest(BaseModel):
|
||||||
|
password: str
|
||||||
|
|
||||||
|
|
||||||
|
class LoginResponse(BaseModel):
|
||||||
|
role: str
|
||||||
|
token: str
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/login")
|
||||||
|
async def login(body: LoginRequest, request: Request, response: Response):
|
||||||
|
ip = request.client.host if request.client else "unknown"
|
||||||
|
|
||||||
|
if not check_login_rate_limit(ip):
|
||||||
|
return _error_response(429, "Too many login attempts. Try again in 1 minute.")
|
||||||
|
|
||||||
|
record_login_attempt(ip)
|
||||||
|
|
||||||
|
if body.password == settings.owner_password:
|
||||||
|
role = "owner"
|
||||||
|
elif body.password == settings.guest_password:
|
||||||
|
role = "guest"
|
||||||
|
else:
|
||||||
|
return _error_response(401, "Invalid password")
|
||||||
|
|
||||||
|
token = create_token(role)
|
||||||
|
|
||||||
|
# Set httpOnly cookie for web UI
|
||||||
|
response.set_cookie(
|
||||||
|
key="token",
|
||||||
|
value=token,
|
||||||
|
httponly=True,
|
||||||
|
samesite="lax",
|
||||||
|
max_age=settings.jwt_expire_hours * 3600,
|
||||||
|
)
|
||||||
|
|
||||||
|
return LoginResponse(role=role, token=token)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/me")
|
||||||
|
async def me(request: Request):
|
||||||
|
role = getattr(request.state, "role", "anonymous")
|
||||||
|
if role == "anonymous":
|
||||||
|
return _error_response(401, "Not authenticated")
|
||||||
|
return {"role": role}
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/logout")
|
||||||
|
async def logout(response: Response):
|
||||||
|
response.delete_cookie("token")
|
||||||
|
return {"ok": True}
|
||||||
|
|
||||||
|
|
||||||
|
def _error_response(status_code: int, message: str):
|
||||||
|
from fastapi.responses import JSONResponse
|
||||||
|
|
||||||
|
return JSONResponse(
|
||||||
|
status_code=status_code,
|
||||||
|
content={
|
||||||
|
"error": {
|
||||||
|
"message": message,
|
||||||
|
"type": "auth_error",
|
||||||
|
"code": f"auth_{status_code}",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
112
hub-api/routers/chat.py
Normal file
112
hub-api/routers/chat.py
Normal file
@@ -0,0 +1,112 @@
|
|||||||
|
from typing import List, Optional
|
||||||
|
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from fastapi.responses import JSONResponse, StreamingResponse
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from middleware.rate_limit import check_backend_rate_limit
|
||||||
|
from services import proxy_ollama, proxy_openai
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
router = APIRouter(prefix="/v1", tags=["chat"])
|
||||||
|
|
||||||
|
|
||||||
|
class ChatMessage(BaseModel):
|
||||||
|
role: str
|
||||||
|
content: str
|
||||||
|
|
||||||
|
|
||||||
|
class ChatRequest(BaseModel):
|
||||||
|
model: str
|
||||||
|
messages: List[ChatMessage]
|
||||||
|
stream: bool = False
|
||||||
|
temperature: Optional[float] = None
|
||||||
|
max_tokens: Optional[int] = None
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/chat/completions")
|
||||||
|
async def chat_completions(body: ChatRequest, request: Request):
|
||||||
|
role = getattr(request.state, "role", "anonymous")
|
||||||
|
if role == "anonymous":
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=401,
|
||||||
|
detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
|
||||||
|
)
|
||||||
|
|
||||||
|
# Resolve model to backend
|
||||||
|
result = registry.resolve_model(body.model, role)
|
||||||
|
if not result:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=404,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Model '{body.model}' not found or not available",
|
||||||
|
"type": "invalid_request_error",
|
||||||
|
"code": "model_not_found",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
backend, model_info = result
|
||||||
|
|
||||||
|
# Check rate limit
|
||||||
|
check_backend_rate_limit(backend.id)
|
||||||
|
|
||||||
|
# Record request for rate limiting
|
||||||
|
registry.record_request(backend.id)
|
||||||
|
|
||||||
|
messages = [{"role": m.role, "content": m.content} for m in body.messages]
|
||||||
|
kwargs = {}
|
||||||
|
if body.temperature is not None:
|
||||||
|
kwargs["temperature"] = body.temperature
|
||||||
|
|
||||||
|
# Use backend-specific model ID if configured, otherwise use the user-facing ID
|
||||||
|
actual_model = model_info.backend_model_id or body.model
|
||||||
|
|
||||||
|
# Route to appropriate proxy
|
||||||
|
if backend.type == "ollama":
|
||||||
|
if body.stream:
|
||||||
|
return StreamingResponse(
|
||||||
|
proxy_ollama.stream_chat(
|
||||||
|
backend.url, actual_model, messages, **kwargs
|
||||||
|
),
|
||||||
|
media_type="text/event-stream",
|
||||||
|
headers={
|
||||||
|
"Cache-Control": "no-cache",
|
||||||
|
"X-Accel-Buffering": "no",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
result = await proxy_ollama.complete_chat(
|
||||||
|
backend.url, actual_model, messages, **kwargs
|
||||||
|
)
|
||||||
|
return JSONResponse(content=result)
|
||||||
|
|
||||||
|
if backend.type == "openai-compat":
|
||||||
|
if body.stream:
|
||||||
|
return StreamingResponse(
|
||||||
|
proxy_openai.stream_chat(
|
||||||
|
backend.url, actual_model, messages, **kwargs
|
||||||
|
),
|
||||||
|
media_type="text/event-stream",
|
||||||
|
headers={
|
||||||
|
"Cache-Control": "no-cache",
|
||||||
|
"X-Accel-Buffering": "no",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
result = await proxy_openai.complete_chat(
|
||||||
|
backend.url, actual_model, messages, **kwargs
|
||||||
|
)
|
||||||
|
return JSONResponse(content=result)
|
||||||
|
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=501,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Backend type '{backend.type}' not yet implemented",
|
||||||
|
"type": "api_error",
|
||||||
|
"code": "not_implemented",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
67
hub-api/routers/embeddings.py
Normal file
67
hub-api/routers/embeddings.py
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
from typing import List, Union
|
||||||
|
|
||||||
|
from fastapi import APIRouter, HTTPException, Request
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
from services import proxy_ollama
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
router = APIRouter(prefix="/v1", tags=["embeddings"])
|
||||||
|
|
||||||
|
|
||||||
|
class EmbeddingRequest(BaseModel):
|
||||||
|
model: str
|
||||||
|
input: Union[str, List[str]]
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/embeddings")
|
||||||
|
async def create_embedding(body: EmbeddingRequest, request: Request):
|
||||||
|
role = getattr(request.state, "role", "anonymous")
|
||||||
|
if role == "anonymous":
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=401,
|
||||||
|
detail={"error": {"message": "Authentication required", "type": "auth_error", "code": "unauthorized"}},
|
||||||
|
)
|
||||||
|
|
||||||
|
result = registry.resolve_model(body.model, role)
|
||||||
|
if not result:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=404,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Model '{body.model}' not found or not available",
|
||||||
|
"type": "invalid_request_error",
|
||||||
|
"code": "model_not_found",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
backend, model_info = result
|
||||||
|
|
||||||
|
if "embed" not in model_info.capabilities:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=400,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Model '{body.model}' does not support embeddings",
|
||||||
|
"type": "invalid_request_error",
|
||||||
|
"code": "capability_mismatch",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if backend.type == "ollama":
|
||||||
|
return await proxy_ollama.generate_embedding(
|
||||||
|
backend.url, body.model, body.input
|
||||||
|
)
|
||||||
|
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=501,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": f"Embedding not supported for backend type '{backend.type}'",
|
||||||
|
"type": "api_error",
|
||||||
|
"code": "not_implemented",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
13
hub-api/routers/gpu.py
Normal file
13
hub-api/routers/gpu.py
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
from fastapi import APIRouter
|
||||||
|
|
||||||
|
from services.gpu_monitor import get_gpu_info
|
||||||
|
|
||||||
|
router = APIRouter(tags=["gpu"])
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/gpu")
|
||||||
|
async def gpu_status():
|
||||||
|
info = await get_gpu_info()
|
||||||
|
if not info:
|
||||||
|
return {"error": {"message": "GPU info unavailable", "type": "api_error", "code": "gpu_unavailable"}}
|
||||||
|
return info
|
||||||
31
hub-api/routers/health.py
Normal file
31
hub-api/routers/health.py
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
from fastapi import APIRouter
|
||||||
|
|
||||||
|
from services.gpu_monitor import get_gpu_info
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
router = APIRouter(tags=["health"])
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/health")
|
||||||
|
async def health():
|
||||||
|
gpu = await get_gpu_info()
|
||||||
|
return {
|
||||||
|
"status": "ok",
|
||||||
|
"backends": registry.get_health_summary(),
|
||||||
|
"gpu": gpu,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/health/{backend_id}")
|
||||||
|
async def backend_health(backend_id: str):
|
||||||
|
backend = registry.backends.get(backend_id)
|
||||||
|
if not backend:
|
||||||
|
return {"error": {"message": f"Backend '{backend_id}' not found"}}
|
||||||
|
|
||||||
|
return {
|
||||||
|
"id": backend.id,
|
||||||
|
"type": backend.type,
|
||||||
|
"status": "healthy" if backend.healthy else "down",
|
||||||
|
"models": [m.id for m in backend.models],
|
||||||
|
"latency_ms": backend.latency_ms,
|
||||||
|
}
|
||||||
12
hub-api/routers/models.py
Normal file
12
hub-api/routers/models.py
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
from fastapi import APIRouter, Request
|
||||||
|
|
||||||
|
from services.registry import registry
|
||||||
|
|
||||||
|
router = APIRouter(prefix="/v1", tags=["models"])
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/models")
|
||||||
|
async def list_models(request: Request):
|
||||||
|
role = getattr(request.state, "role", "anonymous")
|
||||||
|
models = registry.list_models(role)
|
||||||
|
return {"object": "list", "data": models}
|
||||||
0
hub-api/services/__init__.py
Normal file
0
hub-api/services/__init__.py
Normal file
41
hub-api/services/gpu_monitor.py
Normal file
41
hub-api/services/gpu_monitor.py
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from config import settings
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
async def get_gpu_info() -> dict | None:
|
||||||
|
"""Run nvidia-smi and parse GPU info."""
|
||||||
|
try:
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
settings.nvidia_smi_path,
|
||||||
|
"--query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total,power.draw,name",
|
||||||
|
"--format=csv,noheader,nounits",
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.PIPE,
|
||||||
|
)
|
||||||
|
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=5.0)
|
||||||
|
|
||||||
|
if proc.returncode != 0:
|
||||||
|
logger.debug("nvidia-smi failed: %s", stderr.decode())
|
||||||
|
return None
|
||||||
|
|
||||||
|
line = stdout.decode().strip().split("\n")[0]
|
||||||
|
parts = [p.strip() for p in line.split(",")]
|
||||||
|
if len(parts) < 6:
|
||||||
|
return None
|
||||||
|
|
||||||
|
return {
|
||||||
|
"utilization": int(parts[0]),
|
||||||
|
"temperature": int(parts[1]),
|
||||||
|
"vram_used": int(parts[2]),
|
||||||
|
"vram_total": int(parts[3]),
|
||||||
|
"power_draw": float(parts[4]),
|
||||||
|
"name": parts[5],
|
||||||
|
}
|
||||||
|
except (FileNotFoundError, asyncio.TimeoutError):
|
||||||
|
return None
|
||||||
156
hub-api/services/proxy_ollama.py
Normal file
156
hub-api/services/proxy_ollama.py
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from collections.abc import AsyncGenerator
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
async def stream_chat(
|
||||||
|
base_url: str,
|
||||||
|
model: str,
|
||||||
|
messages: list[dict],
|
||||||
|
**kwargs,
|
||||||
|
) -> AsyncGenerator[str, None]:
|
||||||
|
"""Proxy Ollama chat streaming, converting NDJSON to OpenAI SSE format."""
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": messages,
|
||||||
|
"stream": True,
|
||||||
|
**{k: v for k, v in kwargs.items() if v is not None},
|
||||||
|
}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
async with client.stream(
|
||||||
|
"POST",
|
||||||
|
f"{base_url}/api/chat",
|
||||||
|
json=payload,
|
||||||
|
) as resp:
|
||||||
|
if resp.status_code != 200:
|
||||||
|
body = await resp.aread()
|
||||||
|
error_msg = body.decode("utf-8", errors="replace")
|
||||||
|
yield _error_event(f"Ollama error: {error_msg}")
|
||||||
|
return
|
||||||
|
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
chunk = json.loads(line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if chunk.get("done"):
|
||||||
|
# Final chunk — send [DONE]
|
||||||
|
yield "data: [DONE]\n\n"
|
||||||
|
return
|
||||||
|
|
||||||
|
content = chunk.get("message", {}).get("content", "")
|
||||||
|
if content:
|
||||||
|
openai_chunk = {
|
||||||
|
"id": "chatcmpl-gateway",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"model": model,
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"delta": {"content": content},
|
||||||
|
"finish_reason": None,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
yield f"data: {json.dumps(openai_chunk)}\n\n"
|
||||||
|
|
||||||
|
|
||||||
|
async def complete_chat(
|
||||||
|
base_url: str,
|
||||||
|
model: str,
|
||||||
|
messages: list[dict],
|
||||||
|
**kwargs,
|
||||||
|
) -> dict:
|
||||||
|
"""Non-streaming Ollama chat, returns OpenAI-compatible response."""
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": messages,
|
||||||
|
"stream": False,
|
||||||
|
**{k: v for k, v in kwargs.items() if v is not None},
|
||||||
|
}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
resp = await client.post(f"{base_url}/api/chat", json=payload)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
|
||||||
|
return {
|
||||||
|
"id": "chatcmpl-gateway",
|
||||||
|
"object": "chat.completion",
|
||||||
|
"model": model,
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"message": {
|
||||||
|
"role": "assistant",
|
||||||
|
"content": data.get("message", {}).get("content", ""),
|
||||||
|
},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usage": {
|
||||||
|
"prompt_tokens": data.get("prompt_eval_count", 0),
|
||||||
|
"completion_tokens": data.get("eval_count", 0),
|
||||||
|
"total_tokens": data.get("prompt_eval_count", 0)
|
||||||
|
+ data.get("eval_count", 0),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
async def generate_embedding(
|
||||||
|
base_url: str,
|
||||||
|
model: str,
|
||||||
|
input_text: str | list[str],
|
||||||
|
) -> dict:
|
||||||
|
"""Ollama embedding, returns OpenAI-compatible response."""
|
||||||
|
texts = [input_text] if isinstance(input_text, str) else input_text
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=60.0) as client:
|
||||||
|
resp = await client.post(
|
||||||
|
f"{base_url}/api/embed",
|
||||||
|
json={"model": model, "input": texts},
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
|
||||||
|
embeddings_data = []
|
||||||
|
raw_embeddings = data.get("embeddings", [])
|
||||||
|
for i, emb in enumerate(raw_embeddings):
|
||||||
|
embeddings_data.append({
|
||||||
|
"object": "embedding",
|
||||||
|
"embedding": emb,
|
||||||
|
"index": i,
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
"object": "list",
|
||||||
|
"data": embeddings_data,
|
||||||
|
"model": model,
|
||||||
|
"usage": {"prompt_tokens": 1, "total_tokens": 1},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _error_event(message: str) -> str:
|
||||||
|
error = {
|
||||||
|
"id": "chatcmpl-gateway",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"model": "error",
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"delta": {"content": f"[Error] {message}"},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"
|
||||||
83
hub-api/services/proxy_openai.py
Normal file
83
hub-api/services/proxy_openai.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""OpenAI-compatible proxy (MLX server, vLLM, etc.) — SSE passthrough."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from collections.abc import AsyncGenerator
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
async def stream_chat(
|
||||||
|
base_url: str,
|
||||||
|
model: str,
|
||||||
|
messages: list[dict],
|
||||||
|
**kwargs,
|
||||||
|
) -> AsyncGenerator[str, None]:
|
||||||
|
"""Proxy OpenAI-compatible chat streaming. SSE passthrough with model field override."""
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": messages,
|
||||||
|
"stream": True,
|
||||||
|
**{k: v for k, v in kwargs.items() if v is not None},
|
||||||
|
}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
async with client.stream(
|
||||||
|
"POST",
|
||||||
|
f"{base_url}/v1/chat/completions",
|
||||||
|
json=payload,
|
||||||
|
) as resp:
|
||||||
|
if resp.status_code != 200:
|
||||||
|
body = await resp.aread()
|
||||||
|
error_msg = body.decode("utf-8", errors="replace")
|
||||||
|
yield _error_event(f"Backend error ({resp.status_code}): {error_msg}")
|
||||||
|
return
|
||||||
|
|
||||||
|
async for line in resp.aiter_lines():
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
# Pass through SSE lines as-is (already in OpenAI format)
|
||||||
|
if line.startswith("data: "):
|
||||||
|
yield f"{line}\n\n"
|
||||||
|
elif line == "data: [DONE]":
|
||||||
|
yield "data: [DONE]\n\n"
|
||||||
|
|
||||||
|
|
||||||
|
async def complete_chat(
|
||||||
|
base_url: str,
|
||||||
|
model: str,
|
||||||
|
messages: list[dict],
|
||||||
|
**kwargs,
|
||||||
|
) -> dict:
|
||||||
|
"""Non-streaming OpenAI-compatible chat."""
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": messages,
|
||||||
|
"stream": False,
|
||||||
|
**{k: v for k, v in kwargs.items() if v is not None},
|
||||||
|
}
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(timeout=120.0) as client:
|
||||||
|
resp = await client.post(f"{base_url}/v1/chat/completions", json=payload)
|
||||||
|
resp.raise_for_status()
|
||||||
|
return resp.json()
|
||||||
|
|
||||||
|
|
||||||
|
def _error_event(message: str) -> str:
|
||||||
|
error = {
|
||||||
|
"id": "chatcmpl-gateway",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"model": "error",
|
||||||
|
"choices": [
|
||||||
|
{
|
||||||
|
"index": 0,
|
||||||
|
"delta": {"content": f"[Error] {message}"},
|
||||||
|
"finish_reason": "stop",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
return f"data: {json.dumps(error)}\n\ndata: [DONE]\n\n"
|
||||||
227
hub-api/services/registry.py
Normal file
227
hub-api/services/registry.py
Normal file
@@ -0,0 +1,227 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ModelInfo:
|
||||||
|
id: str
|
||||||
|
capabilities: list[str]
|
||||||
|
priority: int = 1
|
||||||
|
backend_model_id: str = "" # actual model ID sent to backend (if different from id)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RateLimitConfig:
|
||||||
|
rpm: int = 0
|
||||||
|
rph: int = 0
|
||||||
|
scope: str = "global"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class BackendInfo:
|
||||||
|
id: str
|
||||||
|
type: str # "ollama", "openai-compat", "anthropic"
|
||||||
|
url: str
|
||||||
|
models: list[ModelInfo]
|
||||||
|
access: str = "all" # "all" or "owner"
|
||||||
|
rate_limit: RateLimitConfig | None = None
|
||||||
|
|
||||||
|
# runtime state
|
||||||
|
healthy: bool = False
|
||||||
|
last_check: float = 0
|
||||||
|
latency_ms: float = 0
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RateLimitState:
|
||||||
|
minute_timestamps: list[float] = field(default_factory=list)
|
||||||
|
hour_timestamps: list[float] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class Registry:
|
||||||
|
def __init__(self):
|
||||||
|
self.backends: dict[str, BackendInfo] = {}
|
||||||
|
self._health_task: asyncio.Task | None = None
|
||||||
|
self._rate_limits: dict[str, RateLimitState] = {}
|
||||||
|
|
||||||
|
async def load_backends(self, config_path: str):
|
||||||
|
path = Path(config_path)
|
||||||
|
if not path.exists():
|
||||||
|
logger.warning("Backends config not found: %s", config_path)
|
||||||
|
return
|
||||||
|
|
||||||
|
with open(path) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
|
||||||
|
for entry in data:
|
||||||
|
models = [
|
||||||
|
ModelInfo(
|
||||||
|
id=m["id"],
|
||||||
|
capabilities=m.get("capabilities", ["chat"]),
|
||||||
|
priority=m.get("priority", 1),
|
||||||
|
backend_model_id=m.get("backend_model_id", ""),
|
||||||
|
)
|
||||||
|
for m in entry.get("models", [])
|
||||||
|
]
|
||||||
|
rl_data = entry.get("rate_limit")
|
||||||
|
rate_limit = (
|
||||||
|
RateLimitConfig(
|
||||||
|
rpm=rl_data.get("rpm", 0),
|
||||||
|
rph=rl_data.get("rph", 0),
|
||||||
|
scope=rl_data.get("scope", "global"),
|
||||||
|
)
|
||||||
|
if rl_data
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
backend = BackendInfo(
|
||||||
|
id=entry["id"],
|
||||||
|
type=entry["type"],
|
||||||
|
url=entry["url"].rstrip("/"),
|
||||||
|
models=models,
|
||||||
|
access=entry.get("access", "all"),
|
||||||
|
rate_limit=rate_limit,
|
||||||
|
)
|
||||||
|
self.backends[backend.id] = backend
|
||||||
|
if rate_limit:
|
||||||
|
self._rate_limits[backend.id] = RateLimitState()
|
||||||
|
|
||||||
|
logger.info("Loaded %d backends", len(self.backends))
|
||||||
|
|
||||||
|
def start_health_loop(self, interval: float = 30.0):
|
||||||
|
self._health_task = asyncio.create_task(self._health_loop(interval))
|
||||||
|
|
||||||
|
def stop_health_loop(self):
|
||||||
|
if self._health_task:
|
||||||
|
self._health_task.cancel()
|
||||||
|
|
||||||
|
async def _health_loop(self, interval: float):
|
||||||
|
while True:
|
||||||
|
await self._check_all_backends()
|
||||||
|
await asyncio.sleep(interval)
|
||||||
|
|
||||||
|
async def _check_all_backends(self):
|
||||||
|
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||||
|
tasks = [
|
||||||
|
self._check_backend(client, backend)
|
||||||
|
for backend in self.backends.values()
|
||||||
|
]
|
||||||
|
await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
async def _check_backend(self, client: httpx.AsyncClient, backend: BackendInfo):
|
||||||
|
try:
|
||||||
|
start = time.monotonic()
|
||||||
|
if backend.type == "ollama":
|
||||||
|
resp = await client.get(f"{backend.url}/api/tags")
|
||||||
|
elif backend.type in ("openai-compat", "anthropic"):
|
||||||
|
resp = await client.get(f"{backend.url}/v1/models")
|
||||||
|
else:
|
||||||
|
resp = await client.get(f"{backend.url}/health")
|
||||||
|
elapsed = (time.monotonic() - start) * 1000
|
||||||
|
|
||||||
|
backend.healthy = resp.status_code < 500
|
||||||
|
backend.latency_ms = round(elapsed, 1)
|
||||||
|
backend.last_check = time.time()
|
||||||
|
except Exception:
|
||||||
|
backend.healthy = False
|
||||||
|
backend.latency_ms = 0
|
||||||
|
backend.last_check = time.time()
|
||||||
|
logger.debug("Health check failed for %s", backend.id)
|
||||||
|
|
||||||
|
def resolve_model(self, model_id: str, role: str) -> tuple[BackendInfo, ModelInfo] | None:
|
||||||
|
"""Find the best backend for a given model ID. Returns (backend, model) or None."""
|
||||||
|
candidates: list[tuple[BackendInfo, ModelInfo, int]] = []
|
||||||
|
|
||||||
|
for backend in self.backends.values():
|
||||||
|
if not backend.healthy:
|
||||||
|
continue
|
||||||
|
if backend.access == "owner" and role != "owner":
|
||||||
|
continue
|
||||||
|
for model in backend.models:
|
||||||
|
if model.id == model_id:
|
||||||
|
candidates.append((backend, model, model.priority))
|
||||||
|
|
||||||
|
if not candidates:
|
||||||
|
return None
|
||||||
|
|
||||||
|
candidates.sort(key=lambda x: x[2])
|
||||||
|
return candidates[0][0], candidates[0][1]
|
||||||
|
|
||||||
|
def list_models(self, role: str) -> list[dict]:
|
||||||
|
"""List all available models for a given role."""
|
||||||
|
result = []
|
||||||
|
for backend in self.backends.values():
|
||||||
|
if not backend.healthy:
|
||||||
|
continue
|
||||||
|
if backend.access == "owner" and role != "owner":
|
||||||
|
continue
|
||||||
|
for model in backend.models:
|
||||||
|
result.append({
|
||||||
|
"id": model.id,
|
||||||
|
"object": "model",
|
||||||
|
"owned_by": backend.id,
|
||||||
|
"capabilities": model.capabilities,
|
||||||
|
"backend_id": backend.id,
|
||||||
|
"backend_status": "healthy" if backend.healthy else "down",
|
||||||
|
})
|
||||||
|
return result
|
||||||
|
|
||||||
|
def check_rate_limit(self, backend_id: str) -> bool:
|
||||||
|
"""Check if a request to this backend is within rate limits. Returns True if allowed."""
|
||||||
|
backend = self.backends.get(backend_id)
|
||||||
|
if not backend or not backend.rate_limit:
|
||||||
|
return True
|
||||||
|
|
||||||
|
state = self._rate_limits.get(backend_id)
|
||||||
|
if not state:
|
||||||
|
return True
|
||||||
|
|
||||||
|
now = time.time()
|
||||||
|
rl = backend.rate_limit
|
||||||
|
|
||||||
|
# Clean old timestamps
|
||||||
|
if rl.rpm > 0:
|
||||||
|
state.minute_timestamps = [t for t in state.minute_timestamps if now - t < 60]
|
||||||
|
if len(state.minute_timestamps) >= rl.rpm:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if rl.rph > 0:
|
||||||
|
state.hour_timestamps = [t for t in state.hour_timestamps if now - t < 3600]
|
||||||
|
if len(state.hour_timestamps) >= rl.rph:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def record_request(self, backend_id: str):
|
||||||
|
"""Record a request timestamp for rate limiting."""
|
||||||
|
state = self._rate_limits.get(backend_id)
|
||||||
|
if not state:
|
||||||
|
return
|
||||||
|
now = time.time()
|
||||||
|
state.minute_timestamps.append(now)
|
||||||
|
state.hour_timestamps.append(now)
|
||||||
|
|
||||||
|
def get_health_summary(self) -> list[dict]:
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"id": b.id,
|
||||||
|
"type": b.type,
|
||||||
|
"status": "healthy" if b.healthy else "down",
|
||||||
|
"models": [m.id for m in b.models],
|
||||||
|
"latency_ms": b.latency_ms,
|
||||||
|
"last_check": b.last_check,
|
||||||
|
}
|
||||||
|
for b in self.backends.values()
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
registry = Registry()
|
||||||
24
hub-web/.gitignore
vendored
Normal file
24
hub-web/.gitignore
vendored
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
# Logs
|
||||||
|
logs
|
||||||
|
*.log
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
lerna-debug.log*
|
||||||
|
|
||||||
|
node_modules
|
||||||
|
dist
|
||||||
|
dist-ssr
|
||||||
|
*.local
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.vscode/*
|
||||||
|
!.vscode/extensions.json
|
||||||
|
.idea
|
||||||
|
.DS_Store
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
||||||
12
hub-web/Dockerfile
Normal file
12
hub-web/Dockerfile
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
FROM node:20-alpine AS build
|
||||||
|
WORKDIR /app
|
||||||
|
COPY package*.json ./
|
||||||
|
RUN npm ci
|
||||||
|
COPY . .
|
||||||
|
RUN npm run build
|
||||||
|
|
||||||
|
FROM nginx:alpine
|
||||||
|
COPY --from=build /app/dist /usr/share/nginx/html
|
||||||
|
COPY nginx.conf /etc/nginx/conf.d/default.conf
|
||||||
|
EXPOSE 80
|
||||||
|
CMD ["nginx", "-g", "daemon off;"]
|
||||||
73
hub-web/README.md
Normal file
73
hub-web/README.md
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
# React + TypeScript + Vite
|
||||||
|
|
||||||
|
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
|
||||||
|
|
||||||
|
Currently, two official plugins are available:
|
||||||
|
|
||||||
|
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Oxc](https://oxc.rs)
|
||||||
|
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/)
|
||||||
|
|
||||||
|
## React Compiler
|
||||||
|
|
||||||
|
The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation).
|
||||||
|
|
||||||
|
## Expanding the ESLint configuration
|
||||||
|
|
||||||
|
If you are developing a production application, we recommend updating the configuration to enable type-aware lint rules:
|
||||||
|
|
||||||
|
```js
|
||||||
|
export default defineConfig([
|
||||||
|
globalIgnores(['dist']),
|
||||||
|
{
|
||||||
|
files: ['**/*.{ts,tsx}'],
|
||||||
|
extends: [
|
||||||
|
// Other configs...
|
||||||
|
|
||||||
|
// Remove tseslint.configs.recommended and replace with this
|
||||||
|
tseslint.configs.recommendedTypeChecked,
|
||||||
|
// Alternatively, use this for stricter rules
|
||||||
|
tseslint.configs.strictTypeChecked,
|
||||||
|
// Optionally, add this for stylistic rules
|
||||||
|
tseslint.configs.stylisticTypeChecked,
|
||||||
|
|
||||||
|
// Other configs...
|
||||||
|
],
|
||||||
|
languageOptions: {
|
||||||
|
parserOptions: {
|
||||||
|
project: ['./tsconfig.node.json', './tsconfig.app.json'],
|
||||||
|
tsconfigRootDir: import.meta.dirname,
|
||||||
|
},
|
||||||
|
// other options...
|
||||||
|
},
|
||||||
|
},
|
||||||
|
])
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also install [eslint-plugin-react-x](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-x) and [eslint-plugin-react-dom](https://github.com/Rel1cx/eslint-react/tree/main/packages/plugins/eslint-plugin-react-dom) for React-specific lint rules:
|
||||||
|
|
||||||
|
```js
|
||||||
|
// eslint.config.js
|
||||||
|
import reactX from 'eslint-plugin-react-x'
|
||||||
|
import reactDom from 'eslint-plugin-react-dom'
|
||||||
|
|
||||||
|
export default defineConfig([
|
||||||
|
globalIgnores(['dist']),
|
||||||
|
{
|
||||||
|
files: ['**/*.{ts,tsx}'],
|
||||||
|
extends: [
|
||||||
|
// Other configs...
|
||||||
|
// Enable lint rules for React
|
||||||
|
reactX.configs['recommended-typescript'],
|
||||||
|
// Enable lint rules for React DOM
|
||||||
|
reactDom.configs.recommended,
|
||||||
|
],
|
||||||
|
languageOptions: {
|
||||||
|
parserOptions: {
|
||||||
|
project: ['./tsconfig.node.json', './tsconfig.app.json'],
|
||||||
|
tsconfigRootDir: import.meta.dirname,
|
||||||
|
},
|
||||||
|
// other options...
|
||||||
|
},
|
||||||
|
},
|
||||||
|
])
|
||||||
|
```
|
||||||
23
hub-web/eslint.config.js
Normal file
23
hub-web/eslint.config.js
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
import js from '@eslint/js'
|
||||||
|
import globals from 'globals'
|
||||||
|
import reactHooks from 'eslint-plugin-react-hooks'
|
||||||
|
import reactRefresh from 'eslint-plugin-react-refresh'
|
||||||
|
import tseslint from 'typescript-eslint'
|
||||||
|
import { defineConfig, globalIgnores } from 'eslint/config'
|
||||||
|
|
||||||
|
export default defineConfig([
|
||||||
|
globalIgnores(['dist']),
|
||||||
|
{
|
||||||
|
files: ['**/*.{ts,tsx}'],
|
||||||
|
extends: [
|
||||||
|
js.configs.recommended,
|
||||||
|
tseslint.configs.recommended,
|
||||||
|
reactHooks.configs.flat.recommended,
|
||||||
|
reactRefresh.configs.vite,
|
||||||
|
],
|
||||||
|
languageOptions: {
|
||||||
|
ecmaVersion: 2020,
|
||||||
|
globals: globals.browser,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
])
|
||||||
13
hub-web/index.html
Normal file
13
hub-web/index.html
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>hub-web</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="root"></div>
|
||||||
|
<script type="module" src="/src/main.tsx"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
9
hub-web/nginx.conf
Normal file
9
hub-web/nginx.conf
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
server {
|
||||||
|
listen 80;
|
||||||
|
root /usr/share/nginx/html;
|
||||||
|
index index.html;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
try_files $uri $uri/ /index.html;
|
||||||
|
}
|
||||||
|
}
|
||||||
4531
hub-web/package-lock.json
generated
Normal file
4531
hub-web/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
34
hub-web/package.json
Normal file
34
hub-web/package.json
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
{
|
||||||
|
"name": "hub-web",
|
||||||
|
"private": true,
|
||||||
|
"version": "0.0.0",
|
||||||
|
"type": "module",
|
||||||
|
"scripts": {
|
||||||
|
"dev": "vite",
|
||||||
|
"build": "tsc -b && vite build",
|
||||||
|
"lint": "eslint .",
|
||||||
|
"preview": "vite preview"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"react": "^19.2.4",
|
||||||
|
"react-dom": "^19.2.4",
|
||||||
|
"react-markdown": "^10.1.0",
|
||||||
|
"react-router-dom": "^7.13.2"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@eslint/js": "^9.39.4",
|
||||||
|
"@tailwindcss/vite": "^4.2.2",
|
||||||
|
"@types/node": "^24.12.0",
|
||||||
|
"@types/react": "^19.2.14",
|
||||||
|
"@types/react-dom": "^19.2.3",
|
||||||
|
"@vitejs/plugin-react": "^6.0.1",
|
||||||
|
"eslint": "^9.39.4",
|
||||||
|
"eslint-plugin-react-hooks": "^7.0.1",
|
||||||
|
"eslint-plugin-react-refresh": "^0.5.2",
|
||||||
|
"globals": "^17.4.0",
|
||||||
|
"tailwindcss": "^4.2.2",
|
||||||
|
"typescript": "~5.9.3",
|
||||||
|
"typescript-eslint": "^8.57.0",
|
||||||
|
"vite": "^8.0.1"
|
||||||
|
}
|
||||||
|
}
|
||||||
1
hub-web/public/favicon.svg
Normal file
1
hub-web/public/favicon.svg
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 9.3 KiB |
24
hub-web/public/icons.svg
Normal file
24
hub-web/public/icons.svg
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
<svg xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<symbol id="bluesky-icon" viewBox="0 0 16 17">
|
||||||
|
<g clip-path="url(#bluesky-clip)"><path fill="#08060d" d="M7.75 7.735c-.693-1.348-2.58-3.86-4.334-5.097-1.68-1.187-2.32-.981-2.74-.79C.188 2.065.1 2.812.1 3.251s.241 3.602.398 4.13c.52 1.744 2.367 2.333 4.07 2.145-2.495.37-4.71 1.278-1.805 4.512 3.196 3.309 4.38-.71 4.987-2.746.608 2.036 1.307 5.91 4.93 2.746 2.72-2.746.747-4.143-1.747-4.512 1.702.189 3.55-.4 4.07-2.145.156-.528.397-3.691.397-4.13s-.088-1.186-.575-1.406c-.42-.19-1.06-.395-2.741.79-1.755 1.24-3.64 3.752-4.334 5.099"/></g>
|
||||||
|
<defs><clipPath id="bluesky-clip"><path fill="#fff" d="M.1.85h15.3v15.3H.1z"/></clipPath></defs>
|
||||||
|
</symbol>
|
||||||
|
<symbol id="discord-icon" viewBox="0 0 20 19">
|
||||||
|
<path fill="#08060d" d="M16.224 3.768a14.5 14.5 0 0 0-3.67-1.153c-.158.286-.343.67-.47.976a13.5 13.5 0 0 0-4.067 0c-.128-.306-.317-.69-.476-.976A14.4 14.4 0 0 0 3.868 3.77C1.546 7.28.916 10.703 1.231 14.077a14.7 14.7 0 0 0 4.5 2.306q.545-.748.965-1.587a9.5 9.5 0 0 1-1.518-.74q.191-.14.372-.293c2.927 1.369 6.107 1.369 8.999 0q.183.152.372.294-.723.437-1.52.74.418.838.963 1.588a14.6 14.6 0 0 0 4.504-2.308c.37-3.911-.63-7.302-2.644-10.309m-9.13 8.234c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.894 0 1.614.82 1.599 1.82.001 1-.705 1.82-1.6 1.82m5.91 0c-.878 0-1.599-.82-1.599-1.82 0-.998.705-1.82 1.6-1.82.893 0 1.614.82 1.599 1.82 0 1-.706 1.82-1.6 1.82"/>
|
||||||
|
</symbol>
|
||||||
|
<symbol id="documentation-icon" viewBox="0 0 21 20">
|
||||||
|
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="m15.5 13.333 1.533 1.322c.645.555.967.833.967 1.178s-.322.623-.967 1.179L15.5 18.333m-3.333-5-1.534 1.322c-.644.555-.966.833-.966 1.178s.322.623.966 1.179l1.534 1.321"/>
|
||||||
|
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M17.167 10.836v-4.32c0-1.41 0-2.117-.224-2.68-.359-.906-1.118-1.621-2.08-1.96-.599-.21-1.349-.21-2.848-.21-2.623 0-3.935 0-4.983.369-1.684.591-3.013 1.842-3.641 3.428C3 6.449 3 7.684 3 10.154v2.122c0 2.558 0 3.838.706 4.726q.306.383.713.671c.76.536 1.79.64 3.581.66"/>
|
||||||
|
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M3 10a2.78 2.78 0 0 1 2.778-2.778c.555 0 1.209.097 1.748-.047.48-.129.854-.503.982-.982.145-.54.048-1.194.048-1.749a2.78 2.78 0 0 1 2.777-2.777"/>
|
||||||
|
</symbol>
|
||||||
|
<symbol id="github-icon" viewBox="0 0 19 19">
|
||||||
|
<path fill="#08060d" fill-rule="evenodd" d="M9.356 1.85C5.05 1.85 1.57 5.356 1.57 9.694a7.84 7.84 0 0 0 5.324 7.44c.387.079.528-.168.528-.376 0-.182-.013-.805-.013-1.454-2.165.467-2.616-.935-2.616-.935-.349-.91-.864-1.143-.864-1.143-.71-.48.051-.48.051-.48.787.051 1.2.805 1.2.805.695 1.194 1.817.857 2.268.649.064-.507.27-.857.49-1.052-1.728-.182-3.545-.857-3.545-3.87 0-.857.31-1.558.8-2.104-.078-.195-.349-1 .077-2.078 0 0 .657-.208 2.14.805a7.5 7.5 0 0 1 1.946-.26c.657 0 1.328.092 1.946.26 1.483-1.013 2.14-.805 2.14-.805.426 1.078.155 1.883.078 2.078.502.546.799 1.247.799 2.104 0 3.013-1.818 3.675-3.558 3.87.284.247.528.714.528 1.454 0 1.052-.012 1.896-.012 2.156 0 .208.142.455.528.377a7.84 7.84 0 0 0 5.324-7.441c.013-4.338-3.48-7.844-7.773-7.844" clip-rule="evenodd"/>
|
||||||
|
</symbol>
|
||||||
|
<symbol id="social-icon" viewBox="0 0 20 20">
|
||||||
|
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M12.5 6.667a4.167 4.167 0 1 0-8.334 0 4.167 4.167 0 0 0 8.334 0"/>
|
||||||
|
<path fill="none" stroke="#aa3bff" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.35" d="M2.5 16.667a5.833 5.833 0 0 1 8.75-5.053m3.837.474.513 1.035c.07.144.257.282.414.309l.93.155c.596.1.736.536.307.965l-.723.73a.64.64 0 0 0-.152.531l.207.903c.164.715-.213.991-.84.618l-.872-.52a.63.63 0 0 0-.577 0l-.872.52c-.624.373-1.003.094-.84-.618l.207-.903a.64.64 0 0 0-.152-.532l-.723-.729c-.426-.43-.289-.864.306-.964l.93-.156a.64.64 0 0 0 .412-.31l.513-1.034c.28-.562.735-.562 1.012 0"/>
|
||||||
|
</symbol>
|
||||||
|
<symbol id="x-icon" viewBox="0 0 19 19">
|
||||||
|
<path fill="#08060d" fill-rule="evenodd" d="M1.893 1.98c.052.072 1.245 1.769 2.653 3.77l2.892 4.114c.183.261.333.48.333.486s-.068.089-.152.183l-.522.593-.765.867-3.597 4.087c-.375.426-.734.834-.798.905a1 1 0 0 0-.118.148c0 .01.236.017.664.017h.663l.729-.83c.4-.457.796-.906.879-.999a692 692 0 0 0 1.794-2.038c.034-.037.301-.34.594-.675l.551-.624.345-.392a7 7 0 0 1 .34-.374c.006 0 .93 1.306 2.052 2.903l2.084 2.965.045.063h2.275c1.87 0 2.273-.003 2.266-.021-.008-.02-1.098-1.572-3.894-5.547-2.013-2.862-2.28-3.246-2.273-3.266.008-.019.282-.332 2.085-2.38l2-2.274 1.567-1.782c.022-.028-.016-.03-.65-.03h-.674l-.3.342a871 871 0 0 1-1.782 2.025c-.067.075-.405.458-.75.852a100 100 0 0 1-.803.91c-.148.172-.299.344-.99 1.127-.304.343-.32.358-.345.327-.015-.019-.904-1.282-1.976-2.808L6.365 1.85H1.8zm1.782.91 8.078 11.294c.772 1.08 1.413 1.973 1.425 1.984.016.017.241.02 1.05.017l1.03-.004-2.694-3.766L7.796 5.75 5.722 2.852l-1.039-.004-1.039-.004z" clip-rule="evenodd"/>
|
||||||
|
</symbol>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 4.9 KiB |
61
hub-web/src/App.tsx
Normal file
61
hub-web/src/App.tsx
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
import { useState, useEffect } from 'react';
|
||||||
|
import { BrowserRouter, Routes, Route, Navigate, NavLink } from 'react-router-dom';
|
||||||
|
import { AuthCtx } from './lib/auth';
|
||||||
|
import { getMe, logout } from './lib/api';
|
||||||
|
import Login from './pages/Login';
|
||||||
|
import Dashboard from './pages/Dashboard';
|
||||||
|
import Chat from './pages/Chat';
|
||||||
|
|
||||||
|
export default function App() {
|
||||||
|
const [role, setRole] = useState<string | null>(null);
|
||||||
|
const [loading, setLoading] = useState(true);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
getMe().then(me => {
|
||||||
|
setRole(me?.role ?? null);
|
||||||
|
setLoading(false);
|
||||||
|
});
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
if (loading) {
|
||||||
|
return <div className="flex items-center justify-center h-screen text-[hsl(var(--muted-foreground))]">Loading...</div>;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!role) {
|
||||||
|
return (
|
||||||
|
<AuthCtx.Provider value={{ role, setRole }}>
|
||||||
|
<Login />
|
||||||
|
</AuthCtx.Provider>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return (
|
||||||
|
<AuthCtx.Provider value={{ role, setRole }}>
|
||||||
|
<BrowserRouter>
|
||||||
|
<div className="min-h-screen flex flex-col dark">
|
||||||
|
<nav className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-6">
|
||||||
|
<span className="font-semibold text-lg">AI Gateway</span>
|
||||||
|
<NavLink to="/" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Dashboard</NavLink>
|
||||||
|
<NavLink to="/chat" className={({ isActive }) => isActive ? 'text-[hsl(var(--foreground))]' : 'text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]'}>Chat</NavLink>
|
||||||
|
<div className="ml-auto flex items-center gap-3">
|
||||||
|
<span className="text-sm text-[hsl(var(--muted-foreground))]">{role}</span>
|
||||||
|
<button
|
||||||
|
onClick={async () => { await logout(); setRole(null); }}
|
||||||
|
className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
|
||||||
|
>
|
||||||
|
Logout
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</nav>
|
||||||
|
<main className="flex-1">
|
||||||
|
<Routes>
|
||||||
|
<Route path="/" element={<Dashboard />} />
|
||||||
|
<Route path="/chat" element={<Chat />} />
|
||||||
|
<Route path="*" element={<Navigate to="/" />} />
|
||||||
|
</Routes>
|
||||||
|
</main>
|
||||||
|
</div>
|
||||||
|
</BrowserRouter>
|
||||||
|
</AuthCtx.Provider>
|
||||||
|
);
|
||||||
|
}
|
||||||
41
hub-web/src/index.css
Normal file
41
hub-web/src/index.css
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
@import "tailwindcss";
|
||||||
|
|
||||||
|
:root {
|
||||||
|
--background: 0 0% 100%;
|
||||||
|
--foreground: 240 10% 3.9%;
|
||||||
|
--card: 0 0% 100%;
|
||||||
|
--card-foreground: 240 10% 3.9%;
|
||||||
|
--muted: 240 4.8% 95.9%;
|
||||||
|
--muted-foreground: 240 3.8% 46.1%;
|
||||||
|
--border: 240 5.9% 90%;
|
||||||
|
--primary: 240 5.9% 10%;
|
||||||
|
--primary-foreground: 0 0% 98%;
|
||||||
|
--destructive: 0 84.2% 60.2%;
|
||||||
|
--ring: 240 5.9% 10%;
|
||||||
|
--radius: 0.5rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.dark {
|
||||||
|
--background: 240 10% 3.9%;
|
||||||
|
--foreground: 0 0% 98%;
|
||||||
|
--card: 240 10% 3.9%;
|
||||||
|
--card-foreground: 0 0% 98%;
|
||||||
|
--muted: 240 3.7% 15.9%;
|
||||||
|
--muted-foreground: 240 5% 64.9%;
|
||||||
|
--border: 240 3.7% 15.9%;
|
||||||
|
--primary: 0 0% 98%;
|
||||||
|
--primary-foreground: 240 5.9% 10%;
|
||||||
|
--destructive: 0 62.8% 30.6%;
|
||||||
|
--ring: 240 4.9% 83.9%;
|
||||||
|
}
|
||||||
|
|
||||||
|
* {
|
||||||
|
border-color: hsl(var(--border));
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
margin: 0;
|
||||||
|
background-color: hsl(var(--background));
|
||||||
|
color: hsl(var(--foreground));
|
||||||
|
font-family: system-ui, -apple-system, sans-serif;
|
||||||
|
}
|
||||||
127
hub-web/src/lib/api.ts
Normal file
127
hub-web/src/lib/api.ts
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
const BASE = '';
|
||||||
|
|
||||||
|
// Store token in memory for Bearer auth (more reliable than cookies through proxies)
|
||||||
|
let _token: string | null = null;
|
||||||
|
|
||||||
|
export function setToken(token: string | null) {
|
||||||
|
_token = token;
|
||||||
|
}
|
||||||
|
|
||||||
|
function authHeaders(): Record<string, string> {
|
||||||
|
const h: Record<string, string> = { 'Content-Type': 'application/json' };
|
||||||
|
if (_token) h['Authorization'] = `Bearer ${_token}`;
|
||||||
|
return h;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function login(password: string): Promise<{ role: string; token: string }> {
|
||||||
|
const res = await fetch(`${BASE}/auth/login`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ password }),
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
const err = await res.json().catch(() => null);
|
||||||
|
throw new Error(err?.error?.message || 'Login failed');
|
||||||
|
}
|
||||||
|
const data = await res.json();
|
||||||
|
_token = data.token;
|
||||||
|
return data;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getMe(): Promise<{ role: string } | null> {
|
||||||
|
if (!_token) return null;
|
||||||
|
const res = await fetch(`${BASE}/auth/me`, { headers: authHeaders() });
|
||||||
|
if (!res.ok) { _token = null; return null; }
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function logout(): Promise<void> {
|
||||||
|
await fetch(`${BASE}/auth/logout`, { method: 'POST', headers: authHeaders() });
|
||||||
|
_token = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface Model {
|
||||||
|
id: string;
|
||||||
|
owned_by: string;
|
||||||
|
capabilities: string[];
|
||||||
|
backend_id: string;
|
||||||
|
backend_status: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getModels(): Promise<Model[]> {
|
||||||
|
const res = await fetch(`${BASE}/v1/models`, { headers: authHeaders() });
|
||||||
|
if (!res.ok) return [];
|
||||||
|
const data = await res.json();
|
||||||
|
return data.data || [];
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface BackendHealth {
|
||||||
|
id: string;
|
||||||
|
type: string;
|
||||||
|
status: string;
|
||||||
|
models: string[];
|
||||||
|
latency_ms: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface GpuInfo {
|
||||||
|
utilization: number;
|
||||||
|
temperature: number;
|
||||||
|
vram_used: number;
|
||||||
|
vram_total: number;
|
||||||
|
power_draw: number;
|
||||||
|
name: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function getHealth(): Promise<{ backends: BackendHealth[]; gpu: GpuInfo | null }> {
|
||||||
|
const res = await fetch(`${BASE}/health`);
|
||||||
|
if (!res.ok) return { backends: [], gpu: null };
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ChatMessage {
|
||||||
|
role: 'user' | 'assistant' | 'system';
|
||||||
|
content: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function* streamChat(
|
||||||
|
model: string,
|
||||||
|
messages: ChatMessage[],
|
||||||
|
): AsyncGenerator<string, void> {
|
||||||
|
const res = await fetch(`${BASE}/v1/chat/completions`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: authHeaders(),
|
||||||
|
body: JSON.stringify({ model, messages, stream: true }),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!res.ok) {
|
||||||
|
const err = await res.json().catch(() => null);
|
||||||
|
throw new Error(err?.error?.message || `Chat failed: ${res.status}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const reader = res.body?.getReader();
|
||||||
|
if (!reader) throw new Error('No response body');
|
||||||
|
const decoder = new TextDecoder();
|
||||||
|
let buffer = '';
|
||||||
|
|
||||||
|
while (true) {
|
||||||
|
const { done, value } = await reader.read();
|
||||||
|
if (done) break;
|
||||||
|
|
||||||
|
buffer += decoder.decode(value, { stream: true });
|
||||||
|
const lines = buffer.split('\n');
|
||||||
|
buffer = lines.pop() || '';
|
||||||
|
|
||||||
|
for (const line of lines) {
|
||||||
|
if (!line.startsWith('data: ')) continue;
|
||||||
|
const data = line.slice(6).trim();
|
||||||
|
if (data === '[DONE]') return;
|
||||||
|
try {
|
||||||
|
const parsed = JSON.parse(data);
|
||||||
|
const content = parsed.choices?.[0]?.delta?.content;
|
||||||
|
if (content) yield content;
|
||||||
|
} catch {
|
||||||
|
// skip malformed chunks
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
12
hub-web/src/lib/auth.ts
Normal file
12
hub-web/src/lib/auth.ts
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
import { createContext, useContext } from 'react';
|
||||||
|
|
||||||
|
export interface AuthContext {
|
||||||
|
role: string | null;
|
||||||
|
setRole: (role: string | null) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export const AuthCtx = createContext<AuthContext>({ role: null, setRole: () => {} });
|
||||||
|
|
||||||
|
export function useAuth() {
|
||||||
|
return useContext(AuthCtx);
|
||||||
|
}
|
||||||
10
hub-web/src/main.tsx
Normal file
10
hub-web/src/main.tsx
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
import { StrictMode } from 'react'
|
||||||
|
import { createRoot } from 'react-dom/client'
|
||||||
|
import './index.css'
|
||||||
|
import App from './App.tsx'
|
||||||
|
|
||||||
|
createRoot(document.getElementById('root')!).render(
|
||||||
|
<StrictMode>
|
||||||
|
<App />
|
||||||
|
</StrictMode>,
|
||||||
|
)
|
||||||
130
hub-web/src/pages/Chat.tsx
Normal file
130
hub-web/src/pages/Chat.tsx
Normal file
@@ -0,0 +1,130 @@
|
|||||||
|
import { useState, useEffect, useRef } from 'react';
|
||||||
|
import ReactMarkdown from 'react-markdown';
|
||||||
|
import type { Model, ChatMessage } from '../lib/api';
|
||||||
|
import { getModels, streamChat } from '../lib/api';
|
||||||
|
|
||||||
|
export default function Chat() {
|
||||||
|
const [models, setModels] = useState<Model[]>([]);
|
||||||
|
const [selectedModel, setSelectedModel] = useState('');
|
||||||
|
const [messages, setMessages] = useState<ChatMessage[]>([]);
|
||||||
|
const [input, setInput] = useState('');
|
||||||
|
const [streaming, setStreaming] = useState(false);
|
||||||
|
const bottomRef = useRef<HTMLDivElement>(null);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
getModels().then(mdls => {
|
||||||
|
const chatModels = mdls.filter(m => m.capabilities.includes('chat'));
|
||||||
|
setModels(chatModels);
|
||||||
|
if (chatModels.length > 0 && !selectedModel) {
|
||||||
|
setSelectedModel(chatModels[0].id);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
|
||||||
|
}, [messages]);
|
||||||
|
|
||||||
|
const handleSend = async () => {
|
||||||
|
if (!input.trim() || !selectedModel || streaming) return;
|
||||||
|
|
||||||
|
const userMsg: ChatMessage = { role: 'user', content: input.trim() };
|
||||||
|
const newMessages = [...messages, userMsg];
|
||||||
|
setMessages(newMessages);
|
||||||
|
setInput('');
|
||||||
|
setStreaming(true);
|
||||||
|
|
||||||
|
const assistantMsg: ChatMessage = { role: 'assistant', content: '' };
|
||||||
|
setMessages([...newMessages, assistantMsg]);
|
||||||
|
|
||||||
|
try {
|
||||||
|
for await (const chunk of streamChat(selectedModel, newMessages)) {
|
||||||
|
assistantMsg.content += chunk;
|
||||||
|
setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
assistantMsg.content += `\n\n[Error: ${err instanceof Error ? err.message : 'Unknown error'}]`;
|
||||||
|
setMessages(prev => [...prev.slice(0, -1), { ...assistantMsg }]);
|
||||||
|
} finally {
|
||||||
|
setStreaming(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const handleKeyDown = (e: React.KeyboardEvent) => {
|
||||||
|
if (e.key === 'Enter' && !e.shiftKey) {
|
||||||
|
e.preventDefault();
|
||||||
|
handleSend();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="flex flex-col h-[calc(100vh-57px)]">
|
||||||
|
{/* Header */}
|
||||||
|
<div className="border-b border-[hsl(var(--border))] px-6 py-3 flex items-center gap-4">
|
||||||
|
<select
|
||||||
|
value={selectedModel}
|
||||||
|
onChange={e => setSelectedModel(e.target.value)}
|
||||||
|
className="px-3 py-1.5 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-sm"
|
||||||
|
>
|
||||||
|
{models.map(m => (
|
||||||
|
<option key={m.id} value={m.id}>{m.id} ({m.owned_by})</option>
|
||||||
|
))}
|
||||||
|
</select>
|
||||||
|
<button
|
||||||
|
onClick={() => setMessages([])}
|
||||||
|
className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]"
|
||||||
|
>
|
||||||
|
Clear
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Messages */}
|
||||||
|
<div className="flex-1 overflow-y-auto px-6 py-4 space-y-4">
|
||||||
|
{messages.length === 0 && (
|
||||||
|
<div className="flex items-center justify-center h-full text-[hsl(var(--muted-foreground))]">
|
||||||
|
Send a message to start
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
{messages.map((msg, i) => (
|
||||||
|
<div key={i} className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}>
|
||||||
|
<div className={`max-w-[80%] rounded-lg px-4 py-2 ${
|
||||||
|
msg.role === 'user'
|
||||||
|
? 'bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))]'
|
||||||
|
: 'bg-[hsl(var(--muted))]'
|
||||||
|
}`}>
|
||||||
|
{msg.role === 'assistant' ? (
|
||||||
|
<div className="prose prose-sm prose-invert max-w-none">
|
||||||
|
<ReactMarkdown>{msg.content || '...'}</ReactMarkdown>
|
||||||
|
</div>
|
||||||
|
) : (
|
||||||
|
<p className="text-sm whitespace-pre-wrap">{msg.content}</p>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
<div ref={bottomRef} />
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Input */}
|
||||||
|
<div className="border-t border-[hsl(var(--border))] px-6 py-4">
|
||||||
|
<div className="flex gap-3">
|
||||||
|
<textarea
|
||||||
|
value={input}
|
||||||
|
onChange={e => setInput(e.target.value)}
|
||||||
|
onKeyDown={handleKeyDown}
|
||||||
|
placeholder="Type a message... (Enter to send, Shift+Enter for newline)"
|
||||||
|
rows={1}
|
||||||
|
className="flex-1 px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] resize-none focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
|
||||||
|
/>
|
||||||
|
<button
|
||||||
|
onClick={handleSend}
|
||||||
|
disabled={streaming || !input.trim()}
|
||||||
|
className="px-4 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
|
||||||
|
>
|
||||||
|
{streaming ? '...' : 'Send'}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
96
hub-web/src/pages/Dashboard.tsx
Normal file
96
hub-web/src/pages/Dashboard.tsx
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
import { useState, useEffect } from 'react';
|
||||||
|
import type { BackendHealth, GpuInfo, Model } from '../lib/api';
|
||||||
|
import { getHealth, getModels } from '../lib/api';
|
||||||
|
|
||||||
|
export default function Dashboard() {
|
||||||
|
const [backends, setBackends] = useState<BackendHealth[]>([]);
|
||||||
|
const [gpu, setGpu] = useState<GpuInfo | null>(null);
|
||||||
|
const [models, setModels] = useState<Model[]>([]);
|
||||||
|
|
||||||
|
const refresh = async () => {
|
||||||
|
const [health, mdls] = await Promise.all([getHealth(), getModels()]);
|
||||||
|
setBackends(health.backends);
|
||||||
|
setGpu(health.gpu);
|
||||||
|
setModels(mdls);
|
||||||
|
};
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
refresh();
|
||||||
|
const id = setInterval(refresh, 15000);
|
||||||
|
return () => clearInterval(id);
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="p-6 space-y-6">
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<h2 className="text-xl font-semibold">Backends</h2>
|
||||||
|
<button onClick={refresh} className="text-sm text-[hsl(var(--muted-foreground))] hover:text-[hsl(var(--foreground))]">Refresh</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||||
|
{backends.map(b => (
|
||||||
|
<div key={b.id} className="rounded-lg border border-[hsl(var(--border))] p-4 space-y-2">
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<span className="font-medium">{b.id}</span>
|
||||||
|
<span className={`text-xs px-2 py-0.5 rounded-full ${b.status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
|
||||||
|
{b.status}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="text-sm text-[hsl(var(--muted-foreground))]">{b.type}</div>
|
||||||
|
<div className="text-sm">{b.models.join(', ')}</div>
|
||||||
|
{b.latency_ms > 0 && <div className="text-xs text-[hsl(var(--muted-foreground))]">{b.latency_ms}ms</div>}
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{gpu && (
|
||||||
|
<>
|
||||||
|
<h2 className="text-xl font-semibold">GPU</h2>
|
||||||
|
<div className="rounded-lg border border-[hsl(var(--border))] p-4 grid grid-cols-2 md:grid-cols-4 gap-4">
|
||||||
|
<Stat label="Utilization" value={`${gpu.utilization}%`} />
|
||||||
|
<Stat label="Temperature" value={`${gpu.temperature}C`} />
|
||||||
|
<Stat label="VRAM" value={`${gpu.vram_used}/${gpu.vram_total} MB`} />
|
||||||
|
<Stat label="Power" value={`${gpu.power_draw}W`} />
|
||||||
|
</div>
|
||||||
|
</>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<h2 className="text-xl font-semibold">Models</h2>
|
||||||
|
<div className="rounded-lg border border-[hsl(var(--border))] overflow-hidden">
|
||||||
|
<table className="w-full text-sm">
|
||||||
|
<thead className="bg-[hsl(var(--muted))]">
|
||||||
|
<tr>
|
||||||
|
<th className="text-left px-4 py-2">Model</th>
|
||||||
|
<th className="text-left px-4 py-2">Backend</th>
|
||||||
|
<th className="text-left px-4 py-2">Capabilities</th>
|
||||||
|
<th className="text-left px-4 py-2">Status</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{models.map(m => (
|
||||||
|
<tr key={`${m.backend_id}-${m.id}`} className="border-t border-[hsl(var(--border))]">
|
||||||
|
<td className="px-4 py-2 font-mono">{m.id}</td>
|
||||||
|
<td className="px-4 py-2">{m.owned_by}</td>
|
||||||
|
<td className="px-4 py-2">{m.capabilities.join(', ')}</td>
|
||||||
|
<td className="px-4 py-2">
|
||||||
|
<span className={`text-xs px-2 py-0.5 rounded-full ${m.backend_status === 'healthy' ? 'bg-green-500/20 text-green-400' : 'bg-red-500/20 text-red-400'}`}>
|
||||||
|
{m.backend_status}
|
||||||
|
</span>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
))}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function Stat({ label, value }: { label: string; value: string }) {
|
||||||
|
return (
|
||||||
|
<div>
|
||||||
|
<div className="text-xs text-[hsl(var(--muted-foreground))]">{label}</div>
|
||||||
|
<div className="text-lg font-semibold">{value}</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
48
hub-web/src/pages/Login.tsx
Normal file
48
hub-web/src/pages/Login.tsx
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
import { useState } from 'react';
|
||||||
|
import { login } from '../lib/api';
|
||||||
|
import { useAuth } from '../lib/auth';
|
||||||
|
|
||||||
|
export default function Login() {
|
||||||
|
const { setRole } = useAuth();
|
||||||
|
const [password, setPassword] = useState('');
|
||||||
|
const [error, setError] = useState('');
|
||||||
|
const [loading, setLoading] = useState(false);
|
||||||
|
|
||||||
|
const handleSubmit = async (e: React.FormEvent) => {
|
||||||
|
e.preventDefault();
|
||||||
|
setError('');
|
||||||
|
setLoading(true);
|
||||||
|
try {
|
||||||
|
const { role } = await login(password);
|
||||||
|
setRole(role);
|
||||||
|
} catch (err) {
|
||||||
|
setError(err instanceof Error ? err.message : 'Login failed');
|
||||||
|
} finally {
|
||||||
|
setLoading(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="dark flex items-center justify-center min-h-screen bg-[hsl(var(--background))]">
|
||||||
|
<form onSubmit={handleSubmit} className="w-80 space-y-4">
|
||||||
|
<h1 className="text-2xl font-semibold text-center text-[hsl(var(--foreground))]">AI Gateway</h1>
|
||||||
|
<input
|
||||||
|
type="password"
|
||||||
|
value={password}
|
||||||
|
onChange={e => setPassword(e.target.value)}
|
||||||
|
placeholder="Password"
|
||||||
|
autoFocus
|
||||||
|
className="w-full px-3 py-2 rounded-md border border-[hsl(var(--border))] bg-[hsl(var(--background))] text-[hsl(var(--foreground))] focus:outline-none focus:ring-2 focus:ring-[hsl(var(--ring))]"
|
||||||
|
/>
|
||||||
|
<button
|
||||||
|
type="submit"
|
||||||
|
disabled={loading}
|
||||||
|
className="w-full px-3 py-2 rounded-md bg-[hsl(var(--primary))] text-[hsl(var(--primary-foreground))] hover:opacity-90 disabled:opacity-50"
|
||||||
|
>
|
||||||
|
{loading ? 'Logging in...' : 'Login'}
|
||||||
|
</button>
|
||||||
|
{error && <p className="text-sm text-[hsl(var(--destructive))] text-center">{error}</p>}
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
32
hub-web/tsconfig.app.json
Normal file
32
hub-web/tsconfig.app.json
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"compilerOptions": {
|
||||||
|
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.app.tsbuildinfo",
|
||||||
|
"target": "ES2023",
|
||||||
|
"useDefineForClassFields": true,
|
||||||
|
"lib": ["ES2023", "DOM", "DOM.Iterable"],
|
||||||
|
"module": "ESNext",
|
||||||
|
"types": ["vite/client"],
|
||||||
|
"skipLibCheck": true,
|
||||||
|
|
||||||
|
/* Bundler mode */
|
||||||
|
"moduleResolution": "bundler",
|
||||||
|
"allowImportingTsExtensions": true,
|
||||||
|
"verbatimModuleSyntax": true,
|
||||||
|
"moduleDetection": "force",
|
||||||
|
"noEmit": true,
|
||||||
|
"jsx": "react-jsx",
|
||||||
|
|
||||||
|
/* Linting */
|
||||||
|
"strict": true,
|
||||||
|
"noUnusedLocals": true,
|
||||||
|
"noUnusedParameters": true,
|
||||||
|
"erasableSyntaxOnly": true,
|
||||||
|
"noFallthroughCasesInSwitch": true,
|
||||||
|
"noUncheckedSideEffectImports": true,
|
||||||
|
"baseUrl": ".",
|
||||||
|
"paths": {
|
||||||
|
"@/*": ["./src/*"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"include": ["src"]
|
||||||
|
}
|
||||||
7
hub-web/tsconfig.json
Normal file
7
hub-web/tsconfig.json
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"files": [],
|
||||||
|
"references": [
|
||||||
|
{ "path": "./tsconfig.app.json" },
|
||||||
|
{ "path": "./tsconfig.node.json" }
|
||||||
|
]
|
||||||
|
}
|
||||||
26
hub-web/tsconfig.node.json
Normal file
26
hub-web/tsconfig.node.json
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"compilerOptions": {
|
||||||
|
"tsBuildInfoFile": "./node_modules/.tmp/tsconfig.node.tsbuildinfo",
|
||||||
|
"target": "ES2023",
|
||||||
|
"lib": ["ES2023"],
|
||||||
|
"module": "ESNext",
|
||||||
|
"types": ["node"],
|
||||||
|
"skipLibCheck": true,
|
||||||
|
|
||||||
|
/* Bundler mode */
|
||||||
|
"moduleResolution": "bundler",
|
||||||
|
"allowImportingTsExtensions": true,
|
||||||
|
"verbatimModuleSyntax": true,
|
||||||
|
"moduleDetection": "force",
|
||||||
|
"noEmit": true,
|
||||||
|
|
||||||
|
/* Linting */
|
||||||
|
"strict": true,
|
||||||
|
"noUnusedLocals": true,
|
||||||
|
"noUnusedParameters": true,
|
||||||
|
"erasableSyntaxOnly": true,
|
||||||
|
"noFallthroughCasesInSwitch": true,
|
||||||
|
"noUncheckedSideEffectImports": true
|
||||||
|
},
|
||||||
|
"include": ["vite.config.ts"]
|
||||||
|
}
|
||||||
21
hub-web/vite.config.ts
Normal file
21
hub-web/vite.config.ts
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
import { defineConfig } from 'vite'
|
||||||
|
import react from '@vitejs/plugin-react'
|
||||||
|
import tailwindcss from '@tailwindcss/vite'
|
||||||
|
import path from 'path'
|
||||||
|
|
||||||
|
export default defineConfig({
|
||||||
|
plugins: [react(), tailwindcss()],
|
||||||
|
resolve: {
|
||||||
|
alias: {
|
||||||
|
'@': path.resolve(__dirname, './src'),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
server: {
|
||||||
|
proxy: {
|
||||||
|
'/v1': 'http://localhost:8000',
|
||||||
|
'/auth': 'http://localhost:8000',
|
||||||
|
'/health': 'http://localhost:8000',
|
||||||
|
'/gpu': 'http://localhost:8000',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
})
|
||||||
Reference in New Issue
Block a user