infra: migrate application from Mac mini to GPU server
- Integrate ollama + ai-gateway into root docker-compose.yml (NVIDIA GPU runtime, single compose for all services) - Change NAS mount from SMB (NAS_SMB_PATH) to NFS (NAS_NFS_PATH) Default: /mnt/nas/Document_Server (fstab registered on GPU server) - Update config.yaml AI endpoints: primary → Mac mini MLX via Tailscale (100.76.254.116:8800) fallback/embedding/vision/rerank → ollama (same Docker network) gateway → ai-gateway (same Docker network) - Update credentials.env.example (remove GPU_SERVER_IP, add NFS path) - Mark gpu-server/docker-compose.yml as deprecated - Update CLAUDE.md network diagram and AI model config - Update architecture.md, deploy.md, devlog.md for GPU server as main - Caddyfile: auto_https off, HTTP only (TLS at upstream proxy) - Caddy port: 127.0.0.1:8080:80 (localhost only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
12
config.yaml
12
config.yaml
@@ -2,17 +2,17 @@
|
||||
|
||||
ai:
|
||||
gateway:
|
||||
endpoint: "http://gpu-server:8080"
|
||||
endpoint: "http://ai-gateway:8080"
|
||||
|
||||
models:
|
||||
primary:
|
||||
endpoint: "http://host.docker.internal:8800/v1/chat/completions"
|
||||
endpoint: "http://100.76.254.116:8800/v1/chat/completions"
|
||||
model: "mlx-community/Qwen3.5-35B-A3B-4bit"
|
||||
max_tokens: 4096
|
||||
timeout: 60
|
||||
|
||||
fallback:
|
||||
endpoint: "http://gpu-server:11434/v1/chat/completions"
|
||||
endpoint: "http://ollama:11434/v1/chat/completions"
|
||||
model: "qwen3.5:35b-a3b"
|
||||
max_tokens: 4096
|
||||
timeout: 120
|
||||
@@ -25,15 +25,15 @@ ai:
|
||||
require_explicit_trigger: true
|
||||
|
||||
embedding:
|
||||
endpoint: "http://gpu-server:11434/api/embeddings"
|
||||
endpoint: "http://ollama:11434/api/embeddings"
|
||||
model: "nomic-embed-text"
|
||||
|
||||
vision:
|
||||
endpoint: "http://gpu-server:11434/api/generate"
|
||||
endpoint: "http://ollama:11434/api/generate"
|
||||
model: "Qwen2.5-VL-7B"
|
||||
|
||||
rerank:
|
||||
endpoint: "http://gpu-server:11434/api/rerank"
|
||||
endpoint: "http://ollama:11434/api/rerank"
|
||||
model: "bge-reranker-v2-m3"
|
||||
|
||||
nas:
|
||||
|
||||
Reference in New Issue
Block a user