infra: migrate application from Mac mini to GPU server

- Integrate ollama + ai-gateway into root docker-compose.yml
  (NVIDIA GPU runtime, single compose for all services)
- Change NAS mount from SMB (NAS_SMB_PATH) to NFS (NAS_NFS_PATH)
  Default: /mnt/nas/Document_Server (fstab registered on GPU server)
- Update config.yaml AI endpoints:
  primary → Mac mini MLX via Tailscale (100.76.254.116:8800)
  fallback/embedding/vision/rerank → ollama (same Docker network)
  gateway → ai-gateway (same Docker network)
- Update credentials.env.example (remove GPU_SERVER_IP, add NFS path)
- Mark gpu-server/docker-compose.yml as deprecated
- Update CLAUDE.md network diagram and AI model config
- Update architecture.md, deploy.md, devlog.md for GPU server as main
- Caddyfile: auto_https off, HTTP only (TLS at upstream proxy)
- Caddy port: 127.0.0.1:8080:80 (localhost only)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-03 07:47:09 +09:00
parent 8afa3c401f
commit 0ca78640ee
11 changed files with 434 additions and 56 deletions

View File

@@ -27,25 +27,26 @@ Mac mini M4 Pro를 애플리케이션 서버, Synology NAS를 파일 저장소,
## 네트워크 환경
```
Mac mini M4 Pro (애플리케이션 서버):
- Docker Compose: FastAPI(:8000), PostgreSQL(:5432), kordoc(:3100), Caddy(:80,:443)
- MLX Server: http://localhost:8800/v1/chat/completions (Qwen3.5-35B-A3B)
- 외부 접근: pkm.hyungi.net (Caddy 프록시)
GPU 서버 (RTX 4070 Ti Super, Ubuntu, 메인 서버):
- Docker Compose: FastAPI(:8000), PostgreSQL(:5432), kordoc(:3100),
Caddy(:8080 HTTP only), Ollama(:11434), AI Gateway(:8081), frontend(:3000)
- NFS 마운트: /mnt/nas/Document_Server → NAS /volume4/Document_Server
- 외부 접근: document.hyungi.net (앞단 프록시 → Caddy)
- 로컬 IP: 192.168.1.186
Mac mini M4 Pro (AI 서버):
- MLX Server: http://100.76.254.116:8800/v1/chat/completions (Qwen3.5-35B-A3B)
- Tailscale IP: 100.76.254.116
Synology NAS (DS1525+):
- 도메인: ds1525.hyungi.net
- Tailscale IP: 100.101.79.37
- 포트: 15001
- 파일 원본: /volume4/Document_Server/PKM/
- NFS export → GPU 서버
- Synology Office: 문서 편집/미리보기
- Synology Calendar: CalDAV 태스크 관리 (OmniFocus 대체)
- MailPlus: IMAP(993) + SMTP(465)
GPU 서버 (RTX 4070 Ti Super):
- AI Gateway: http://gpu-server:8080 (모델 라우팅, 폴백, 비용 제어)
- nomic-embed-text: 벡터 임베딩
- Qwen2.5-VL-7B: 이미지/도면 OCR
- bge-reranker-v2-m3: RAG 리랭킹
```
## 인증 정보
@@ -57,22 +58,22 @@ GPU 서버 (RTX 4070 Ti Super):
## AI 모델 구성
```
Primary (Mac mini MLX, 상시, 무료):
Primary (Mac mini MLX, Tailscale 경유, 상시, 무료):
mlx-community/Qwen3.5-35B-A3B-4bit — 분류, 태그, 요약
→ http://localhost:8800/v1/chat/completions
→ http://100.76.254.116:8800/v1/chat/completions
Fallback (GPU Ollama, MLX 장애 시):
Fallback (GPU Ollama, 같은 Docker 네트워크, MLX 장애 시):
qwen3.5:35b-a3b
→ http://gpu-server:11434/v1/chat/completions
→ http://ollama:11434/v1/chat/completions
Premium (Claude API, 종량제, 수동 트리거만):
claude-sonnet — 복잡한 분석, 장문 처리
→ 일일 한도 $5, require_explicit_trigger: true
Embedding (GPU 서버 전용):
nomic-embed-text → 벡터 임베딩
Qwen2.5-VL-7B → 이미지/도면 OCR
bge-reranker-v2-m3 → RAG 리랭킹
Embedding (GPU Ollama, 같은 Docker 네트워크):
nomic-embed-text → 벡터 임베딩 → http://ollama:11434/api/embeddings
Qwen2.5-VL-7B → 이미지/도면 OCR → http://ollama:11434/api/generate
bge-reranker-v2-m3 → RAG 리랭킹 → http://ollama:11434/api/rerank
```
## 프로젝트 구조
@@ -120,21 +121,16 @@ hyungi_Document_Server/
## 개발/배포 워크플로우
```
MacBook Pro (개발) → Gitea push → 서버에서 pull
MacBook Pro (개발) → Gitea push → GPU 서버에서 pull
개발:
cd ~/Documents/code/hyungi_Document_Server/
# 코드 작성 → git commit & push
Mac mini 배포:
GPU 서버 배포 (메인):
cd ~/Documents/code/hyungi_Document_Server/
git pull
docker compose up -d
GPU 서버 배포:
cd ~/Documents/code/hyungi_Document_Server/gpu-server/
git pull
docker compose up -d
```
## v1 코드 참조