refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일 #20

Merged
hyungi merged 1 commits from feat/gpu-llm-remove into main 2026-05-14 08:34:00 +09:00
Owner

GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드.
Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary +
classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger,
premium 과 budget 공유).

  • triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint)
  • fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false)
  • classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512)
  • primary / premium / embedding / rerank: 변경 0

후속 (별 커밋): ssh gpu "ollama rm gemma4:e4b-it-q8_0" — VRAM ~11GB 회수.

Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증,
(2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음),
(3) Beszel siteMonitor :8801 health check + Synology Chat alert.

plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략)

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드. Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary + classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger, premium 과 budget 공유). - triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint) - fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false) - classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512) - primary / premium / embedding / rerank: 변경 0 후속 (별 커밋): `ssh gpu "ollama rm gemma4:e4b-it-q8_0"` — VRAM ~11GB 회수. Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증, (2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음), (3) Beszel siteMonitor :8801 health check + Synology Chat alert. plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyungi added 1 commit 2026-05-14 08:33:57 +09:00
GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드.
Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary +
classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger,
premium 과 budget 공유).

- triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint)
- fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false)
- classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512)
- primary / premium / embedding / rerank: 변경 0

후속 (별 커밋): `ssh gpu "ollama rm gemma4:e4b-it-q8_0"` — VRAM ~11GB 회수.

Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증,
(2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음),
(3) Beszel siteMonitor :8801 health check + Synology Chat alert.

plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyungi merged commit f1399459c5 into main 2026-05-14 08:34:00 +09:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: hyungi/hyungi_document_server#20