refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일 #20

Merged

hyungi merged 1 commits from feat/gpu-llm-remove into main

2026-05-14 08:34:00 +09:00

Author	SHA1	Message	Date
Hyungi Ahn	4eed0bc4f8	refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일 GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드. Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary + classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger, premium 과 budget 공유). - triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint) - fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false) - classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512) - primary / premium / embedding / rerank: 변경 0 후속 (별 커밋): `ssh gpu "ollama rm gemma4:e4b-it-q8_0"` — VRAM ~11GB 회수. Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증, (2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음), (3) Beszel siteMonitor :8801 health check + Synology Chat alert. plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 08:16:40 +09:00

Author

SHA1

Message

Date

Hyungi Ahn

4eed0bc4f8

refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일

GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드.
Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary +
classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger,
premium 과 budget 공유).

- triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint)
- fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false)
- classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512)
- primary / premium / embedding / rerank: 변경 0

후속 (별 커밋): `ssh gpu "ollama rm gemma4:e4b-it-q8_0"` — VRAM ~11GB 회수.

Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증,
(2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음),
(3) Beszel siteMonitor :8801 health check + Synology Chat alert.

plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 08:16:40 +09:00

refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일 #20

1 Commits