Files
hyungi_document_server/config.yaml
T
hyungi a7b8f15870 feat(search): /ask backend dispatcher (qwen-macbook opt-in, no silent fallback)
PR-MacBook-RAG-Backend-1 — /api/search/ask 의 명시 backend 선택 진입점.

핵심 invariant (정정 4):
- backend 미지정 = Gemma Mac mini default, 응답 contract 변동 0
- backend="qwen-macbook" 명시 opt-in 만 MacBook M5 Max mlx-vlm.server 호출
- MacBook unavailable 시 HTTP 503 + error_reason=macbook_unavailable
- 자동 fallback 절대 금지 — 실패 path 에서 Gemma backend.generate() 호출 0

backend dispatcher (services/llm/):
- BackendBase / GemmaMacMiniBackend / QwenMacBookBackend / BackendUnavailable
- Qwen backend 는 Mac mini llm_gate 점유 X, 별 Semaphore(1) — llm_gate
  docstring 의 single-inference 영구 룰은 같은 endpoint 한정으로 scope 명시
- httpx Connect/Read/Pool/Timeout/5xx → BackendUnavailable, 4xx 전파

synthesis_service.py:
- backend 인자 추가, status="backend_unavailable" 신규
- cache key 에 backend_name 포함 (qwen ↔ gemma 캐시 충돌 차단)

config:
- search.ask.backend.{macmini_url, macbook_url, macbook_model,
  timeout_connect_s=1, timeout_read_s=30}
- MacBook endpoint = http://100.118.112.84:8810 (M5 Max Tailscale bind)

tests (14 신규):
- tests/services/test_backend_dispatcher.py (9): dispatcher 정합성 + Qwen
  generate path (mock 200 / dead port / 5xx / 4xx) + cache identity
- tests/api/test_search_ask_macbook_503.py (5): 정정 4 핵심 invariant.
  backend=qwen-macbook 비가용 시 gemma.generate.assert_not_called()

기존 ask 회귀 0 (test_ask_eval_auth 9건 등 85건 모두 PASS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:10:44 +00:00

170 lines
6.3 KiB
YAML

# hyungi_Document_Server 설정
ai:
gateway:
endpoint: "http://ai-gateway:8080"
models:
# ─── 단일 generation 호스트 routing (2026-05-14 GPU LLM 제거) ───
# GPU Ollama gemma4:e4b-it-q8_0 제거. Mac mini 26B-A4B 가 triage + primary + classifier 모두 흡수.
# fallback 은 Claude Sonnet 4 API (Mac mini 다운 시 자동 trigger, premium 과 budget 공유).
# plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E
# triage: 상시 분류·요약·근거 선별. Mac mini 26B (primary 와 동일 endpoint, 짧은 max_tokens).
triage:
endpoint: "http://100.76.254.116:8801/v1/chat/completions"
model: "mlx-community/gemma-4-26b-a4b-it-8bit"
max_tokens: 4096
timeout: 30
context_char_limit: 120000
# primary: 에스컬레이션 전용. 26B MLX (맥미니 Semaphore(1) 보호 대상).
primary:
endpoint: "http://100.76.254.116:8801/v1/chat/completions"
model: "mlx-community/gemma-4-26b-a4b-it-8bit"
max_tokens: 8192
timeout: 180
context_char_limit: 260000
# fallback: primary 장애 시 최후 방어선. Claude Sonnet 4 API (소액 한도, 자동 trigger).
# 호출 빈도 낮음 가정 (Mac mini 가 거의 항상 up) → premium 과 budget 공유 OK.
fallback:
endpoint: "https://api.anthropic.com/v1/messages"
model: "claude-sonnet-4-20250514"
max_tokens: 4096
daily_budget_usd: 5.00
require_explicit_trigger: false
timeout: 120
premium:
endpoint: "https://api.anthropic.com/v1/messages"
model: "claude-sonnet-4-20250514"
max_tokens: 8192
daily_budget_usd: 5.00
require_explicit_trigger: true
embedding:
endpoint: "http://ollama:11434/api/embeddings"
model: "bge-m3"
rerank:
endpoint: "http://reranker:80/rerank"
model: "bge-reranker-v2-m3"
# Phase 3.5a answerability classifier. 2026-05-14 GPU LLM 제거 후 Mac mini 26B 로 swap.
# classifier_service 가 hasattr 체크로 optional 이므로 이 섹션 제거 시 classifier gate 는 자동 skip (score-only).
classifier:
endpoint: "http://100.76.254.116:8801/v1/chat/completions"
model: "mlx-community/gemma-4-26b-a4b-it-8bit"
max_tokens: 512
timeout: 30 # 2026-05-17: 15s 도 동시 부하 시 elapsed 14.4s 직전이라 tight — 30s 로 2x 마진 (Mac mini 26B concurrent load). classifier_service.LLM_TIMEOUT_MS=30000 와 align
# 제거: vision (미사용)
# ─── deep_summary enqueue 폭발 억제 (B-1 R2) ───
# 초기 튜닝 전 deep_summary 큐에 soft escalate 가 과발생하면 MLX 26B 가 포화된다.
# 아래 임계치 중 하나라도 초과하면 soft escalate (recommend_deep_summary 만) 를
# suppress. hard escalate (long_context / triage_json_invalid / low_confidence)는
# 절대 suppress 되지 않는다.
deep_summary_backlog:
ratio_threshold: 0.3 # 지난 window 의 deep_n/classify_n
pending_threshold: 5 # deep_summary stage 의 pending+processing
window_minutes: 30
# ─── /api/search/ask backend dispatcher (PR-MacBook-RAG-Backend-1) ───
# backend 미지정 (default) → Gemma Mac mini (settings.ai.primary 경로 그대로, 변동 0).
# backend="qwen-macbook" 명시 opt-in → MacBook M5 Max mlx-vlm.server. unavailable 시 503.
# 자동 fallback 없음 ([[macbook-inference-endpoint-role]] Invariant 1).
search:
ask:
backend:
macmini_url: "http://100.76.254.116:8801" # Gemma 경로 = settings.ai.primary 가 권위, 본 키는 spec 일관성 + 변경 추적용
macbook_url: "http://100.118.112.84:8810" # MacBook M5 Max Tailscale interface bind
macbook_model: "mlx-community/Qwen3.6-27B-8bit"
timeout_connect_s: 1 # MacBook sleep/wake 빠른 감지 (자동 fallback 부재 → 빠른 503)
timeout_read_s: 30 # synthesis_service.LLM_TIMEOUT_MS=30000 와 align
nas:
mount_path: "/documents"
pkm_root: "/documents/PKM"
# ─── 업로드 한도 정책 (authoritative) ───
# 프록시(home-caddy 등) request_body 한도는 max_bytes * content_length_slack_ratio 이상 유지.
upload:
max_bytes: 100000000 # 100 MB (SI). 업로드 실제 제한의 단일 진실 공급원.
content_length_slack_ratio: 1.05 # multipart form 오버헤드(헤더/바운더리) 여유.
stream_chunk_bytes: 1048576 # 1 MiB 단위 스트리밍 read/write.
# ─── 문서 분류 체계 ───
taxonomy:
Philosophy:
Ethics: []
Metaphysics: []
Epistemology: []
Logic: []
Aesthetics: []
Eastern_Philosophy: []
Western_Philosophy: []
Language:
Korean: []
English: []
Japanese: []
Translation: []
Linguistics: []
Engineering:
Mechanical: [Piping, HVAC, Equipment]
Electrical: [Power, Instrumentation]
Chemical: [Process, Material]
Civil: []
Network: [Server, Security, Infrastructure]
Industrial_Safety:
Legislation: [Act, Decree, Foreign_Law, Korea_Law_Archive, Enforcement_Rule, Public_Notice, SAPA]
Theory: [Industrial_Safety_General, Safety_Health_Fundamentals]
Academic_Papers: [Safety_General, Risk_Assessment_Research]
Cases: [Domestic, International]
Practice: [Checklist, Contractor_Management, Safety_Education, Emergency_Plan, Patrol_Inspection, Permit_to_Work, PPE, Safety_Plan]
Risk_Assessment: [KRAS, JSA, Checklist_Method]
Safety_Manager: [Appointment, Duty_Record, Improvement, Inspection, Meeting]
Health_Manager: [Appointment, Duty_Record, Ergonomics, Health_Checkup, Mental_Health, MSDS, Work_Environment]
Programming:
Programming_Language: [Python, JavaScript, Go, Rust]
Framework: [FastAPI, SvelteKit, React]
DevOps: [Docker, CI_CD, Linux_Administration]
AI_ML: [Large_Language_Model, Computer_Vision, Data_Science]
Database: []
Software_Architecture: []
General:
Reading_Notes: []
Self_Development: []
Business: []
Science: []
History: []
document_types:
- Reference
- Standard
- Manual
- Drawing
- Template
- Note
- Academic_Paper
- Law_Document
- Report
- Memo
- Checklist
- Meeting_Minutes
- Specification
- 발주서
- 세금계산서
- 명세표
- 도면
- 증명서
- 계획서
- 시방서
schedule:
law_monitor: "07:00"
mailplus_archive: ["07:00", "18:00"]
daily_digest: "20:00"
file_watcher_interval_minutes: 5
queue_consumer_interval_minutes: 10