- llm_gate.py: MLX single-inference 전역 semaphore (analyzer/evidence/synthesis 공유) - search_pipeline.py: run_search() 추출, /search 와 /ask 단일 진실 소스 - evidence_service.py: Rule + LLM span select (EV-A), doc-group ordering, span too-short 자동 확장(<80자→120자), fallback 은 query 중심 window 강제 - synthesis_service.py: grounded answer + citation 검증 + LRU 캐시(1h/300), refused 처리, span_text ONLY 룰 (full_snippet 프롬프트 금지) - /api/search/ask: 15s timeout, 9가지 failure mode + 한국어 no_results_reason - rerank_service: rerank_score raw 보존 (display drift 방지) - query_analyzer: _get_llm_semaphore 를 llm_gate.get_mlx_gate 로 위임 - prompts: evidence_extract.txt, search_synthesis.txt (JSON-only, example 포함) config.yaml / docker / ollama / infra_inventory 변경 없음. plan: ~/.claude/plans/quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.8 KiB
6.8 KiB