feat(search): Phase 3 Ask pipeline (evidence + synthesis + /api/search/ask)

- llm_gate.py: MLX single-inference 전역 semaphore (analyzer/evidence/synthesis 공유)
- search_pipeline.py: run_search() 추출, /search 와 /ask 단일 진실 소스
- evidence_service.py: Rule + LLM span select (EV-A), doc-group ordering,
  span too-short 자동 확장(<80자→120자), fallback 은 query 중심 window 강제
- synthesis_service.py: grounded answer + citation 검증 + LRU 캐시(1h/300),
  refused 처리, span_text ONLY 룰 (full_snippet 프롬프트 금지)
- /api/search/ask: 15s timeout, 9가지 failure mode + 한국어 no_results_reason
- rerank_service: rerank_score raw 보존 (display drift 방지)
- query_analyzer: _get_llm_semaphore 를 llm_gate.get_mlx_gate 로 위임
- prompts: evidence_extract.txt, search_synthesis.txt (JSON-only, example 포함)

config.yaml / docker / ollama / infra_inventory 변경 없음.
plan: ~/.claude/plans/quiet-meandering-nova.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-09 07:34:08 +09:00
parent 120db86d74
commit 64322e4f6f
9 changed files with 1698 additions and 258 deletions

View File

@@ -134,7 +134,12 @@ async def rerank_chunks(
if idx is None or sc is None or idx >= len(candidates):
continue
chunk = candidates[idx]
chunk.score = float(sc)
score = float(sc)
chunk.score = score
# Phase 3.1: reranker raw 점수를 별도 필드에 보존.
# normalize_display_scores가 나중에 .score를 랭크 기반으로 덮어써도
# fast-path 판단에 쓸 수 있는 원본 신호 유지.
chunk.rerank_score = score
chunk.match_reason = (chunk.match_reason or "") + "+rerank"
reranked.append(chunk)
return reranked[:limit]