diff --git a/app/api/search.py b/app/api/search.py index d36fcc2..e04f776 100644 --- a/app/api/search.py +++ b/app/api/search.py @@ -179,11 +179,12 @@ async def search( None, pattern=r"^(baseline|cand_[a-z0-9_]+)$", description=( - "Phase 2Q Apply (2026-05-24 진입, opt-in, 1주 관찰). slug-based, no silent fallback. " - "baseline|cand_multi_query_macmini (추천 gemma-4)|cand_multi_query_macbook (qwen3.6). " - "미지정/baseline = single-query path (회귀 0 invariant). " - "변경 후 variant N 별 retrieval+fusion → unified RRF → reranker 1회 (chunk_id dedup + cap 60). " - "docs: docs/phase_2q_apply_opt_in.md" + "⚠️ EXPERIMENTAL / DEPRECATED (Phase 2Q closed 2026-05-24 as evaluated experiment). " + "Result-level dedup 정정 후 net gain marginal (NDCG +0.019, Recall t≥2 +0.030) " + "vs latency cost 큼 (cold +876%, warm +320%). default production rollout 권고 X. " + "slug-based, no silent fallback. baseline|cand_multi_query_macmini|cand_multi_query_macbook. " + "미지정/baseline = single-query path (회귀 0 invariant, 권장 default). " + "opt-in 실험 reference 만 유지 — docs/phase_2q_apply_opt_in.md 의 closed status 참조." ), ), ): diff --git a/docs/phase_2q_apply_opt_in.md b/docs/phase_2q_apply_opt_in.md index 45118e0..1f1d4c4 100644 --- a/docs/phase_2q_apply_opt_in.md +++ b/docs/phase_2q_apply_opt_in.md @@ -1,13 +1,59 @@ -# Phase 2Q Apply — Multi-Query Rewrite (opt-in, 2026-05-24 진입) +# Phase 2Q Multi-Query Rewrite — ⚠️ DEPRECATED / EXPERIMENTAL (2026-05-24 closed) -## 개요 +## 🛑 Status: closed as evaluated experiment -Phase 2Q Diagnose 결과 (decision md `tests/search_eval/baselines/v0_2_phase2q_decision_2026-05-24.md`) -H1 (both backends 유의미 net 개선) 확정 + Rerank-Payload-Fix (commit `b734fc5`) 완료 후 -Apply rollout 진입. +> Phase 2Q Query Rewrite is closed as an evaluated experiment. +> After result-level dedup correction, true net gain was marginal +> (NDCG +0.019, Recall t≥2 +0.030) while latency cost was high +> (cold +876%, warm +320%). Therefore, multi-query rewrite is not +> recommended for default production rollout. Keep opt-in path as +> experimental/deprecated reference only; do not proceed to +> Cache-Prewarm unless future real-query evidence shows a stronger gain. -**rollout 정책 = opt-in 1주 관찰** (2026-05-24 ~ 2026-05-31). 1주 후 metric 정상 시 -default ON 전환 결정 (별 PR `PR-2Q-Apply-Default-ON-1`). +**opt-in flag `?rewrite_backend=cand_multi_query_macmini` 는 코드 유지 (실험 reference)**. +단 **production default rollout 권고 X**. PR-2Q-Cache-Prewarm / PR-2Q-Apply-Default-ON-1 +폐기. Extended 트랙 중 SynonymDict (deterministic, LLM 우회) 만 별도 후보로 보존. + +## 개요 (역사 박제) + +Phase 2Q Diagnose 결과 H1 (both backends 유의미 net 개선) 확정 + Rerank-Payload-Fix +완료 후 Apply opt-in 진입 (commit `fef5ddc`). **단 measurement chain 의 다층 inflation +발견 후 정정값 기준 결정 = closed as experiment.** + +## 측정 정정 history (모든 inflation 정정) + +| Layer | commit | NDCG | inflation 원인 | +|---|---|---:|---| +| Phase 3 | `a41adb6` | 0.927 | chunk_id 중복 누적 | +| Rerank-Fix | `b734fc5` | 0.876 | doc_id 잔재 (chunk dedup 만) | +| Eval-Dedup | `3553573` | 0.641 | eval layer 만 dedup | +| **Result-Dedup (최종)** | **`5e480d6`** | **0.663** | ✅ 0/51 dedup audit 정상 | + +**진짜 multi-query 효과** (baseline 0.644 대비): +- NDCG cold +0.019 / warm +0.015 ← sub-noise +- Recall t≥2 cold +0.030 / warm +0.022 ← 소량 개선 +- Recall t≥3 0.000 (cold) / -0.022 (warm) ← 동등~약간 회귀 +- **latency p50 cold +876% (3692ms) / warm +320% (1588ms)** ← 비용 명확 +- 카테고리: english/standards/mixed 소량 우세 / exam/korean 소량 회귀 + +→ **multi-query 의 marginal quality 개선이 latency cost + 시스템 복잡도 + LLM 의존 정당화 X**. + +## 권고 (사용자 결정 2026-05-24) + +**Phase 2Q 자체는 실패가 아닌 좋은 실험**. 성과: +- chunk_id 중복 inflation 발견 (Phase 3 → Rerank-Fix) +- doc_id / result dedup 문제 정리 (Eval-Dedup → Result-Dedup) +- multi-query 의 실제 효과를 정량화 (NDCG +0.019) +- "LLM rewrite 는 현재 DS 검색 기본값으로는 ROI 낮음" 결론 확보 +- 신규 feedback 메모리 3건 (fixture-first call shape / apply prereq structural fix / + graded NDCG dedup invariant) + +**기능 자체는 deprecated, 교훈과 인프라는 보존**. + +## ~~rollout 정책~~ (역사 박제) + +이전 결정: opt-in 1주 관찰 ~2026-05-31 → default ON 검토. +**정정 결정 (2026-05-24)**: closed as evaluated experiment, default ON 진행 X. **추천 LLM = `cand_multi_query_macmini` (gemma-4-26b-a4b-it-8bit, Mac mini)**. 4-factor weighted 사유 (decision md §4):