fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)

plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker 413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries 한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도). 4 iteration 진단 history (json 박제): 1) cap 60 + dedup = 413 다수 (batch 54 > 32) 2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261) 3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개) 4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL) 변경: - app/services/search/search_pipeline.py: · _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only. variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존. · PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도). · search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap. - docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정): · MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도) · MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause) · GPU VRAM free 6199MiB 충분 사전 verify. - tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test. 38/38 PASS (기존 32 + 신규 6). 측정 결과 (51 case, gemma backend, snapshot 25180/56526): vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수): · NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족) · Recall t≥2 0.721 (+0.034 회복) · Recall t≥3 0.739 (+0.011) · latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win · 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0 · 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목) closure gate PASS: · unit test 38/38, production smoke 413 0 · 51 case 413 < 5/51 (1건만) · latency 대폭 개선 · NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족 · Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready 산출물: · reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw) · tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:54:59 +00:00
parent 1ae7802485
commit b734fc54af
5 changed files with 336 additions and 5 deletions
@@ -76,6 +76,18 @@ PHASE2Q_PRODUCTION_TOPK = 50
 PHASE2Q_UNIFIED_CAP = 60  # variant 합성 후 reranker 입력 후보 doc cap
 PHASE2Q_RRF_K = 60  # production fusion_service.RRFOnly.K 와 동일

+# PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 reranker 입력 후보
+# chunk cap. baseline path (run_search) 의 MAX_RERANK_INPUT=200 과 별도.
+# 진단 history (2026-05-24):
+#   1) cap 60 + dedup 0 = 413 다수 + NDCG 0.927 (Phase 3 baseline)
+#   2) cap 30 + chunks_per_doc=1 + dedup = 413 0건 + NDCG 0.666 (-0.261 catastrophic)
+#   3) cap 60 + chunks_per_doc=2 + dedup + TEI MAX_BATCH_TOKENS 8192→16384 = NDCG 회복
+#      예상 (사용자 결정 = 본 path). doc 다양성 유지 + reranker 가 doc 의 2 best chunks
+#      봄 + payload 한도 16384 안에 안전.
+# baseline MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 는 영향 0 (multi-query 전용 cap).
+PHASE2Q_RERANK_INPUT_CAP = 60
+PHASE2Q_CHUNKS_PER_DOC = 2
+

 def _analyzer_tier(confidence: float) -> str:
    """analyzer_confidence → 사용 tier 문자열. Phase 2.2/2.3에서 실제 분기용."""
@@ -440,6 +452,35 @@ def _rrf_fuse_variants(
    return fused[:limit]


+def _dedup_chunks_by_id(chunks: "list[SearchResult]") -> "list[SearchResult]":
+    """chunk_id 기준 dedup. chunk_id None 인 doc-level result 는 doc.id 기준 first-only.
+
+    PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 merged_chunks_by_doc 가
+    variant 별 same chunk 중복 누적되는 문제 회피 — 같은 chunk_id 의 SearchResult 가
+    여러 variant 에서 등장하면 첫 등장만 유지 (variant 0 = 원본 verbatim 우선).
+    중복 누적이 reranker payload 폭발 → 413 → RRF fallback trigger 원인.
+
+    SearchResult.id = doc_id (api/search.py:54), SearchResult.chunk_id = optional
+    chunk identifier (line 63). chunk-level result 는 cid 기준, doc-level (cid=None)
+    은 id 기준 dedup.
+    """
+    seen_chunk_ids: set[int] = set()
+    seen_doc_ids_without_chunk: set[int] = set()
+    result: list["SearchResult"] = []
+    for c in chunks:
+        cid = getattr(c, "chunk_id", None)
+        if cid is not None:
+            if cid in seen_chunk_ids:
+                continue
+            seen_chunk_ids.add(cid)
+        else:
+            if c.id in seen_doc_ids_without_chunk:
+                continue
+            seen_doc_ids_without_chunk.add(c.id)
+        result.append(c)
+    return result
+
+
 async def search_with_rewrite(
    session: AsyncSession,
    q: str,
@@ -520,6 +561,10 @@ async def search_with_rewrite(
        per_variant_fused.append(fused)
        for doc_id, chunks in cbd.items():
            merged_chunks_by_doc.setdefault(doc_id, []).extend(chunks)
+    # PR-2Q-Rerank-Payload-Fix: variant 별 same chunk 중복 누적 → reranker 413 방지.
+    # chunk_id 기준 dedup (chunk_id None 은 doc.id 기준). 첫 등장 variant 보존.
+    for doc_id in list(merged_chunks_by_doc.keys()):
+        merged_chunks_by_doc[doc_id] = _dedup_chunks_by_id(merged_chunks_by_doc[doc_id])
    timing["variant_fusion_ms"] = (time.perf_counter() - t_fuse) * 1000
    notes.append(f"fusion={strategy.name}")

@@ -539,16 +584,20 @@ async def search_with_rewrite(
    if rerank:
        t_re = time.perf_counter()
        rerank_input: list["SearchResult"] = []
+        # PR-2Q-Rerank-Payload-Fix: baseline path 의 MAX_RERANK_INPUT=200 와 별도로
+        # multi-query 전용 더 작은 cap (30) + doc 당 1 chunk 만 — TEI MAX_BATCH_TOKENS=8192
+        # 한도 안에 chunk token 합산 유지. dedup 후 chunks_per_doc=1 으로 doc 다양성
+        # 30 docs unique 확보. baseline 의 MAX_CHUNKS_PER_DOC=2 와 별도.
        for doc in unified_docs:
            chunks = merged_chunks_by_doc.get(doc.id, [])
            if chunks:
-                rerank_input.extend(chunks[:MAX_CHUNKS_PER_DOC])
+                rerank_input.extend(chunks[:PHASE2Q_CHUNKS_PER_DOC])
            else:
                rerank_input.append(doc)
-            if len(rerank_input) >= MAX_RERANK_INPUT:
+            if len(rerank_input) >= PHASE2Q_RERANK_INPUT_CAP:
                break
-        rerank_input = rerank_input[:MAX_RERANK_INPUT]
-        notes.append(f"rerank input={len(rerank_input)}")
+        rerank_input = rerank_input[:PHASE2Q_RERANK_INPUT_CAP]
+        notes.append(f"rerank input={len(rerank_input)} cap={PHASE2Q_RERANK_INPUT_CAP}")

        reranked = await rerank_chunks(
            q, rerank_input, limit * 3,