fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)

plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker
413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries
한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도).

4 iteration 진단 history (json 박제):
  1) cap 60 + dedup = 413 다수 (batch 54 > 32)
  2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261)
  3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개)
  4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL)

변경:
- app/services/search/search_pipeline.py:
  · _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only.
    variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존.
  · PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline
    MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도).
  · search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap.
- docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정):
  · MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도)
  · MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause)
  · GPU VRAM free 6199MiB 충분 사전 verify.
- tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test.
  38/38 PASS (기존 32 + 신규 6).

측정 결과 (51 case, gemma backend, snapshot 25180/56526):
  vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수):
  · NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족)
  · Recall t≥2 0.721 (+0.034 회복)
  · Recall t≥3 0.739 (+0.011)
  · latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win
  · 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0
  · 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목)

closure gate PASS:
  · unit test 38/38, production smoke 413 0
  · 51 case 413 < 5/51 (1건만)
  · latency 대폭 개선
  · NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족
  · Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready

산출물:
  · reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw)
  · tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
hyungi
2026-05-24 03:54:59 +00:00
parent 1ae7802485
commit b734fc54af
5 changed files with 336 additions and 5 deletions
+53 -4
View File
@@ -76,6 +76,18 @@ PHASE2Q_PRODUCTION_TOPK = 50
PHASE2Q_UNIFIED_CAP = 60 # variant 합성 후 reranker 입력 후보 doc cap
PHASE2Q_RRF_K = 60 # production fusion_service.RRFOnly.K 와 동일
# PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 reranker 입력 후보
# chunk cap. baseline path (run_search) 의 MAX_RERANK_INPUT=200 과 별도.
# 진단 history (2026-05-24):
# 1) cap 60 + dedup 0 = 413 다수 + NDCG 0.927 (Phase 3 baseline)
# 2) cap 30 + chunks_per_doc=1 + dedup = 413 0건 + NDCG 0.666 (-0.261 catastrophic)
# 3) cap 60 + chunks_per_doc=2 + dedup + TEI MAX_BATCH_TOKENS 8192→16384 = NDCG 회복
# 예상 (사용자 결정 = 본 path). doc 다양성 유지 + reranker 가 doc 의 2 best chunks
# 봄 + payload 한도 16384 안에 안전.
# baseline MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 는 영향 0 (multi-query 전용 cap).
PHASE2Q_RERANK_INPUT_CAP = 60
PHASE2Q_CHUNKS_PER_DOC = 2
def _analyzer_tier(confidence: float) -> str:
"""analyzer_confidence → 사용 tier 문자열. Phase 2.2/2.3에서 실제 분기용."""
@@ -440,6 +452,35 @@ def _rrf_fuse_variants(
return fused[:limit]
def _dedup_chunks_by_id(chunks: "list[SearchResult]") -> "list[SearchResult]":
"""chunk_id 기준 dedup. chunk_id None 인 doc-level result 는 doc.id 기준 first-only.
PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 merged_chunks_by_doc 가
variant 별 same chunk 중복 누적되는 문제 회피 — 같은 chunk_id 의 SearchResult 가
여러 variant 에서 등장하면 첫 등장만 유지 (variant 0 = 원본 verbatim 우선).
중복 누적이 reranker payload 폭발 → 413 → RRF fallback trigger 원인.
SearchResult.id = doc_id (api/search.py:54), SearchResult.chunk_id = optional
chunk identifier (line 63). chunk-level result 는 cid 기준, doc-level (cid=None)
은 id 기준 dedup.
"""
seen_chunk_ids: set[int] = set()
seen_doc_ids_without_chunk: set[int] = set()
result: list["SearchResult"] = []
for c in chunks:
cid = getattr(c, "chunk_id", None)
if cid is not None:
if cid in seen_chunk_ids:
continue
seen_chunk_ids.add(cid)
else:
if c.id in seen_doc_ids_without_chunk:
continue
seen_doc_ids_without_chunk.add(c.id)
result.append(c)
return result
async def search_with_rewrite(
session: AsyncSession,
q: str,
@@ -520,6 +561,10 @@ async def search_with_rewrite(
per_variant_fused.append(fused)
for doc_id, chunks in cbd.items():
merged_chunks_by_doc.setdefault(doc_id, []).extend(chunks)
# PR-2Q-Rerank-Payload-Fix: variant 별 same chunk 중복 누적 → reranker 413 방지.
# chunk_id 기준 dedup (chunk_id None 은 doc.id 기준). 첫 등장 variant 보존.
for doc_id in list(merged_chunks_by_doc.keys()):
merged_chunks_by_doc[doc_id] = _dedup_chunks_by_id(merged_chunks_by_doc[doc_id])
timing["variant_fusion_ms"] = (time.perf_counter() - t_fuse) * 1000
notes.append(f"fusion={strategy.name}")
@@ -539,16 +584,20 @@ async def search_with_rewrite(
if rerank:
t_re = time.perf_counter()
rerank_input: list["SearchResult"] = []
# PR-2Q-Rerank-Payload-Fix: baseline path 의 MAX_RERANK_INPUT=200 와 별도로
# multi-query 전용 더 작은 cap (30) + doc 당 1 chunk 만 — TEI MAX_BATCH_TOKENS=8192
# 한도 안에 chunk token 합산 유지. dedup 후 chunks_per_doc=1 으로 doc 다양성
# 30 docs unique 확보. baseline 의 MAX_CHUNKS_PER_DOC=2 와 별도.
for doc in unified_docs:
chunks = merged_chunks_by_doc.get(doc.id, [])
if chunks:
rerank_input.extend(chunks[:MAX_CHUNKS_PER_DOC])
rerank_input.extend(chunks[:PHASE2Q_CHUNKS_PER_DOC])
else:
rerank_input.append(doc)
if len(rerank_input) >= MAX_RERANK_INPUT:
if len(rerank_input) >= PHASE2Q_RERANK_INPUT_CAP:
break
rerank_input = rerank_input[:MAX_RERANK_INPUT]
notes.append(f"rerank input={len(rerank_input)}")
rerank_input = rerank_input[:PHASE2Q_RERANK_INPUT_CAP]
notes.append(f"rerank input={len(rerank_input)} cap={PHASE2Q_RERANK_INPUT_CAP}")
reranked = await rerank_chunks(
q, rerank_input, limit * 3,