fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)
plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker
413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries
한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도).
4 iteration 진단 history (json 박제):
1) cap 60 + dedup = 413 다수 (batch 54 > 32)
2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261)
3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개)
4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL)
변경:
- app/services/search/search_pipeline.py:
· _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only.
variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존.
· PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline
MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도).
· search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap.
- docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정):
· MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도)
· MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause)
· GPU VRAM free 6199MiB 충분 사전 verify.
- tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test.
38/38 PASS (기존 32 + 신규 6).
측정 결과 (51 case, gemma backend, snapshot 25180/56526):
vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수):
· NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족)
· Recall t≥2 0.721 (+0.034 회복)
· Recall t≥3 0.739 (+0.011)
· latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win
· 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0
· 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목)
closure gate PASS:
· unit test 38/38, production smoke 413 0
· 51 case 413 < 5/51 (1건만)
· latency 대폭 개선
· NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족
· Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready
산출물:
· reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw)
· tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -76,6 +76,18 @@ PHASE2Q_PRODUCTION_TOPK = 50
|
||||
PHASE2Q_UNIFIED_CAP = 60 # variant 합성 후 reranker 입력 후보 doc cap
|
||||
PHASE2Q_RRF_K = 60 # production fusion_service.RRFOnly.K 와 동일
|
||||
|
||||
# PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 reranker 입력 후보
|
||||
# chunk cap. baseline path (run_search) 의 MAX_RERANK_INPUT=200 과 별도.
|
||||
# 진단 history (2026-05-24):
|
||||
# 1) cap 60 + dedup 0 = 413 다수 + NDCG 0.927 (Phase 3 baseline)
|
||||
# 2) cap 30 + chunks_per_doc=1 + dedup = 413 0건 + NDCG 0.666 (-0.261 catastrophic)
|
||||
# 3) cap 60 + chunks_per_doc=2 + dedup + TEI MAX_BATCH_TOKENS 8192→16384 = NDCG 회복
|
||||
# 예상 (사용자 결정 = 본 path). doc 다양성 유지 + reranker 가 doc 의 2 best chunks
|
||||
# 봄 + payload 한도 16384 안에 안전.
|
||||
# baseline MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 는 영향 0 (multi-query 전용 cap).
|
||||
PHASE2Q_RERANK_INPUT_CAP = 60
|
||||
PHASE2Q_CHUNKS_PER_DOC = 2
|
||||
|
||||
|
||||
def _analyzer_tier(confidence: float) -> str:
|
||||
"""analyzer_confidence → 사용 tier 문자열. Phase 2.2/2.3에서 실제 분기용."""
|
||||
@@ -440,6 +452,35 @@ def _rrf_fuse_variants(
|
||||
return fused[:limit]
|
||||
|
||||
|
||||
def _dedup_chunks_by_id(chunks: "list[SearchResult]") -> "list[SearchResult]":
|
||||
"""chunk_id 기준 dedup. chunk_id None 인 doc-level result 는 doc.id 기준 first-only.
|
||||
|
||||
PR-2Q-Rerank-Payload-Fix (Apply prereq). multi-query path 의 merged_chunks_by_doc 가
|
||||
variant 별 same chunk 중복 누적되는 문제 회피 — 같은 chunk_id 의 SearchResult 가
|
||||
여러 variant 에서 등장하면 첫 등장만 유지 (variant 0 = 원본 verbatim 우선).
|
||||
중복 누적이 reranker payload 폭발 → 413 → RRF fallback trigger 원인.
|
||||
|
||||
SearchResult.id = doc_id (api/search.py:54), SearchResult.chunk_id = optional
|
||||
chunk identifier (line 63). chunk-level result 는 cid 기준, doc-level (cid=None)
|
||||
은 id 기준 dedup.
|
||||
"""
|
||||
seen_chunk_ids: set[int] = set()
|
||||
seen_doc_ids_without_chunk: set[int] = set()
|
||||
result: list["SearchResult"] = []
|
||||
for c in chunks:
|
||||
cid = getattr(c, "chunk_id", None)
|
||||
if cid is not None:
|
||||
if cid in seen_chunk_ids:
|
||||
continue
|
||||
seen_chunk_ids.add(cid)
|
||||
else:
|
||||
if c.id in seen_doc_ids_without_chunk:
|
||||
continue
|
||||
seen_doc_ids_without_chunk.add(c.id)
|
||||
result.append(c)
|
||||
return result
|
||||
|
||||
|
||||
async def search_with_rewrite(
|
||||
session: AsyncSession,
|
||||
q: str,
|
||||
@@ -520,6 +561,10 @@ async def search_with_rewrite(
|
||||
per_variant_fused.append(fused)
|
||||
for doc_id, chunks in cbd.items():
|
||||
merged_chunks_by_doc.setdefault(doc_id, []).extend(chunks)
|
||||
# PR-2Q-Rerank-Payload-Fix: variant 별 same chunk 중복 누적 → reranker 413 방지.
|
||||
# chunk_id 기준 dedup (chunk_id None 은 doc.id 기준). 첫 등장 variant 보존.
|
||||
for doc_id in list(merged_chunks_by_doc.keys()):
|
||||
merged_chunks_by_doc[doc_id] = _dedup_chunks_by_id(merged_chunks_by_doc[doc_id])
|
||||
timing["variant_fusion_ms"] = (time.perf_counter() - t_fuse) * 1000
|
||||
notes.append(f"fusion={strategy.name}")
|
||||
|
||||
@@ -539,16 +584,20 @@ async def search_with_rewrite(
|
||||
if rerank:
|
||||
t_re = time.perf_counter()
|
||||
rerank_input: list["SearchResult"] = []
|
||||
# PR-2Q-Rerank-Payload-Fix: baseline path 의 MAX_RERANK_INPUT=200 와 별도로
|
||||
# multi-query 전용 더 작은 cap (30) + doc 당 1 chunk 만 — TEI MAX_BATCH_TOKENS=8192
|
||||
# 한도 안에 chunk token 합산 유지. dedup 후 chunks_per_doc=1 으로 doc 다양성
|
||||
# 30 docs unique 확보. baseline 의 MAX_CHUNKS_PER_DOC=2 와 별도.
|
||||
for doc in unified_docs:
|
||||
chunks = merged_chunks_by_doc.get(doc.id, [])
|
||||
if chunks:
|
||||
rerank_input.extend(chunks[:MAX_CHUNKS_PER_DOC])
|
||||
rerank_input.extend(chunks[:PHASE2Q_CHUNKS_PER_DOC])
|
||||
else:
|
||||
rerank_input.append(doc)
|
||||
if len(rerank_input) >= MAX_RERANK_INPUT:
|
||||
if len(rerank_input) >= PHASE2Q_RERANK_INPUT_CAP:
|
||||
break
|
||||
rerank_input = rerank_input[:MAX_RERANK_INPUT]
|
||||
notes.append(f"rerank input={len(rerank_input)}")
|
||||
rerank_input = rerank_input[:PHASE2Q_RERANK_INPUT_CAP]
|
||||
notes.append(f"rerank input={len(rerank_input)} cap={PHASE2Q_RERANK_INPUT_CAP}")
|
||||
|
||||
reranked = await rerank_chunks(
|
||||
q, rerank_input, limit * 3,
|
||||
|
||||
Reference in New Issue
Block a user