fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)
plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker
413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries
한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도).
4 iteration 진단 history (json 박제):
1) cap 60 + dedup = 413 다수 (batch 54 > 32)
2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261)
3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개)
4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL)
변경:
- app/services/search/search_pipeline.py:
· _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only.
variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존.
· PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline
MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도).
· search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap.
- docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정):
· MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도)
· MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause)
· GPU VRAM free 6199MiB 충분 사전 verify.
- tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test.
38/38 PASS (기존 32 + 신규 6).
측정 결과 (51 case, gemma backend, snapshot 25180/56526):
vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수):
· NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족)
· Recall t≥2 0.721 (+0.034 회복)
· Recall t≥3 0.739 (+0.011)
· latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win
· 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0
· 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목)
closure gate PASS:
· unit test 38/38, production smoke 413 0
· 51 case 413 < 5/51 (1건만)
· latency 대폭 개선
· NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족
· Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready
산출물:
· reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw)
· tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+8
-1
@@ -138,7 +138,14 @@ services:
|
||||
- "80"
|
||||
environment:
|
||||
- MODEL_ID=BAAI/bge-reranker-v2-m3
|
||||
- MAX_BATCH_TOKENS=8192
|
||||
# PR-2Q-Rerank-Payload-Fix (2026-05-24): 2 env 변경 — 413 root cause 분리.
|
||||
# (a) MAX_BATCH_TOKENS 8192 → 16384: 각 batch 의 token sum 한도
|
||||
# (b) MAX_CLIENT_BATCH_SIZE 32 → 64: 1 request 안 entries 수 한도 (default 32).
|
||||
# multi-query cap 60 chunks (×2 chunks_per_doc dedup) 가 batch entries 54~60
|
||||
# → 32 한도 초과 → 413. 64 로 늘림.
|
||||
# GPU VRAM free 6199MiB 충분. baseline path (MAX_RERANK_INPUT=200) 영향 0.
|
||||
- MAX_BATCH_TOKENS=16384
|
||||
- MAX_CLIENT_BATCH_SIZE=64
|
||||
- MAX_CONCURRENT_REQUESTS=4
|
||||
volumes:
|
||||
- reranker_cache:/data
|
||||
|
||||
Reference in New Issue
Block a user