076c0e1802
round-2-review-mighty-starfish.md v2.1 (Phase 2B Reranker Diagnose) plan 실행.
Phase 2A 의 CANDIDATE_BACKEND_MAP 패턴 재사용 + RERANKER_BACKEND_MAP 신규.
코드 변경 (4 파일):
- app/services/search/rerank_service.py:
- RERANKER_BACKEND_MAP allowlist (baseline / cand_gte_ml_base, slug-based resolve)
- _resolve_reranker(slug) → endpoint URL or None
- _rerank_via_candidate_endpoint() — 후보 TEI POST /rerank
- rerank_chunks() 시그니처에 reranker_backend + snapshot_*_id_max 추가 + dispatch log
- app/services/search/search_pipeline.py: run_search() threading
- app/api/search.py: reranker_backend Query parameter + 400 unknown_reranker_backend 에러 매핑
- tests/search_eval/run_eval.py: --reranker-backend flag + call_search/evaluate threading
infra:
- docker-compose.override.rerank-cand.yml: 3 후보 service (gte_ml_base / mxbai_large / bge_v2_gemma_2b),
profile 'rerank-cand' 격리, restart=unless-stopped
측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv (NDCG 0.659, Phase 2A 와 일치 = 재현성 PASS)
- reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2b_{baseline_snapshot,gte_ml_base}_2026-05-23.json
- reports/phase_2b_reranker_decision_2026-05-23.md
- tests/fixtures/tei_rerank_response.json (G0-1 한국어+영어 mixed sample sanity PASS)
후보 TEI 1.7 호환성 (Phase 1 smoke gate):
- cand_gte_ml_base : ✅ PASS (xlm-roberta-based, TEI 호환)
- cand_mxbai_large : ❌ deberta-v2 미지원 → Phase 2B-Extended (sentence-transformers wrapper)
- cand_bge_v2_gemma_2b : ❌ LLM-based reranker, 1_Pooling/config.json 부재 → Phase 2B-Extended (FlagEmbedding wrapper)
결과 (1 후보 측정 + baseline rebaseline):
| Candidate | NDCG | Δ baseline | mixed | korean | exam | p50 ms |
|------------------------------------|------:|-----------:|------:|-------:|------:|-------:|
| bge-reranker-v2-m3 (baseline) | 0.659 | — | 0.39 | 0.51 | 0.74 | 454 |
| cand_gte_ml_base | 0.604 | -0.055 | 0.38 | 0.41 | 0.62 | 345 |
Decision (H3): bge-reranker-v2-m3 유지. gte 의 reranker quality 가 production 보다 약함 (korean_only -0.10, exam -0.12, overall -0.055).
후속 PR 백로그 (6건):
- PR-Search-Query-Rewrite-1 (Phase 2Q, korean_only/mixed 보완 권고)
- PR-2B-Extended-Mxbai-Large (sentence-transformers wrapper)
- PR-2B-Extended-Bge-V2-Gemma (FlagEmbedding LayerwiseReranker wrapper)
- PR-2B-Extended-Jina-V2-ML (license 결정 후, 개인 비영리 가정)
- PR-2B-Cloud-Reranker-Scaffold-1 (Cohere scaffold-only, 선택)
- PR-2B-Rerank-Cand-Cleanup-1 (1주 후 cand 컨테이너 정리)
production 영향:
- production reranker (bge-reranker-v2-m3) 변경 0
- config.yaml ai.models.rerank.endpoint 변경 0
- embedding (bge-m3 ollama) 변경 0 (Phase 2A 결정 보존)
- documents / document_chunks 변경 0 (21365 docs / 30605 chunks 그대로)
- 4 smoke PASS (baseline / baseline+snapshot / cand_gte_ml_base / cand_invalid → 400)
- dispatch log 박제 verify (endpoint + snapshot id)
closure gate: 16 항목 PASS (flex closure 조항 적용 — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
47 lines
2.2 KiB
JSON
47 lines
2.2 KiB
JSON
{
|
|
"fixture_purpose": "Phase 2B G0-1 — TEI rerank endpoint 응답 spec 박제. mixed (한국어+영어) sanity check.",
|
|
"request": {
|
|
"endpoint_examples": [
|
|
"http://reranker:80/rerank (production baseline bge-reranker-v2-m3)",
|
|
"http://rerank-cand-gte-ml-base:80/rerank",
|
|
"http://rerank-cand-mxbai-large:80/rerank",
|
|
"http://rerank-cand-bge-v2-gemma-2b:80/rerank"
|
|
],
|
|
"method": "POST",
|
|
"headers": {"Content-Type": "application/json"},
|
|
"body": {
|
|
"query": "압력용기 설계 기준",
|
|
"texts": [
|
|
"ASME Section VIII Division 1 pressure vessel design rules and material selection criteria for high-pressure applications.",
|
|
"고압가스 안전관리법에 따른 압력용기 검사 기준 — 정기 검사 주기 및 안전 밸브 설정.",
|
|
"Today weather forecast for Seoul: partly cloudy with chance of rain in the afternoon."
|
|
]
|
|
}
|
|
},
|
|
"response_shape": "[{index: int, score: float}, ...] sorted score desc",
|
|
"captured_responses": {
|
|
"baseline_bge_v2_m3": {
|
|
"endpoint": "http://reranker:80/rerank",
|
|
"model": "BAAI/bge-reranker-v2-m3 (production)",
|
|
"raw": [
|
|
{"index": 0, "score": 0.9091032},
|
|
{"index": 1, "score": 0.7514658},
|
|
{"index": 2, "score": 0.0000165714}
|
|
],
|
|
"interpretation": "ASME(en)+고압가스(ko) 둘 다 무관(weather) 보다 명확 높음. 한국어/영어 score gap 작음 (0.91 vs 0.75 = 0.16) — 한국어 능력 강함."
|
|
},
|
|
"cand_gte_ml_base": {
|
|
"endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
|
|
"model": "Alibaba-NLP/gte-multilingual-reranker-base",
|
|
"raw": [
|
|
{"index": 0, "score": 0.6365791},
|
|
{"index": 1, "score": 0.4685475},
|
|
{"index": 2, "score": 0.034488525}
|
|
],
|
|
"interpretation": "ASME(en)+고압가스(ko) 둘 다 weather 보다 명확 높음. 한국어/영어 score gap 0.17 — baseline 과 비슷. score 절대값 baseline 보다 낮음 (model 별 calibration 차이, rank 순서는 동일)."
|
|
}
|
|
},
|
|
"sanity_check": "ASME(en) > 고압가스(ko) > weather(noise) 순서 — 두 모델 모두 통과. 후보가 한국어 무관하지 않은지 검증.",
|
|
"captured_at": "2026-05-23"
|
|
}
|