Files
hyungi_document_server/docker-compose.override.rerank-cand.yml
T
hyungi 076c0e1802 feat(eval): Phase 2B Reranker Diagnose — dispatcher + gte 측정 + decision (H3 bge-reranker-v2-m3 유지)
round-2-review-mighty-starfish.md v2.1 (Phase 2B Reranker Diagnose) plan 실행.
Phase 2A 의 CANDIDATE_BACKEND_MAP 패턴 재사용 + RERANKER_BACKEND_MAP 신규.

코드 변경 (4 파일):
- app/services/search/rerank_service.py:
  - RERANKER_BACKEND_MAP allowlist (baseline / cand_gte_ml_base, slug-based resolve)
  - _resolve_reranker(slug) → endpoint URL or None
  - _rerank_via_candidate_endpoint() — 후보 TEI POST /rerank
  - rerank_chunks() 시그니처에 reranker_backend + snapshot_*_id_max 추가 + dispatch log
- app/services/search/search_pipeline.py: run_search() threading
- app/api/search.py: reranker_backend Query parameter + 400 unknown_reranker_backend 에러 매핑
- tests/search_eval/run_eval.py: --reranker-backend flag + call_search/evaluate threading

infra:
- docker-compose.override.rerank-cand.yml: 3 후보 service (gte_ml_base / mxbai_large / bge_v2_gemma_2b),
  profile 'rerank-cand' 격리, restart=unless-stopped

측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv (NDCG 0.659, Phase 2A 와 일치 = 재현성 PASS)
- reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2b_{baseline_snapshot,gte_ml_base}_2026-05-23.json
- reports/phase_2b_reranker_decision_2026-05-23.md
- tests/fixtures/tei_rerank_response.json (G0-1 한국어+영어 mixed sample sanity PASS)

후보 TEI 1.7 호환성 (Phase 1 smoke gate):
- cand_gte_ml_base       :  PASS (xlm-roberta-based, TEI 호환)
- cand_mxbai_large       :  deberta-v2 미지원 → Phase 2B-Extended (sentence-transformers wrapper)
- cand_bge_v2_gemma_2b   :  LLM-based reranker, 1_Pooling/config.json 부재 → Phase 2B-Extended (FlagEmbedding wrapper)

결과 (1 후보 측정 + baseline rebaseline):
| Candidate                          | NDCG  | Δ baseline | mixed | korean | exam  | p50 ms |
|------------------------------------|------:|-----------:|------:|-------:|------:|-------:|
| bge-reranker-v2-m3 (baseline)      | 0.659 | —          | 0.39  | 0.51   | 0.74  | 454    |
| cand_gte_ml_base                   | 0.604 | -0.055     | 0.38  | 0.41   | 0.62  | 345    |

Decision (H3): bge-reranker-v2-m3 유지. gte 의 reranker quality 가 production 보다 약함 (korean_only -0.10, exam -0.12, overall -0.055).

후속 PR 백로그 (6건):
- PR-Search-Query-Rewrite-1 (Phase 2Q, korean_only/mixed 보완 권고)
- PR-2B-Extended-Mxbai-Large (sentence-transformers wrapper)
- PR-2B-Extended-Bge-V2-Gemma (FlagEmbedding LayerwiseReranker wrapper)
- PR-2B-Extended-Jina-V2-ML (license 결정 후, 개인 비영리 가정)
- PR-2B-Cloud-Reranker-Scaffold-1 (Cohere scaffold-only, 선택)
- PR-2B-Rerank-Cand-Cleanup-1 (1주 후 cand 컨테이너 정리)

production 영향:
- production reranker (bge-reranker-v2-m3) 변경 0
- config.yaml ai.models.rerank.endpoint 변경 0
- embedding (bge-m3 ollama) 변경 0 (Phase 2A 결정 보존)
- documents / document_chunks 변경 0 (21365 docs / 30605 chunks 그대로)
- 4 smoke PASS (baseline / baseline+snapshot / cand_gte_ml_base / cand_invalid → 400)
- dispatch log 박제 verify (endpoint + snapshot id)

closure gate: 16 항목 PASS (flex closure 조항 적용 — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:37:42 +00:00

102 lines
3.0 KiB
YAML

# Phase 2B — Reranker candidate compose override (Diagnose only)
#
# Profile-isolated: `--profile rerank-cand` 명시 opt-in. default up 시 미기동.
# production fastapi/postgres/reranker(bge-reranker-v2-m3) 에 영향 0.
# 본 PR 종료 후 별 chore (PR-2B-Rerank-Cand-Cleanup-1) 에서 제거.
#
# 후보 상태 (2026-05-23):
# - gte_ml_base : Apache 2.0, 305M, smoke 대기
# - mxbai_large : Apache 2.0, ~435M, safetensors 부재 — TEI smoke risk
# - bge_v2_gemma_2b : Gemma 라이센스, 2.5B FP16 ~5GB, smoke 대기
#
# 사용:
# docker compose -f docker-compose.yml -f docker-compose.override.rerank-cand.yml \
# --profile rerank-cand up -d rerank-cand-gte-ml-base
services:
rerank-cand-gte-ml-base:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-gte-ml-base-1
expose:
- "80"
environment:
- MODEL_ID=Alibaba-NLP/gte-multilingual-reranker-base
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=4
volumes:
- rerank_cand_gte_ml_base_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
profiles: ["rerank-cand"]
rerank-cand-mxbai-large:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-mxbai-large-1
expose:
- "80"
environment:
- MODEL_ID=mixedbread-ai/mxbai-rerank-large-v1
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=4
volumes:
- rerank_cand_mxbai_large_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
profiles: ["rerank-cand"]
rerank-cand-bge-v2-gemma-2b:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-bge-v2-gemma-2b-1
expose:
- "80"
environment:
- MODEL_ID=BAAI/bge-reranker-v2-gemma
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=2
volumes:
- rerank_cand_bge_v2_gemma_2b_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
profiles: ["rerank-cand"]
volumes:
rerank_cand_gte_ml_base_cache:
rerank_cand_mxbai_large_cache:
rerank_cand_bge_v2_gemma_2b_cache: