076c0e1802
round-2-review-mighty-starfish.md v2.1 (Phase 2B Reranker Diagnose) plan 실행.
Phase 2A 의 CANDIDATE_BACKEND_MAP 패턴 재사용 + RERANKER_BACKEND_MAP 신규.
코드 변경 (4 파일):
- app/services/search/rerank_service.py:
- RERANKER_BACKEND_MAP allowlist (baseline / cand_gte_ml_base, slug-based resolve)
- _resolve_reranker(slug) → endpoint URL or None
- _rerank_via_candidate_endpoint() — 후보 TEI POST /rerank
- rerank_chunks() 시그니처에 reranker_backend + snapshot_*_id_max 추가 + dispatch log
- app/services/search/search_pipeline.py: run_search() threading
- app/api/search.py: reranker_backend Query parameter + 400 unknown_reranker_backend 에러 매핑
- tests/search_eval/run_eval.py: --reranker-backend flag + call_search/evaluate threading
infra:
- docker-compose.override.rerank-cand.yml: 3 후보 service (gte_ml_base / mxbai_large / bge_v2_gemma_2b),
profile 'rerank-cand' 격리, restart=unless-stopped
측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv (NDCG 0.659, Phase 2A 와 일치 = 재현성 PASS)
- reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2b_{baseline_snapshot,gte_ml_base}_2026-05-23.json
- reports/phase_2b_reranker_decision_2026-05-23.md
- tests/fixtures/tei_rerank_response.json (G0-1 한국어+영어 mixed sample sanity PASS)
후보 TEI 1.7 호환성 (Phase 1 smoke gate):
- cand_gte_ml_base : ✅ PASS (xlm-roberta-based, TEI 호환)
- cand_mxbai_large : ❌ deberta-v2 미지원 → Phase 2B-Extended (sentence-transformers wrapper)
- cand_bge_v2_gemma_2b : ❌ LLM-based reranker, 1_Pooling/config.json 부재 → Phase 2B-Extended (FlagEmbedding wrapper)
결과 (1 후보 측정 + baseline rebaseline):
| Candidate | NDCG | Δ baseline | mixed | korean | exam | p50 ms |
|------------------------------------|------:|-----------:|------:|-------:|------:|-------:|
| bge-reranker-v2-m3 (baseline) | 0.659 | — | 0.39 | 0.51 | 0.74 | 454 |
| cand_gte_ml_base | 0.604 | -0.055 | 0.38 | 0.41 | 0.62 | 345 |
Decision (H3): bge-reranker-v2-m3 유지. gte 의 reranker quality 가 production 보다 약함 (korean_only -0.10, exam -0.12, overall -0.055).
후속 PR 백로그 (6건):
- PR-Search-Query-Rewrite-1 (Phase 2Q, korean_only/mixed 보완 권고)
- PR-2B-Extended-Mxbai-Large (sentence-transformers wrapper)
- PR-2B-Extended-Bge-V2-Gemma (FlagEmbedding LayerwiseReranker wrapper)
- PR-2B-Extended-Jina-V2-ML (license 결정 후, 개인 비영리 가정)
- PR-2B-Cloud-Reranker-Scaffold-1 (Cohere scaffold-only, 선택)
- PR-2B-Rerank-Cand-Cleanup-1 (1주 후 cand 컨테이너 정리)
production 영향:
- production reranker (bge-reranker-v2-m3) 변경 0
- config.yaml ai.models.rerank.endpoint 변경 0
- embedding (bge-m3 ollama) 변경 0 (Phase 2A 결정 보존)
- documents / document_chunks 변경 0 (21365 docs / 30605 chunks 그대로)
- 4 smoke PASS (baseline / baseline+snapshot / cand_gte_ml_base / cand_invalid → 400)
- dispatch log 박제 verify (endpoint + snapshot id)
closure gate: 16 항목 PASS (flex closure 조항 적용 — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
272 lines
9.4 KiB
Python
272 lines
9.4 KiB
Python
"""Reranker 서비스 — bge-reranker-v2-m3 통합 (Phase 1.3).
|
|
|
|
TEI 컨테이너 호출 + asyncio.Semaphore(2) + soft timeout fallback.
|
|
|
|
데이터 흐름 원칙:
|
|
- fusion = doc 기준 / reranker = chunk 기준 — 절대 섞지 말 것
|
|
- raw chunks를 끝까지 보존, fusion은 압축본만 사용
|
|
- reranker는 chunks_by_doc dict에서 raw chunks 회수해서 chunk 단위로 호출
|
|
- diversity는 reranker 직후 마지막 단계에서만 적용
|
|
|
|
snippet 생성:
|
|
- 200~400 토큰(800~1500자) 기준
|
|
- query keyword 위치 중심 ±target_chars/2 윈도우
|
|
- keyword 매치 없으면 첫 target_chars 문자 fallback (성능 손실 방지)
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import asyncio
|
|
import re
|
|
from typing import TYPE_CHECKING
|
|
|
|
import httpx
|
|
|
|
from ai.client import AIClient
|
|
from core.utils import setup_logger
|
|
|
|
if TYPE_CHECKING:
|
|
from api.search import SearchResult
|
|
|
|
logger = setup_logger("rerank")
|
|
|
|
# 동시 rerank 호출 제한 (GPU saturation 방지)
|
|
RERANK_SEMAPHORE = asyncio.Semaphore(2)
|
|
|
|
# rerank input 크기 제한 (latency / VRAM hard cap)
|
|
MAX_RERANK_INPUT = 200
|
|
MAX_CHUNKS_PER_DOC = 2
|
|
|
|
# Soft timeout (초)
|
|
RERANK_TIMEOUT = 5.0
|
|
|
|
# ─── Phase 2B Diagnose dispatcher (R2-B1 slug-based) ──────────────
|
|
# server-side allowlist map. query parameter 가 raw endpoint URL 받지 않음.
|
|
RERANKER_BACKEND_MAP: dict[str, dict[str, str] | None] = {
|
|
"baseline": None, # production reranker (config.yaml endpoint via AIClient.rerank)
|
|
"cand_gte_ml_base": {
|
|
"endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
|
|
},
|
|
# mxbai_large 후보 (deberta-v2 → TEI 1.7 미지원) Phase 2B-Extended 이관
|
|
# bge_v2_gemma_2b 후보 (LLM-based reranker, 1_Pooling/config.json 부재) Phase 2B-Extended 이관
|
|
}
|
|
|
|
|
|
def _resolve_reranker(slug: str | None) -> str | None:
|
|
"""slug → endpoint URL or None (baseline = config.yaml via AIClient).
|
|
|
|
Raises ValueError on unknown slug (caller 가 HTTP 400 으로 translate).
|
|
"""
|
|
if slug is None or slug == "baseline":
|
|
return None
|
|
if slug not in RERANKER_BACKEND_MAP:
|
|
raise ValueError(f"unknown_reranker_backend: {slug!r}")
|
|
cfg = RERANKER_BACKEND_MAP[slug]
|
|
return cfg["endpoint"] if cfg else None
|
|
|
|
|
|
async def _rerank_via_candidate_endpoint(
|
|
endpoint: str, query: str, texts: list[str]
|
|
) -> list[dict]:
|
|
"""후보 TEI reranker endpoint 호출 (cache 미사용).
|
|
|
|
Returns:
|
|
[{"index": int, "score": float}, ...] sorted score desc.
|
|
Raises:
|
|
httpx errors — caller 가 timeout/fallback path 로.
|
|
"""
|
|
async with httpx.AsyncClient(timeout=RERANK_TIMEOUT) as c:
|
|
r = await c.post(endpoint, json={"query": query, "texts": texts})
|
|
r.raise_for_status()
|
|
data = r.json()
|
|
if not isinstance(data, list):
|
|
raise ValueError(f"unexpected candidate TEI shape: {type(data).__name__}")
|
|
return data
|
|
|
|
|
|
def _extract_window(text: str, query: str, target_chars: int = 800) -> str:
|
|
"""query keyword 위치 중심으로 ±target_chars/2 윈도우 추출.
|
|
|
|
fallback: keyword 매치 없으면 첫 target_chars 문자 그대로.
|
|
이게 없으면 reranker가 무관한 텍스트만 보고 점수 매겨 성능 급락.
|
|
"""
|
|
keywords = [k for k in re.split(r"\s+", query) if len(k) >= 2]
|
|
best_pos = -1
|
|
for kw in keywords:
|
|
pos = text.lower().find(kw.lower())
|
|
if pos >= 0:
|
|
best_pos = pos
|
|
break
|
|
|
|
if best_pos < 0:
|
|
# Fallback: 첫 target_chars 문자
|
|
return text[:target_chars]
|
|
|
|
half = target_chars // 2
|
|
start = max(0, best_pos - half)
|
|
end = min(len(text), start + target_chars)
|
|
return text[start:end]
|
|
|
|
|
|
def _make_snippet(c: "SearchResult", query: str, max_chars: int = 1500) -> str:
|
|
"""Reranker input snippet — title + query 중심 본문 윈도우.
|
|
|
|
feedback_search_phase1_implementation.md 3번 항목 강제:
|
|
snippet 200~400 토큰(800~1500자), full document 절대 안 됨.
|
|
"""
|
|
title = c.title or ""
|
|
text = c.snippet or ""
|
|
|
|
# snippet은 chunk text 앞 200자 또는 doc text 앞 200자
|
|
# 더 긴 chunk text가 필요하면 호출자가 따로 채워서 넘김
|
|
if len(text) > max_chars:
|
|
text = _extract_window(text, query, target_chars=max_chars - 100)
|
|
|
|
return f"{title}\n\n{text}"
|
|
|
|
|
|
def _wrap_doc_as_chunk(doc: "SearchResult") -> "SearchResult":
|
|
"""text-only 매치 doc(chunks_by_doc에 없는 doc)을 ChunkResult 형태로 변환.
|
|
|
|
Phase 1.3 reranker 입력에 doc 자체가 들어가야 하는 경우.
|
|
snippet은 documents.extracted_text 앞 200자 (이미 SearchResult.snippet에 채워짐).
|
|
chunk_id 등은 None 그대로.
|
|
"""
|
|
return doc
|
|
|
|
|
|
async def rerank_chunks(
|
|
query: str,
|
|
candidates: list["SearchResult"],
|
|
limit: int,
|
|
*,
|
|
reranker_backend: str | None = None,
|
|
snapshot_doc_id_max: int | None = None,
|
|
snapshot_chunk_id_max: int | None = None,
|
|
) -> list["SearchResult"]:
|
|
"""RRF 결과 candidates를 bge-reranker로 재정렬.
|
|
|
|
Args:
|
|
query: 사용자 쿼리
|
|
candidates: chunk-level SearchResult 리스트 (이미 chunks_by_doc에서 회수)
|
|
limit: 반환할 결과 수
|
|
|
|
Returns:
|
|
reranked SearchResult 리스트 (rerank score로 score 필드 업데이트)
|
|
|
|
Fallback (timeout/HTTPError): RRF 순서 그대로 candidates[:limit] 반환.
|
|
"""
|
|
if not candidates:
|
|
return []
|
|
|
|
# input 크기 제한 (latency/VRAM hard cap)
|
|
if len(candidates) > MAX_RERANK_INPUT:
|
|
logger.warning(
|
|
f"rerank input {len(candidates)} > MAX {MAX_RERANK_INPUT}, 자름"
|
|
)
|
|
candidates = candidates[:MAX_RERANK_INPUT]
|
|
|
|
snippets = [_make_snippet(c, query) for c in candidates]
|
|
|
|
# Phase 2B dispatcher (R2-B1 + R2-B2): slug → endpoint resolve, snapshot id dispatch log
|
|
cand_endpoint = _resolve_reranker(reranker_backend)
|
|
logger.info(
|
|
"[reranker-dispatch] backend=%s endpoint=%s snapshot_doc_id_max=%s snapshot_chunk_id_max=%s",
|
|
reranker_backend or "baseline",
|
|
cand_endpoint or "production(config.yaml)",
|
|
snapshot_doc_id_max,
|
|
snapshot_chunk_id_max,
|
|
)
|
|
|
|
client: AIClient | None = AIClient() if cand_endpoint is None else None
|
|
|
|
try:
|
|
async with asyncio.timeout(RERANK_TIMEOUT):
|
|
async with RERANK_SEMAPHORE:
|
|
if cand_endpoint is None:
|
|
results = await client.rerank(query, snippets)
|
|
else:
|
|
results = await _rerank_via_candidate_endpoint(
|
|
cand_endpoint, query, snippets
|
|
)
|
|
# results: [{"index": int, "score": float}, ...] (이미 정렬됨)
|
|
reranked: list["SearchResult"] = []
|
|
for r in results:
|
|
idx = r.get("index")
|
|
sc = r.get("score")
|
|
if idx is None or sc is None or idx >= len(candidates):
|
|
continue
|
|
chunk = candidates[idx]
|
|
score = float(sc)
|
|
chunk.score = score
|
|
# Phase 3.1: reranker raw 점수를 별도 필드에 보존.
|
|
# normalize_display_scores가 나중에 .score를 랭크 기반으로 덮어써도
|
|
# fast-path 판단에 쓸 수 있는 원본 신호 유지.
|
|
chunk.rerank_score = score
|
|
chunk.match_reason = (chunk.match_reason or "") + "+rerank"
|
|
reranked.append(chunk)
|
|
return reranked[:limit]
|
|
except (asyncio.TimeoutError, httpx.HTTPError) as e:
|
|
logger.warning(f"rerank failed → RRF fallback: {type(e).__name__}: {e}")
|
|
return candidates[:limit]
|
|
except Exception as e:
|
|
logger.warning(f"rerank unexpected error → RRF fallback: {type(e).__name__}: {e}")
|
|
return candidates[:limit]
|
|
finally:
|
|
if client is not None:
|
|
try:
|
|
await client.close()
|
|
except Exception:
|
|
pass
|
|
|
|
|
|
async def warmup_reranker() -> bool:
|
|
"""TEI 부팅 후 모델 로딩 완료 대기 (10회 retry).
|
|
|
|
TEI는 health 200을 빠르게 반환하지만 첫 모델 로딩(10~30초) 전에는
|
|
rerank 요청이 실패하거나 매우 느림. FastAPI startup 또는 첫 요청 전 호출.
|
|
"""
|
|
client = AIClient()
|
|
try:
|
|
for attempt in range(10):
|
|
try:
|
|
await client.rerank("warmup", ["dummy text for model load"])
|
|
logger.info(f"reranker warmup OK (attempt {attempt + 1})")
|
|
return True
|
|
except Exception as e:
|
|
logger.info(f"reranker warmup retry {attempt + 1}: {e}")
|
|
await asyncio.sleep(3)
|
|
logger.error("reranker warmup failed after 10 attempts")
|
|
return False
|
|
finally:
|
|
await client.close()
|
|
|
|
|
|
def apply_diversity(
|
|
results: list["SearchResult"],
|
|
max_per_doc: int = MAX_CHUNKS_PER_DOC,
|
|
top_score_threshold: float = 0.90,
|
|
) -> list["SearchResult"]:
|
|
"""chunk-level 결과를 doc 기준으로 압축 (max_per_doc).
|
|
|
|
조건부 완화: 가장 상위 결과 score가 threshold 이상이면 unlimited
|
|
(high confidence relevance > diversity).
|
|
"""
|
|
if not results:
|
|
return []
|
|
|
|
# 가장 상위 score가 threshold 이상이면 diversity 제약 해제
|
|
top_score = results[0].score if results else 0.0
|
|
if top_score >= top_score_threshold:
|
|
return results
|
|
|
|
seen: dict[int, int] = {}
|
|
out: list["SearchResult"] = []
|
|
for r in results:
|
|
doc_id = r.id
|
|
if seen.get(doc_id, 0) >= max_per_doc:
|
|
continue
|
|
out.append(r)
|
|
seen[doc_id] = seen.get(doc_id, 0) + 1
|
|
return out
|