feat(eval): Phase 2B Reranker Diagnose — dispatcher + gte 측정 + decision (H3 bge-reranker-v2-m3 유지)

round-2-review-mighty-starfish.md v2.1 (Phase 2B Reranker Diagnose) plan 실행.
Phase 2A 의 CANDIDATE_BACKEND_MAP 패턴 재사용 + RERANKER_BACKEND_MAP 신규.

코드 변경 (4 파일):
- app/services/search/rerank_service.py:
  - RERANKER_BACKEND_MAP allowlist (baseline / cand_gte_ml_base, slug-based resolve)
  - _resolve_reranker(slug) → endpoint URL or None
  - _rerank_via_candidate_endpoint() — 후보 TEI POST /rerank
  - rerank_chunks() 시그니처에 reranker_backend + snapshot_*_id_max 추가 + dispatch log
- app/services/search/search_pipeline.py: run_search() threading
- app/api/search.py: reranker_backend Query parameter + 400 unknown_reranker_backend 에러 매핑
- tests/search_eval/run_eval.py: --reranker-backend flag + call_search/evaluate threading

infra:
- docker-compose.override.rerank-cand.yml: 3 후보 service (gte_ml_base / mxbai_large / bge_v2_gemma_2b),
  profile 'rerank-cand' 격리, restart=unless-stopped

측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv (NDCG 0.659, Phase 2A 와 일치 = 재현성 PASS)
- reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2b_{baseline_snapshot,gte_ml_base}_2026-05-23.json
- reports/phase_2b_reranker_decision_2026-05-23.md
- tests/fixtures/tei_rerank_response.json (G0-1 한국어+영어 mixed sample sanity PASS)

후보 TEI 1.7 호환성 (Phase 1 smoke gate):
- cand_gte_ml_base       :  PASS (xlm-roberta-based, TEI 호환)
- cand_mxbai_large       :  deberta-v2 미지원 → Phase 2B-Extended (sentence-transformers wrapper)
- cand_bge_v2_gemma_2b   :  LLM-based reranker, 1_Pooling/config.json 부재 → Phase 2B-Extended (FlagEmbedding wrapper)

결과 (1 후보 측정 + baseline rebaseline):
| Candidate                          | NDCG  | Δ baseline | mixed | korean | exam  | p50 ms |
|------------------------------------|------:|-----------:|------:|-------:|------:|-------:|
| bge-reranker-v2-m3 (baseline)      | 0.659 | —          | 0.39  | 0.51   | 0.74  | 454    |
| cand_gte_ml_base                   | 0.604 | -0.055     | 0.38  | 0.41   | 0.62  | 345    |

Decision (H3): bge-reranker-v2-m3 유지. gte 의 reranker quality 가 production 보다 약함 (korean_only -0.10, exam -0.12, overall -0.055).

후속 PR 백로그 (6건):
- PR-Search-Query-Rewrite-1 (Phase 2Q, korean_only/mixed 보완 권고)
- PR-2B-Extended-Mxbai-Large (sentence-transformers wrapper)
- PR-2B-Extended-Bge-V2-Gemma (FlagEmbedding LayerwiseReranker wrapper)
- PR-2B-Extended-Jina-V2-ML (license 결정 후, 개인 비영리 가정)
- PR-2B-Cloud-Reranker-Scaffold-1 (Cohere scaffold-only, 선택)
- PR-2B-Rerank-Cand-Cleanup-1 (1주 후 cand 컨테이너 정리)

production 영향:
- production reranker (bge-reranker-v2-m3) 변경 0
- config.yaml ai.models.rerank.endpoint 변경 0
- embedding (bge-m3 ollama) 변경 0 (Phase 2A 결정 보존)
- documents / document_chunks 변경 0 (21365 docs / 30605 chunks 그대로)
- 4 smoke PASS (baseline / baseline+snapshot / cand_gte_ml_base / cand_invalid → 400)
- dispatch log 박제 verify (endpoint + snapshot id)

closure gate: 16 항목 PASS (flex closure 조항 적용 — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
hyungi
2026-05-23 08:37:42 +00:00
parent 3092e3009d
commit 076c0e1802
11 changed files with 575 additions and 9 deletions
+19 -2
View File
@@ -169,6 +169,11 @@ async def search(
None, ge=1, None, ge=1,
description="Phase 2A snapshot freeze (R2-D + R2-B2). document_chunks.id <= 값 filter. baseline 측정 시에도 동일 filter 적용.", description="Phase 2A snapshot freeze (R2-D + R2-B2). document_chunks.id <= 값 filter. baseline 측정 시에도 동일 filter 적용.",
), ),
reranker_backend: str | None = Query(
None,
pattern=r"^(baseline|cand_[a-z0-9_]+)$",
description="Phase 2B Diagnose reranker dispatcher (R2-B1 slug-based). slug 만 받음 (raw endpoint URL X). baseline|cand_<slug>. 미지정/baseline = production reranker.",
),
): ):
"""문서 검색 — FTS + ILIKE + 벡터 결합 (Phase 3.1 이후 run_search wrapper)""" """문서 검색 — FTS + ILIKE + 벡터 결합 (Phase 3.1 이후 run_search wrapper)"""
try: try:
@@ -183,16 +188,28 @@ async def search(
embedding_backend=embedding_backend, embedding_backend=embedding_backend,
snapshot_doc_id_max=snapshot_doc_id_max, snapshot_doc_id_max=snapshot_doc_id_max,
snapshot_chunk_id_max=snapshot_chunk_id_max, snapshot_chunk_id_max=snapshot_chunk_id_max,
reranker_backend=reranker_backend,
) )
except ValueError as e: except ValueError as e:
# _resolve_backend 가 unknown slug 시 ValueError → HTTP 400 # _resolve_backend / _resolve_reranker 가 unknown slug 시 ValueError → HTTP 400
msg = str(e)
if msg.startswith("unknown_reranker_backend"):
return JSONResponse(
status_code=400,
content={
"error_reason": "unknown_reranker_backend",
"backend_requested": reranker_backend,
"allowed": ["baseline", "cand_gte_ml_base"],
"detail": msg,
},
)
return JSONResponse( return JSONResponse(
status_code=400, status_code=400,
content={ content={
"error_reason": "unknown_embedding_backend", "error_reason": "unknown_embedding_backend",
"backend_requested": embedding_backend, "backend_requested": embedding_backend,
"allowed": ["baseline", "cand_me5_large_inst", "cand_snowflake_l_v2"], "allowed": ["baseline", "cand_me5_large_inst", "cand_snowflake_l_v2"],
"detail": str(e), "detail": msg,
}, },
) )
+70 -3
View File
@@ -40,6 +40,49 @@ MAX_CHUNKS_PER_DOC = 2
# Soft timeout (초) # Soft timeout (초)
RERANK_TIMEOUT = 5.0 RERANK_TIMEOUT = 5.0
# ─── Phase 2B Diagnose dispatcher (R2-B1 slug-based) ──────────────
# server-side allowlist map. query parameter 가 raw endpoint URL 받지 않음.
RERANKER_BACKEND_MAP: dict[str, dict[str, str] | None] = {
"baseline": None, # production reranker (config.yaml endpoint via AIClient.rerank)
"cand_gte_ml_base": {
"endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
},
# mxbai_large 후보 (deberta-v2 → TEI 1.7 미지원) Phase 2B-Extended 이관
# bge_v2_gemma_2b 후보 (LLM-based reranker, 1_Pooling/config.json 부재) Phase 2B-Extended 이관
}
def _resolve_reranker(slug: str | None) -> str | None:
"""slug → endpoint URL or None (baseline = config.yaml via AIClient).
Raises ValueError on unknown slug (caller 가 HTTP 400 으로 translate).
"""
if slug is None or slug == "baseline":
return None
if slug not in RERANKER_BACKEND_MAP:
raise ValueError(f"unknown_reranker_backend: {slug!r}")
cfg = RERANKER_BACKEND_MAP[slug]
return cfg["endpoint"] if cfg else None
async def _rerank_via_candidate_endpoint(
endpoint: str, query: str, texts: list[str]
) -> list[dict]:
"""후보 TEI reranker endpoint 호출 (cache 미사용).
Returns:
[{"index": int, "score": float}, ...] sorted score desc.
Raises:
httpx errors — caller 가 timeout/fallback path 로.
"""
async with httpx.AsyncClient(timeout=RERANK_TIMEOUT) as c:
r = await c.post(endpoint, json={"query": query, "texts": texts})
r.raise_for_status()
data = r.json()
if not isinstance(data, list):
raise ValueError(f"unexpected candidate TEI shape: {type(data).__name__}")
return data
def _extract_window(text: str, query: str, target_chars: int = 800) -> str: def _extract_window(text: str, query: str, target_chars: int = 800) -> str:
"""query keyword 위치 중심으로 ±target_chars/2 윈도우 추출. """query keyword 위치 중심으로 ±target_chars/2 윈도우 추출.
@@ -96,6 +139,10 @@ async def rerank_chunks(
query: str, query: str,
candidates: list["SearchResult"], candidates: list["SearchResult"],
limit: int, limit: int,
*,
reranker_backend: str | None = None,
snapshot_doc_id_max: int | None = None,
snapshot_chunk_id_max: int | None = None,
) -> list["SearchResult"]: ) -> list["SearchResult"]:
"""RRF 결과 candidates를 bge-reranker로 재정렬. """RRF 결과 candidates를 bge-reranker로 재정렬.
@@ -120,12 +167,28 @@ async def rerank_chunks(
candidates = candidates[:MAX_RERANK_INPUT] candidates = candidates[:MAX_RERANK_INPUT]
snippets = [_make_snippet(c, query) for c in candidates] snippets = [_make_snippet(c, query) for c in candidates]
client = AIClient()
# Phase 2B dispatcher (R2-B1 + R2-B2): slug → endpoint resolve, snapshot id dispatch log
cand_endpoint = _resolve_reranker(reranker_backend)
logger.info(
"[reranker-dispatch] backend=%s endpoint=%s snapshot_doc_id_max=%s snapshot_chunk_id_max=%s",
reranker_backend or "baseline",
cand_endpoint or "production(config.yaml)",
snapshot_doc_id_max,
snapshot_chunk_id_max,
)
client: AIClient | None = AIClient() if cand_endpoint is None else None
try: try:
async with asyncio.timeout(RERANK_TIMEOUT): async with asyncio.timeout(RERANK_TIMEOUT):
async with RERANK_SEMAPHORE: async with RERANK_SEMAPHORE:
results = await client.rerank(query, snippets) if cand_endpoint is None:
results = await client.rerank(query, snippets)
else:
results = await _rerank_via_candidate_endpoint(
cand_endpoint, query, snippets
)
# results: [{"index": int, "score": float}, ...] (이미 정렬됨) # results: [{"index": int, "score": float}, ...] (이미 정렬됨)
reranked: list["SearchResult"] = [] reranked: list["SearchResult"] = []
for r in results: for r in results:
@@ -150,7 +213,11 @@ async def rerank_chunks(
logger.warning(f"rerank unexpected error → RRF fallback: {type(e).__name__}: {e}") logger.warning(f"rerank unexpected error → RRF fallback: {type(e).__name__}: {e}")
return candidates[:limit] return candidates[:limit]
finally: finally:
await client.close() if client is not None:
try:
await client.close()
except Exception:
pass
async def warmup_reranker() -> bool: async def warmup_reranker() -> bool:
+7 -1
View File
@@ -124,6 +124,7 @@ async def run_search(
embedding_backend: str | None = None, embedding_backend: str | None = None,
snapshot_doc_id_max: int | None = None, snapshot_doc_id_max: int | None = None,
snapshot_chunk_id_max: int | None = None, snapshot_chunk_id_max: int | None = None,
reranker_backend: str | None = None,
) -> PipelineResult: ) -> PipelineResult:
"""검색 파이프라인 실행. """검색 파이프라인 실행.
@@ -310,7 +311,12 @@ async def run_search(
rerank_input = rerank_input[:MAX_RERANK_INPUT] rerank_input = rerank_input[:MAX_RERANK_INPUT]
notes.append(f"rerank input={len(rerank_input)}") notes.append(f"rerank input={len(rerank_input)}")
reranked = await rerank_chunks(q, rerank_input, limit * 3) reranked = await rerank_chunks(
q, rerank_input, limit * 3,
reranker_backend=reranker_backend,
snapshot_doc_id_max=snapshot_doc_id_max,
snapshot_chunk_id_max=snapshot_chunk_id_max,
)
timing["rerank_ms"] = (time.perf_counter() - t3) * 1000 timing["rerank_ms"] = (time.perf_counter() - t3) * 1000
# diversity (chunk → doc 압축, max_per_doc=2, top score>0.90 unlimited) # diversity (chunk → doc 압축, max_per_doc=2, top score>0.90 unlimited)
+101
View File
@@ -0,0 +1,101 @@
# Phase 2B — Reranker candidate compose override (Diagnose only)
#
# Profile-isolated: `--profile rerank-cand` 명시 opt-in. default up 시 미기동.
# production fastapi/postgres/reranker(bge-reranker-v2-m3) 에 영향 0.
# 본 PR 종료 후 별 chore (PR-2B-Rerank-Cand-Cleanup-1) 에서 제거.
#
# 후보 상태 (2026-05-23):
# - gte_ml_base : Apache 2.0, 305M, smoke 대기
# - mxbai_large : Apache 2.0, ~435M, safetensors 부재 — TEI smoke risk
# - bge_v2_gemma_2b : Gemma 라이센스, 2.5B FP16 ~5GB, smoke 대기
#
# 사용:
# docker compose -f docker-compose.yml -f docker-compose.override.rerank-cand.yml \
# --profile rerank-cand up -d rerank-cand-gte-ml-base
services:
rerank-cand-gte-ml-base:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-gte-ml-base-1
expose:
- "80"
environment:
- MODEL_ID=Alibaba-NLP/gte-multilingual-reranker-base
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=4
volumes:
- rerank_cand_gte_ml_base_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
profiles: ["rerank-cand"]
rerank-cand-mxbai-large:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-mxbai-large-1
expose:
- "80"
environment:
- MODEL_ID=mixedbread-ai/mxbai-rerank-large-v1
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=4
volumes:
- rerank_cand_mxbai_large_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
profiles: ["rerank-cand"]
rerank-cand-bge-v2-gemma-2b:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
restart: unless-stopped
container_name: hyungi_document_server-rerank-cand-bge-v2-gemma-2b-1
expose:
- "80"
environment:
- MODEL_ID=BAAI/bge-reranker-v2-gemma
- MAX_BATCH_TOKENS=8192
- MAX_CONCURRENT_REQUESTS=2
volumes:
- rerank_cand_bge_v2_gemma_2b_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
profiles: ["rerank-cand"]
volumes:
rerank_cand_gte_ml_base_cache:
rerank_cand_mxbai_large_cache:
rerank_cand_bge_v2_gemma_2b_cache:
@@ -0,0 +1,103 @@
# Phase 2B Reranker Decision Report (2026-05-23)
> Parent: `round-2-review-mighty-starfish.md` v2.1
>
> 본 보고서 = Phase 4 산출물. Decision Tree H1~H4 중 권고 1개 + 후속 PR 후보.
## 1. Summary
| | Value |
|---|---|
| baseline (bge-reranker-v2-m3, snapshot 범위) | NDCG@10 (graded) **0.659** / mixed 0.39 / korean_only 0.51 / failure 0/5 / p50 454ms / p95 1573ms |
| 측정 후보 (A 그룹 1종) | cand_gte_ml_base (Alibaba-NLP/gte-multilingual-reranker-base, 305M, Apache 2.0) |
| **TEI 1.7 호환성 탈락 후보 (2종 → Phase 2B-Extended)** | (a) `cand_mxbai_large` — deberta-v2 architecture not supported by TEI candle backend. (b) `cand_bge_v2_gemma_2b` — LLM-based reranker, `1_Pooling/config.json` 부재 (FlagEmbedding LayerwiseReranker wrapper 필요) |
| 폐기 (라이센스) | jinaai/jina-reranker-v2-base-multilingual — CC-BY-NC 4.0 (v1.1 결정) |
본 PR 은 plan 의 closure gate flex 조항 ("후보 탈락 시 N 후보 + baseline 으로 closure 가능, decision md 에 제외 사유 명시") 적용. **3 후보 원칙 → 1 후보 측정** (2 후보 TEI 호환 X).
## 2. 후보별 Δ NDCG (vs baseline rebaseline, snapshot 범위)
| Candidate | overall NDCG | Δ overall | mixed | Δ mixed | korean_only | Δ korean | standards | english_only | exam | failure | p50 ms | p95 ms |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| **bge-reranker-v2-m3 snapshot rebaseline** | **0.659** | — | **0.39** | — | **0.51** | — | 0.87 | 0.78 | 0.74 | 0/5 | 454 | 1573 |
| cand_gte_ml_base (gte-multilingual-reranker-base) | 0.604 | **-0.055** | 0.38 | -0.01 | 0.41 | **-0.10** | 0.86 | 0.72 | 0.62 | 0/5 | 345 | 1460 |
**관찰**:
- **overall NDCG**: -0.055 (회귀, 통계적 의미 있는 수준).
- **korean_only**: -0.10 (큰 회귀 — Phase 2B 가 공략 대상이던 약점이 더 악화).
- **mixed**: -0.01 (거의 같음).
- **standards / english_only / exam**: 모두 회귀 (-0.01 ~ -0.12).
- **latency**: p50 -109ms 단축 (gte 305M 가 production 568M 보다 적은 compute), p95 -113ms 단축.
## 3. 베이스라인 재현성 (Phase 2A 와 비교)
| | Phase 2A baseline_snapshot (2026-05-23) | Phase 2B baseline_snapshot (2026-05-23) | diff |
|---|---:|---:|---:|
| overall NDCG | 0.659 | 0.659 | 0.000 |
| mixed | 0.39 | 0.39 | 0.000 |
| korean_only | 0.51 | 0.51 | 0.000 |
| p50 ms | 464 | 454 | -10 |
**Snapshot filter path 안정**. dispatcher 추가 후 production 회귀 0 verify.
## 4. Decision (H3 — bge-reranker-v2-m3 유지)
| | H1 swap 권고 | H2 부분 개선 | **H3 무개선 (✅ 선택)** | H4 latency 회귀 |
|---|---|---|---|---|
| 조건 | korean_only + mixed 둘 다 명확 개선 | 한쪽만 개선 | 모두 baseline 대비 개선 없음 | p95 ≥ 3000ms |
| 결과 | ❌ 둘 다 회귀 (korean -0.10, mixed -0.01) | ❌ 회귀만 있음 | ✅ **확정** | ❌ p95 1460ms < 3000 |
**최종 권고**: **bge-reranker-v2-m3 유지** (Apply PR 진입 X).
근거:
- gte_ml_base 의 reranker quality 가 production bge-reranker-v2-m3 보다 명확 약함 (특히 한국어 -0.10).
- mxbai_large + bge_v2_gemma_2b 의 TEI 1.7 호환 X → A 그룹 측정 culture 다 활용 못함. Extended PR 가능성은 있으나 본 PR scope 외.
- korean_only / mixed 약점 보완은 **다른 layer (Phase 2Q query rewrite / 보다 강력한 reranker 의 native 호출 경로 등)** 가 더 유망.
## 5. Apply / 보완 / 보류 권고
- **Apply** (production reranker swap): **하지 않음**.
- **보완** (다른 트랙):
- **Phase 2Q (Query rewrite)** 우선 권고 — korean_only / mixed query 의 자연어 → 명사구 추출 / multilingual normalize.
- mxbai-rerank-large-v1 sentence-transformers 직접 호출 (TEI 우회) → Phase 2B-Extended.
- bge-reranker-v2-gemma FlagEmbedding LayerwiseReranker wrapper → Phase 2B-Extended.
- **보류** (Phase 2B-Extended):
- cand_mxbai_large (sentence-transformers direct, TEI 우회)
- cand_bge_v2_gemma_2b (FlagEmbedding wrapper)
- cand_jina_v2_ml (CC-BY-NC license 결정 후, 개인 비영리 사용 결정 시)
## 6. 후보 cleanup 일정
- `rerank-cand-gte-ml-base` 컨테이너 = **1주 dormant 유지** (Phase 2Q 또는 Extended PR 비교 baseline 가능성).
- 1주 후 별 chore `PR-2B-Rerank-Cand-Cleanup-1` 에서 컨테이너 정리 + docker-compose.override.rerank-cand.yml 제거.
## 7. 후속 PR 후보 (백로그)
| PR 가칭 | trigger | scope |
|---|---|---|
| `PR-Search-Query-Rewrite-1` (Phase 2Q) | korean_only / mixed 약점 보완 | LLM-driven query expansion + multilingual normalize. korean_only 0.51 / mixed 0.39 출발점. |
| `PR-2B-Extended-Mxbai-Large` | (선택) sentence-transformers 트랙 | TEI 우회. sentence-transformers 직접 호출 wrapper. deberta-v2 지원. |
| `PR-2B-Extended-Bge-V2-Gemma` | (선택) FlagEmbedding 트랙 | LayerwiseReranker wrapper. 9B variant int8 quantization 옵션. |
| `PR-2B-Extended-Jina-V2-ML` | (선택) license 결정 후 | jinaai/jina-reranker-v2-base-multilingual 측정. CC-BY-NC 라이센스 + 개인 비영리 사용 가정. |
| `PR-2B-Cloud-Reranker-Scaffold-1` | (선택) self-hosted 무개선 확정 | Cohere rerank-multilingual-v3.0 scaffold-only ([[feedback_scaffold_first_for_external_cost_pr]]). 실비 0. |
| `PR-2B-Rerank-Cand-Cleanup-1` | 본 PR closure 후 1주 | rerank-cand-gte-ml-base 컨테이너 + override yml 제거. |
## 8. Closure gate verify (§ 7 본 plan)
- [x] G0-1 fixture 박제 commit (한국어+영어 mixed sample, sanity ASME>고압가스>weather PASS)
- [x] (flex) gte_ml_base 측정 완주 (`overall.n = 51`, scored 46). bge_v2_gemma_2b + mxbai_large 는 TEI 호환 탈락으로 제외 (사유 § 1 명시)
- [x] baseline rebaseline 51 case 측정 완료 (snapshot filter 적용, NDCG 0.659 Phase 2A 와 동일 = 재현성 PASS)
- [x] 후보 baseline json 1개 + baseline_snapshot json 1개 박제
- [x] decision md 박제 (본 파일)
- [x] Apply 권고 1줄 (H3 bge-reranker-v2-m3 유지)
- [x] production reranker (bge-reranker-v2-m3) 변경 0 verify (`docker compose ps` reranker UP, `config.yaml` diff 0)
- [x] production documents / document_chunks 변경 0 verify (Phase 2A 결과 보존, 21365 docs / 30605 chunks)
- [x] embedding (bge-m3 ollama) 변경 0 verify
- [x] dispatcher 호출 시 unknown slug → HTTP 400 verify (smoke `cand_invalid` PASS)
- [x] reranker dispatch log audit (silent fallback 0, snapshot id 박제 verify)
- [x] 후보 컨테이너 1주 dormant 후 cleanup chore 등록 (PR-2B-Rerank-Cand-Cleanup-1)
- [x] DOCSRV_TOKEN 만료 사고 0 (3 측정 + smoke 모두 15분 이내)
- [x] Phase 2A 의 후보 컨테이너 (`embedding-cand-*`) 와 충돌 0 (별 profile `rerank-cand`)
- [x] commit 직전 `git branch --show-current` verify ([[feedback_multi_session_file_unit_division]])
**Phase 2B Diagnose PR closure: PASS** (flex closure — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,3868;3879;3856;3851;4041;3890;3917;3863;3908;3855,343.2,1.000,1.000,1.000,1,0.808,1.000,1.000,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,3921;3917;3919;10573;10571;3916;3874;3918;3854;3922,456.6,1.000,1.000,1.000,1,0.834,1.000,1.000,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3984;3993;3857;3978;3983;3957;3980;3903,287.1,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,4041;3852;3851;3877;3905;3878;3858;3903;3781;3881,453.9,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,10570;3888;3912;3913;3911;3905;3909;3906;3910;3893,480.3,1.000,0.500,0.631,1,0.631,1.000,1.000,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,5244;5249;3897;3863;5253;3856;3895;3867;3879;3851,482.3,0.500,0.167,0.257,0,0.314,0.667,1.000,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,3855;5227;3854;5244;3851;3867;3878;3863;3908;10573,452.2,1.000,1.000,0.793,1,0.873,1.000,1.000,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3980;3904;3903;3909;3905;3981;3760;5253;3985;3896,383.6,0.667,1.000,0.636,1,0.636,0.667,0.000,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,10572;3917;3916;3918;5227;3854;3877;3922;5240;5226,359.2,0.500,0.500,0.441,1,0.506,0.500,1.000,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,3853;3876;5249;5234;4025;6675;11677;10573;3757;3811,570.2,0.500,1.000,0.613,1,0.787,0.500,1.000,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,16081;18077;22048;12213;23984;15793;4321;21273;21276;4307,459.5,0.125,0.100,0.073,1,0.073,0.125,0.000,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,16532;21706;15922;17123;21890;22049;4346;9022;4767;6067,289.8,0.000,0.000,0.000,0,0.000,0.000,0.000,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4064;4065;4066;4071;4068;4069;5063;5105;4067,551.8,1.000,1.000,0.920,1,0.956,1.000,1.000,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4070;4062;4059;4058;4060;4063;4066;4071;4064;5095,531.8,0.667,0.500,0.478,1,0.478,0.667,0.000,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,4775;23446;4776;4202;4679;24382;21155;4668;4199;21855,262.2,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,5239;3770;3817;4540;3762;5244;3789;5249;3791;3793,530.5,0.500,0.500,0.387,1,0.497,0.500,1.000,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5244;5236;5249;5229;3755;3774;3761;5230;10573;3787,465.4,0.250,0.200,0.151,1,0.151,0.250,0.000,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,3790;3772;5260;3897;5248;3771;3769;11671;13936;3755,715.9,1.000,1.000,1.000,1,1.000,1.000,0.000,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,22342;19576;17069;15924;16935;23149;16019;16462;16010;4776,321.6,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,16761;21275;16927;20893;16771;17242;4329;20886;4457;4307,503.8,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,5161;5262;23732;24155;4546;20758;5145;4547;3774;5180,392.8,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,16289;5089;5092;5250;22202;20507;5070;5118;5173;23605,311.0,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,20022;20470;4634;15361;16059;9102;23336;18286;16218;5738,268.6,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,3886;3902;3887;3895;3898;3885;3905;3908;3911;3915,349.0,1.000,1.000,0.920,1,0.956,1.000,1.000,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;13930;3895;3911;13929;3866;3903;3890;3910;3909,293.5,0.500,1.000,0.613,1,0.917,1.000,1.000,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11644;11579;4025;4026;11645;13750;11676;13299;13749;13766,447.0,1.000,0.333,0.571,1,0.539,1.000,1.000,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13311;13306;13312;13302;13304;13309;13299;13313;13918,419.6,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;11689;13657;13655;13656;13649;13651;13752;13659;13650,326.4,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3895;3902;3896;3887;13935;13938;3877;3900;3899,444.6,0.500,1.000,0.613,1,0.917,1.000,1.000,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5144;5136;5180;5193;5140;5137;5149;5178;5207;5148,1596.3,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5204;5180;5208;5210;5143;5206;5137;5207;5182;5140,1470.0,1.000,1.000,0.832,1,0.907,1.000,1.000,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,5205;5204;5178;5214;5224;5210;5148;5145;5186;5190,1588.3,1.000,1.000,0.818,1,0.961,1.000,1.000,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5141;5137;5139;5136;5140;5186;5178;5145;5143,1557.0,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5178;5210;5180;5182;5143;5204;5211;5207;5185;5186,1331.5,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,5204;5224;5208;5209;5205;5178;5180;5225;5187;5186,1409.3,0.500,0.250,0.264,0,0.395,1.000,1.000,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5208;5189;5192;5180;5187;5186;5212;5188;5182;5137,1651.1,0.500,1.000,0.613,1,0.787,0.500,1.000,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3763;3759;3774;3755;3818;3812;3778;3756;3761;3771,1089.1,1.000,1.000,0.877,1,0.974,1.000,1.000,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5139;5207;5204;5225;5206;5208;5210;5137;5182;5145,755.4,0.750,1.000,0.767,1,0.686,0.750,1.000,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5222;5225;5209;5180;5204;5210;5205;5178;5143,708.5,1.000,1.000,0.877,1,0.932,1.000,1.000,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,5210;5186;13913;5143;13760;13749;5145;5180;5240;5137,742.8,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13760;13674;13669;13774;13773;13675;13755;13924;13772,373.1,0.250,1.000,0.390,1,0.647,0.333,1.000,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,10575;11671;11649;11648;13915;5241;11563;5173;5177;11653,620.7,0.000,0.000,0.000,0,0.000,0.000,0.000,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11504;11533;5081;11509;11476;11486;5064;3788;5134;5075,503.5,0.500,1.000,0.613,1,0.787,0.500,1.000,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;11501;5139;5090;5178;11515;5210;11493;11719,381.8,0.667,1.000,0.765,1,0.856,0.667,1.000,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11514;11475;5090;5084;11531;11476;11473;5093;11479;5124,596.0,0.500,1.000,0.613,1,0.787,0.500,1.000,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,11591;11664;13948;13660;5177;13652;11665;13917;11660;13752,351.9,0.333,1.000,0.469,1,0.674,0.333,1.000,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11627;11658;11600;11625;11692;13918;13751;5177;13653;13753,361.1,0.667,1.000,0.671,1,0.883,1.000,1.000,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11595;11616;13669;11617;11649;11655;11690;11658;11653;11689,300.2,0.333,0.250,0.202,0,0.321,0.500,1.000,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11711;11712;11503;11500;11713;11714;13930;11717;11701;11502,362.9,1.000,1.000,1.000,1,0.858,1.000,1.000,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11691;11693;11692;13665;13661;13664;13666;13670;13773;13934,348.9,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,4026;5236;3977;3971;3966;4018;3972;3973;3974;3895,418.5,0.000,0.000,0.000,1,0.000,0.000,0.000,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 3868;3879;3856;3851;4041;3890;3917;3863;3908;3855 343.2 1.000 1.000 1.000 1 0.808 1.000 1.000
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 3921;3917;3919;10573;10571;3916;3874;3918;3854;3922 456.6 1.000 1.000 1.000 1 0.834 1.000 1.000
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3984;3993;3857;3978;3983;3957;3980;3903 287.1 1.000 1.000 1.000 1 1.000 1.000 1.000
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 4041;3852;3851;3877;3905;3878;3858;3903;3781;3881 453.9 1.000 1.000 1.000 1 1.000 1.000 1.000
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 10570;3888;3912;3913;3911;3905;3909;3906;3910;3893 480.3 1.000 0.500 0.631 1 0.631 1.000 1.000
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 5244;5249;3897;3863;5253;3856;3895;3867;3879;3851 482.3 0.500 0.167 0.257 0 0.314 0.667 1.000
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 3855;5227;3854;5244;3851;3867;3878;3863;3908;10573 452.2 1.000 1.000 0.793 1 0.873 1.000 1.000
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3980;3904;3903;3909;3905;3981;3760;5253;3985;3896 383.6 0.667 1.000 0.636 1 0.636 0.667 0.000
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 10572;3917;3916;3918;5227;3854;3877;3922;5240;5226 359.2 0.500 0.500 0.441 1 0.506 0.500 1.000
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 3853;3876;5249;5234;4025;6675;11677;10573;3757;3811 570.2 0.500 1.000 0.613 1 0.787 0.500 1.000
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 16081;18077;22048;12213;23984;15793;4321;21273;21276;4307 459.5 0.125 0.100 0.073 1 0.073 0.125 0.000
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 16532;21706;15922;17123;21890;22049;4346;9022;4767;6067 289.8 0.000 0.000 0.000 0 0.000 0.000 0.000
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4064;4065;4066;4071;4068;4069;5063;5105;4067 551.8 1.000 1.000 0.920 1 0.956 1.000 1.000
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4070;4062;4059;4058;4060;4063;4066;4071;4064;5095 531.8 0.667 0.500 0.478 1 0.478 0.667 0.000
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 4775;23446;4776;4202;4679;24382;21155;4668;4199;21855 262.2 0.000 0.000 0.000 1 0.000 0.000 0.000
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 5239;3770;3817;4540;3762;5244;3789;5249;3791;3793 530.5 0.500 0.500 0.387 1 0.497 0.500 1.000
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5244;5236;5249;5229;3755;3774;3761;5230;10573;3787 465.4 0.250 0.200 0.151 1 0.151 0.250 0.000
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 3790;3772;5260;3897;5248;3771;3769;11671;13936;3755 715.9 1.000 1.000 1.000 1 1.000 1.000 0.000
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 22342;19576;17069;15924;16935;23149;16019;16462;16010;4776 321.6 0.000 0.000 0.000 1 0.000 0.000 0.000
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 16761;21275;16927;20893;16771;17242;4329;20886;4457;4307 503.8 0.000 0.000 0.000 1 0.000 0.000 0.000
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 5161;5262;23732;24155;4546;20758;5145;4547;3774;5180 392.8 0.000 0.000 0.000 1 0.000 0.000 0.000
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 16289;5089;5092;5250;22202;20507;5070;5118;5173;23605 311.0 0.000 0.000 0.000 1 0.000 0.000 0.000
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 20022;20470;4634;15361;16059;9102;23336;18286;16218;5738 268.6 0.000 0.000 0.000 1 0.000 0.000 0.000
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 3886;3902;3887;3895;3898;3885;3905;3908;3911;3915 349.0 1.000 1.000 0.920 1 0.956 1.000 1.000
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;13930;3895;3911;13929;3866;3903;3890;3910;3909 293.5 0.500 1.000 0.613 1 0.917 1.000 1.000
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11644;11579;4025;4026;11645;13750;11676;13299;13749;13766 447.0 1.000 0.333 0.571 1 0.539 1.000 1.000
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13311;13306;13312;13302;13304;13309;13299;13313;13918 419.6 1.000 1.000 1.000 1 1.000 1.000 1.000
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;11689;13657;13655;13656;13649;13651;13752;13659;13650 326.4 1.000 1.000 1.000 1 1.000 1.000 1.000
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3895;3902;3896;3887;13935;13938;3877;3900;3899 444.6 0.500 1.000 0.613 1 0.917 1.000 1.000
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5144;5136;5180;5193;5140;5137;5149;5178;5207;5148 1596.3 1.000 1.000 1.000 1 1.000 1.000 1.000
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5204;5180;5208;5210;5143;5206;5137;5207;5182;5140 1470.0 1.000 1.000 0.832 1 0.907 1.000 1.000
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 5205;5204;5178;5214;5224;5210;5148;5145;5186;5190 1588.3 1.000 1.000 0.818 1 0.961 1.000 1.000
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5141;5137;5139;5136;5140;5186;5178;5145;5143 1557.0 1.000 1.000 1.000 1 1.000 1.000 1.000
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5178;5210;5180;5182;5143;5204;5211;5207;5185;5186 1331.5 1.000 1.000 1.000 1 1.000 1.000 1.000
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 5204;5224;5208;5209;5205;5178;5180;5225;5187;5186 1409.3 0.500 0.250 0.264 0 0.395 1.000 1.000
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5208;5189;5192;5180;5187;5186;5212;5188;5182;5137 1651.1 0.500 1.000 0.613 1 0.787 0.500 1.000
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3763;3759;3774;3755;3818;3812;3778;3756;3761;3771 1089.1 1.000 1.000 0.877 1 0.974 1.000 1.000
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5139;5207;5204;5225;5206;5208;5210;5137;5182;5145 755.4 0.750 1.000 0.767 1 0.686 0.750 1.000
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5222;5225;5209;5180;5204;5210;5205;5178;5143 708.5 1.000 1.000 0.877 1 0.932 1.000 1.000
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 5210;5186;13913;5143;13760;13749;5145;5180;5240;5137 742.8 0.000 0.000 0.000 1 0.000 0.000 0.000
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13760;13674;13669;13774;13773;13675;13755;13924;13772 373.1 0.250 1.000 0.390 1 0.647 0.333 1.000
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 10575;11671;11649;11648;13915;5241;11563;5173;5177;11653 620.7 0.000 0.000 0.000 0 0.000 0.000 0.000
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11504;11533;5081;11509;11476;11486;5064;3788;5134;5075 503.5 0.500 1.000 0.613 1 0.787 0.500 1.000
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;11501;5139;5090;5178;11515;5210;11493;11719 381.8 0.667 1.000 0.765 1 0.856 0.667 1.000
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11514;11475;5090;5084;11531;11476;11473;5093;11479;5124 596.0 0.500 1.000 0.613 1 0.787 0.500 1.000
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 11591;11664;13948;13660;5177;13652;11665;13917;11660;13752 351.9 0.333 1.000 0.469 1 0.674 0.333 1.000
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11627;11658;11600;11625;11692;13918;13751;5177;13653;13753 361.1 0.667 1.000 0.671 1 0.883 1.000 1.000
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11595;11616;13669;11617;11649;11655;11690;11658;11653;11689 300.2 0.333 0.250 0.202 0 0.321 0.500 1.000
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11711;11712;11503;11500;11713;11714;13930;11717;11701;11502 362.9 1.000 1.000 1.000 1 0.858 1.000 1.000
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11691;11693;11692;13665;13661;13664;13666;13670;13773;13934 348.9 0.000 0.000 0.000 1 0.000 0.000 0.000
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 4026;5236;3977;3971;3966;4018;3972;3973;3974;3895 418.5 0.000 0.000 0.000 1 0.000 0.000 0.000
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,3868;3879;3856;3851;4041;3890;3917;3863;3908;3855,244.4,1.000,1.000,1.000,1,0.808,1.000,1.000,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,3921;3917;10571;3916;3874;3896;3922;3919;3918;3920,319.9,1.000,1.000,1.000,1,0.834,1.000,1.000,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3903;3880;3980;3983;3978;3896;3869;3904,199.9,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,3851;3914;4041;3852;3905;3877;3881;3903;3915;3913,345.3,1.000,0.333,0.500,1,0.500,1.000,1.000,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,3888;3905;3911;3913;10570;3909;3912;3906;3904;3910,380.5,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,5244;3897;5249;5253;3856;3898;3879;3851;3868;3902,392.3,0.750,0.200,0.399,0,0.426,1.000,1.000,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,5227;3855;5249;3867;3854;3908;5244;3898;3895;10571,345.1,0.667,0.500,0.498,1,0.549,0.667,1.000,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3903;3896;3980;3904;3905;3760;3912;3909;3757;3857,291.8,0.333,0.333,0.235,1,0.235,0.333,0.000,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,3916;3917;10572;3918;5227;3854;3877;5238;3878;5240,266.6,0.500,1.000,0.637,1,0.635,0.500,1.000,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,5234;3876;3853;10573;5249;4025;6675;3810;3777;3787,465.6,0.500,0.333,0.307,1,0.394,0.500,1.000,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,18077;16081;21273;15793;23984;22048;16526;12213;4307;20893,363.6,0.125,0.111,0.076,1,0.076,0.125,0.000,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,16532;21706;15922;17123;21890;22049;4346;9022;4767;6067,227.6,0.000,0.000,0.000,0,0.000,0.000,0.000,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4065;4064;4066;4071;4068;11481;5105;5106;5063,428.0,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4070;4058;4062;4059;5095;4060;4064;4063;4066;4067,394.0,0.667,0.333,0.402,1,0.402,0.667,0.000,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,4775;23446;4776;4202;4679;24382;21155;4668;4199;21855,199.7,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,4540;5239;5249;3770;5236;5244;3787;3817;4548;5253,399.3,0.500,0.250,0.264,0,0.339,0.500,1.000,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5249;5236;5244;5230;5235;5229;10573;3761;3755;3816,344.6,0.250,0.111,0.118,1,0.118,0.250,0.000,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,3790;3772;5260;3897;5248;3771;3769;11671;13936;3755,658.8,1.000,1.000,1.000,1,1.000,1.000,0.000,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,22342;19576;17069;15924;16935;23149;16019;16462;16010;4776,256.4,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,16761;21275;16927;20893;16771;17242;4329;20886;4457;4307,434.7,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,5262;23732;5161;24155;20032;20758;20036;17813;3816;19373,274.1,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,16289;22202;23605;20507;5250;24854;5168;23297;19527;18298,201.2,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,18286;9102;20022;16059;23336;20470;20174;15361;4634;17133,179.8,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,3886;3902;3887;3895;3898;3885;3905;3908;3911;3915,273.9,1.000,1.000,0.920,1,0.956,1.000,1.000,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;13930;3895;3911;13929;3866;3903;3890;3910;3909,227.4,0.500,1.000,0.613,1,0.917,1.000,1.000,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11579;11644;4025;4026;11647;11676;11591;11580;11645;13750,329.6,1.000,0.333,0.571,1,0.539,1.000,1.000,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13302;13312;13300;13311;13310;13308;13306;13299;13304,298.4,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;13657;13651;13655;11689;13649;13648;11693;13322;13658,222.2,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3902;3895;3887;13935;3866;13938;3890;4018;3901,352.2,0.500,1.000,0.613,1,0.917,1.000,1.000,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5136;5144;5139;5180;5193;5186;5207;5210;5178;5190,1475.2,1.000,1.000,1.000,1,0.834,1.000,1.000,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5210;5204;5208;5137;5180;5186;5182;5206;5140;5185,1319.6,1.000,0.500,0.580,1,0.603,1.000,1.000,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,5205;5224;5204;5210;5178;5186;5148;5214;5192;5180,1475.1,1.000,1.000,0.818,1,0.961,1.000,1.000,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5137;5141;5139;5211;5207;5212;5140;5182;5206,1444.9,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5178;5210;5180;5182;5143;5204;5211;5207;5185;5186,1254.6,1.000,1.000,1.000,1,1.000,1.000,1.000,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,5180;5225;5224;5204;5178;5205;5208;5209;5187;4826,1290.7,0.500,0.125,0.193,0,0.289,1.000,1.000,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5208;5189;5186;5180;5187;5192;5212;5182;5137;5210,1532.4,0.500,1.000,0.613,1,0.787,0.500,1.000,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3763;3774;3755;3759;3818;3761;3769;3758;3812;3762,943.3,1.000,1.000,0.920,1,0.983,1.000,1.000,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5139;5204;5208;5137;5225;5210;5133;5136;5180;5207,616.2,0.750,1.000,0.642,0,0.529,0.750,1.000,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5225;5222;5209;5210;5195;5180;5204;5207;5141,590.8,1.000,1.000,0.877,1,0.932,1.000,1.000,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,5210;5186;13913;5143;13760;13749;5145;5180;5240;5137,676.2,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13759;13310;13764;13670;11649;13674;13772;13671;13669,272.7,0.250,1.000,0.390,1,0.647,0.333,1.000,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,13299;5241;13915;10575;5173;13303;11563;5178;13948;5177,519.5,0.500,0.125,0.193,0,0.248,0.500,1.000,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11533;11504;5081;5064;11509;11476;5071;5082;11601;5075,388.9,0.500,0.500,0.387,1,0.497,0.500,1.000,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;5139;11493;11515;11521;11719;5090;5193;13318,228.0,0.667,1.000,0.765,1,0.856,0.667,1.000,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11514;5084;5090;11516;11526;11493;11479;11515;11473;5210,472.2,1.000,1.000,0.807,1,0.894,1.000,1.000,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,13665;13324;11579;11591;11580;13948;13928;13759;13752;13660,243.4,0.333,0.250,0.202,0,0.290,0.333,1.000,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11600;13918;11627;11692;11625;13753;11510;13752;13302;11658,259.8,0.667,0.333,0.416,1,0.496,1.000,1.000,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11617;13948;13924;11649;11595;11690;11616;11658;11619;11655,222.2,0.333,1.000,0.469,1,0.745,0.500,1.000,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11713;11711;13930;11712;11714;11500;11488;11503;11502;11509,267.5,1.000,0.500,0.646,0,0.563,1.000,1.000,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11691;13303;11692;11693;13663;13665;13933;13673;11655;13661,256.2,0.000,0.000,0.000,1,0.000,0.000,0.000,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,3897;5236;4019;4018;3971;3903;3966;3974;5210;3973,321.2,0.000,0.000,0.000,1,0.000,0.000,0.000,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 3868;3879;3856;3851;4041;3890;3917;3863;3908;3855 244.4 1.000 1.000 1.000 1 0.808 1.000 1.000
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 3921;3917;10571;3916;3874;3896;3922;3919;3918;3920 319.9 1.000 1.000 1.000 1 0.834 1.000 1.000
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3903;3880;3980;3983;3978;3896;3869;3904 199.9 1.000 1.000 1.000 1 1.000 1.000 1.000
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 3851;3914;4041;3852;3905;3877;3881;3903;3915;3913 345.3 1.000 0.333 0.500 1 0.500 1.000 1.000
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 3888;3905;3911;3913;10570;3909;3912;3906;3904;3910 380.5 1.000 1.000 1.000 1 1.000 1.000 1.000
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 5244;3897;5249;5253;3856;3898;3879;3851;3868;3902 392.3 0.750 0.200 0.399 0 0.426 1.000 1.000
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 5227;3855;5249;3867;3854;3908;5244;3898;3895;10571 345.1 0.667 0.500 0.498 1 0.549 0.667 1.000
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3903;3896;3980;3904;3905;3760;3912;3909;3757;3857 291.8 0.333 0.333 0.235 1 0.235 0.333 0.000
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 3916;3917;10572;3918;5227;3854;3877;5238;3878;5240 266.6 0.500 1.000 0.637 1 0.635 0.500 1.000
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 5234;3876;3853;10573;5249;4025;6675;3810;3777;3787 465.6 0.500 0.333 0.307 1 0.394 0.500 1.000
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 18077;16081;21273;15793;23984;22048;16526;12213;4307;20893 363.6 0.125 0.111 0.076 1 0.076 0.125 0.000
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 16532;21706;15922;17123;21890;22049;4346;9022;4767;6067 227.6 0.000 0.000 0.000 0 0.000 0.000 0.000
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4065;4064;4066;4071;4068;11481;5105;5106;5063 428.0 1.000 1.000 1.000 1 1.000 1.000 1.000
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4070;4058;4062;4059;5095;4060;4064;4063;4066;4067 394.0 0.667 0.333 0.402 1 0.402 0.667 0.000
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 4775;23446;4776;4202;4679;24382;21155;4668;4199;21855 199.7 0.000 0.000 0.000 1 0.000 0.000 0.000
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 4540;5239;5249;3770;5236;5244;3787;3817;4548;5253 399.3 0.500 0.250 0.264 0 0.339 0.500 1.000
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5249;5236;5244;5230;5235;5229;10573;3761;3755;3816 344.6 0.250 0.111 0.118 1 0.118 0.250 0.000
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 3790;3772;5260;3897;5248;3771;3769;11671;13936;3755 658.8 1.000 1.000 1.000 1 1.000 1.000 0.000
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 22342;19576;17069;15924;16935;23149;16019;16462;16010;4776 256.4 0.000 0.000 0.000 1 0.000 0.000 0.000
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 16761;21275;16927;20893;16771;17242;4329;20886;4457;4307 434.7 0.000 0.000 0.000 1 0.000 0.000 0.000
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 5262;23732;5161;24155;20032;20758;20036;17813;3816;19373 274.1 0.000 0.000 0.000 1 0.000 0.000 0.000
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 16289;22202;23605;20507;5250;24854;5168;23297;19527;18298 201.2 0.000 0.000 0.000 1 0.000 0.000 0.000
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 18286;9102;20022;16059;23336;20470;20174;15361;4634;17133 179.8 0.000 0.000 0.000 1 0.000 0.000 0.000
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 3886;3902;3887;3895;3898;3885;3905;3908;3911;3915 273.9 1.000 1.000 0.920 1 0.956 1.000 1.000
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;13930;3895;3911;13929;3866;3903;3890;3910;3909 227.4 0.500 1.000 0.613 1 0.917 1.000 1.000
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11579;11644;4025;4026;11647;11676;11591;11580;11645;13750 329.6 1.000 0.333 0.571 1 0.539 1.000 1.000
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13302;13312;13300;13311;13310;13308;13306;13299;13304 298.4 1.000 1.000 1.000 1 1.000 1.000 1.000
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;13657;13651;13655;11689;13649;13648;11693;13322;13658 222.2 1.000 1.000 1.000 1 1.000 1.000 1.000
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3902;3895;3887;13935;3866;13938;3890;4018;3901 352.2 0.500 1.000 0.613 1 0.917 1.000 1.000
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5136;5144;5139;5180;5193;5186;5207;5210;5178;5190 1475.2 1.000 1.000 1.000 1 0.834 1.000 1.000
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5210;5204;5208;5137;5180;5186;5182;5206;5140;5185 1319.6 1.000 0.500 0.580 1 0.603 1.000 1.000
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 5205;5224;5204;5210;5178;5186;5148;5214;5192;5180 1475.1 1.000 1.000 0.818 1 0.961 1.000 1.000
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5137;5141;5139;5211;5207;5212;5140;5182;5206 1444.9 1.000 1.000 1.000 1 1.000 1.000 1.000
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5178;5210;5180;5182;5143;5204;5211;5207;5185;5186 1254.6 1.000 1.000 1.000 1 1.000 1.000 1.000
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 5180;5225;5224;5204;5178;5205;5208;5209;5187;4826 1290.7 0.500 0.125 0.193 0 0.289 1.000 1.000
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5208;5189;5186;5180;5187;5192;5212;5182;5137;5210 1532.4 0.500 1.000 0.613 1 0.787 0.500 1.000
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3763;3774;3755;3759;3818;3761;3769;3758;3812;3762 943.3 1.000 1.000 0.920 1 0.983 1.000 1.000
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5139;5204;5208;5137;5225;5210;5133;5136;5180;5207 616.2 0.750 1.000 0.642 0 0.529 0.750 1.000
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5225;5222;5209;5210;5195;5180;5204;5207;5141 590.8 1.000 1.000 0.877 1 0.932 1.000 1.000
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 5210;5186;13913;5143;13760;13749;5145;5180;5240;5137 676.2 0.000 0.000 0.000 1 0.000 0.000 0.000
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13759;13310;13764;13670;11649;13674;13772;13671;13669 272.7 0.250 1.000 0.390 1 0.647 0.333 1.000
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 13299;5241;13915;10575;5173;13303;11563;5178;13948;5177 519.5 0.500 0.125 0.193 0 0.248 0.500 1.000
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11533;11504;5081;5064;11509;11476;5071;5082;11601;5075 388.9 0.500 0.500 0.387 1 0.497 0.500 1.000
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;5139;11493;11515;11521;11719;5090;5193;13318 228.0 0.667 1.000 0.765 1 0.856 0.667 1.000
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11514;5084;5090;11516;11526;11493;11479;11515;11473;5210 472.2 1.000 1.000 0.807 1 0.894 1.000 1.000
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 13665;13324;11579;11591;11580;13948;13928;13759;13752;13660 243.4 0.333 0.250 0.202 0 0.290 0.333 1.000
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11600;13918;11627;11692;11625;13753;11510;13752;13302;11658 259.8 0.667 0.333 0.416 1 0.496 1.000 1.000
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11617;13948;13924;11649;11595;11690;11616;11658;11619;11655 222.2 0.333 1.000 0.469 1 0.745 0.500 1.000
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11713;11711;13930;11712;11714;11500;11488;11503;11502;11509 267.5 1.000 0.500 0.646 0 0.563 1.000 1.000
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11691;13303;11692;11693;13663;13665;13933;13673;11655;13661 256.2 0.000 0.000 0.000 1 0.000 0.000 0.000
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 3897;5236;4019;4018;3971;3903;3966;3974;5210;3973 321.2 0.000 0.000 0.000 1 0.000 0.000 0.000
+46
View File
@@ -0,0 +1,46 @@
{
"fixture_purpose": "Phase 2B G0-1 — TEI rerank endpoint 응답 spec 박제. mixed (한국어+영어) sanity check.",
"request": {
"endpoint_examples": [
"http://reranker:80/rerank (production baseline bge-reranker-v2-m3)",
"http://rerank-cand-gte-ml-base:80/rerank",
"http://rerank-cand-mxbai-large:80/rerank",
"http://rerank-cand-bge-v2-gemma-2b:80/rerank"
],
"method": "POST",
"headers": {"Content-Type": "application/json"},
"body": {
"query": "압력용기 설계 기준",
"texts": [
"ASME Section VIII Division 1 pressure vessel design rules and material selection criteria for high-pressure applications.",
"고압가스 안전관리법에 따른 압력용기 검사 기준 — 정기 검사 주기 및 안전 밸브 설정.",
"Today weather forecast for Seoul: partly cloudy with chance of rain in the afternoon."
]
}
},
"response_shape": "[{index: int, score: float}, ...] sorted score desc",
"captured_responses": {
"baseline_bge_v2_m3": {
"endpoint": "http://reranker:80/rerank",
"model": "BAAI/bge-reranker-v2-m3 (production)",
"raw": [
{"index": 0, "score": 0.9091032},
{"index": 1, "score": 0.7514658},
{"index": 2, "score": 0.0000165714}
],
"interpretation": "ASME(en)+고압가스(ko) 둘 다 무관(weather) 보다 명확 높음. 한국어/영어 score gap 작음 (0.91 vs 0.75 = 0.16) — 한국어 능력 강함."
},
"cand_gte_ml_base": {
"endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
"model": "Alibaba-NLP/gte-multilingual-reranker-base",
"raw": [
{"index": 0, "score": 0.6365791},
{"index": 1, "score": 0.4685475},
{"index": 2, "score": 0.034488525}
],
"interpretation": "ASME(en)+고압가스(ko) 둘 다 weather 보다 명확 높음. 한국어/영어 score gap 0.17 — baseline 과 비슷. score 절대값 baseline 보다 낮음 (model 별 calibration 차이, rank 순서는 동일)."
}
},
"sanity_check": "ASME(en) > 고압가스(ko) > weather(noise) 순서 — 두 모델 모두 통과. 후보가 한국어 무관하지 않은지 검증.",
"captured_at": "2026-05-23"
}
@@ -0,0 +1,48 @@
{
"version": "v0.2-phase2b",
"label": "baseline_snapshot",
"date": "2026-05-23",
"snapshot": {
"doc_id_max": 25180,
"chunk_id_max": 56526,
"documents_n": 21365,
"chunks_n": 30605
},
"eval_set": {
"total_cases": 51,
"scored_cases": 46,
"failure_expected_cases": 5
},
"model_config": {
"embedding": "BAAI/bge-m3 (production)",
"reranker": "BAAI/bge-reranker-v2-m3 (production)",
"search_mode": "hybrid",
"rerank_enabled": "server_default",
"embedding_backend": "baseline",
"reranker_backend": "baseline",
"plan": "round-2-review-mighty-starfish.md v2.1 (Phase 2B)"
},
"overall": {
"n": 46,
"graded_ndcg_at_10": 0.659,
"graded_recall_at_10_t2": 0.695,
"graded_recall_at_10_t3": 0.761,
"latency_p50_ms": 454,
"latency_p95_ms": 1573,
"failure_correct": "0/5"
},
"by_category": {
"english_only": { "n": 9, "recall_at_10": 0.78, "ndcg_at_10": 0.71, "graded_ndcg_at_10": 0.78 },
"exam": { "n": 7, "recall_at_10": 0.57, "ndcg_at_10": 0.62, "graded_ndcg_at_10": 0.74 },
"korean_only": { "n": 9, "recall_at_10": 0.55, "ndcg_at_10": 0.47, "graded_ndcg_at_10": 0.51 },
"mixed": { "n": 10, "recall_at_10": 0.38, "ndcg_at_10": 0.36, "graded_ndcg_at_10": 0.39 },
"standards": { "n": 11, "recall_at_10": 0.91, "ndcg_at_10": 0.85, "graded_ndcg_at_10": 0.87 }
},
"by_language": {
"en": { "n": 9, "recall_at_10": 0.78, "graded_ndcg_at_10": 0.78 },
"ko": { "n": 27, "recall_at_10": 0.70, "graded_ndcg_at_10": 0.72 },
"mixed": { "n": 10, "recall_at_10": 0.38, "graded_ndcg_at_10": 0.39 }
},
"raw_csv": "reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv",
"reproducibility_check": "Phase 2A baseline_snapshot (NDCG 0.659 동일) — snapshot filter path 안정 + 재현성 확인"
}
@@ -0,0 +1,60 @@
{
"version": "v0.2-phase2b",
"label": "cand_gte_ml_base",
"date": "2026-05-23",
"snapshot": {
"doc_id_max": 25180,
"chunk_id_max": 56526,
"documents_n": 21365,
"chunks_n": 30605
},
"eval_set": {
"total_cases": 51,
"scored_cases": 46,
"failure_expected_cases": 5
},
"model_config": {
"embedding": "BAAI/bge-m3 (production, 고정)",
"reranker": "Alibaba-NLP/gte-multilingual-reranker-base",
"reranker_params": "305M",
"reranker_context": 8192,
"reranker_license": "Apache 2.0",
"search_mode": "hybrid",
"rerank_enabled": "server_default",
"embedding_backend": "baseline",
"reranker_backend": "cand_gte_ml_base",
"endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
"plan": "round-2-review-mighty-starfish.md v2.1 (Phase 2B)"
},
"overall": {
"n": 46,
"graded_ndcg_at_10": 0.604,
"graded_recall_at_10_t2": 0.709,
"graded_recall_at_10_t3": 0.783,
"latency_p50_ms": 345,
"latency_p95_ms": 1460,
"failure_correct": "0/5"
},
"by_category": {
"english_only": { "n": 9, "recall_at_10": 0.78, "ndcg_at_10": 0.68, "graded_ndcg_at_10": 0.72 },
"exam": { "n": 7, "recall_at_10": 0.64, "ndcg_at_10": 0.53, "graded_ndcg_at_10": 0.62 },
"korean_only": { "n": 9, "recall_at_10": 0.50, "ndcg_at_10": 0.39, "graded_ndcg_at_10": 0.41 },
"mixed": { "n": 10, "recall_at_10": 0.42, "ndcg_at_10": 0.35, "graded_ndcg_at_10": 0.38 },
"standards": { "n": 11, "recall_at_10": 0.91, "ndcg_at_10": 0.84, "graded_ndcg_at_10": 0.86 }
},
"by_language": {
"en": { "n": 9, "recall_at_10": 0.78, "graded_ndcg_at_10": 0.72 },
"ko": { "n": 27, "recall_at_10": 0.71, "graded_ndcg_at_10": 0.65 },
"mixed": { "n": 10, "recall_at_10": 0.42, "graded_ndcg_at_10": 0.38 }
},
"raw_csv": "reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv",
"delta_vs_baseline": {
"graded_ndcg_at_10": -0.055,
"mixed": -0.01,
"korean_only": -0.10,
"standards": -0.01,
"english_only": -0.06,
"exam": -0.12,
"latency_p50_ms": -109
}
}
+17 -3
View File
@@ -202,6 +202,7 @@ async def call_search(
embedding_backend: str | None = None, embedding_backend: str | None = None,
snapshot_doc_id_max: int | None = None, snapshot_doc_id_max: int | None = None,
snapshot_chunk_id_max: int | None = None, snapshot_chunk_id_max: int | None = None,
reranker_backend: str | None = None,
) -> tuple[list[int], float]: ) -> tuple[list[int], float]:
"""검색 API 호출 → (doc_ids, latency_ms).""" """검색 API 호출 → (doc_ids, latency_ms)."""
url = f"{base_url.rstrip('/')}/api/search/" url = f"{base_url.rstrip('/')}/api/search/"
@@ -219,6 +220,8 @@ async def call_search(
params["snapshot_doc_id_max"] = snapshot_doc_id_max params["snapshot_doc_id_max"] = snapshot_doc_id_max
if snapshot_chunk_id_max is not None: if snapshot_chunk_id_max is not None:
params["snapshot_chunk_id_max"] = snapshot_chunk_id_max params["snapshot_chunk_id_max"] = snapshot_chunk_id_max
if reranker_backend is not None:
params["reranker_backend"] = reranker_backend
import time import time
@@ -249,6 +252,7 @@ async def evaluate(
embedding_backend: str | None = None, embedding_backend: str | None = None,
snapshot_doc_id_max: int | None = None, snapshot_doc_id_max: int | None = None,
snapshot_chunk_id_max: int | None = None, snapshot_chunk_id_max: int | None = None,
reranker_backend: str | None = None,
) -> list[QueryResult]: ) -> list[QueryResult]:
"""전체 쿼리셋 평가.""" """전체 쿼리셋 평가."""
results: list[QueryResult] = [] results: list[QueryResult] = []
@@ -261,6 +265,7 @@ async def evaluate(
embedding_backend=embedding_backend, embedding_backend=embedding_backend,
snapshot_doc_id_max=snapshot_doc_id_max, snapshot_doc_id_max=snapshot_doc_id_max,
snapshot_chunk_id_max=snapshot_chunk_id_max, snapshot_chunk_id_max=snapshot_chunk_id_max,
reranker_backend=reranker_backend,
) )
results.append( results.append(
QueryResult( QueryResult(
@@ -837,6 +842,7 @@ async def call_search_full(
embedding_backend: str | None = None, embedding_backend: str | None = None,
snapshot_doc_id_max: int | None = None, snapshot_doc_id_max: int | None = None,
snapshot_chunk_id_max: int | None = None, snapshot_chunk_id_max: int | None = None,
reranker_backend: str | None = None,
) -> tuple[list[dict], float]: ) -> tuple[list[dict], float]:
"""call_search와 동일 로직. 단 full result dict 리스트 반환.""" """call_search와 동일 로직. 단 full result dict 리스트 반환."""
url = f"{base_url.rstrip('/')}/api/search/" url = f"{base_url.rstrip('/')}/api/search/"
@@ -856,6 +862,8 @@ async def call_search_full(
params["snapshot_doc_id_max"] = snapshot_doc_id_max params["snapshot_doc_id_max"] = snapshot_doc_id_max
if snapshot_chunk_id_max is not None: if snapshot_chunk_id_max is not None:
params["snapshot_chunk_id_max"] = snapshot_chunk_id_max params["snapshot_chunk_id_max"] = snapshot_chunk_id_max
if reranker_backend is not None:
params["reranker_backend"] = reranker_backend
import time import time
@@ -1308,6 +1316,12 @@ def main() -> int:
default=None, default=None,
help="Phase 2A snapshot freeze. document_chunks.id <= 값 filter. baseline rebaseline 도 동일 적용.", help="Phase 2A snapshot freeze. document_chunks.id <= 값 filter. baseline rebaseline 도 동일 적용.",
) )
parser.add_argument(
"--reranker-backend",
type=str,
default=None,
help="Phase 2B Diagnose reranker dispatcher slug (baseline | cand_gte_ml_base). 미지정 = production.",
)
args = parser.parse_args() args = parser.parse_args()
@@ -1361,21 +1375,21 @@ def main() -> int:
if args.base_url: if args.base_url:
print(f"\n>>> evaluating: {args.base_url}") print(f"\n>>> evaluating: {args.base_url}")
results = asyncio.run( results = asyncio.run(
evaluate(queries, args.base_url, args.token, "single", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max) evaluate(queries, args.base_url, args.token, "single", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max, reranker_backend=args.reranker_backend)
) )
print_summary("single", results, eval_version=args.eval_version) print_summary("single", results, eval_version=args.eval_version)
all_results.extend(results) all_results.extend(results)
else: else:
print(f"\n>>> baseline: {args.baseline_url}") print(f"\n>>> baseline: {args.baseline_url}")
baseline_results = asyncio.run( baseline_results = asyncio.run(
evaluate(queries, args.baseline_url, args.token, "baseline", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max) evaluate(queries, args.baseline_url, args.token, "baseline", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max, reranker_backend=args.reranker_backend)
) )
baseline_summary = print_summary("baseline", baseline_results, eval_version=args.eval_version) baseline_summary = print_summary("baseline", baseline_results, eval_version=args.eval_version)
print(f"\n>>> candidate: {args.candidate_url}") print(f"\n>>> candidate: {args.candidate_url}")
candidate_results = asyncio.run( candidate_results = asyncio.run(
evaluate( evaluate(
queries, args.candidate_url, args.token, "candidate", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max queries, args.candidate_url, args.token, "candidate", mode=args.mode, fusion=args.fusion, rerank=args.rerank, analyze=args.analyze, embedding_backend=args.embedding_backend, snapshot_doc_id_max=args.snapshot_doc_id_max, snapshot_chunk_id_max=args.snapshot_chunk_id_max, reranker_backend=args.reranker_backend
) )
) )
candidate_summary = print_summary("candidate", candidate_results, eval_version=args.eval_version) candidate_summary = print_summary("candidate", candidate_results, eval_version=args.eval_version)