feat(eval): v0.2 28 신규 case + 2026-05-23 baseline + analysis #25
Reference in New Issue
Block a user
Delete Branch "feat/eval-v0-2-baseline-analysis"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
PR-1 (
725a4e1) v0.2 schema + harness 위에 신규 28 case 추가 → 51 case완성 + 현재 모델로 baseline 박제 + 약점 카테고리 analysis md.
신규 28 case 분포 (계획 +28 = standards +6 / english_only +8 / mixed +5
/ exam +7 / failure_expected +2 / ocr_derived 0):
Hydrogen ASME + Industrial Safety 영문 교재 + Structural Analysis)
= RSS feed 명. OCR 식별 컬럼 부재 → +4 case 재배분, analysis 명시)
baseline 측정 결과 (corpus 21,385, hybrid mode, bge-m3 + bge-reranker-v2-m3):
약점 top 3 (analysis md):
multilingual 한계 추정
Phase 2 dispatch 권고 (analysis md):
Query rewrite 는 Phase 2Q/Search-PR 로 별도 분리.
영향 받는 파일:
PR plan: ~/.claude/plans/pr-2-serialized-hummingbird.md
Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com