hyungi_document_server/tests at 725a4e1f1d2c3508093db452b6d647b807bd684b - hyungi_document_server - Hyun-git

hyungi/hyungi_document_server

Files

T

History

hyungi 725a4e1f1d feat(eval): v0.2 graded relevance schema + harness

queries.yaml v0.1 23 case → v0.2 schema swap:
- 7 카테고리 (standards / korean_only / english_only / mixed / exam /
  ocr_derived / failure_expected)
- language / ocr_derived / failure_expected / graded_relevance 컬럼 추가
- v0.1 호환 보존 (legacy_category + relevant_ids + top3_ids)
- 신규 28 case (50+ 목표) 는 후속 PR-Eval-V0_2-Baseline-Analysis

run_eval.py 확장:
- graded_ndcg_at_k / graded_recall_at_k 함수 추가
- Query / QueryResult dataclass 확장 (v0.2 컬럼)
- load_queries v0.1 fallback (top3 → grade 3, 나머지 → grade 2)
- --eval-version v0.1/v0.2/both flag (default both)
- print_summary 의 by_language / by_ocr_derived 집계 추가
- write_csv 의 graded 컬럼 추가

README.md 신규:
- graded 등급 정의 (0~3) + 카테고리 정의 (7개)
- v0.2 schema 컬럼 + 신규 case 작성 가이드
- v0.1 호환성 + CLI 사용 예 + baseline 박제 정책

Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md
Parent: ~/.claude/plans/peppy-hugging-nest.md § Phase 1

본 PR closure: schema + harness + README. 신규 28 case + baseline 박제 +
약점 분석 (embedding-sensitive failure pattern 4 카테고리 식별) 은 후속 PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 01:21:06 +00:00

..

feat(search): /ask/react endpoint with Qwen native tool calling ReAct loop

2026-05-22 13:43:47 +00:00

calibrate_fixtures

feat(scripts): Phase 3.5 — calibrate_ask.py CLI (Q0~Q8 + render + FP CSV)

2026-04-17 08:11:06 +09:00

feat(search): /ask/react endpoint with Qwen native tool calling ReAct loop

2026-05-22 13:43:47 +00:00

ops(gpu-health): GPU 서비스 health/smoke 표준화 + synthetic VRAM 피크 가드

2026-05-14 09:42:07 +09:00

feat(ask): Phase 3.5a guardrails (classifier + refusal gate + grounding + partial)

2026-04-10 08:49:11 +09:00

fix(policy): use container-compatible imports (drop app. prefix)

2026-04-24 09:42:24 +09:00

feat(chunk): Phase 1.2-E reindex 스크립트 추가

2026-04-08 12:31:29 +09:00

feat(eval): v0.2 graded relevance schema + harness

2026-05-23 01:21:06 +00:00

feat(search): /ask/react endpoint with Qwen native tool calling ReAct loop

2026-05-22 13:43:47 +00:00

__init__.py

feat: scaffold v2 project structure with Docker, FastAPI, and config

2026-04-02 10:20:15 +09:00

_worker_pool_helpers.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

conftest.py

feat: scaffold v2 project structure with Docker, FastAPI, and config

2026-04-02 10:20:15 +09:00

test_ask_eval_auth.py

feat(ask): Phase 3.5 A0 — ask_events source/eval_case_id + eval auth boundary

2026-04-17 08:11:06 +09:00

test_briefing_historical.py

fix(briefing): backfill country_perspectives[].article_ids from cluster members

2026-05-12 13:15:26 +09:00

test_explanation_cap.py

fix(tests): explanation cap test setup — 한글 chunk 길이 부족 보정

2026-05-02 08:35:34 +09:00

test_freshness_decay.py

feat(search): PR-RAG-Time-1 freshness decay (news/law_monitor)

2026-05-03 08:38:09 +09:00

test_grounding_fabricated_number.py

feat(grounding): Phase 3.5 B1 — unit-aware fabricated_number + bound semantics

2026-04-17 08:11:06 +09:00

test_internal_worker_authz.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_internal_worker_endpoints.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_laptop_worker_bot_auth.py

feat(worker-pool): Registry-1B Pull 활성화 (auth + worker_jobs + 5 endpoint)

2026-05-19 08:54:07 +09:00

test_marker_image_persist.py

feat(markdown): persist extracted images with auth routes

2026-05-10 14:05:41 +09:00

test_priority_gate.py

feat(search): MLX priority gate (B-1, Priority.FOREGROUND vs BACKGROUND)

2026-05-17 08:42:58 +09:00

test_session_summary_guard_pattern.py

feat(study): Phase 4-D 운영 관찰 + confidence calibration

2026-05-02 07:33:57 +09:00

test_synthesis_failure_regate.py

fix(search): re-gate Tier 0 — synthesis self-refuse / timeout / empty answer 일관 처리

2026-04-17 08:29:49 +09:00

test_verifier_numeric_promote.py

feat(verifier): Phase 3.5 B2 — numeric_conflict promote (env flag) + Tier 4

2026-04-17 08:11:06 +09:00

test_worker_jobs_ownership.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_worker_jobs_retry.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_worker_jobs_skip_locked.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_worker_jobs_smoke.py

refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone

2026-05-19 12:43:53 +09:00

test_worker_recap_compaction.py

feat(worker-pool): Registry-1C cap 1MB + deterministic compaction

2026-05-19 12:55:51 +09:00

test_worker_recap_endpoint.py

feat(worker-pool): Registry-1C cap 1MB + deterministic compaction

2026-05-19 12:55:51 +09:00