hyungi_document_server

Author	SHA1	Message	Date
hyungi	5bf9ff9dc2	docs(eval): Phase 2 canary 결과 — HALT (failed 4/40 = 10%, but 분류상 Marker 0 fail) 35 success / 3 failed / 1 skipped / 1 stuck processing (corner case). Plan 게이트 FAIL (success<36 + failed>2). 다만 failure root cause 분석: - 2/4 = GPU contention (5.93+5.35 GiB 다른 process 점유, free 50 MiB) - 1/4 = 진짜 corrupt PDF (Pdfium error, non-retryable) - 1/4 = scan-likely + tiny text + ReadTimeout (Phase 1B corner case) Marker quality 자체 fail = 0. p50 elapsed 33.2s (1D 34s 와 동등), text_length_ratio p50 1.00 (1D 1.15 대비 -13%, 정상 범위), 신규 warning 없음. 사용자 결정: A(수용) / B(코드 가드 추가) / C(OOM 2건 즉시 재 enqueue → GO 통과) / D(HALT 유지). 추천 C 또는 A. 5201 stuck processing 은 어느 옵션이든 수동 DB 정리 필요 (사용자 승인 후).	2026-05-10 05:47:20 +00:00
hyungi	f61dce262e	docs(eval): Phase 2 경로 정책 정정 — 2-B /app/logs vs 2-C /app/scripts canonical Plan/README 가 /app/scripts 를 통일 경로로 가정했으나 실측 결과 read-only bind-mount 라 docker cp 불가. soft lock 으로 --build 도 금지. 단계별로 다른 경로 사용해야 함: - 2-B canary (pre-merge): /app/logs/phase2_backfill.py + /app/logs/.csv (docker cp worktree → /app/logs rw bind-mount). canary 검증 동안 미검증 코드 main 진입 회피. - 2-C nightly (post-merge canonical): /app/scripts/phase2_backfill.py + /app/evals/markdown/phase2_ (feat/phase2-backfill main 머지 + parent git pull 후 bind-mount 자동 활성). cron 도 canonical path. evals/markdown/README.md 의 enqueue 예제 + 신규 #### 경로 정책 섹션 반영.	2026-05-10 05:47:20 +00:00
hyungi	48f8bf6ca6	docs(eval): Phase 2 canary sample — 40 docs (seed 20260503) Bucket distribution (algorithm vs allocated): - large (>10MB): 6 / 6 - scan_likely (text_density<5): 2 / 2 - study_note born-digital: 10 / 10 - Academic_Paper born-digital: 2 / 8 (under-fill — only 20 born-digital docs total in pool) - Reference born-digital: 0 / 6 (under-fill — 동상) - tech_doc (Standard/Manual/Specification): 4 / 4 - minor_doc (Note/Report/Memo/NULL): 4 / 4 - filler (rest from candidates): 12 (picked up under-fill slack) Note: 1D 의 born-digital bias 가정이 Phase 2 실 모집단과 안 맞음 (text_density 분포가 mixed-dominant: 174/237). 그래도 40 docs 가 large / scan-likely / 다양 doctype 커버 — canary 진단 목적 충족. Next: 사용자 승인 게이트 — --no-dry-run enqueue 시점 결정.	2026-05-10 05:47:20 +00:00
hyungi	ac58c8262c	docs(eval): Phase 2 inventory dry-run — 237 pending PDFs, 227 convert candidates - forecast_skip_reason distribution: - none: 227 (convert candidates) - over_max_pages_estimated: 10 (file_size > 25MB proxy) - handwritten_hint: 0 (1D-A1 skip already in marker_worker) - doctype_skip: 0 - file_size_band: S=47 / M=160 / L=30 - text_density_band: mixed=174 / scan-likely=43 / born-digital=20 - doc_type top: study_note 79 / Academic_Paper 57 / Reference 35 / Standard 24 / Manual 19 - 시드 baseline for select-canary (next step)	2026-05-10 05:47:20 +00:00
hyungi	25ee10ac34	feat(scripts): Phase 2 markdown backfill — script + README - scripts/phase2_backfill.py: 5 subcommands - inventory: pending PDFs dry-run CSV with skip forecast - select-canary: stratified 40 sample (seed 20260503) - enqueue: one-shot from sample CSV (--no-dry-run gate) - nightly-enqueue: cron-friendly with disable flag / marker /ready / active-queue threshold (oldest_age stuck guard) / DB pool guards - post-report: final state CSV + 1D baseline comparison MD - evals/markdown/README.md: Phase 2 section appended - plan: ~/.claude/plans/iridescent-gathering-clover.md - depends on Phase 1B handwritten skip `7d0fca2` (marker_worker side guard)	2026-05-10 05:47:20 +00:00
Hyungi Ahn	e4fe18b7a8	docs(eval): 1D pilot 약식 평가 결과 기록 사용자 quality 평가: "애플펜슬로 필기한건 내 글씨체 이슈에 더해서 좋은 자료를 뽑아내지 못하네 그 외에는 잘되는거 같은데" 분류: overall_pass=true 24건 — 일반 PDF (born-digital + scan-like 中 5127 같이 정상 변환되는 케이스) overall_pass=false 4건 — 애플펜슬 필기 4건 (4798/4813/4815 controlled_backfill + 4809 anchor) overall_pass=empty 2건 — page_count > MAX_PAGES=200 의도 skip (5178 ASME 272p, 5180 ASME Sec I 453p) 정식 rubric 5축 (text_accuracy/structure/noise_rate/multi_script/ completeness) 점수는 비워둠 — 사용자 약식 판정으로도 의사결정 매트릭스 분기 (필기만 fail → SKIP rule 확장) 가 명확해 정식 채점 over-investment. 후속 라운드 (Marker 튜닝/대안 OCR 도입 시) 같은 30건 재평가에는 정식 rubric 채울 가치 있음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:15:33 +09:00
Hyungi Ahn	b09687d41d	feat(scripts): Phase 1D Round 2 — controlled backfill stratification 기존 phase1d_pilot.py (단순 ai_domain × file_size 3-bucket) 를 plan ~/.claude/plans/stratified-mingling-otter.md 의 4축 + sample_source 분리 + forced_include 로 augment. Round 1 (ai_domain × file_size 3-bucket) 의 한계: pending PDFs 의 자연 분포만 반영 → 알려진 약점 (필기/스캔/한중일 mixed OCR) 이 sample 에 안 들어옴. 1C 시각 확인에서 doc 4809 (Note_240805_용접교육 필기) 가 실제로 그 패턴을 보였는데, 자연 selection 에 맡기면 다음 라운드도 같은 case 가 빠질 위험. Round 2 디자인: - 4 축 stratification: doc_type × file_size_band × text_density_band × handwritten_hint - sample_source ∈ {existing_success(5), controlled_backfill(25)} - forced_include doc 4809 — known bad anchor. 다음 튜닝/대안 도입 후 같은 문서 재변환 결과와 1:1 비교 가능. - text_density = LENGTH(extracted_text) / (file_size / 1024) chars/KB 가장 깨끗한 단일 proxy. 0.17(필기 4809) ↔ 94(born-digital 3759) 양 끝 검증. - script_mix proxy: Hangul/CJK/Hiragana/Katakana/Latin Unicode block ratio → korean_dominant / mixed_korean_cjk / mixed_korean_latin / cjk_dominant / latin_dominant / unknown. - page_count_estimate: existing_success 는 md_extraction_quality. metrics.source_page_count 사용. controlled_backfill 은 NULL (marker 가 PyMuPDF 로 어차피 다시 읽음). - 시드 SAMPLE_SEED=20260502 고정, 재현성 보장. Sample 분포 (실측 2026-05-02): bucket_label: born_digital=12, mixed=5, existing_calibration=4, handwritten=3, scan_likely=3, large=2, existing_anchor=1 doc_type: Academic_Paper=7, study_note=6, Standard=5, Note=4, Reference=3, Manual=3, Drawing=1, Report=1 file_size_band: M=14, S=12, L=4 text_density_band: born-digital=15, scan-likely=9, mixed=6 handwritten_hint: lo=26, hi=4 (모집단 1.1% 대비 13배 over-sample) forced anchor doc 4809 = density 0.17 (사용자 시각 확인의 그 문서) 새 subcommand: eval_template — pilot_1d_eval.csv 스켈레톤 (rubric 5축 1~5 + overall_pass + notes). 사용자가 MarkdownDoc + PDF 토글 비교하며 점수 채움. 기존 cmd_enqueue (snapshot/backup/dedup) + cmd_report (quality 메트릭) 는 유지. 산출물: scripts/phase1d_pilot.py — 4축 + sample_source + forced_include + eval_template subcommand. CSV+JSON dual output. evals/markdown/README.md — rubric + decision matrix + workflow guide. evals/markdown/pilot_1d_sample.csv — 30 rows × 15 cols (시드 결과, 재현성 보존). evals/markdown/pilot_1d_eval.csv — 빈 스켈레톤 (사용자 평가 후 채움). 실행 경계: Step 1~3 (selection / template / dry-run) = 본 PR 으로 완료. Step 4 (--yes enqueue, 실제 30건 markdown 큐 인입) = 사용자 timing 승인 + 야간 단발 sweep 윈도우 (23:00~03:00 KST) 안에서 별도 실행. marker-service BATCH_SIZE=1, 30건 평균 5분/건 ≈ 2.5h. Verify: GPU 서버 fastapi 컨테이너에서 select 실행 → 30건 sample CSV 생성됨. eval_template subcommand 동작 확인. enqueue dry-run 으로 30 doc_ids + snapshot 출력 후 사용자 취소 분기 확인. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 16:15:09 +09:00
Hyungi Ahn	c82d52e73f	feat(eval): E.6 runner + 평가셋 main 복원 (from feat/eval-infra) selective checkout (not cherry-pick): - scripts/run_eval_ask.py (RESULT_FIELDS 21 고정, X-Source:eval 헤더) - evals/ask_analyze_v1.jsonl (300 case = ask 220 + analyze 80) E.3/E.6 측정 진입점. feat/eval-infra 의 원본은 유지. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 09:10:18 +09:00

8 Commits