1 Commits

Author SHA1 Message Date
hyungi 48f8bf6ca6 docs(eval): Phase 2 canary sample — 40 docs (seed 20260503)
Bucket distribution (algorithm vs allocated):
- large (>10MB): 6 / 6
- scan_likely (text_density<5): 2 / 2
- study_note born-digital: 10 / 10
- Academic_Paper born-digital: 2 / 8 (under-fill — only 20 born-digital docs total in pool)
- Reference born-digital: 0 / 6 (under-fill — 동상)
- tech_doc (Standard/Manual/Specification): 4 / 4
- minor_doc (Note/Report/Memo/NULL): 4 / 4
- filler (rest from candidates): 12 (picked up under-fill slack)

Note: 1D 의 born-digital bias 가정이 Phase 2 실 모집단과 안 맞음
(text_density 분포가 mixed-dominant: 174/237). 그래도 40 docs 가 large /
scan-likely / 다양 doctype 커버 — canary 진단 목적 충족.

Next: 사용자 승인 게이트 — --no-dry-run enqueue 시점 결정.
2026-05-10 05:47:20 +00:00