48f8bf6ca6
Bucket distribution (algorithm vs allocated): - large (>10MB): 6 / 6 - scan_likely (text_density<5): 2 / 2 - study_note born-digital: 10 / 10 - Academic_Paper born-digital: 2 / 8 (under-fill — only 20 born-digital docs total in pool) - Reference born-digital: 0 / 6 (under-fill — 동상) - tech_doc (Standard/Manual/Specification): 4 / 4 - minor_doc (Note/Report/Memo/NULL): 4 / 4 - filler (rest from candidates): 12 (picked up under-fill slack) Note: 1D 의 born-digital bias 가정이 Phase 2 실 모집단과 안 맞음 (text_density 분포가 mixed-dominant: 174/237). 그래도 40 docs 가 large / scan-likely / 다양 doctype 커버 — canary 진단 목적 충족. Next: 사용자 승인 게이트 — --no-dry-run enqueue 시점 결정.
42 lines
6.2 KiB
CSV
42 lines
6.2 KiB
CSV
doc_id,title,file_size,file_size_band,text_density,text_density_band,doc_type,bucket_label
|
|
3817,"Safety and Health for Engineers_05_37 Safety and Health Data, Analysis and Management Information",11313981,L,11.264,mixed,study_note,large
|
|
4059,공업역학 동역학(제13판)_Chapter 12 질점의 운동학,11140007,L,8.78,mixed,Academic_Paper,large
|
|
5264,황현필의 진보를 위한 역사_2장 식민지 수탈과 학살의 진실을 말하라,14154100,L,2.14,scan-likely,Academic_Paper,large
|
|
5271,황현필의 진보를 위한 역사_9장 박정희 신격화를 중단하라,16266788,L,1.878,scan-likely,Academic_Paper,large
|
|
5267,황현필의 진보를 위한 역사_5장 해방정국을 감추지 말라,12320415,L,1.617,scan-likely,Note,large
|
|
5090,일반기계기사_실전 모의고사,24976506,L,15.75,mixed,Academic_Paper,large
|
|
8857,"TKP-26-0132_신양철강_74,290원 - 복사본",450026,S,4.407,scan-likely,Specification,scan_likely
|
|
5116,표준기계설계(KS)_04_나사,21167224,L,2.759,scan-likely,Standard,scan_likely
|
|
5238,근로자 노출평가제도 내 위험성평가 방법론의 적용,393908,S,89.488,born-digital,study_note,study_note
|
|
5240,사업장 위험성평가에 관한 법제의 비교법적 고찰,437377,S,83.786,born-digital,study_note,study_note
|
|
3771,Industrial Safety and Health Management(7-ED)_16 Welding,1239353,M,62.251,born-digital,study_note,study_note
|
|
3762,Industrial Safety and Health Management(7-ED)_7 Buildings and Facilities,1123847,M,60.053,born-digital,study_note,study_note
|
|
3768,Industrial Safety and Health Management(7-ED)_13 Fire Protection,725779,S,62.01,born-digital,study_note,study_note
|
|
3764,Industrial Safety and Health Management(7-ED)_9 Health and Toxic Substances,1409470,M,66.395,born-digital,study_note,study_note
|
|
3766,Industrial Safety and Health Management(7-ED)_11 Flammable and Explosive Materials,1185190,M,54.637,born-digital,study_note,study_note
|
|
3756,Industrial Safety and Health Management(7-ED)_1 The Safety and Health Manager,550267,S,76.971,born-digital,study_note,study_note
|
|
3761,Industrial Safety and Health Management(7-ED)_6 Process safety and Disaster Preparedness,773058,S,65.656,born-digital,study_note,study_note
|
|
3767,Industrial Safety and Health Management(7-ED)_12 Personal Protection and First Aid,1553116,M,65.161,born-digital,study_note,study_note
|
|
5156,스마트팩토리 구축 제조기업 조직원이 지각한 상사의 변혁적리더십이 셀프리더,449251,S,94.531,born-digital,Academic_Paper,Academic_Paper
|
|
5257,Modifying Large Language Model Post-Training for Diverse Creative Writing,885908,S,106.149,born-digital,Academic_Paper,Academic_Paper
|
|
5142,표준기계설계(KS)_10_키 및 스플라인,17185683,L,2.06,scan-likely,Standard,tech_doc
|
|
4546,機械設計_7 不等速運動機構,1205358,M,16.38,mixed,Manual,tech_doc
|
|
5205,2_ASME VIII_Impact Test,364827,S,8.791,mixed,Standard,tech_doc
|
|
5112,표준기계설계(KS)_03_기하 공차의 도시방법,7742981,M,1.377,scan-likely,Standard,tech_doc
|
|
5085,일반기계기사_실전 모의고사 2회,2357404,M,12.318,mixed,Note,minor_doc
|
|
5201,A Endorsement Exam,705076,S,0.022,scan-likely,Memo,minor_doc
|
|
5086,일반기계기사_실전 모의고사 3회,1643798,M,17.111,mixed,Note,minor_doc
|
|
5199,2012_NB_A_March_homewrk_quiz,1937838,M,0.025,scan-likely,Memo,minor_doc
|
|
3780,"Safety and Health for Engineers_02_0 Legal, Aspects of Safety and Health",5089329,M,0.415,scan-likely,study_note,filler
|
|
5160,"스마트팩토리, 제조기업 분야에 새로운 경쟁우위 확보를 위한 견인차 역할을",1261747,M,29.268,mixed,Academic_Paper,filler
|
|
5174,#8. 검사 및 시험 계획_테크니컬 코리아 R.A_현대로템 통합,156781,S,22.043,mixed,Note,filler
|
|
3819,Safety and Health for Engineers_06_Alphabetical List of Organizations Cited in Text,9678536,M,13.504,mixed,study_note,filler
|
|
3784,Safety and Health for Engineers_02_7 Products Liability,6618025,M,10.892,mixed,study_note,filler
|
|
5056,기계진동 이론과 응용(제5판)_목차,835993,S,18.099,mixed,Reference,filler
|
|
3810,Safety and Health for Engineers_04_31 Human Behavior and Performance in safety and Health,8384987,M,10.362,mixed,study_note,filler
|
|
4547,機械設計_8 アクチュエータ,1242829,M,19.333,mixed,Reference,filler
|
|
5093,재료역학_부록,3178000,M,28.455,mixed,Academic_Paper,filler
|
|
5265,황현필의 진보를 위한 역사_3장 독립운동을 상처 내지 말라,10398784,M,2.302,scan-likely,Note,filler
|
|
5161,스마트팩토리를 위한 운영빅데이터 분석 플랫폼,920193,S,22.36,mixed,Academic_Paper,filler
|
|
5130,07_Mechanical Behavior of Materials_Yielding and Fracture under Combined Stresses,3953088,M,32.835,mixed,Academic_Paper,filler
|