Files
hyungi_document_server/app/workers
hyungi 7aaabe2c75 feat(search): split markdown processing for large PDFs (>threshold)
PR-DocSrv-LargeDoc-Split-Markdown-1 commit 3 (plan brisk-paging-quokka.md).

- page_count gauge 분기: 소형(<=120p)=_process_single 통째 1-shot / 대형(>120p)=_process_split
- MAX_PAGES=200 hard skip 제거 → 대형은 BATCH_PAGES=40 page-range 윈도우 순차 변환
- 각 batch /convert start_page/end_page(1-based) 호출 + slug 충돌 회피 batch별 ref rewrite + stitch
- _persist_images_to_nas seq_offset → batch 간 image_key(img_NNN) 연속
- md_status success/partial/failed (전부/일부/전무) + failed batch manifest JSON
- 대형 md_content = head+manifest (LARGE_DOC_MD_CONTENT_HEAD_CHARS=50000), canonical=document_chunks(commit 4)
- MARKER_MAX_SPLIT_PAGES=5000 초과 = skipped_too_large 안전상태

검증: G1 소형회귀 doc6675 동일(success,6292,14)/single경로 / G2 doc5180 453p→12batch success
manifest+207img(img_001~207 연속) / G4 stuck0 restart0 각batch<300s. 섹션 chunk적재(G3)=commit 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:39:49 +00:00
..