feat(canonical): Phase 1B marker-service + marker_worker for PDF→markdown (222)
신규 컨테이너 marker-service (port 3300, Marker 1.10.2 + surya 0.17.1 + HF cache volume). marker_worker 가 markdown stage 큐 소비: classify_worker → enqueue 'markdown' (leaf, embed/chunk 와 독립) → SKIP_DOC_TYPES (발주서/세금계산서/명세표) 스킵 → 확장자 != .pdf 스킵 (Phase 1B = PDF only) → page_count > 200 스킵 → marker-service POST /convert → 422/404 = doc-level failed, 5xx = queue retry 안정성 장치: - migration 222: ALTER TYPE process_stage ADD VALUE markdown (단일 statement) - md_extraction_quality JSONB dict 직접 저장 - skip 시 md_content/hash NULL 클리어 - /ready Response.status_code + warmup_error 가시화 - HF cache volume (build-time download 0) - file_path 는 NAS 상대경로 → /documents prefix prepend 성공 기준: 파이프라인 안정성. markdown 품질은 Phase 1D pilot. Pre-flight (2026-05-01): - marker-pdf 1.10.2 stable - file_path 9503건 NAS 상대경로 - DOCUMENT_TYPES 한국어 7종 → SKIP alias 보강 - queue retry max_attempts=3 + reset_stale_items 확인 - main 220/221 study_q_related 선점 → 222 rebump Plan: ~/.claude/plans/plan-idempotent-sundae.md (Round 5 approved) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,8 @@
|
||||
-- 222_processing_queue_stage_markdown.sql
|
||||
-- Phase 1B: process_stage enum 에 'markdown' 추가.
|
||||
-- plan: ~/.claude/plans/plan-idempotent-sundae.md (Phase 1B Round 5)
|
||||
--
|
||||
-- marker_worker 가 소비할 stage. classify_worker 완료 시 enqueue.
|
||||
-- 단일 statement (asyncpg exec_driver_sql 제약, feedback_migration_runner_single_statement).
|
||||
|
||||
ALTER TYPE process_stage ADD VALUE IF NOT EXISTS 'markdown';
|
||||
Reference in New Issue
Block a user