e50869cbda
신규 컨테이너 marker-service (port 3300, Marker 1.10.2 + surya 0.17.1 + HF cache volume). marker_worker 가 markdown stage 큐 소비: classify_worker → enqueue 'markdown' (leaf, embed/chunk 와 독립) → SKIP_DOC_TYPES (발주서/세금계산서/명세표) 스킵 → 확장자 != .pdf 스킵 (Phase 1B = PDF only) → page_count > 200 스킵 → marker-service POST /convert → 422/404 = doc-level failed, 5xx = queue retry 안정성 장치: - migration 222: ALTER TYPE process_stage ADD VALUE markdown (단일 statement) - md_extraction_quality JSONB dict 직접 저장 - skip 시 md_content/hash NULL 클리어 - /ready Response.status_code + warmup_error 가시화 - HF cache volume (build-time download 0) - file_path 는 NAS 상대경로 → /documents prefix prepend 성공 기준: 파이프라인 안정성. markdown 품질은 Phase 1D pilot. Pre-flight (2026-05-01): - marker-pdf 1.10.2 stable - file_path 9503건 NAS 상대경로 - DOCUMENT_TYPES 한국어 7종 → SKIP alias 보강 - queue retry max_attempts=3 + reset_stale_items 확인 - main 220/221 study_q_related 선점 → 222 rebump Plan: ~/.claude/plans/plan-idempotent-sundae.md (Round 5 approved) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23 lines
640 B
Docker
23 lines
640 B
Docker
FROM python:3.12-slim
|
|
|
|
WORKDIR /app
|
|
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
libgl1 libglib2.0-0 curl \
|
|
&& apt-get clean && rm -rf /var/lib/apt/lists/*
|
|
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir \
|
|
--extra-index-url https://download.pytorch.org/whl/cu126 \
|
|
-r requirements.txt
|
|
|
|
# 모델 미다운로드 (HF cache volume → 첫 호출/warmup 시 적재).
|
|
|
|
COPY server.py .
|
|
|
|
EXPOSE 3300
|
|
HEALTHCHECK --start-period=300s --interval=30s --timeout=10s --retries=3 \
|
|
CMD curl -f http://localhost:3300/ready || exit 1
|
|
|
|
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "3300"]
|