feat(extraction): MinerU 2.5 VLM 추출 서비스 + 워커 엔드포인트 env화

marker-service(Surya, ~10GB) 대체 후보. MinerU2.5-Pro-2605-1.2B VLM(vllm-async-engine,
~5.9GB 고정). marker /convert 계약 복제(file_path·start/end·md+base64 images) → 워커는
MARKER_ENDPOINT env 플립만으로 전환. 단일카드(16GB) 검색스택 공존, 40p 윈도우 무변.

- services/mineru: Dockerfile(vllm/vllm-openai:v0.21.0 + mineru[core]) + async server.py
  (NFC/NFD 한글경로 resolver, PyMuPDF page 슬라이스, gpu_memory_utilization 캡)
- docker-compose: mineru-service profile-gated(기본 미기동=marker 무영향) + mineru_models vol
- marker_worker: MARKER_ENDPOINT 하드코딩 → env(기본 marker, 무변)

격리 PoC A/B 8/8 게이트 PASS (한국어/표/수식LaTeX/heading/figure/40p VRAM).
컷오버(env 플립+marker 제거)는 별 단계(읽기뷰 회귀 0 게이트).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
hyungi
2026-06-18 15:58:55 +09:00
parent 5cabf728e6
commit bb929f88d0
4 changed files with 403 additions and 1 deletions
+38
View File
@@ -87,6 +87,43 @@ services:
start_period: 300s
restart: unless-stopped
# MinerU 2.5 VLM PDF→markdown 추출 — marker-service 대체 후보(단일카드 markdown VRAM ~10→~5GB).
# profile-gated: 기본 미기동 = marker 무영향. 활성 = `docker compose --profile mineru up -d mineru-service`.
# 컷오버(A/B 8게이트 PASS) 전까지 fastapi depends_on 에 넣지 않는다(격리). 포트 3301 (marker=3300).
mineru-service:
build: ./services/mineru
profiles: ["mineru"]
ports:
- "127.0.0.1:3301:3301"
expose:
- "3301"
environment:
# vlm-engine = 순수 VLM 단일모델. 기본 hybrid-engine 은 다중모델 로드 = OOM(반드시 명시).
- MINERU_BACKEND=vlm-engine
- MINERU_LANG=${MINERU_LANG:-korean}
# 공유 16GB 카드 공존: 절대 VRAM 캡(GB, 공유카드 robust) + vLLM 분율 캡 병용.
- MINERU_VIRTUAL_VRAM_SIZE=${MINERU_VIRTUAL_VRAM_SIZE:-6}
- MINERU_GPU_MEMORY_UTILIZATION=${MINERU_GPU_MEMORY_UTILIZATION:-0.40}
- MINERU_PRELOAD=${MINERU_PRELOAD:-1}
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
- mineru_models:/root/.cache
ipc: host # vLLM 공유메모리 — 공식 run 의 --ipc=host 대응.
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3301/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 900s # VLM 모델 lazy 다운로드(~2.4GB)+엔진 로드 여유.
restart: unless-stopped
stt-service:
# 2026-05-08 (D9 Track B revised): GPU is canonical STT owner.
# 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유.
@@ -271,3 +308,4 @@ volumes:
ocr_models:
stt_models:
marker_models:
mineru_models: