feat(ocr): Surya OCR 마이크로서비스 추가

GPU 가속 OCR (Surya, Apache 2.0) 별도 컨테이너로 추가.
스캔 PDF/이미지 파일의 텍스트 추출 지원.

- services/ocr: Dockerfile + server.py + requirements.txt
- /health (liveness) + /ready (readiness, CUDA+모델 상태)
- /ocr: 페이지 단위 스트리밍 처리 (메모리 피크 억제)
- docker-compose: ocr-service + GPU reservation + ocr_models 볼륨

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-15 15:03:55 +09:00
parent 083aa3126a
commit 7883ac67b3
4 changed files with 156 additions and 0 deletions
+24
View File
@@ -32,6 +32,28 @@ services:
retries: 3
restart: unless-stopped
ocr-service:
build: ./services/ocr
expose:
- "3200"
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
- ocr_models:/root/.cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:3200/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 180s
restart: unless-stopped
ollama:
image: ollama/ollama
volumes:
@@ -102,6 +124,7 @@ services:
environment:
- DATABASE_URL=postgresql+asyncpg://pkm:${POSTGRES_PASSWORD}@postgres:5432/pkm
- KORDOC_ENDPOINT=http://kordoc-service:3100
- OCR_ENDPOINT=http://ocr-service:3200
restart: unless-stopped
frontend:
@@ -129,3 +152,4 @@ volumes:
caddy_data:
ollama_data:
reranker_cache:
ocr_models: