hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	f8f72ceae2	fix(ocr): Surya 0.17 API + NFC/NFD path normalize - services/ocr/server.py: surya 0.17.x predictors 기반으로 재작성 (구 `from surya.ocr import run_ocr` 제거됨 → import error → 빈 텍스트 반환) - NFC(DB 경로) vs NFD(NFS 파일시스템) 한글 정규화 mismatch 보정 - surya-ocr 버전 0.17.1 고정 (0.6~1.0 범위는 breaking change 노출) - AIClient.ocr() NotImplementedError 제거 (호출처 0건, extract_worker 가 ocr-service HTTP 호출을 직접 사용) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:52:19 +09:00
Hyungi Ahn	7883ac67b3	feat(ocr): Surya OCR 마이크로서비스 추가 GPU 가속 OCR (Surya, Apache 2.0) 별도 컨테이너로 추가. 스캔 PDF/이미지 파일의 텍스트 추출 지원. - services/ocr: Dockerfile + server.py + requirements.txt - /health (liveness) + /ready (readiness, CUDA+모델 상태) - /ocr: 페이지 단위 스트리밍 처리 (메모리 피크 억제) - docker-compose: ocr-service + GPU reservation + ocr_models 볼륨 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:03:55 +09:00
Hyungi Ahn	2a240cb9e9	fix(kordoc): adaptive parse timeout + 동시 파싱 제한 kordoc의 30초 하드 타임아웃을 파일 크기 비례 adaptive(60~300초)로 변경. 대형 PDF/HWP가 파싱 타임아웃으로 영구 실패하던 문제 해결. - getParseTimeoutMs(): 10MB당 60초, 최소 60초, 최대 300초 - parseJobs Map 기반 동시 파싱 2건 제한 (유령 작업 누적 방지) - 상세 로그: START/DONE/ZOMBIE_DONE/REJECTED + ext/size/elapsed/active - clearTimeout으로 정상 완료 시 불필요한 타이머 콜백 정리 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:00:12 +09:00
Hyungi Ahn	2dfb05e653	fix: convert kordoc service to ESM (kordoc requires ESM import) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:38:34 +09:00
Hyungi Ahn	299fac3904	feat: implement Phase 1 data pipeline and migration - Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib, text files direct read, images flagged for OCR) - Add queue consumer with APScheduler (1min interval, stage chaining extract→classify→embed, stale item recovery, retry logic) - Add extract worker (kordoc HTTP call + direct text read) - Add classify worker (Qwen3.5 AI classification with think-tag stripping and robust JSON extraction from AI responses) - Add embed worker (GPU server nomic-embed-text, graceful failure) - Add DEVONthink migration script with folder mapping for 16 DBs, dry-run mode, batch commits, and idempotent file_path UNIQUE - Enhance ai/client.py with strip_thinking() and parse_json_response() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:35:36 +09:00
Hyungi Ahn	131dbd7b7c	feat: scaffold v2 project structure with Docker, FastAPI, and config 동작하는 최소 코드 수준의 v2 스캐폴딩: - docker-compose.yml: postgres, fastapi, kordoc, frontend, caddy - app/: FastAPI 백엔드 (main, core, models, ai, prompts) - services/kordoc/: Node.js 문서 파싱 마이크로서비스 - gpu-server/: AI Gateway + GPU docker-compose - frontend/: SvelteKit 기본 구조 - migrations/: PostgreSQL 초기 스키마 (documents, tasks, processing_queue) - tests/: pytest conftest 기본 설정 - config.yaml, Caddyfile, credentials.env.example 갱신 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 10:20:15 +09:00

6 Commits