hyungi_document_server/services/kordoc/package.json at 299fac390458df0fbe552514fa13e0fda4c62a79 - hyungi_document_server - Hyun-git

hyungi/hyungi_document_server

Files

Hyungi Ahn 299fac3904 feat: implement Phase 1 data pipeline and migration

- Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib,
  text files direct read, images flagged for OCR)
- Add queue consumer with APScheduler (1min interval, stage chaining
  extract→classify→embed, stale item recovery, retry logic)
- Add extract worker (kordoc HTTP call + direct text read)
- Add classify worker (Qwen3.5 AI classification with think-tag
  stripping and robust JSON extraction from AI responses)
- Add embed worker (GPU server nomic-embed-text, graceful failure)
- Add DEVONthink migration script with folder mapping for 16 DBs,
  dry-run mode, batch commits, and idempotent file_path UNIQUE
- Enhance ai/client.py with strip_thinking() and parse_json_response()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-02 14:35:36 +09:00

15 lines

297 B

JSON

Raw Blame History

 {
   "name": "kordoc-service",
   "version": "1.0.0",
   "description": "HWP/HWPX/PDF 문서 파싱 마이크로서비스",
   "main": "server.js",
   "scripts": {
     "start": "node server.js"
   },
   "dependencies": {
     "express": "^4.18.0",
     "kordoc": "^1.7.0",
     "pdfjs-dist": "^4.0.0"
   }
 }