- Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib, text files direct read, images flagged for OCR) - Add queue consumer with APScheduler (1min interval, stage chaining extract→classify→embed, stale item recovery, retry logic) - Add extract worker (kordoc HTTP call + direct text read) - Add classify worker (Qwen3.5 AI classification with think-tag stripping and robust JSON extraction from AI responses) - Add embed worker (GPU server nomic-embed-text, graceful failure) - Add DEVONthink migration script with folder mapping for 16 DBs, dry-run mode, batch commits, and idempotent file_path UNIQUE - Enhance ai/client.py with strip_thinking() and parse_json_response() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 lines
297 B
JSON
15 lines
297 B
JSON
{
|
|
"name": "kordoc-service",
|
|
"version": "1.0.0",
|
|
"description": "HWP/HWPX/PDF 문서 파싱 마이크로서비스",
|
|
"main": "server.js",
|
|
"scripts": {
|
|
"start": "node server.js"
|
|
},
|
|
"dependencies": {
|
|
"express": "^4.18.0",
|
|
"kordoc": "^1.7.0",
|
|
"pdfjs-dist": "^4.0.0"
|
|
}
|
|
}
|