Commit Graph

5 Commits

Author SHA1 Message Date
Hyungi Ahn
24142ea605 fix: Codex 리뷰 5건 수정 (critical 1 + high 4)
1. [critical] config.yaml → settings 객체에서 taxonomy 로드 (import crash 방지)
2. [high] ODF 변환: file_path 유지, derived_path 별도 필드 (무한 중복 방지)
3. [high] 법령 분할: 첫 장 이전 조문을 "서문"으로 보존
4. [high] Inbox: review_status 필드 분리 (pending/approved/rejected)
5. [high] 삭제: soft-delete (deleted_at) + worker 방어 + active_documents 뷰
   - 모든 조회에 deleted_at IS NULL 일관 적용
   - queue_consumer: row 없으면 gracefully skip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 07:15:13 +09:00
Hyungi Ahn
6893ea132d refactor: preview 병렬 트리거 + 파일 이동 제거 + domain 색상 바
- queue_consumer: extract 완료 시 classify + preview 동시 등록
- classify_worker: _move_to_knowledge() 제거, 파일 원본 위치 유지
- DocumentCard: 좌측 domain별 색상 바 (4px) 추가

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:31:57 +09:00
Hyungi Ahn
4bea408bbd feat: Markdown 편집기 + PDF 변환 파이프라인 + 뷰어 포맷 분기
- Markdown split editor: textarea + marked preview, Ctrl+S 저장
- PUT /api/documents/{id}/content: 원본 파일 저장 + extracted_text 갱신
- GET /api/documents/{id}/preview: PDF 미리보기 캐시 서빙
- preview_worker: LibreOffice headless → PDF 변환 (timeout 60s, retry 1회)
- queue_consumer: preview stage 추가 (embed 후 자동 트리거)
- DocumentViewer: 포맷별 분기 (markdown/pdf/preview-pdf/image/text/cad)
- 오피스/CAD 문서: 새 탭 편집 버튼
- Dockerfile: LibreOffice headless 설치
- migration 005: preview_status, preview_hash, preview_at 컬럼

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:10:03 +09:00
Hyungi Ahn
62f5eccb96 fix: isolate each worker call in independent async session
Shared session between queue consumer and workers caused
MissingGreenlet errors in APScheduler context. Each worker
call now gets its own session with explicit commit/rollback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 08:29:14 +09:00
Hyungi Ahn
299fac3904 feat: implement Phase 1 data pipeline and migration
- Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib,
  text files direct read, images flagged for OCR)
- Add queue consumer with APScheduler (1min interval, stage chaining
  extract→classify→embed, stale item recovery, retry logic)
- Add extract worker (kordoc HTTP call + direct text read)
- Add classify worker (Qwen3.5 AI classification with think-tag
  stripping and robust JSON extraction from AI responses)
- Add embed worker (GPU server nomic-embed-text, graceful failure)
- Add DEVONthink migration script with folder mapping for 16 DBs,
  dry-run mode, batch commits, and idempotent file_path UNIQUE
- Enhance ai/client.py with strip_thinking() and parse_json_response()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:35:36 +09:00