Commit Graph

7 Commits

Author SHA1 Message Date
Hyungi Ahn
62f5eccb96 fix: isolate each worker call in independent async session
Shared session between queue consumer and workers caused
MissingGreenlet errors in APScheduler context. Each worker
call now gets its own session with explicit commit/rollback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 08:29:14 +09:00
Hyungi Ahn
46537ee11a fix: Codex 리뷰 P1/P2 버그 4건 수정
- [P1] migration runner 도입: schema_migrations 추적, advisory lock,
  단일 트랜잭션 실행, SQL 검증 (기존 DB 업그레이드 대응)
- [P1] eml extract 큐 조건 분기: extract_worker 미지원 포맷 큐 스킵
- [P2] iCalendar escape_ical_text() 추가: RFC 5545 준수
- [P2] 이메일 charset 감지: get_content_charset() 사용 + payload None 방어

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:55:38 +09:00
Hyungi Ahn
d93e50b55c security: fix 5 review findings (2 high, 3 medium)
HIGH:
- Lock setup TOTP/NAS endpoints behind _require_setup() guard
  (prevented unauthenticated admin 2FA takeover after setup)
- Sanitize upload filename with Path().name + resolve() validation
  (prevented path traversal writing outside Inbox)

MEDIUM:
- Add score > 0.01 filter to hybrid search via subquery
  (prevented returning irrelevant documents with zero score)
- Implement Inbox → Knowledge file move after classification
  (classify_worker now moves files based on ai_domain)
- Add Anthropic Messages API support in _request()
  (premium/Claude path now sends correct format and parses
  content[0].text instead of choices[0].message.content)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:33:31 +09:00
Hyungi Ahn
31d5498f8d feat: implement Phase 3 automation workers
- Add automation_state table for incremental sync (last UID, last check)
- Add law_monitor worker: 국가법령정보센터 API → NAS/DB/CalDAV VTODO
  (LAW_OC 승인 대기 중, 코드 완성)
- Add mailplus_archive worker: IMAP(993) → .eml NAS save + DB + SMTP
  notification (imaplib via asyncio.to_thread, timeout=30)
- Add daily_digest worker: PostgreSQL/pipeline stats → Markdown + SMTP
  (documents, law changes, email, queue errors, inbox backlog)
- Add CalDAV VTODO helper and SMTP email helper to core/utils.py
- Wire 3 cron jobs in APScheduler (law@07:00, mail@07:00+18:00,
  digest@20:00) with timezone=Asia/Seoul

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:24:50 +09:00
Hyungi Ahn
4b695332b9 feat: implement Phase 2 core features
- Add document CRUD API (list/get/upload/update/delete with auth)
  - Upload saves to Inbox + auto-enqueues processing pipeline
  - Delete defaults to DB-only, explicit flag for file deletion
- Add hybrid search API (FTS 0.4 + trigram 0.2 + vector 0.4 weighted)
  - Modes: fts, trgm, vector, hybrid (default)
  - Vector search gracefully degrades if GPU unavailable
- Add Inbox file watcher (5min interval, new file + hash change detection)
- Register documents/search routers and file_watcher scheduler in main.py
- Add IVFFLAT vector index migration (lists=50, with tuning guide)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:49:12 +09:00
Hyungi Ahn
299fac3904 feat: implement Phase 1 data pipeline and migration
- Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib,
  text files direct read, images flagged for OCR)
- Add queue consumer with APScheduler (1min interval, stage chaining
  extract→classify→embed, stale item recovery, retry logic)
- Add extract worker (kordoc HTTP call + direct text read)
- Add classify worker (Qwen3.5 AI classification with think-tag
  stripping and robust JSON extraction from AI responses)
- Add embed worker (GPU server nomic-embed-text, graceful failure)
- Add DEVONthink migration script with folder mapping for 16 DBs,
  dry-run mode, batch commits, and idempotent file_path UNIQUE
- Enhance ai/client.py with strip_thinking() and parse_json_response()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:35:36 +09:00
Hyungi Ahn
131dbd7b7c feat: scaffold v2 project structure with Docker, FastAPI, and config
동작하는 최소 코드 수준의 v2 스캐폴딩:

- docker-compose.yml: postgres, fastapi, kordoc, frontend, caddy
- app/: FastAPI 백엔드 (main, core, models, ai, prompts)
- services/kordoc/: Node.js 문서 파싱 마이크로서비스
- gpu-server/: AI Gateway + GPU docker-compose
- frontend/: SvelteKit 기본 구조
- migrations/: PostgreSQL 초기 스키마 (documents, tasks, processing_queue)
- tests/: pytest conftest 기본 설정
- config.yaml, Caddyfile, credentials.env.example 갱신

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 10:20:15 +09:00