hyungi_document_server

Author	SHA1	Message	Date
hyungi	acd29b963e	ops(triage): event_kind_hint diagnostic logging cleanup (PR-4B Apply 영구 보류) chore-memo-NULL-backfill 6/6 H1 (historical artifact) 확정 후 Apply PR 영구 보류. `406b810` 의 8-line logger.info 블록 제거 (behavior 변경 0, 진단 데이터 더 이상 불필요). backup: app/workers/classify_worker.py.pre-eventkind-cleanup (7일 안전망 ~2026-05-25) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:27:29 +00:00
hyungi	406b810e28	ops(triage): PR-4B-Diagnose-EventKindHint-Layer-A — diagnostic logging (no behavior change) Layer-A Diagnose only. classify_worker.py:691 직전에 event_kind_hint 의 raw/normalized/in_valid/confidence 값 capture (logger.info 5줄 insert, lazy formatting + %r repr). guard 통과 X 의 specific root cause (A1 field 부재 / A2 빈 string / A3 invalid enum) 확정용. specific fix (default note / enum mapping / prompt 강화) 는 별 PR-4B-Fix-EventKindHint-Apply. Apply PR closure gate 에 logging cleanup (info → DEBUG 또는 제거) 흡수. plan: ~/.claude/plans/c-1-pr-infra-drift-1-phase-1b-linear-frost.md backup: app/workers/classify_worker.py.pre-4b-eventkind-logging.20260517	2026-05-17 06:41:32 +00:00
hyungi	8998cbea8c	ops(triage): PR-4B-Diagnose — exception logging 강화 (type/repr/exc_info) Layer 1 root cause 진단을 위해 classify_worker.py:595 의 exception logging 을 lazy formatting + exc_info=True 로 강화. f-string 1줄 → 5줄 block. - type=%s: exception class name (TimeoutError/JSONDecodeError/ValueError/etc.) - repr=%r: full exception state - exc_info=True: traceback 까지 capture (wrapper 정확 지점 추적) 본 PR scope = Diagnose only. Layer 1 specific fix (H1/H2/H3/H4) + Layer 2 escalate path ai_event_kind fallback set 은 별 PR queue. plan: ~/.claude/plans/c-1-pr-infra-drift-1-phase-1b-linear-frost.md backup: app/workers/classify_worker.py.pre-4b-diagnose.20260517	2026-05-17 06:22:27 +00:00
Hyungi Ahn	0cbba0ceeb	feat(ingest): devonagent 트랙 Phase 1 ingest 활성화 DEVONagent/DEVONthink 가 발견한 웹페이지를 NAS Web/ drop → file_watcher ingest → extract 4-tier fallback (trafilatura/sibling-md/readability/bs4) → embed + chunk 까지. classify/preview/markdown SKIP. - source_channel='devonagent' (migration 001 dormant 활성화) - file_watcher: SCAN_TARGETS 통합 + Web/ rglob + canonical_url dedup + sidecar 누락 정책 (skip 안 함, web_meta.sidecar_missing=true flag) - extract_worker: HTML+devonagent 분기 + md_extraction_engine 4-tier 구분 (trafilatura → sibling .md ≥200char → readability+markdownify → bs4_text) - queue_consumer: enqueue_next_stage 의 extract stage 만 source_channel- aware override (devonagent → [embed, chunk]) - classify_worker: devonagent safety skip (law_monitor 패턴 mirror, ai_domain='Web', ai_tags=['Web/{host}']) - requirements: trafilatura/readability-lxml/markdownify 추가 - docs: devonthink-web-bridge.md 설치 가이드 + first-wins 정책 명시 Phase 1 closure 기준 = 재료 품질 (검색 가능 + 노이즈율 + dedup + 엔진 분포). 활용처(ai_tldr/digest/PKM 회고)는 1-2주 OR 30-50건 관찰 후 별 PR 에서 결정. Plan: ~/.claude/plans/db-snuggly-petal.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 21:23:16 +09:00
hyungi	118f32f9b1	refactor(ai): PR #20 reframe cleanup — Ollama LLM 잔재 주석 정정 PR #20 (2026-05-14, GPU LLM 제거 + Mac mini 26B MLX 흡수) 의 swap 이 backends.json + 코드 주석/docstring 까지 따라가지 못한 표현 잔재 정리. - app/ai/client.py: AIClient docstring 및 call_triage / call_fallback docstring 의 "4B Ollama" → "Mac mini 26B MLX" / "현재는 triage 와 동일 엔드포인트" → "Claude Sonnet 4 API (PR #20 swap 완료)" - app/core/config.py: triage/primary/fallback 주석 통합 + Phase 3.5 classifier/verifier 주석에 PR #20 endpoint 명시 (history 보존) - app/services/search/{llm_gate,classifier_service,verifier_service, evidence_service}.py: "fallback(Ollama)" / "Ollama concurrent OK" / "triage(4B Ollama)" 표현을 Mac mini 26B MLX endpoint 기준으로 정정 + concurrent 안전성 별 검토 마커 추가 - app/services/digest/summarizer.py: "MLX hang/Ollama stall 방어" → "MLX hang / fallback Claude API stall 방어" - app/services/prompt_versions.py: SUMMARY_TRIAGE_TASK + ASK_PROMPT_VERSION 주석의 "4B Ollama" / "4B gemma Ollama" → Mac mini 26B MLX - app/workers/classify_worker.py: B-1 tier triage docstring 정정 코드 동작 변경 0 (주석/docstring 만). embed_worker / study_question_embed_worker 의 "Ollama bge-m3" 표현은 사실 정확이라 유지. 검증: - ollama list → bge-m3:latest 잔존 (embedding owner) - /api/embeddings probe → 1024-dim 200 OK - fastapi embed/ollama error 0 (last 10min) - document.hyungi.net 200 plan: ~/.claude/plans/4-stateless-dongarra.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 12:09:15 +00:00
Hyungi Ahn	63990ac632	feat(memos): add AI event-kind triage fields PR-2B (Memo Inbox Triage) backend 1/2. plan: beszel-tingly-sloth.md 라운드 13. 사용자 비전 = 메모는 inbox, AI 는 triage assistant. AI worker 는 events row 직접 생성 X. Migrations 250–253 (실측 N=250): - 250 CREATE TYPE event_kind_hint AS ENUM (note\|task\|calendar_event\|activity_log\|reference) - 251 ALTER TABLE documents ADD ai_event_kind event_kind_hint - 252 ALTER TABLE documents ADD ai_event_confidence NUMERIC(3,2) + CHECK 0–1 - 253 CREATE INDEX idx_documents_ai_event_kind partial WHERE ai_event_kind IS NOT NULL ORM: - Document.ai_event_kind / ai_event_confidence 컬럼 추가 (Enum SQLAlchemy 동기) - source_channel enum 에 'voice' 추가 (PR-2C 와 호환) Worker: - classify_worker Phase 3 (Gemma 4B triage) 확장 · TriageOutput 에 event_kind_hint + event_kind_confidence 필드 추가 · 4B 응답에 hint 가 있을 때만 Document 에 저장 (enum 외 값은 무시) - prompt p3a_short_summary.txt 확장 — note/task/calendar_event/activity_log/reference 분류 기준 + confidence + default='note' 명시 원칙: AI worker 는 hint 만 제공. events 생성은 다음 commit 의 promote endpoint 에서만.	2026-05-11 12:04:21 +09:00
Hyungi Ahn	aca2f0d62c	feat(canonical): restore GPU STT owner and extend KGS watch paths D9 Track B revised (2026-05-08): 1) STT owner GPU 정식 복귀: - docker-compose.yml: stt-service profiles:[legacy] 제거 → 상시 활성 - fastapi STT_ENDPOINT = http://stt-service:3300 (compose 내부 DNS) - 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유. 이전 "Mac mini 이전본" 주석은 trace 오인 기반. 2) KGS Code 등 외부 학습 자료 추가 스캔 경로: - ADDITIONAL_WATCH_TARGETS env (쉼표 구분, PKM 상대경로) - app/core/config.py: additional_watch_targets list 설정 추가 - app/workers/file_watcher.py: 추가 watch path 처리 - app/workers/classify_worker.py: KGS Code 분류 분기 (가스기사 학습 자료) - 모두 expected_category=library 처리 (md/pdf/docx 만) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 05:47:20 +00:00
Hyungi Ahn	7dd77ec926	fix(classify): data_origin enum 검증 — knowledge 등 잘못된 값 cascade fail 방지 AI 응답에서 dataOrigin='knowledge' 같은 doc_purpose enum 값이 data_origin 컬럼에 잘못 매핑되면 asyncpg InvalidTextRepresentationError 발생. 같은 classify_worker session 의 후속 autoflush 호출이 PendingRollbackError 로 cascade 되어 batch 안 다른 문서까지 모두 실패. doc_purpose 처럼 enum 허용값(work/external) 검증 후 박도록 수정. 외 값은 skip (data_origin NULL 유지). 가스기사 토픽 결손 15건의 RAG 결손 root cause.	2026-04-28 10:01:45 +09:00
Hyungi Ahn	c6335c9a1e	fix(classify): law_monitor skip 분기 복원 + tier_backfill law 제외 PR-B refactor 과정에서 `e88640d` 의 process() 진입부 source_channel='law_monitor' skip 분기가 사라져 매일 07:00 신규 법령 분할마다 26B legacy classify(8s) + 26B legacy summarize(10s) + 4B triage(1.5s) 전부 호출되고 있었다. 법령 분리 PR (stateless-churning-raccoon) 의 명제: "법령은 외부 source-of-truth + immutable + 자동 재수집 → 다른 수명주기" 와 일치하도록 process() 진입부에 skip 분기 복원. 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium') 만 세팅 후 return. queue_consumer 의 NEXT_STAGES['classify']=['embed','chunk'] 가 자동 chain 하므로 검색 영향 0. 법령 도메인 AI 산출물 가치 분석: - ai_summary: 법령 해석 환각 위험 (ASME/안전 엔지니어 사고 책임 소지) - ai_tldr/bullets: 이미 title 이 같은 정보 노출 — redundant - ai_inconsistencies: 공식 정합 문서라 100% false positive → 비용 (월 ~14분 26B 점유) 대비 가치 음수, skip 합당. tier_backfill.py 도 함께 수정: - DOMAIN_PRIORITY 에서 ('law', source_channel='law_monitor') 항목 제거 - safety 필터에 source_channel != 'law_monitor' 추가 (기존 ai_domain LIKE 'Industrial_Safety%' 매칭 안에 backfill 기 처리한 법령 doc 들이 잡혀 들어가는 case 차단) - 사유: skip 처리될 doc 을 enqueue 하면 야간마다 enqueue→skip→NULL→ enqueue 무한 루프 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:35:27 +09:00
Hyungi Ahn	e2b32fe9b7	fix(ai): B-1 R2 risk_flag_requires_26b 를 hard escalate 로 승격 실측 발견 (safety 8건 재분류): - 10574 KRAS (safety_operational) → escalate=true (guard 전 pass) - 10568 JSA (safety_operational) → escalate=false suppressed=True - 10570 PPE (safety_operational) → escalate=false suppressed=True - 동일 도메인인데 4건 중 1건만 26B 처리. 같은 질의 종류 문서가 누구는 깊이 있고 누구는 짧음 → 사용자 관점 일관성 붕괴. 원인: risk_flag_requires_26b 가 soft escalate 분류 → R2 backlog guard 의 ratio 임계치(0.3) 에 걸림. 방금 classify 8건 enqueue 중 앞선 건들이 deep_summary 큐 채우자 뒤 건들이 전부 suppress. 수정: HARD_ESCALATE_REASONS 에 risk_flag_requires_26b 추가. safety/ health/chemical 등 도메인 정책 기반 escalate 는 절대 억제하지 않음. soft 영역은 여전히 남아있음: self_declare (4B 자가선언), deep_requested (recommend_deep_summary). 이 둘만 backlog guard 가 억제 대상. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:33:12 +09:00
Hyungi Ahn	165b00f917	fix(ai): B-1 subject_domain 매칭 + RoutingDecision.escalate_to_26b 존중 실측 발견 (safety md 8건 tier triage 결과): 1. 분류 오분류: 본문에 "MSDS" 한 번 스쳐도 msds 도메인 매칭됨. 개인보호구/중대재해/밀폐공간/산업안전보건법 전부 msds 로 잘못 판정. 2. RoutingDecision 무시: PR-A domain_policy 의 high_impact=true 와 risk_flag_requires_26b 때문에 RoutingDecision.escalate_to_26b=True 이지만 내 _classify_escalation_reason 이 이걸 안 봐서 escalate=False 로 마감. safety/msds/hazard_specific 전부 4B 만 돌고 26B 정책 우회. 수정: - _match_subject_domain: (a) title 기반 매칭 우선 추가 — 파일명이 의도의 1차 시그널. (b) 본문 키워드는 2회 이상 등장해야 match (single-mention 오분류 방지). 우선순위도 재배열 (msds 맨 앞 → hazard/safety 뒤로). - _classify_escalation_reason: routing_decision 파라미터 추가. 4B 자체 판정 (long_context / low_confidence / self_declare / deep_requested) 이후 PR-A routing_decision.escalate_to_26b 가 True 이면 그 escalation_reasons 중 "high_impact" 외의 구체 사유(risk_flag_requires_26b 등) 를 채택. - _run_tier_triage: routing_decision 을 먼저 계산하여 _classify_escalation_reason 에 전달. _apply_triage_result 는 routing_decision 을 param 으로 받음 (중복 계산 제거). 이 변경 후 safety/msds/hazard_specific/incident_report 도메인 문서는 항상 26B escalate → deep_summary 큐. MLX 부하 증가하지만 plan 의도대로 정책 준수. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:18:59 +09:00
Hyungi Ahn	f872e4666f	fix(ai): B-1 envelope.from_stage PR-A enum 값으로 정정 doc 5260 (confidence 0.3 low_confidence 에스컬레이션) 실측에서 발견: EscalationEnvelope(from_stage='summary_triage') 가 PR-A ValidFromStage ({triage, summarize_short, advice_trigger, classify, night_sweep, ask_pre, unknown}) 에 없어 ValueError 발생 → 모든 deep_summary enqueue 가 envelope 생성 단계에서 터짐. tldr/bullets 기록은 envelope 실패 전에 완료되어 영향 없음 (try/except 가 classify 전체는 보호). P3a short summary 에서의 에스컬레이션 의미에 맞춰 'summarize_short' 로 변경. 내부 task 이름 (SUMMARY_TRIAGE_TASK = 'p3a_short_summary') 는 analyze_events. prompt_version 기록 전용이라 그대로 유지. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:04:47 +09:00
Hyungi Ahn	6fdc48e5b6	feat(ai): B-1 summary tier 분할 — triage(4B) + deep_summary(26B) PR-A policy 레이어를 재사용하여 classify_worker 에 tier triage 경로를 추가. Legacy ai_summary / ai_domain / ai_suggestion 은 유지 (회귀 0), tldr/bullets/ detail/inconsistencies 는 별도 필드로 분리. Migrations (156~160): - 156 documents: ai_tldr, ai_bullets, ai_detail_summary, ai_inconsistencies, ai_analysis_tier 5컬럼 - 157 process_stage 에 'deep_summary' ADD VALUE 단독 (Postgres 동일 트랜잭션 제약 회피) - 158 processing_queue.payload JSONB (envelope 전달) - 159 analyze_events 에 tier + suppressed_reason - 160 suppressed_reason partial index Models/ORM: - Document: 5컬럼 Mapped 추가 - ProcessingQueue: deep_summary enum 확장 + payload 필드, enqueue_stage 에 payload 옵션 - AnalyzeEvent: PR-A shadow 6컬럼 + PR-B tier/suppressed_reason Workers: - classify_worker: 기존 legacy 경로 뒤에 _run_tier_triage 추가. - _match_subject_domain(doc, text): source_channel + 본문 keywords + ai_domain prefix 로 PR-A policy 의 subject_domain 이름 결정 (category 매칭 금지). - R1 TriageOutput pydantic + JSON 깨짐 fallback (triage_json_invalid). - R2 _check_backlog_guard(): 30분 window ratio > threshold OR pending 초과면 soft escalate suppress. hard escalate 는 통과. - R3 _slice_text_ranges(): 260k 초과 시 head 120k + mid 20k + tail 120k 3조각. - escalate 시 EscalationEnvelope 구성 + {envelope, subject_domain} payload 로 deep_summary enqueue. - deep_summary_worker (신규): queue payload 에서 envelope + subject_domain 읽기 → render_26b("p3c_deep_summary", subject_domain) + MLX 호출 (llm_gate Semaphore(1) 경유) → ai_detail_summary + ai_inconsistencies 저장 + ai_analysis_tier='deep'. _filter_inconsistencies 로 허용 kind (version_drift / procedure_conflict / source_conflict / missing_basis) 만 통과 — 구매/계약 kind drop. - queue_consumer: workers dict 에 deep_summary 추가 + BATCH_SIZE=1. next_stages 는 건드리지 않음 — classify → embed/chunk 는 그대로, deep_summary 는 독립 체인. Telemetry: - record_analyze_event: subject_domain / risk_flags / escalation_reasons / confidence / policy_version / shadow_would_route_to / tier / escalated_to_26b / suppressed_reason 파라미터 확장. classify/deep worker 가 mode="summary_triage" 또는 "summary_deep" 로 기록. API: - DocumentResponse 에 ai_tldr / ai_bullets / ai_detail_summary / ai_inconsistencies / ai_analysis_tier 5필드 노출. Prompts: - classify.txt 에 DEPRECATED 주석만 추가 (파일 유지 — rollback 경로 보존). - PR-A 의 app/prompts/policy/p3a_short_summary.txt (4B) 와 p3c_deep_summary.txt (26B) 를 그대로 사용. 내 소유의 summary_triage.txt / summary_deep.txt 는 중복 이라 별도 커밋에서 제거하지 않고 바로 생성 전 삭제. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:22:40 +09:00
Hyungi Ahn	e88640d3d8	feat(category): law 카테고리 분리 — enum + backfill + classify skip - migrations/152: ALTER TYPE doc_category ADD VALUE 'law' (DDL only; PG16 단일-트랜잭션 제약상 backfill 은 별도) - models/document.py: Enum 에 'law' 추가 (7 활성 + 3 유보) - workers/law_monitor.py: Document(..., category='law') — 신규 유입부터 세팅 - workers/classify_worker.py: source_channel='law_monitor' early-return + 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium'). AI classify skip — 법령 구조 고정/외부 source of truth/자동 재수집 - scripts/backfill_category.py: law 분기 + WHERE re-target ((source_channel='law_monitor' AND category='document')) + VERIFY cat_law/law_source_count + fail 조건 - api/documents.py: default 목록 제외에 law_monitor 추가 (news 와 동일 패턴) - api/dashboard.py: documents count FILTER 에 law_monitor 제외 (category_counts.law 는 기존 GROUP BY category 로 자동 노출) - frontend/Sidebar.svelte: '법령 알림' 버튼 ?source=law_monitor → ?category=law (explicit category 경로가 default exclusion 을 skip) plan: ~/.claude/plans/stateless-churning-raccoon.md axis 원칙: category=UI 축, policy/telemetry=source_channel+ai_domain 축 (feedback_category_vs_ai_domain_axis.md) 배포 순서: push → GPU pull → compose up --build fastapi frontend → backfill --dry-run → --apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:14:56 +09:00
Hyungi Ahn	8fdea88676	feat(documents): §1 category enum + ai_suggestion 승인 파이프 plan: ~/.claude/plans/luminous-sprouting-hamster.md §1 - migrations/143_category.sql: doc_category enum (6 활성 + 3 유보) + documents.category + documents.ai_suggestion JSONB + 2 idx. - app/models/document.py: category (Enum, create_type=False), ai_suggestion (JSONB). - app/prompts/classify.txt: document_type enum 에 7 실무 doctype 추가 (발주서/세금계산서/명세표/도면/증명서/계획서/시방서) + facet_doctype 필드 directive. - config.yaml: document_types 에 7 항목 추가 (worker 검증 통과). - app/workers/classify_worker.py: FACET_DOCTYPES / LIBRARY_SUGGESTION_DOCTYPES 상수, facet_doctype 파싱(기존값 미덮어씀), 발주서/세금계산서/명세표 감지 시 ai_suggestion={proposed_category=library, proposed_path=@library/ 거래/{YYYY}/{doctype}, source_updated_at=doc.updated_at.isoformat(), ...}. category / user_tags 자동 전이 금지 (suggestion-only). - app/api/documents.py: · DocumentResponse 에 category / ai_suggestion 노출 · GET /documents ?category=<cat> / ?has_suggestion / ?proposed_category (category 지정 시 기본 news/memo 제외 해제 — §2 승인 UI 계약) · GET /documents/library 를 Document.category=='library' 기반으로 재구현 (path subquery 는 user_tags 유지 — 분류 내부 서가 경로) · POST /documents/{id}/accept-suggestion — FOR UPDATE + idempotent no-op + dual 409 stale (payload source_updated_at / documents.updated_at) + user_tags idempotent append · DELETE /documents/{id}/suggestion — idempotent, stale 검사 없음 - scripts/backfill_category.py: dry-run / apply. 매핑(news/memo/@library/else) + 3-way 상대 검증 (all_rows==categorized, uncategorized==0, cat_library==has_library_tag — 자동 전이 금지 정책 검증). 남은 DoD (원격 배포 후): docker compose up → migration 143 적용 → backfill apply → smoke (drive_sync 발주서 업로드 suggestion 생성 / category 유지, accept-suggestion idempotency + 409 stale 두 벡터, /documents?category=library == /documents/library 건수 일치). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:32:01 +09:00
Hyungi Ahn	5070ac45ff	fix(extract): LibreOffice 추출 절단 제거 및 요약 입력 확대 - extract_worker: LibreOffice 15000자 절단 제거 (full text 저장 원칙) - classify_worker/summarize_worker: 요약 입력 15000→50000자 확대 - client.py: 길이 기반 Claude 자동전환 제거 (require_explicit_trigger 정책 준수) _call_chat의 primary→fallback(exaone3.5) 체인은 유지 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:00:23 +09:00
Hyungi Ahn	5c58778a41	feat(library): doc_purpose 필드 + 자료실 업로드 기능 지식/업무 문서 1차 구분을 위한 doc_purpose(business\|knowledge) 추가. - 마이그레이션: document_purpose enum + 컬럼 - AI 분류: docPurpose 자동 추론 (빈 값만 채움) - 업로드 API: doc_purpose + library_path Form 파라미터 - 자료실 업로드: business 기본값 + 선택 경로 자동 태깅 - FileInfoView: 용도 select (수동 변경, 실패 롤백) - DocumentCard: 업무/참조 배지 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:26:59 +09:00
Hyungi Ahn	24142ea605	fix: Codex 리뷰 5건 수정 (critical 1 + high 4) 1. [critical] config.yaml → settings 객체에서 taxonomy 로드 (import crash 방지) 2. [high] ODF 변환: file_path 유지, derived_path 별도 필드 (무한 중복 방지) 3. [high] 법령 분할: 첫 장 이전 조문을 "서문"으로 보존 4. [high] Inbox: review_status 필드 분리 (pending/approved/rejected) 5. [high] 삭제: soft-delete (deleted_at) + worker 방어 + active_documents 뷰 - 모든 조회에 deleted_at IS NULL 일관 적용 - queue_consumer: row 없으면 gracefully skip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 07:15:13 +09:00
Hyungi Ahn	6d73e7ee12	feat: 분류 체계 전면 개편 — taxonomy + document_type + confidence - config.yaml: 6개 domain × 3단계 taxonomy + 13개 document_types 정의 - classify.txt: 영문 프롬프트, taxonomy 경로 기반 분류 + 분류 규칙 주입 - classify_worker: taxonomy 검증, confidence 기반 분류, document_type 저장 - migration 008: document_type, importance, ai_confidence 컬럼 - API: DocumentResponse에 document_type, importance, ai_confidence 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:32:20 +09:00
Hyungi Ahn	733f730e16	fix: preview enum 누락 + AI summary thinking 제거 + CLAUDE.md 전면 갱신 - queue.py: process_stage enum에 'preview' 추가 - classify_worker: ai_summary에 strip_thinking() 적용 - CLAUDE.md: 현재 아키텍처 전면 반영 (파이프라인, UI, 인프라, 코딩규칙) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:38:59 +09:00
Hyungi Ahn	6893ea132d	refactor: preview 병렬 트리거 + 파일 이동 제거 + domain 색상 바 - queue_consumer: extract 완료 시 classify + preview 동시 등록 - classify_worker: _move_to_knowledge() 제거, 파일 원본 위치 유지 - DocumentCard: 좌측 domain별 색상 바 (4px) 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:31:57 +09:00
Hyungi Ahn	d93e50b55c	security: fix 5 review findings (2 high, 3 medium) HIGH: - Lock setup TOTP/NAS endpoints behind _require_setup() guard (prevented unauthenticated admin 2FA takeover after setup) - Sanitize upload filename with Path().name + resolve() validation (prevented path traversal writing outside Inbox) MEDIUM: - Add score > 0.01 filter to hybrid search via subquery (prevented returning irrelevant documents with zero score) - Implement Inbox → Knowledge file move after classification (classify_worker now moves files based on ai_domain) - Add Anthropic Messages API support in _request() (premium/Claude path now sends correct format and parses content[0].text instead of choices[0].message.content) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 15:33:31 +09:00
Hyungi Ahn	299fac3904	feat: implement Phase 1 data pipeline and migration - Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib, text files direct read, images flagged for OCR) - Add queue consumer with APScheduler (1min interval, stage chaining extract→classify→embed, stale item recovery, retry logic) - Add extract worker (kordoc HTTP call + direct text read) - Add classify worker (Qwen3.5 AI classification with think-tag stripping and robust JSON extraction from AI responses) - Add embed worker (GPU server nomic-embed-text, graceful failure) - Add DEVONthink migration script with folder mapping for 16 DBs, dry-run mode, batch commits, and idempotent file_path UNIQUE - Enhance ai/client.py with strip_thinking() and parse_json_response() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:35:36 +09:00

23 Commits