hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	95bcdb851b	fix(ops): backfill 쿼리에 빈 extracted_text 제외 — 무한 retry 방지 3일 운영 결과 doc 4811, 5181 가 extracted_text='' (빈 문자열) 인데 IS NOT NULL 만 걸려 enqueue → classify_worker 의 not doc.extracted_text truthy 체크에서 ValueError → max_attempts(3) 도달 → status=failed. 다음 backfill 사이클에서 다시 enqueue 되어 12회 반복, failed 24건 누적. 수정: tier_backfill.py + backfill_tier.py 양쪽 SQL 에 LENGTH(extracted_text) > 0 추가. 빈 문자열 문서는 enqueue 자체에서 제외. 기존 failed 24건 정리 SQL (사용자가 수동 실행): DELETE FROM processing_queue WHERE stage='classify' AND status='failed' AND error_message LIKE '%extracted_text%'; Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 08:25:12 +09:00
Hyungi Ahn	c6335c9a1e	fix(classify): law_monitor skip 분기 복원 + tier_backfill law 제외 PR-B refactor 과정에서 `e88640d` 의 process() 진입부 source_channel='law_monitor' skip 분기가 사라져 매일 07:00 신규 법령 분할마다 26B legacy classify(8s) + 26B legacy summarize(10s) + 4B triage(1.5s) 전부 호출되고 있었다. 법령 분리 PR (stateless-churning-raccoon) 의 명제: "법령은 외부 source-of-truth + immutable + 자동 재수집 → 다른 수명주기" 와 일치하도록 process() 진입부에 skip 분기 복원. 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium') 만 세팅 후 return. queue_consumer 의 NEXT_STAGES['classify']=['embed','chunk'] 가 자동 chain 하므로 검색 영향 0. 법령 도메인 AI 산출물 가치 분석: - ai_summary: 법령 해석 환각 위험 (ASME/안전 엔지니어 사고 책임 소지) - ai_tldr/bullets: 이미 title 이 같은 정보 노출 — redundant - ai_inconsistencies: 공식 정합 문서라 100% false positive → 비용 (월 ~14분 26B 점유) 대비 가치 음수, skip 합당. tier_backfill.py 도 함께 수정: - DOMAIN_PRIORITY 에서 ('law', source_channel='law_monitor') 항목 제거 - safety 필터에 source_channel != 'law_monitor' 추가 (기존 ai_domain LIKE 'Industrial_Safety%' 매칭 안에 backfill 기 처리한 법령 doc 들이 잡혀 들어가는 case 차단) - 사유: skip 처리될 doc 을 enqueue 하면 야간마다 enqueue→skip→NULL→ enqueue 무한 루프 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:35:27 +09:00
Hyungi Ahn	a95294ff42	feat(ops): 야간 auto tier 백필 스케줄러 (PR-B 레거시 해소) 6720건 레거시 문서를 야간에 자동으로 tier triage + deep_summary 처리. app/workers/tier_backfill.py (신규): - APScheduler 30분 주기 트리거. KST 00:00~06:00 시간대만 실제 enqueue. - safety > law > manual 우선순위 25건씩 classify 큐 재투입. - classify 큐 40건 이상 쌓여있으면 MLX 부하 보호로 skip. - drive_sync / memo / news 는 제외 (plan 스코프 밖 또는 가치 낮음). - off-switch: settings.ai.tier_backfill.enabled = false 로 전면 중단 가능. app/main.py lifespan: - scheduler.add_job(tier_backfill_run, interval=30min, id='tier_backfill'). - AsyncIOScheduler 이미 timezone='Asia/Seoul' 로 설정돼 tier_backfill 내부의 zoneinfo('Asia/Seoul') 와 일치. 수치 예상: 야간 6시간 × 2회/시간 × 25건 = 150건/야간. 6720 / 150 = 약 45일이면 전체 레거시 소화. MLX 부하 제어가 가장 강한 관심 — R2 backlog guard 와 중복 안전장치. 운영 중 과부하 감지 시: config.yaml 에 `ai.tier_backfill.enabled: false` 만 넣으면 즉시 정지 (재시작 없이 스케줄러가 매번 체크). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:28:28 +09:00
Hyungi Ahn	e2b32fe9b7	fix(ai): B-1 R2 risk_flag_requires_26b 를 hard escalate 로 승격 실측 발견 (safety 8건 재분류): - 10574 KRAS (safety_operational) → escalate=true (guard 전 pass) - 10568 JSA (safety_operational) → escalate=false suppressed=True - 10570 PPE (safety_operational) → escalate=false suppressed=True - 동일 도메인인데 4건 중 1건만 26B 처리. 같은 질의 종류 문서가 누구는 깊이 있고 누구는 짧음 → 사용자 관점 일관성 붕괴. 원인: risk_flag_requires_26b 가 soft escalate 분류 → R2 backlog guard 의 ratio 임계치(0.3) 에 걸림. 방금 classify 8건 enqueue 중 앞선 건들이 deep_summary 큐 채우자 뒤 건들이 전부 suppress. 수정: HARD_ESCALATE_REASONS 에 risk_flag_requires_26b 추가. safety/ health/chemical 등 도메인 정책 기반 escalate 는 절대 억제하지 않음. soft 영역은 여전히 남아있음: self_declare (4B 자가선언), deep_requested (recommend_deep_summary). 이 둘만 backlog guard 가 억제 대상. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:33:12 +09:00
Hyungi Ahn	93a687a51d	fix(ai): B-1 deep_summary 잘린 응답 field-level regex fallback _parse_outermost_json 도 열린 문자열 중간에 응답 끊기면 실패. 실전 MLX 응답이 entities_confirmed 내부 문자열에서 종료되는 패턴이라 detail/tldr/bullets/inconsistencies 전부 손실되던 이슈. _regex_extract_fields helper 추가: "key":"value" 쌍 개별 매칭으로 앞쪽 완결된 필드만이라도 건진다. detail 이 응답 앞부분에 있어 잘림 지점보다 앞이면 성공. 순서: 1. _parse_outermost_json (brace balance) 2. parse_json_response (기존 regex) 3. _regex_extract_fields (field-level fallback) entities_confirmed 제거 같은 프롬프트 수정은 PR-A 영역이라 건드리지 않고, PR-B 워커에서 방어. 근본 해결은 p3c_deep_summary 에서 불필요 필드 제거 또는 max_tokens 튜닝을 policy 소유자가 결정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:27:04 +09:00
Hyungi Ahn	154cb1c8bd	fix(ai): B-1 deep_summary JSON parser 강건화 (최외곽 JSON 추출) 실측 버그 (doc 10573 산업안전보건법 deep 처리): - 26B MLX 응답 길이 1131자 (8192 token 한도 미도달) 에서 응답이 \`entities_confirmed\` 섹션 중간에 잘림. - parse_json_response 의 regex \`{[^{}](?:{[^{}]}[^{}])}\` 가 1단계 중첩까지만 매칭 + reversed 순회로 "가장 마지막 valid JSON" 우선 반환. - 결과적으로 entities_confirmed 내부 객체 (\`{"people":[],"orgs":[],...}\`) 가 파싱돼 detail/tldr/bullets 전부 손실 → ai_detail_summary 빈값. 수정: deep_summary_worker 에 \`_parse_outermost_json\` helper 추가. brace balance + 문자열 리터럴 인식으로 첫 '{' 부터 최외곽 '}' 까지 추출. 응답이 잘려 closure 없으면 남은 depth 만큼 '}' 보강 후 재시도 (partial 응답도 최대한 복구). parse_json_response 는 fallback. 이 수정 후 doc 10573 재처리 smoke 필요. entities_confirmed 필드는 정보창 UI 에 안 쓰므로 응답에서 제거하는 프롬프트 조정은 다음 라운드. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:25:01 +09:00
Hyungi Ahn	165b00f917	fix(ai): B-1 subject_domain 매칭 + RoutingDecision.escalate_to_26b 존중 실측 발견 (safety md 8건 tier triage 결과): 1. 분류 오분류: 본문에 "MSDS" 한 번 스쳐도 msds 도메인 매칭됨. 개인보호구/중대재해/밀폐공간/산업안전보건법 전부 msds 로 잘못 판정. 2. RoutingDecision 무시: PR-A domain_policy 의 high_impact=true 와 risk_flag_requires_26b 때문에 RoutingDecision.escalate_to_26b=True 이지만 내 _classify_escalation_reason 이 이걸 안 봐서 escalate=False 로 마감. safety/msds/hazard_specific 전부 4B 만 돌고 26B 정책 우회. 수정: - _match_subject_domain: (a) title 기반 매칭 우선 추가 — 파일명이 의도의 1차 시그널. (b) 본문 키워드는 2회 이상 등장해야 match (single-mention 오분류 방지). 우선순위도 재배열 (msds 맨 앞 → hazard/safety 뒤로). - _classify_escalation_reason: routing_decision 파라미터 추가. 4B 자체 판정 (long_context / low_confidence / self_declare / deep_requested) 이후 PR-A routing_decision.escalate_to_26b 가 True 이면 그 escalation_reasons 중 "high_impact" 외의 구체 사유(risk_flag_requires_26b 등) 를 채택. - _run_tier_triage: routing_decision 을 먼저 계산하여 _classify_escalation_reason 에 전달. _apply_triage_result 는 routing_decision 을 param 으로 받음 (중복 계산 제거). 이 변경 후 safety/msds/hazard_specific/incident_report 도메인 문서는 항상 26B escalate → deep_summary 큐. MLX 부하 증가하지만 plan 의도대로 정책 준수. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:18:59 +09:00
Hyungi Ahn	f872e4666f	fix(ai): B-1 envelope.from_stage PR-A enum 값으로 정정 doc 5260 (confidence 0.3 low_confidence 에스컬레이션) 실측에서 발견: EscalationEnvelope(from_stage='summary_triage') 가 PR-A ValidFromStage ({triage, summarize_short, advice_trigger, classify, night_sweep, ask_pre, unknown}) 에 없어 ValueError 발생 → 모든 deep_summary enqueue 가 envelope 생성 단계에서 터짐. tldr/bullets 기록은 envelope 실패 전에 완료되어 영향 없음 (try/except 가 classify 전체는 보호). P3a short summary 에서의 에스컬레이션 의미에 맞춰 'summarize_short' 로 변경. 내부 task 이름 (SUMMARY_TRIAGE_TASK = 'p3a_short_summary') 는 analyze_events. prompt_version 기록 전용이라 그대로 유지. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:04:47 +09:00
Hyungi Ahn	6fdc48e5b6	feat(ai): B-1 summary tier 분할 — triage(4B) + deep_summary(26B) PR-A policy 레이어를 재사용하여 classify_worker 에 tier triage 경로를 추가. Legacy ai_summary / ai_domain / ai_suggestion 은 유지 (회귀 0), tldr/bullets/ detail/inconsistencies 는 별도 필드로 분리. Migrations (156~160): - 156 documents: ai_tldr, ai_bullets, ai_detail_summary, ai_inconsistencies, ai_analysis_tier 5컬럼 - 157 process_stage 에 'deep_summary' ADD VALUE 단독 (Postgres 동일 트랜잭션 제약 회피) - 158 processing_queue.payload JSONB (envelope 전달) - 159 analyze_events 에 tier + suppressed_reason - 160 suppressed_reason partial index Models/ORM: - Document: 5컬럼 Mapped 추가 - ProcessingQueue: deep_summary enum 확장 + payload 필드, enqueue_stage 에 payload 옵션 - AnalyzeEvent: PR-A shadow 6컬럼 + PR-B tier/suppressed_reason Workers: - classify_worker: 기존 legacy 경로 뒤에 _run_tier_triage 추가. - _match_subject_domain(doc, text): source_channel + 본문 keywords + ai_domain prefix 로 PR-A policy 의 subject_domain 이름 결정 (category 매칭 금지). - R1 TriageOutput pydantic + JSON 깨짐 fallback (triage_json_invalid). - R2 _check_backlog_guard(): 30분 window ratio > threshold OR pending 초과면 soft escalate suppress. hard escalate 는 통과. - R3 _slice_text_ranges(): 260k 초과 시 head 120k + mid 20k + tail 120k 3조각. - escalate 시 EscalationEnvelope 구성 + {envelope, subject_domain} payload 로 deep_summary enqueue. - deep_summary_worker (신규): queue payload 에서 envelope + subject_domain 읽기 → render_26b("p3c_deep_summary", subject_domain) + MLX 호출 (llm_gate Semaphore(1) 경유) → ai_detail_summary + ai_inconsistencies 저장 + ai_analysis_tier='deep'. _filter_inconsistencies 로 허용 kind (version_drift / procedure_conflict / source_conflict / missing_basis) 만 통과 — 구매/계약 kind drop. - queue_consumer: workers dict 에 deep_summary 추가 + BATCH_SIZE=1. next_stages 는 건드리지 않음 — classify → embed/chunk 는 그대로, deep_summary 는 독립 체인. Telemetry: - record_analyze_event: subject_domain / risk_flags / escalation_reasons / confidence / policy_version / shadow_would_route_to / tier / escalated_to_26b / suppressed_reason 파라미터 확장. classify/deep worker 가 mode="summary_triage" 또는 "summary_deep" 로 기록. API: - DocumentResponse 에 ai_tldr / ai_bullets / ai_detail_summary / ai_inconsistencies / ai_analysis_tier 5필드 노출. Prompts: - classify.txt 에 DEPRECATED 주석만 추가 (파일 유지 — rollback 경로 보존). - PR-A 의 app/prompts/policy/p3a_short_summary.txt (4B) 와 p3c_deep_summary.txt (26B) 를 그대로 사용. 내 소유의 summary_triage.txt / summary_deep.txt 는 중복 이라 별도 커밋에서 제거하지 않고 바로 생성 전 삭제. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:22:40 +09:00
Hyungi Ahn	e88640d3d8	feat(category): law 카테고리 분리 — enum + backfill + classify skip - migrations/152: ALTER TYPE doc_category ADD VALUE 'law' (DDL only; PG16 단일-트랜잭션 제약상 backfill 은 별도) - models/document.py: Enum 에 'law' 추가 (7 활성 + 3 유보) - workers/law_monitor.py: Document(..., category='law') — 신규 유입부터 세팅 - workers/classify_worker.py: source_channel='law_monitor' early-return + 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium'). AI classify skip — 법령 구조 고정/외부 source of truth/자동 재수집 - scripts/backfill_category.py: law 분기 + WHERE re-target ((source_channel='law_monitor' AND category='document')) + VERIFY cat_law/law_source_count + fail 조건 - api/documents.py: default 목록 제외에 law_monitor 추가 (news 와 동일 패턴) - api/dashboard.py: documents count FILTER 에 law_monitor 제외 (category_counts.law 는 기존 GROUP BY category 로 자동 노출) - frontend/Sidebar.svelte: '법령 알림' 버튼 ?source=law_monitor → ?category=law (explicit category 경로가 default exclusion 을 skip) plan: ~/.claude/plans/stateless-churning-raccoon.md axis 원칙: category=UI 축, policy/telemetry=source_channel+ai_domain 축 (feedback_category_vs_ai_domain_axis.md) 배포 순서: push → GPU pull → compose up --build fastapi frontend → backfill --dry-run → --apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:14:56 +09:00
Hyungi Ahn	8f25d396df	feat(upload): §4-독립 — error_code 체계 + .uploading orphan cleanup + 진행률/abort UX plan: ~/.claude/plans/luminous-sprouting-hamster.md §4 (1GB/stt/dashboard 외 독립 항목) backend: - _upload_error(status, code, msg) 헬퍼 정의 (§3 가 호출만 추가했던 누락 수정). detail = {error_code, message} — 프론트가 error_code 로 분기. - upload_document 의 모든 HTTPException 을 _upload_error 로 전환: body_too_large / invalid_input / empty_file / unsupported_codec / internal - ClientDisconnect → 499 network_abort + 임시파일 정리. asyncio.TimeoutError → 408 upload_timeout. - 쓰기 중 .uploading 임시명 → 완료 후 staging.replace(target) atomic rename. → 프로세스 크래시 잔존물은 cleanup_orphan_uploads 가 수거. - file_watcher SKIP_EXTENSIONS 에 .uploading 추가 (오해 픽업 방지). cleanup scheduler: - workers/upload_cleanup.py 신규. 10분 주기로 Inbox 하위 *.uploading 중 mtime > orphan_max_age_sec(3600) 인 파일 삭제. - 최근 3회 (≈30분) 누적 삭제 수가 cleanup_warn_threshold(10) 이상이면 WARNING 로그. in-memory deque (재시작 시 리셋) — 집요한 이슈만 잡는 목적. - core/config.py UploadConfig 에 두 임계치 필드 (defaults — config.yaml override 무관). frontend: - api.ts: ApiError 에 optional errorCode/errorMessage 필드 (detail string 유지로 기존 5+ 소비자 호환). parseDetail() 가 {error_code, message} 객체 응답을 풀어 정규화. uploadFile(path, formData, {signal, onProgress}) XHR 헬퍼 신규 (fetch() 가 upload progress 미지원이라 XHR). 401 refresh 1회 정책 동일. - UploadDropzone.svelte 재작성: 진행률 바, 파일별/전체 abort 버튼, 페이지 이탈 beforeunload 경고, errorCode 별 토스트 메시지 분기 (7 코드 — body_too_large / upload_timeout / network_abort / empty_file / invalid_input / unsupported_codec / internal). 컴포넌트 unmount 시 진행 중 업로드 abort. 보류: - max_bytes 1GB 상향 + Caddyfile 1100MB (별도 결정으로 100MB 유지) - /dashboard 카테고리 카드 (별도 plan) - docs/categories.md (§1-3 정의 안착 후) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:57:02 +09:00
Hyungi Ahn	1e2c004dd4	feat(media): §3 audio STT + video 재생 인프라 plan: ~/.claude/plans/luminous-sprouting-hamster.md §3 스키마: - migrations/147_audio_segments_table.sql: audio_segments (STT 타임스탬프 세그먼트) - migrations/148_audio_segments_idx.sql: (document_id, start_s) idx - migrations/149_document_media_cols.sql: documents.thumbnail_path + needs_conversion - migrations/150_queue_stage_stt.sql: process_stage += 'stt' - migrations/151_queue_stage_thumbnail.sql: process_stage += 'thumbnail' - app/models/audio_segment.py, document.py (thumbnail_path/needs_conversion) 서비스: - services/stt/{Dockerfile, requirements.txt, server.py} — faster-whisper large-v3 GPU 컨테이너. /transcribe (filePath/langs/beamSize) + /health + /ready (cuda device_count + model_loaded). NFC/NFD 경로 resolver (OCR 교훈). - docker-compose.yml: stt-service 추가 (GPU 1 예약, :3300, NAS ro mount, stt_models volume, start_period 300s), fastapi env 에 STT_ENDPOINT. 파이프라인 (의존 §1 category): - app/workers/stt_worker.py 신규: stage='stt' pickup → STT_ENDPOINT 호출 → extracted_text + audio_segments 저장. Timeout 30분. - app/workers/thumbnail_worker.py 신규: ffmpeg 50% 지점 1장 → PKM/Videos/.thumbs/{id}.jpg + thumbnail_path 세팅. needs_conversion=true 는 skip. - app/workers/file_watcher.py 확장: PKM/{Inbox, Recordings, Videos} 스캔. 확장자→category, audio→stage=stt, video .mp4/.webm→ stage=thumbnail, video .mov/.mkv/.avi→needs_conversion=true + stage 없음. settings.roon_library_path prefix skip. - app/workers/queue_consumer.py 확장: stt + thumbnail workers 등록, BATCH_SIZE(stt=1, thumbnail=3), next_stages 에 stt→[classify] 추가 (audio 는 extract 건너뜀). - app/Dockerfile: ffmpeg 추가 (썸네일 subprocess 용). API (의존 §1): - /api/audio/{id}/segments — AudioSegment ORDER BY start_s - /api/video/{id}/thumbnail — thumbnail_path FileResponse (쿼리 토큰) - /api/documents/{id}/file: media_types 에 audio/video mime 포함 (§2 커밋에 이미 포함). Starlette FileResponse 가 Range 자동. - upload_document: .mov/.mkv/.avi 웹 업로드 거부 (error_code unsupported_codec). NAS 드롭은 file_watcher 가 quarantine 수용. 프론트: - AudioPlayer.svelte: HTML5 audio + 전사 세그먼트 sticky 패널 + 줄 클릭 seek. activeIdx 하이라이트. - VideoPlayer.svelte: HTML5 video direct play + needs_conversion 안내 카드. poster 는 thumbnail endpoint. - /audio (목록 grid) + /audio/[id] (플레이어) - /video (썸네일 grid + 변환 필요 배지) + /video/[id] (플레이어) - Sidebar.svelte: Mic/Film 아이콘 + audio/video 네비 활성, count 배지 (§2 /stats/category-counts 재사용). 설정: - app/core/config.py: stt_endpoint + roon_library_path. DoD 배포 후 smoke: /ready cuda:true, 회의 mp3 transcribe, audio extract 없이 classify 진행(queue 회귀), /audio 재생, .mp4 재생, .mov 웹 400, .mov NAS quarantine, Sidebar 네비 + count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:47:36 +09:00
Hyungi Ahn	8fdea88676	feat(documents): §1 category enum + ai_suggestion 승인 파이프 plan: ~/.claude/plans/luminous-sprouting-hamster.md §1 - migrations/143_category.sql: doc_category enum (6 활성 + 3 유보) + documents.category + documents.ai_suggestion JSONB + 2 idx. - app/models/document.py: category (Enum, create_type=False), ai_suggestion (JSONB). - app/prompts/classify.txt: document_type enum 에 7 실무 doctype 추가 (발주서/세금계산서/명세표/도면/증명서/계획서/시방서) + facet_doctype 필드 directive. - config.yaml: document_types 에 7 항목 추가 (worker 검증 통과). - app/workers/classify_worker.py: FACET_DOCTYPES / LIBRARY_SUGGESTION_DOCTYPES 상수, facet_doctype 파싱(기존값 미덮어씀), 발주서/세금계산서/명세표 감지 시 ai_suggestion={proposed_category=library, proposed_path=@library/ 거래/{YYYY}/{doctype}, source_updated_at=doc.updated_at.isoformat(), ...}. category / user_tags 자동 전이 금지 (suggestion-only). - app/api/documents.py: · DocumentResponse 에 category / ai_suggestion 노출 · GET /documents ?category=<cat> / ?has_suggestion / ?proposed_category (category 지정 시 기본 news/memo 제외 해제 — §2 승인 UI 계약) · GET /documents/library 를 Document.category=='library' 기반으로 재구현 (path subquery 는 user_tags 유지 — 분류 내부 서가 경로) · POST /documents/{id}/accept-suggestion — FOR UPDATE + idempotent no-op + dual 409 stale (payload source_updated_at / documents.updated_at) + user_tags idempotent append · DELETE /documents/{id}/suggestion — idempotent, stale 검사 없음 - scripts/backfill_category.py: dry-run / apply. 매핑(news/memo/@library/else) + 3-way 상대 검증 (all_rows==categorized, uncategorized==0, cat_library==has_library_tag — 자동 전이 금지 정책 검증). 남은 DoD (원격 배포 후): docker compose up → migration 143 적용 → backfill apply → smoke (drive_sync 발주서 업로드 suggestion 생성 / category 유지, accept-suggestion idempotency + 409 stale 두 벡터, /documents?category=library == /documents/library 건수 일치). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:32:01 +09:00
Hyungi Ahn	088966bf78	feat(extract): OCR 트리거 규칙 + extract_meta JSONB 스캔 PDF/이미지 자동 OCR 트리거 + 결과 품질 검증 + 1회 제한. - extract_meta JSONB 컬럼 추가 (migration 134) ocr_attempted, ocr_reason, ocr_skip_reason, ocr_terminal, ocr_chars - PDF OCR 트리거: total_chars < 300 또는 avg < 80 && total < 3000 - 이미지 자동 OCR: jpg/png/tiff/webp 등 - 품질 차등: 이미지 50자, PDF 200자 또는 페이지당 30자 - 상한: pages > 200 또는 file_size > 150MB → 스킵 - OCR 1회 제한: extract_meta.ocr_attempted로 재시도 방지 - extractor_version은 도구명만 (surya_ocr/pymupdf/kordoc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:04:13 +09:00
Hyungi Ahn	083aa3126a	feat(search): retrieval+evidence 품질 개선 - embed_worker: ai_summary 누락 시 text[:800] fallback → ToC 감지 + 서술형 문단 우선 선택 (보수적 휴리스틱, 강신호 2개 이상 + 스킵 상한) - retrieval_service: snippet 200자 → 1200자 (리랭커/evidence에 더 넓은 문맥 제공) - evidence_service: CANDIDATE_SNIPPET_CHARS 800 → 1200 (LLM evidence window 확대) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:56:33 +09:00
Hyungi Ahn	4c442ac776	fix(watcher): file_watcher.py에 sqlalchemy select import 누락 수정 file_watcher.watch_inbox()에서 select(Document)를 사용하지만 sqlalchemy import가 빠져있어 NameError 발생. 이로 인해 큐 컨슈머가 max_instances 도달로 실행 스킵되어 embed(45건) + chunk(8건)이 pending 상태로 정체됨. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:49:46 +09:00
Hyungi Ahn	72910db818	fix(extract): kordoc 실패 시 PyMuPDF fallback 추가 kordoc은 PDF 전체를 메모리에 올려 파싱 → 이미지 PDF에서 OOM. PyMuPDF는 페이지 단위 스트리밍으로 40MB+ PDF도 수백 MB 내 처리. - kordoc 시도 → 실패(OOM/timeout/422) → PDF면 PyMuPDF fallback - PyMuPDF도 텍스트 레이어 없으면 로그 경고 (스캔 전용 PDF) - HWP/HWPX는 kordoc 전용 (fallback 없음) - extractor_version으로 어떤 경로로 추출됐는지 추적 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:30:00 +09:00
Hyungi Ahn	32c79740f8	fix(kordoc): 파일 크기 제한 삭제, Docker 메모리 상한 4GiB 적용 25MB 파일 크기 제한은 텍스트 PDF(18MB 성공)까지 차단하는 문제. 실제 원인은 이미지 스캔 PDF의 in-memory 파싱 시 메모리 폭발. - extract_worker: 25MB 파일 크기 제한 삭제 - docker-compose: kordoc-service mem_limit 4g + memswap_limit 4g - 텍스트 PDF → 크기 무관 정상 파싱 - 이미지 PDF → 4GiB 초과 시 Docker OOM-kill → 재시작 → 3회 실패 후 failed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:21:08 +09:00
Hyungi Ahn	21931138e3	fix(extract): 25MB 초과 PDF kordoc 파싱 스킵 (OOM 방지) 38.2MB PDF에서 kordoc이 22.8GiB 메모리 사용 후 OOM 크래시 확인. 컨테이너 재시작으로 다른 문서 처리까지 차단되는 문제 방지. - 25MB 초과 파일: kordoc 호출 없이 스킵 (extractor_version에 크기 기록) - 25MB 이하 파일: 기존 adaptive timeout으로 정상 처리 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:10:18 +09:00
Hyungi Ahn	5070ac45ff	fix(extract): LibreOffice 추출 절단 제거 및 요약 입력 확대 - extract_worker: LibreOffice 15000자 절단 제거 (full text 저장 원칙) - classify_worker/summarize_worker: 요약 입력 15000→50000자 확대 - client.py: 길이 기반 Claude 자동전환 제거 (require_explicit_trigger 정책 준수) _call_chat의 primary→fallback(exaone3.5) 체인은 유지 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:00:23 +09:00
Hyungi Ahn	751cdc5be8	fix(queue): enqueue 경로 중복 방어 — partial unique index + 중앙 enqueue_stage 함수 기존 UNIQUE(document_id, stage, status)는 pending+processing 동시 존재를 허용해서 stale 복구 시 충돌 발생. 2-layer 방어로 근본 차단: 1) DB: partial unique index uq_queue_active — 활성 행(pending/processing)은 (document_id, stage)당 최대 1개만 허용 2) App: enqueue_stage() 중앙 함수 — INSERT ON CONFLICT DO NOTHING으로 모든 9개 경로의 check-then-insert TOCTOU race 제거 migration 117은 guard check 포함 — 활성 중복이 남아있으면 RAISE EXCEPTION 으로 중단, 수동 정리 유도. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:37:32 +09:00
Hyungi Ahn	8ec1e53ca4	fix(queue): reset_stale_items UniqueViolationError로 큐 소비 전체 중단 수정 stale processing 행을 pending으로 bulk UPDATE 시 이미 같은 (document_id, stage, pending) 행이 존재하면 unique constraint 위반으로 APScheduler consume_queue 잡 전체가 크래시. 2-step 접근으로 변경: 1) pending 중복 있는 stale processing 행은 DELETE 2) 나머지만 pending으로 UPDATE + 예외 삼키기로 stale reset 실패가 전체 큐 소비를 죽이지 않게 방어 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 07:41:20 +09:00
Hyungi Ahn	5c58778a41	feat(library): doc_purpose 필드 + 자료실 업로드 기능 지식/업무 문서 1차 구분을 위한 doc_purpose(business\|knowledge) 추가. - 마이그레이션: document_purpose enum + 컬럼 - AI 분류: docPurpose 자동 추론 (빈 값만 채움) - 업로드 API: doc_purpose + library_path Form 파라미터 - 자료실 업로드: business 기본값 + 선택 경로 자동 태깅 - FileInfoView: 용도 select (수동 변경, 실패 롤백) - DocumentCard: 업무/참조 배지 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:26:59 +09:00
Hyungi Ahn	ee74a9ba78	fix(extract): scale kordoc timeout by file size for large PDFs 대형 PDF(14~40MB)에서 kordoc 파싱 timeout(60초) 실패하던 문제. 10MB당 60초 추가, 최소 60초 최대 300초로 조정. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 06:47:22 +09:00
Hyungi Ahn	b46a75758b	feat(memos): 내장 메모 기능 — 파일 없는 문서(file_type='note') Document Server에 Memos 앱 대체 기능 내장. 메모를 documents 테이블의 file_type='note' 레코드로 관리하여 기존 AI 파이프라인(classify/embed/ chunk/search/ask) 재활용. Backend: - migration 105: source_channel 'memo', file_path NULL 허용, user_tags/pinned/ask_includable 컬럼, 메모 인덱스 - api/memos.py: CRUD 7개 엔드포인트 + #태그 파싱 + stale AI 초기화 + 큐 pending 중복 방지 - queue_consumer: note extract/preview skip - documents API: file_path NULL 가드, 목록에서 메모 제외 - search /ask: ask_includable=false 문서 evidence 제외 Frontend: - /memos 타임라인 페이지 (빠른 입력 + 피드 + 인라인 편집 + 태그 필터) - QuickMemoButton FAB (Ctrl+M, 모든 페이지) - Sidebar 메모 링크 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:00:00 +09:00
Hyungi Ahn	141eb77938	fix(news): allow HTTP redirect for HTTP_EXCEPTION_DOMAINS sources SCMP(www.scmp.com)처럼 HTTPS 원본이 HTTP로 301 redirect하는 소스에서 redirect target이 차단되던 문제 수정. allow_http를 원본 스킴이 아닌 소스 도메인의 allowlist 등록 여부로 판단하도록 변경. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 15:05:05 +09:00
Hyungi Ahn	6b189f0d47	fix(digest): multi-word ai_sub_group matching + NYT_API_KEY example - loader.py: first-token + all-but-last-token 이중 키 매칭 (Le Monde, Der Spiegel 대응) - chunk_worker.py: startswith 매칭 보강 - credentials.env.example: NYT_API_KEY 항목 추가 핫픽스 — 단계 3에서 news_source_id FK 정규화로 문자열 매칭 제거 예정 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:33:04 +09:00
Hyungi Ahn	5038007998	fix(news): SSRF validation + admin auth + API key masking + collect lock + XML safety - 신규 url_validator.py: SSRF 차단 (private IP/loopback/link-local/reserved/multicast/CGNAT 블록, HTTPS only) - require_admin dependency 추가 — 소스 CRUD, /collect, /digest/regenerate에 적용 - User.is_admin 컬럼 + migration 104 - NYT API key 로그 마스킹 (쿼리스트링 제거) - RSS fetch: redirect 수동 처리(3회, target 재검증), 5MB 크기 제한, content-type 허용목록, feed.bozo 체크 - /collect 재진입 차단 (asyncio.Lock, 단일 인스턴스 한정) - HTTP feed allowlist (코드 레벨 상수, API 미노출) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:32:55 +09:00
Hyungi Ahn	010e25cb23	fix(queue): doc-level embed metadata 기반 + NUL 바이트 strip + 빈 예외 fallback embed_worker: - extracted_text[:6000] → title + ai_summary + tags(top 5) metadata 입력 - 500k자 문서의 표지+목차가 임베딩되는 구조적 버그 해결 - Ollama 기본 context 안전 (~1500자 이하), num_ctx 조정 불필요 - ai_summary < 50자 시 본문 800자 fallback - ai_domain 은 초기 제외 (taxonomy 노이즈 방지) extract_worker: - kordoc / 직접 읽기 / LibreOffice 3 경로 모두 \x00 strip - asyncpg CharacterNotInRepertoireError 재발 방지 queue_consumer: - str(e) or repr(e) or type(e).__name__ fallback - 빈 메시지 예외(24건 발생) 다음부터 클래스명이라도 기록 plan: ~/.claude/plans/quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 13:45:55 +09:00
Hyungi Ahn	75a1919342	feat(digest): Phase 4 Global News Digest (cluster-level batch summarization) 7일 rolling window 뉴스를 country × topic 2-level로 묶어 매일 04:00 KST 배치 생성. search 파이프라인 미사용. documents → clustering → cluster-level LLM summarization → digest. 핵심 결정: - adaptive threshold (0.75/0.78/0.80) + EMA centroid (α=0.7) + time-decay (λ=ln(2)/3) - min_articles=3, max_topics=10/country, top-5 MMR diversity, ai_summary[:300] truncate - cluster-level LLM only, drop금지 fallback (topic_label="주요 뉴스 묶음" + top member ai_summary[:200]) - importance_score country별 0~1 normalize + raw_weight_sum 별도 보존, max(score, 0.01) floor - per-call timeout 25s + pipeline hard cap 600s - DELETE+INSERT idempotent (UNIQUE digest_date), AIClient._call_chat 직접 호출 (client.py 수정 없음) 신규: - migrations/101_global_digests.sql (2테이블 정규화) - app/models/digest.py (GlobalDigest + DigestTopic ORM) - app/services/digest/{loader,clustering,selection,summarizer,pipeline}.py - app/workers/digest_worker.py (PIPELINE_HARD_CAP + CLI 진입점) - app/api/digest.py (/latest, ?date\|country, /regenerate, inline Pydantic) - app/prompts/digest_topic.txt (JSON-only + 절대 금지 블록) main.py 4줄: import 2 + scheduler add_job 1 + include_router 1. plan: ~/.claude/plans/quiet-herding-tome.md	2026-04-09 07:45:11 +09:00
Hyungi Ahn	25ef3996ec	feat(chunk): Phase 1.2-G embedding 입력 강화 (title + section + text) Phase 1.2-G hybrid retrieval 측정 결과 Recall 0.66 정체 + 진단: 직접 nl 쿼리 시도 결과 일부 정답 doc(3854, 3981, 3982, 3920, 3921)이 top-100에도 못 들어옴. doc은 corpus + chunks + embedding 모두 정상. 진짜 원인: 자연어 query ↔ 법령 조항 의미 거리 + 짧은 본문 embedding signal 약함. - query: '유해화학물질을 다루는 회사가 지켜야 할 안전 의무' - 본문: '화학물질관리법 제4장 유해화학물질 영업자' - bge-m3 입장: chunk text만으로는 같은 의미인지 못 알아봄 해결: chunks embedding 입력에 doc.title + section_title 포함. - before: embed(c['text']) - after: embed('[제목] {title}\n[섹션] {section}\n[본문] {text}') 기대 효과: - 짧은 조항 문서 매칭 회복 (3920/3921 등 300자대) - 자연어 query → 법령 조항 의미 매칭 개선 - Recall 0.66 → 0.72~0.78 영향: chunks embedding 차원/구조 변경 X — 입력 텍스트 prefix만 다름. 재인덱싱 1회로 모든 chunks 재생성 필요.	2026-04-08 13:08:23 +09:00
Hyungi Ahn	731d1396e8	fix(chunk): _chunk_legal 영어 법령 sliding window fallback 영어/외국 법령(ai_domain Foreign_Law 등)은 '제N조' 패턴이 없어 split 결과가 1개 element만 나옴 → 서문 chunk(첫 1500자)만 생성되고 본문 대부분 손실. 발견: doc 3759 (Industrial Safety, 93KB 영어) → 1개 chunk만 생성. 수정: parts split 결과가 1개 이하면 _chunk_sliding fallback 호출. 한국어 법령(제N조 패턴 있음)은 기존 분할 로직 그대로 작동. Phase 1.2-D smoke test에서 발견. 재인덱싱 전 fix 필수.	2026-04-08 12:26:38 +09:00
Hyungi Ahn	378fbc7845	feat(chunk): Phase 0.1 chunk 인덱싱 — ORM/worker/migration 정리 GPU 서버에 untracked로만 존재하던 Phase 0.1 코드를 정식 commit: - app/models/chunk.py — DocumentChunk ORM (country/source/domain 메타 포함) - app/workers/chunk_worker.py — 6가지 chunking 전략 (legal/news/markdown/email/long_pdf/default) - migrations/014_document_chunks.sql — pgvector + FTS + trigram 인덱스 - app/models/queue.py — ProcessingQueue enum에 'chunk' stage 추가 - app/workers/queue_consumer.py — chunk stage 등록, classify→[embed,chunk] 자동 연결 Phase 1 reranker 통합 작업의 전제 조건. document_chunks 테이블 기반 retrieval에 사용.	2026-04-07 13:26:37 +09:00
Hyungi Ahn	49cc86db80	feat: summarize 전용 stage — 뉴스 AI 요약 (classify 없이) - summarize_worker: 요약만 생성 (분류 안 함) - queue_consumer: summarize stage 추가 (batch 3) - news_collector: summarize + embed 큐 등록 - process_stage enum에 'summarize' 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 15:00:14 +09:00
Hyungi Ahn	7ca3abf17c	feat: 뉴스 전용 페이지 + 분류 격리 + 읽음 상태 - /news 전용 페이지: 신문사 필터, 읽지않음 필터, 시간순 리스트, 미리보기 - 뉴스 분류 격리: ai_domain='News', classify 제거, embed만 등록 - is_read: 클릭 시 자동 읽음, 전체 읽음 API - documents 목록에서 뉴스 제외 (source_channel != 'news') - nav에 뉴스 링크 추가 - GET /api/news/articles, POST /api/news/mark-all-read Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 14:16:00 +09:00
Hyungi Ahn	a6c19ef76c	feat: 뉴스 자동 수집 시스템 — 6개국 신문 RSS/API - news_sources 테이블 (소스 관리, UI 동적 제어) - news_collector 워커: RSS(feedparser) + NYT API - 중복 체크: hash(title+date+source) + URL normalize - category 표준화, summary HTML 정제, timezone UTC - 30일 이내만 embed, source별 try/catch - News API: 소스 CRUD + 수동 수집 트리거 - APScheduler: 6시간 간격 자동 수집 - 대상: 경향/아사히/NYT/르몽드/신화/슈피겔 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 13:38:07 +09:00
Hyungi Ahn	bf8efd1cd3	feat: 임베딩 모델 변경 — nomic-embed-text → bge-m3 (1024차원, 다국어) - config.yaml: embedding model → bge-m3 - document.py: Vector(768) → Vector(1024) - embed_worker.py: 모델 버전 업데이트 - migration 011: 벡터 컬럼 재생성 (기존 임베딩 초기화) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 12:49:45 +09:00
Hyungi Ahn	24142ea605	fix: Codex 리뷰 5건 수정 (critical 1 + high 4) 1. [critical] config.yaml → settings 객체에서 taxonomy 로드 (import crash 방지) 2. [high] ODF 변환: file_path 유지, derived_path 별도 필드 (무한 중복 방지) 3. [high] 법령 분할: 첫 장 이전 조문을 "서문"으로 보존 4. [high] Inbox: review_status 필드 분리 (pending/approved/rejected) 5. [high] 삭제: soft-delete (deleted_at) + worker 방어 + active_documents 뷰 - 모든 조회에 deleted_at IS NULL 일관 적용 - queue_consumer: row 없으면 gracefully skip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 07:15:13 +09:00
Hyungi Ahn	06da098eab	fix: 법령 분할 — 조문키 000 기반 장(章) 단위 분할로 변경 국가법령 XML은 <편>/<장> 태그가 아닌 <조문단위 조문키="xxxx000">에 "제X장 ..." 형태로 장 구분자가 포함됨. 이를 파싱하여 분할. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 15:05:48 +09:00
Hyungi Ahn	1668be0a75	fix: 법령 저장 후 즉시 commit — 알림 실패가 DB 롤백하지 않도록 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:50:39 +09:00
Hyungi Ahn	93c5805060	feat: 법령 API 전면 개편 — 26개 법령, 분할 저장, 변경 이력 추적 - 모니터링 법령 12개 → 26개 (산업안전/건설/위험물/소방/전기/가스/근로/환경) - lawSearch.do로 검색, lawService.do로 본문 조회 - 대형 법령 편/장 단위 분할 저장 (fallback: 편→장→전체) - 저장 경로: PKM/Inbox/ (AI 자동 분류 연계) - 변경 감지 시 user_note에 이력 자동 기록 - CalDAV + SMTP 알림 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:47:08 +09:00
Hyungi Ahn	6d73e7ee12	feat: 분류 체계 전면 개편 — taxonomy + document_type + confidence - config.yaml: 6개 domain × 3단계 taxonomy + 13개 document_types 정의 - classify.txt: 영문 프롬프트, taxonomy 경로 기반 분류 + 분류 규칙 주입 - classify_worker: taxonomy 검증, confidence 기반 분류, document_type 저장 - migration 008: document_type, importance, ai_confidence 컬럼 - API: DocumentResponse에 document_type, importance, ai_confidence 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:32:20 +09:00
Hyungi Ahn	1b5fa95a9f	feat: 오피스 → ODF 변환 + 원본/편집본 분리 아키텍처 - original_path/format/hash + conversion_status 필드 추가 (migration 007) - extract_worker: 텍스트 추출 후 xlsx→ods, docx→odt 등 ODF 변환 - 변환본은 .derived/{doc_id}.ods 에 저장 - 원본 메타 보존 (original_path/format/hash) - file_watcher: .derived/ .preview/ 디렉토리 제외 - DocumentViewer: ODF 포맷이면 편집 버튼 자동 표시 - edit_url 있으면 "편집", 없으면 "Synology Drive에서 열기" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:11:43 +09:00
Hyungi Ahn	733f730e16	fix: preview enum 누락 + AI summary thinking 제거 + CLAUDE.md 전면 갱신 - queue.py: process_stage enum에 'preview' 추가 - classify_worker: ai_summary에 strip_thinking() 적용 - CLAUDE.md: 현재 아키텍처 전면 반영 (파이프라인, UI, 인프라, 코딩규칙) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:38:59 +09:00
Hyungi Ahn	6893ea132d	refactor: preview 병렬 트리거 + 파일 이동 제거 + domain 색상 바 - queue_consumer: extract 완료 시 classify + preview 동시 등록 - classify_worker: _move_to_knowledge() 제거, 파일 원본 위치 유지 - DocumentCard: 좌측 domain별 색상 바 (4px) 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:31:57 +09:00
Hyungi Ahn	03b0612aa2	fix: extract_worker OFFICE_FORMATS 블록에 return 누락 수정 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:28:09 +09:00
Hyungi Ahn	a5186bf4aa	fix: 스프레드시트 텍스트 추출 — csv 필터 사용 (txt:Text는 Calc 미지원) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:21:29 +09:00
Hyungi Ahn	b37043d651	fix: LibreOffice 한글 파일명 호환 — 영문 임시파일로 복사 후 변환 extract_worker, preview_worker 모두 적용. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:18:06 +09:00
Hyungi Ahn	45448b4036	feat: extract_worker에 LibreOffice 텍스트 추출 추가 (오피스 포맷) - xlsx, docx, pptx, odt, ods, odp, odoc, osheet 지원 - LibreOffice --convert-to txt로 텍스트 추출 (60s timeout) - 추가 의존성 없음 (Docker에 이미 설치된 LibreOffice 사용) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:12:19 +09:00
Hyungi Ahn	4bea408bbd	feat: Markdown 편집기 + PDF 변환 파이프라인 + 뷰어 포맷 분기 - Markdown split editor: textarea + marked preview, Ctrl+S 저장 - PUT /api/documents/{id}/content: 원본 파일 저장 + extracted_text 갱신 - GET /api/documents/{id}/preview: PDF 미리보기 캐시 서빙 - preview_worker: LibreOffice headless → PDF 변환 (timeout 60s, retry 1회) - queue_consumer: preview stage 추가 (embed 후 자동 트리거) - DocumentViewer: 포맷별 분기 (markdown/pdf/preview-pdf/image/text/cad) - 오피스/CAD 문서: 새 탭 편집 버튼 - Dockerfile: LibreOffice headless 설치 - migration 005: preview_status, preview_hash, preview_at 컬럼 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 10:10:03 +09:00

1 2

57 Commits