hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	63ed4d81e5	feat(study): study_topics 학습 워크스페이스 컨테이너 도입 필기 세션과 자료(library document)를 한 학습 주제(예: 가스기사) 아래로 묶는 1차 컨테이너. 향후 단어장/오디오/문제세트 등 학습 자산이 같은 묶음으로 들어올 수 있도록 응답 구조(sections + stats)를 dict 기반으로 설계. 데이터 모델 (migrations 179~185): - study_topics: user_id × name partial unique (active 행만), soft delete - study_sessions.study_topic_id: 1:N nullable FK (ON DELETE SET NULL) - study_topic_documents: 자료 N:M 매핑 (user_id 반정규화로 권한 격리) 설계 원칙: - documents.category(자료실 UI 축)와 직교 → 자료실 facet/카테고리 미터치 - StudySession.certification/subject/topic 보존 (세부 메타로 계속 사용) - study_type은 느슨한 분류 (강한 enum 미사용, jlpt_n3 등 확장 여지) - polymorphic study_topic_items 영구 금지 → 자산 타입별 조인 테이블 추가 방식 API: /api/study-topics CRUD + /by-document/{id} + 자료/세션 매핑 엔드포인트. 프론트: /study/topics 목록 + /study/topics/[id] 통합 뷰(필기·자료 두 트랙) + write 폼에 워크스페이스 드롭다운 + study hub 진입 카드. 후속 PR-2 어학 UX, PR-3 오디오 자산, PR-4 AI retrieval scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:06:37 +09:00
Hyungi Ahn	24bd363beb	feat(library): 자료별 손글씨 노트 (PR-D) — iPad 학습 시 옆에 필기 자료실 자료 detail 에 "필기" 버튼 → 본문 아래에 HandwriteCanvas 띄움. 자료당 사용자별 1개 캔버스 (UNIQUE user×document). upsert 방식. Backend: - migrations 177~178: document_notes (user_id, document_id, strokes_json, canvas 크기) + UNIQUE(user_id, document_id) + 인덱스 - app/models/document_note.py: DocumentNote ORM - app/api/document_notes.py: · GET /api/documents/{id}/note — 단건 조회 (없으면 strokes_json=null) · PUT /api/documents/{id}/note — upsert (PostgreSQL ON CONFLICT) · DELETE /api/documents/{id}/note · ownership: WHERE user_id=current_user.id (single-user 가정) - app/main.py: document_notes_router 등록 (/api/documents prefix) Frontend: - routes/documents/[id]/+page.svelte: · 자료실 자료 (category='library') 의 affordance row 에 "필기" 토글 추가 · 클릭 시 GET /note 로 strokes 로드 → HandwriteCanvas 본문 카드 아래 마운트 · 캔버스 onChange → PUT /note 자동 저장 (HandwriteCanvas 내부 3초 idle 디바운스 활용) · 60vh / min-h-[400px] 분할. 모바일에선 본문 아래 스크롤로 자연스럽게. - HandwriteCanvas 재사용 — sessionId prop 에 documentId 전달. localStorage 키도 그대로 사용 (자료별로 namespacing).	2026-04-27 12:38:03 +09:00
Hyungi Ahn	49d8f68986	feat(library): 자료실 회독 카운트 추적 (PR-A backend) 자료실 자료를 사용자가 명시적으로 "1회독 완료" 클릭 시 +1 누적. detail 진입 자동 카운트 ❌. append-only 로그. 데이터: - migrations 174~176: document_reads 테이블 + 인덱스 2개 (단일 statement 분할) ORM: - app/models/document_read.py: DocumentRead (user_id, document_id, read_at) API (app/api/document_reads.py, /api/documents prefix): - POST /api/documents/{id}/read — 회독 +1 - GET /api/documents/{id}/read-stats — {read_count, last_read_at} - DELETE /api/documents/{id}/read/last — 현재 사용자의 그 문서 마지막 1건만 · ownership: WHERE user_id=current_user.id AND document_id=:doc_id · documents 에 user_id 부재 (single-user). multi-user 전환 시 ownership check 추가 필요 — 코드 주석 명시. 응답 확장: - DocumentResponse: read_count(default 0), last_read_at(default None) - /api/documents/library: 페이지 N건 한정 LEFT JOIN 으로 read 통계 매핑 (N+1 회피) - /api/library/tree CategoryTreeNode: unread_count 추가 · 기존 path_docs 가 ancestor 누적 구조라 그대로 활용 — 하위 경로 합산 자동 규칙 (사용자 명시 — 변경 금지): · 같은 날 여러 번 클릭 → 각각 별개 회독 · 실수 클릭 취소 = DELETE /read/last · documents 에 read_count 컬럼 추가 ❌, 로그 기반 COUNT(*) 만 plan: ~/.claude/plans/scalable-chasing-stonebraker.md 브랜치: feature/library-reads (손글씨 트랙과 분리)	2026-04-27 12:08:36 +09:00
Hyungi Ahn	2df7b24ac9	fix(study): split migration 164 into 10 single-statement files (asyncpg) asyncpg prepared statement 는 single-command 만 허용. 원래 한 파일이던 study_sessions 스키마(CREATE TABLE x2 + CREATE INDEX x8)를 143~146 분할 패턴 따라 10개로 분리. 164: CREATE TABLE study_sessions 165~169: study_sessions 인덱스 5개 (partial) 170: CREATE TABLE study_session_assets 171~173: study_session_assets 인덱스 3개 문제: cannot insert multiple commands into a prepared statement 원인: _run_migrations 가 conn.exec_driver_sql 로 단일 prepared statement 실행	2026-04-27 08:18:40 +09:00
Hyungi Ahn	7804f22dce	feat(study): study_sessions backend (Phase 1) — 자격증/어학 일반 학습 세션 + assets 연결 iPad 손글씨 필사 / 모바일 암기노트 / 모바일 퀴즈가 같은 데이터를 공유하는 일반 학습 세션 backend. study_type 으로 certification/language 분기. - migrations/164: study_sessions + study_session_assets DDL + 5 partial indexes - app/models/study_session.py: StudySession + StudySessionAsset ORM (cascade) - app/api/study_sessions.py: CRUD + snapshot(PNG) + assets + filter + groups - ownership: 모든 endpoint user_id 검증, mismatch 도 404 (정보 누설 방지) - 409 중복: UNIQUE(session, document, asset_type, role) 사전 SELECT + IntegrityError 폴백 - enum 422: study_type / mode / asset_type / role / review_state / order - filter: 11개 (study_type, certification, language_code, learning_level, subject, topic, review_state, document_id, asset_type, mode, due_before) - groups: certification 트리 + language 트리 + has_audio/has_video - snapshot: documents.py atomic rename + error_code 패턴 차용 - app/main.py: /api/study-sessions router 등록 plan: ~/.claude/plans/scalable-chasing-stonebraker.md Phase 1 미사용 필드 (review_state/quiz/ocr/ai_summary/prompt) 는 NULL 허용, 자동 로직은 Phase 2~4 별도 PR 에서 활성.	2026-04-27 08:15:28 +09:00
Hyungi Ahn	320c02fe29	fix(memo): bump migration number 161 → 163 (collision with PR-B 161/162) cherry-pick 시점에 main 이 PR-B B-2 (161_analyze_events_answerability + 162_analyze_events_answerability_idx) 까지 포함해 번호 충돌. fastapi 부팅 시 _parse_migration_files 가 "migration 버전 중복: 161_..." 로 RuntimeError. 163 로 재지정 (schema_migrations 의 기존 161/162 레코드는 그대로 유효).	2026-04-24 12:59:39 +09:00
Hyungi Ahn	9d344c87ea	feat(memo): auto-hide completed tasks after 10s with toggle 체크박스 체크 후 10초 경과 항목을 대시보드 핀 메모 / /memos 에서 자동 숨김, 메모 푸터 "완료 N개 보기" 버튼으로 토글. - migration 161: documents.memo_task_state JSONB — {"<idx>":{"checked_at":"ISO"}} - PATCH /memos/{id}/tasks/{task_index} 전용 엔드포인트: · SELECT FOR UPDATE 로 동시 토글 race 차단 · task_index drift 시 stale state 자동 정리 (400 대신 200) · AI 재처리/큐 enqueue 의도적 스킵 + memo_task_toggle_skip_ai 로그 - renderMemoHtml(taskStates, now) → 경과 항목에 memo-task-hidden 클래스 - Svelte 5 $effect cleanup 으로 setInterval 누수 방지	2026-04-24 12:56:55 +09:00
Hyungi Ahn	34f79f84f2	feat(search): B-2 evidence LLM → 4B triage 전환 + answerability 컬럼 Plan 본래 의도: 근거 선별은 4B, 합성은 26B. - evidence_service: LLM 호출을 primary(26B MLX) → triage(4B Ollama) 로 전환. Ollama concurrent 가능하므로 get_mlx_gate() 제거. synthesis 는 여전히 llm_gate Semaphore(1) 경유로 MLX 보호. - prompt_version v3-evidence-triage bump (synthesis 프롬프트 자체는 v2-600char 그대로, evidence LLM 경로 변경을 분리 추적). - migrations 161/162: analyze_events 에 answerability / partial_basis / suggested_query_count 컬럼 + partial index. /ask 는 이미 ask_events 에 completeness (full/partial/insufficient) 기록 운영 중이므로, analyze_events 쪽은 향후 문서 분석에서 answerability 개념 도입 시 활용 예비. - telemetry record_analyze_event 에 answerability / partial_basis / suggested_query_count 파라미터 확장. 기존 /ask 3-state completeness 로직 (classifier_service + 7-tier gate) 은 그대로 유지 — 이미 Phase 3.5a 에서 완성된 상태. B-2 는 LLM 부하 재분배와 관측성 확장에 집중. MLX 부하 감소 효과: 이전엔 쿼리 1건당 evidence(26B) + synthesis(26B) 2번 MLX 호출. 이제는 evidence(4B Ollama) + synthesis(26B MLX) 로 MLX 호출 절반. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:33:32 +09:00
Hyungi Ahn	6fdc48e5b6	feat(ai): B-1 summary tier 분할 — triage(4B) + deep_summary(26B) PR-A policy 레이어를 재사용하여 classify_worker 에 tier triage 경로를 추가. Legacy ai_summary / ai_domain / ai_suggestion 은 유지 (회귀 0), tldr/bullets/ detail/inconsistencies 는 별도 필드로 분리. Migrations (156~160): - 156 documents: ai_tldr, ai_bullets, ai_detail_summary, ai_inconsistencies, ai_analysis_tier 5컬럼 - 157 process_stage 에 'deep_summary' ADD VALUE 단독 (Postgres 동일 트랜잭션 제약 회피) - 158 processing_queue.payload JSONB (envelope 전달) - 159 analyze_events 에 tier + suppressed_reason - 160 suppressed_reason partial index Models/ORM: - Document: 5컬럼 Mapped 추가 - ProcessingQueue: deep_summary enum 확장 + payload 필드, enqueue_stage 에 payload 옵션 - AnalyzeEvent: PR-A shadow 6컬럼 + PR-B tier/suppressed_reason Workers: - classify_worker: 기존 legacy 경로 뒤에 _run_tier_triage 추가. - _match_subject_domain(doc, text): source_channel + 본문 keywords + ai_domain prefix 로 PR-A policy 의 subject_domain 이름 결정 (category 매칭 금지). - R1 TriageOutput pydantic + JSON 깨짐 fallback (triage_json_invalid). - R2 _check_backlog_guard(): 30분 window ratio > threshold OR pending 초과면 soft escalate suppress. hard escalate 는 통과. - R3 _slice_text_ranges(): 260k 초과 시 head 120k + mid 20k + tail 120k 3조각. - escalate 시 EscalationEnvelope 구성 + {envelope, subject_domain} payload 로 deep_summary enqueue. - deep_summary_worker (신규): queue payload 에서 envelope + subject_domain 읽기 → render_26b("p3c_deep_summary", subject_domain) + MLX 호출 (llm_gate Semaphore(1) 경유) → ai_detail_summary + ai_inconsistencies 저장 + ai_analysis_tier='deep'. _filter_inconsistencies 로 허용 kind (version_drift / procedure_conflict / source_conflict / missing_basis) 만 통과 — 구매/계약 kind drop. - queue_consumer: workers dict 에 deep_summary 추가 + BATCH_SIZE=1. next_stages 는 건드리지 않음 — classify → embed/chunk 는 그대로, deep_summary 는 독립 체인. Telemetry: - record_analyze_event: subject_domain / risk_flags / escalation_reasons / confidence / policy_version / shadow_would_route_to / tier / escalated_to_26b / suppressed_reason 파라미터 확장. classify/deep worker 가 mode="summary_triage" 또는 "summary_deep" 로 기록. API: - DocumentResponse 에 ai_tldr / ai_bullets / ai_detail_summary / ai_inconsistencies / ai_analysis_tier 5필드 노출. Prompts: - classify.txt 에 DEPRECATED 주석만 추가 (파일 유지 — rollback 경로 보존). - PR-A 의 app/prompts/policy/p3a_short_summary.txt (4B) 와 p3c_deep_summary.txt (26B) 를 그대로 사용. 내 소유의 summary_triage.txt / summary_deep.txt 는 중복 이라 별도 커밋에서 제거하지 않고 바로 생성 전 삭제. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:22:40 +09:00
Hyungi Ahn	c9e8dd0ba1	fix(db): split migration 153 for asyncpg single-statement limit asyncpg exec_driver_sql 이 prepared statement 로 multiple commands 를 허용하지 않아 배포 시 PostgresSyntaxError: cannot insert multiple commands into a prepared statement 로 init_db() 실패. 153 를 단일 ALTER TABLE (10 ADD COLUMN) 로 축소하고 2 partial index 를 154/155 로 분리: - 153_analyze_events_shadow.sql: ALTER TABLE ADD COLUMN (단일 statement) - 154_analyze_events_shadow_idx_ts.sql: idx_analyze_events_shadow_ts - 155_analyze_events_policy_violation_idx.sql: idx_analyze_events_policy_violation 배포 test: GPU fastapi 컨테이너 재빌드 후 init_db 가 153/154/155 세 파일을 순차 적용 (asyncpg prepared statement 1 파일 1 문). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:38:40 +09:00
Hyungi Ahn	301867d0ee	feat(db): migration 153 — analyze_events shadow 컬럼 ALTER TABLE analyze_events ADD COLUMN IF NOT EXISTS 로 10개 shadow 컬럼: subject_domain, risk_flags[], high_impact_task, escalated_to_26b, escalation_reasons[], confidence, policy_violation, policy_violation_ids[], shadow_would_route_to, policy_version. + 2 partial index: - idx_analyze_events_shadow_ts (shadow_would_route_to IS NOT NULL) - idx_analyze_events_policy_violation (policy_violation=true) 전부 nullable, 기본값 NULL. 아무도 쓰지 않음 — PR-B 의 DBShadowLogger 가 writer 추가 예정. 번호 153: 152 는 `feat(category): law` 가 점유 (`e88640d`). BEGIN/COMMIT 없음 (CLAUDE.md: _run_migrations 단일 outer 트랜잭션). answerability / new_facts_count 는 PR-B 의 migration 154+ 가 소유. plan: ~/.claude/plans/wise-gliding-hippo.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:34:49 +09:00
Hyungi Ahn	e88640d3d8	feat(category): law 카테고리 분리 — enum + backfill + classify skip - migrations/152: ALTER TYPE doc_category ADD VALUE 'law' (DDL only; PG16 단일-트랜잭션 제약상 backfill 은 별도) - models/document.py: Enum 에 'law' 추가 (7 활성 + 3 유보) - workers/law_monitor.py: Document(..., category='law') — 신규 유입부터 세팅 - workers/classify_worker.py: source_channel='law_monitor' early-return + 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium'). AI classify skip — 법령 구조 고정/외부 source of truth/자동 재수집 - scripts/backfill_category.py: law 분기 + WHERE re-target ((source_channel='law_monitor' AND category='document')) + VERIFY cat_law/law_source_count + fail 조건 - api/documents.py: default 목록 제외에 law_monitor 추가 (news 와 동일 패턴) - api/dashboard.py: documents count FILTER 에 law_monitor 제외 (category_counts.law 는 기존 GROUP BY category 로 자동 노출) - frontend/Sidebar.svelte: '법령 알림' 버튼 ?source=law_monitor → ?category=law (explicit category 경로가 default exclusion 을 skip) plan: ~/.claude/plans/stateless-churning-raccoon.md axis 원칙: category=UI 축, policy/telemetry=source_channel+ai_domain 축 (feedback_category_vs_ai_domain_axis.md) 배포 순서: push → GPU pull → compose up --build fastapi frontend → backfill --dry-run → --apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:14:56 +09:00
Hyungi Ahn	1e2c004dd4	feat(media): §3 audio STT + video 재생 인프라 plan: ~/.claude/plans/luminous-sprouting-hamster.md §3 스키마: - migrations/147_audio_segments_table.sql: audio_segments (STT 타임스탬프 세그먼트) - migrations/148_audio_segments_idx.sql: (document_id, start_s) idx - migrations/149_document_media_cols.sql: documents.thumbnail_path + needs_conversion - migrations/150_queue_stage_stt.sql: process_stage += 'stt' - migrations/151_queue_stage_thumbnail.sql: process_stage += 'thumbnail' - app/models/audio_segment.py, document.py (thumbnail_path/needs_conversion) 서비스: - services/stt/{Dockerfile, requirements.txt, server.py} — faster-whisper large-v3 GPU 컨테이너. /transcribe (filePath/langs/beamSize) + /health + /ready (cuda device_count + model_loaded). NFC/NFD 경로 resolver (OCR 교훈). - docker-compose.yml: stt-service 추가 (GPU 1 예약, :3300, NAS ro mount, stt_models volume, start_period 300s), fastapi env 에 STT_ENDPOINT. 파이프라인 (의존 §1 category): - app/workers/stt_worker.py 신규: stage='stt' pickup → STT_ENDPOINT 호출 → extracted_text + audio_segments 저장. Timeout 30분. - app/workers/thumbnail_worker.py 신규: ffmpeg 50% 지점 1장 → PKM/Videos/.thumbs/{id}.jpg + thumbnail_path 세팅. needs_conversion=true 는 skip. - app/workers/file_watcher.py 확장: PKM/{Inbox, Recordings, Videos} 스캔. 확장자→category, audio→stage=stt, video .mp4/.webm→ stage=thumbnail, video .mov/.mkv/.avi→needs_conversion=true + stage 없음. settings.roon_library_path prefix skip. - app/workers/queue_consumer.py 확장: stt + thumbnail workers 등록, BATCH_SIZE(stt=1, thumbnail=3), next_stages 에 stt→[classify] 추가 (audio 는 extract 건너뜀). - app/Dockerfile: ffmpeg 추가 (썸네일 subprocess 용). API (의존 §1): - /api/audio/{id}/segments — AudioSegment ORDER BY start_s - /api/video/{id}/thumbnail — thumbnail_path FileResponse (쿼리 토큰) - /api/documents/{id}/file: media_types 에 audio/video mime 포함 (§2 커밋에 이미 포함). Starlette FileResponse 가 Range 자동. - upload_document: .mov/.mkv/.avi 웹 업로드 거부 (error_code unsupported_codec). NAS 드롭은 file_watcher 가 quarantine 수용. 프론트: - AudioPlayer.svelte: HTML5 audio + 전사 세그먼트 sticky 패널 + 줄 클릭 seek. activeIdx 하이라이트. - VideoPlayer.svelte: HTML5 video direct play + needs_conversion 안내 카드. poster 는 thumbnail endpoint. - /audio (목록 grid) + /audio/[id] (플레이어) - /video (썸네일 grid + 변환 필요 배지) + /video/[id] (플레이어) - Sidebar.svelte: Mic/Film 아이콘 + audio/video 네비 활성, count 배지 (§2 /stats/category-counts 재사용). 설정: - app/core/config.py: stt_endpoint + roon_library_path. DoD 배포 후 smoke: /ready cuda:true, 회의 mp3 transcribe, audio extract 없이 classify 진행(queue 회귀), /audio 재생, .mp4 재생, .mov 웹 400, .mov NAS quarantine, Sidebar 네비 + count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:47:36 +09:00
Hyungi Ahn	aceb54e586	fix(migrations): 143 asyncpg multi-statement 분리 asyncpg prepared statement 는 single-command 만 지원 (core/database.py exec_driver_sql 경로). §1 의 143_category.sql 이 4 statement (TYPE + ALTER + INDEX×2) 였어서 fastapi 부팅 시 asyncpg.PostgresSyntaxError "cannot insert multiple commands into a prepared statement" 로 실패 → 컨테이너 restart 루프. 143 을 4 개 파일로 분리: 143: CREATE TYPE doc_category 144: ALTER TABLE documents ADD category / ai_suggestion 145: CREATE INDEX idx_documents_category 146: CREATE INDEX idx_documents_has_suggestion (partial) DB 상태는 깨끗 (migration 143 이 부분 적용 안 됨 — asyncpg 가 batch 자체를 reject). schema_migrations 에 143 도 미기록이라 재실행 안전. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:46:00 +09:00
Hyungi Ahn	8fdea88676	feat(documents): §1 category enum + ai_suggestion 승인 파이프 plan: ~/.claude/plans/luminous-sprouting-hamster.md §1 - migrations/143_category.sql: doc_category enum (6 활성 + 3 유보) + documents.category + documents.ai_suggestion JSONB + 2 idx. - app/models/document.py: category (Enum, create_type=False), ai_suggestion (JSONB). - app/prompts/classify.txt: document_type enum 에 7 실무 doctype 추가 (발주서/세금계산서/명세표/도면/증명서/계획서/시방서) + facet_doctype 필드 directive. - config.yaml: document_types 에 7 항목 추가 (worker 검증 통과). - app/workers/classify_worker.py: FACET_DOCTYPES / LIBRARY_SUGGESTION_DOCTYPES 상수, facet_doctype 파싱(기존값 미덮어씀), 발주서/세금계산서/명세표 감지 시 ai_suggestion={proposed_category=library, proposed_path=@library/ 거래/{YYYY}/{doctype}, source_updated_at=doc.updated_at.isoformat(), ...}. category / user_tags 자동 전이 금지 (suggestion-only). - app/api/documents.py: · DocumentResponse 에 category / ai_suggestion 노출 · GET /documents ?category=<cat> / ?has_suggestion / ?proposed_category (category 지정 시 기본 news/memo 제외 해제 — §2 승인 UI 계약) · GET /documents/library 를 Document.category=='library' 기반으로 재구현 (path subquery 는 user_tags 유지 — 분류 내부 서가 경로) · POST /documents/{id}/accept-suggestion — FOR UPDATE + idempotent no-op + dual 409 stale (payload source_updated_at / documents.updated_at) + user_tags idempotent append · DELETE /documents/{id}/suggestion — idempotent, stale 검사 없음 - scripts/backfill_category.py: dry-run / apply. 매핑(news/memo/@library/else) + 3-way 상대 검증 (all_rows==categorized, uncategorized==0, cat_library==has_library_tag — 자동 전이 금지 정책 검증). 남은 DoD (원격 배포 후): docker compose up → migration 143 적용 → backfill apply → smoke (drive_sync 발주서 업로드 suggestion 생성 / category 유지, accept-suggestion idempotency + 409 stale 두 벡터, /documents?category=library == /documents/library 건수 일치). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:32:01 +09:00
Hyungi Ahn	893ecacc0e	ops(migrations): defer 142 NOT NULL until 1주 운영 관찰 후 활성화 migration 142 ALTER COLUMN source SET NOT NULL 자동 적용 방지. _run_migrations 의 glob('.sql') 비재귀 → _deferred/ 무시. 활성화 절차 (D7 참조): - 138~141 적용 + 7일 운영 후 SELECT COUNT() FROM ask_events WHERE source IS NULL AND created_at > <deploy> = 0 확인 - git mv migrations/_deferred/142_.sql migrations/142_.sql - docker compose restart fastapi (init_db 가 자동 적용) 이유: 새 코드의 source 누락 가능성 empirical 검증 후 lock. NOT NULL 적용 후 NULL INSERT 시도 시 ask_events 기록 실패 (data loss). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	09883d0358	feat(ask): Phase 3.5 A0 — ask_events source/eval_case_id + eval auth boundary - migrations 138~142: source TEXT DEFAULT 'document_server' + eval_case_id TEXT 추가, 인덱스 2개, backfill, 1주 관찰 후 NOT NULL (140 적용 분리) - app/models/ask_event.py: source / eval_case_id ORM 필드 (138~141 단계 nullable) - app/services/search_telemetry.py: record_ask_event 시그니처에 source / eval_case_id - app/core/config.py: settings.eval_runner_token + EVAL_RUNNER_TOKEN env 로드 - app/api/search.py: - X-Source / X-Eval-Case-Id / X-Eval-Token 헤더 수신 - _resolve_eval_identity(): hmac.compare_digest 로 token 검증, 실패 시 source 'document_server' 강등 + warning log + eval_case_id=None - 두 record_ask_event 호출에 검증된 source/eval_case_id 전달 - credentials.env.example: EVAL_RUNNER_TOKEN= (empty default = 모든 eval claim 거부) - tests/test_ask_eval_auth.py: 9 케이스 — token 없음/틀림/일치, env 미설정, case_id only, non-eval source forces case_id None trust boundary: 일반 client 의 X-Source=eval / X-Eval-Case-Id 시도는 무시되어 calibration telemetry 오염 불가. eval runner 만 EVAL_RUNNER_TOKEN 으로 인증. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	8a8096a444	feat(api): Phase E.2 — analyze_events 테이블 + 로깅 POST /documents/{id}/analyze 호출을 DB에 기록. failure mode 분류 + source 식별. - migrations/137: analyze_events 테이블 (doc_id FK, mode, truncated, layers_returned JSONB, cached, latency_ms, error_code, source TEXT NOT NULL DEFAULT 'document_server', prompt_version) - ORM: models/analyze_event.py 신규 - services/document_telemetry.py: record_analyze_event() + sanitize_source() 서버 fallback 강제 (enum 외 → unknown, None → document_server) - app/api/documents.py: · X-Source 헤더 + BackgroundTasks 의존성 추가 · try/finally 패턴으로 성공/cache/에러 모든 exit에서 background insert · error_code: None(성공) \| not_found \| no_text \| timeout \| llm \| parse \| missing_summary Phase F에서 nanoclaude가 X-Source: synology_chat 헤더로 호출하면 source 구분 가능. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:58:58 +09:00
Hyungi Ahn	72b7e65fca	fix(migration): asyncpg 다중 statement 분리 (135/136) `a842c65` 패턴과 동일. asyncpg는 prepared statement에 단일 SQL만 허용. - 135: ALTER TABLE만, 세미콜론 제거 - 136: CREATE INDEX 별도 파일 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:54:15 +09:00
Hyungi Ahn	59e38d80b0	feat(api): Phase E.1 — ask_events 측정 필드 확장 (answer_length/prompt_version) E.3 400→600자 튜닝 전후 비교 + 단계 5 failure mode 분석의 기준 필드 추가. - migrations/135: answer_length/covered_aspects/missing_aspects/model_name/prompt_version 컬럼 + prompt_version 인덱스 - ORM: ask_event.py에 동일 5개 필드 매핑 - prompt_versions.py: ASK_PROMPT_VERSION="search_synthesis.v1-400char" 상수 + resolve_primary_model() helper - search_telemetry.record_ask_event: 시그니처에 keyword-only 필드 5개 추가 (하위 호환) - search.py: refused + success 두 호출사이트에서 새 필드 전달. answer_length는 len(sr.answer or ""), model_name/prompt_version은 상수 모듈 기반 기존 호출 구조(이미 search_telemetry+background_tasks로 DB insert 중)는 유지. 순수 확장 커밋. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:52:14 +09:00
Hyungi Ahn	a842c650d8	fix(migration): asyncpg 다중 statement 분리 asyncpg는 prepared statement에 다중 SQL 불가. COMMENT 제거하고 ALTER TABLE만 유지. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:10:01 +09:00
Hyungi Ahn	088966bf78	feat(extract): OCR 트리거 규칙 + extract_meta JSONB 스캔 PDF/이미지 자동 OCR 트리거 + 결과 품질 검증 + 1회 제한. - extract_meta JSONB 컬럼 추가 (migration 134) ocr_attempted, ocr_reason, ocr_skip_reason, ocr_terminal, ocr_chars - PDF OCR 트리거: total_chars < 300 또는 avg < 80 && total < 3000 - 이미지 자동 OCR: jpg/png/tiff/webp 등 - 품질 차등: 이미지 50자, PDF 200자 또는 페이지당 30자 - 상한: pages > 200 또는 file_size > 150MB → 스킵 - OCR 1회 제한: extract_meta.ocr_attempted로 재시도 방지 - extractor_version은 도구명만 (surya_ocr/pymupdf/kordoc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:04:13 +09:00
Hyungi Ahn	ba19c6fb79	feat(library): Phase 2A facet 탐색 기반 — 컬럼 + API + 필터 documents 테이블에 facet_company/topic/year/doctype 4개 축 추가. facet_values 사전 테이블 + CRUD API. facet-counts 집계 API (교차 필터링 지원). 문서 목록 API에 facet 필터 파라미터 추가. DocumentResponse/DocumentUpdate 스키마에 facet 필드 포함. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:09:25 +09:00
Hyungi Ahn	32aab7784b	fix(library): 마이그레이션 asyncpg 다중 statement 분리 asyncpg는 prepared statement에 다중 SQL 불가. 120(테이블) → 121(unique idx) → 122(parent idx) → 123(시드) 분리. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:03:00 +09:00
Hyungi Ahn	964d4ffc67	feat(library): 자료실 분류 체계 독립 관리 Phase 1 library_categories 테이블 추가로 빈 카테고리 생성 가능. CRUD API (생성/leaf rename/leaf delete) + 트리 머지 엔드포인트. 사이드바 트리에 컨텍스트 메뉴 (추가/이름변경/삭제). LibraryPathEditor를 카테고리 기반 flat selector로 전환. 미분류는 시스템 분류로 보호 (삭제/이름변경 불가). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:01:53 +09:00
Hyungi Ahn	7c78c09046	fix(queue): migration을 단일 statement 파일 3개로 분리 asyncpg prepare가 다중 statement 불가. 117(stale 정리) → 118(constraint 제거) → 119(partial unique index 생성) 순차 실행. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:40:19 +09:00
Hyungi Ahn	f0c7d4c2c2	fix(queue): migration 117에서 DO $$ BEGIN 제거 (BEGIN 검증 회피) _validate_sql_content가 PL/pgSQL의 BEGIN을 트랜잭션 제어문으로 오탐. guard check를 제거하고 CREATE UNIQUE INDEX 자체의 중복 실패에 의존. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:38:49 +09:00
Hyungi Ahn	751cdc5be8	fix(queue): enqueue 경로 중복 방어 — partial unique index + 중앙 enqueue_stage 함수 기존 UNIQUE(document_id, stage, status)는 pending+processing 동시 존재를 허용해서 stale 복구 시 충돌 발생. 2-layer 방어로 근본 차단: 1) DB: partial unique index uq_queue_active — 활성 행(pending/processing)은 (document_id, stage)당 최대 1개만 허용 2) App: enqueue_stage() 중앙 함수 — INSERT ON CONFLICT DO NOTHING으로 모든 9개 경로의 check-then-insert TOCTOU race 제거 migration 117은 guard check 포함 — 활성 중복이 남아있으면 RAISE EXCEPTION 으로 중단, 수동 정리 유도. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:37:32 +09:00
Hyungi Ahn	9363cdcc61	fix(library): 마이그레이션 2개로 분리 (BEGIN 검증 회피) DO $$ BEGIN 블록이 트랜잭션 BEGIN으로 오탐됨. CREATE TYPE / ALTER TABLE을 별도 마이그레이션으로 분리. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:31:16 +09:00
Hyungi Ahn	d01617e2bc	fix(library): 마이그레이션 asyncpg multiple statement 에러 수정 asyncpg는 prepared statement에 여러 명령을 넣을 수 없음. CREATE TYPE + ALTER TABLE을 단일 DO $$ 블록으로 합침. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:30:06 +09:00
Hyungi Ahn	5c58778a41	feat(library): doc_purpose 필드 + 자료실 업로드 기능 지식/업무 문서 1차 구분을 위한 doc_purpose(business\|knowledge) 추가. - 마이그레이션: document_purpose enum + 컬럼 - AI 분류: docPurpose 자동 추론 (빈 값만 채움) - 업로드 API: doc_purpose + library_path Form 파라미터 - 자료실 업로드: business 기본값 + 선택 경로 자동 태깅 - FileInfoView: 용도 select (수동 변경, 실패 롤백) - DocumentCard: 업무/참조 배지 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:26:59 +09:00
Hyungi Ahn	deb5c1b704	feat(library): 자료실 — 태그 기반 트리 문서 관리 기능 목적성 문서(양식, 템플릿, 연간보고서)를 @library/ 태그로 분류하고 트리 구조로 탐색하는 자료실 페이지 추가. 백엔드: 경로 정규화 유틸, library-tree/library 엔드포인트, 다운로드 Content-Disposition 개선(원본/PDF 분리, 한글 filename*) 프론트: /library 페이지, LibraryPathEditor, 상단 nav/사이드바 링크 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 14:55:45 +09:00
Hyungi Ahn	e435332ea1	feat(memos): UX 개선 — 편집 수정, 제목 제거, 체크박스, 아카이브 Phase A: 편집 버그 수정 (content만 PATCH, Ctrl+Enter/Esc), 제목 UI 제거 (자동생성 80자, 내부용), 카드 경량화. Phase B: GFM task list 지원, taskIndex 기반 인터랙티브 토글, DOMPurify checkbox 최소 허용, optimistic update + 롤백. Phase C: archived 컬럼 (메모 UX 전용, 문서 미노출), 멱등 세팅 API (토글 아님), 활성/아카이브 뷰 분리 쿼리, 핀은 활성 메모용 (archived 시 무시). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 08:36:16 +09:00
Hyungi Ahn	3c5844e287	fix(memos): DROP CONSTRAINT 사용 (UNIQUE constraint는 DROP INDEX 불가) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:03:35 +09:00
Hyungi Ahn	e3a065d15d	fix(memos): migration을 개별 파일로 분리 (asyncpg multi-statement 미지원) asyncpg prepared statement가 multi-command를 지원하지 않아 시작 실패. 105 단일 파일을 105-112 개별 statement로 분리. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:02:45 +09:00
Hyungi Ahn	b46a75758b	feat(memos): 내장 메모 기능 — 파일 없는 문서(file_type='note') Document Server에 Memos 앱 대체 기능 내장. 메모를 documents 테이블의 file_type='note' 레코드로 관리하여 기존 AI 파이프라인(classify/embed/ chunk/search/ask) 재활용. Backend: - migration 105: source_channel 'memo', file_path NULL 허용, user_tags/pinned/ask_includable 컬럼, 메모 인덱스 - api/memos.py: CRUD 7개 엔드포인트 + #태그 파싱 + stale AI 초기화 + 큐 pending 중복 방지 - queue_consumer: note extract/preview skip - documents API: file_path NULL 가드, 목록에서 메모 제외 - search /ask: ask_includable=false 문서 evidence 제외 Frontend: - /memos 타임라인 페이지 (빠른 입력 + 피드 + 인라인 편집 + 태그 필터) - QuickMemoButton FAB (Ctrl+M, 모든 페이지) - Sidebar 메모 링크 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:00:00 +09:00
Hyungi Ahn	5038007998	fix(news): SSRF validation + admin auth + API key masking + collect lock + XML safety - 신규 url_validator.py: SSRF 차단 (private IP/loopback/link-local/reserved/multicast/CGNAT 블록, HTTPS only) - require_admin dependency 추가 — 소스 CRUD, /collect, /digest/regenerate에 적용 - User.is_admin 컬럼 + migration 104 - NYT API key 로그 마스킹 (쿼리스트링 제거) - RSS fetch: redirect 수동 처리(3회, target 재검증), 5MB 크기 제한, content-type 허용목록, feed.bozo 체크 - /collect 재진입 차단 (asyncio.Lock, 단일 인스턴스 한정) - HTTP feed allowlist (코드 레벨 상수, API 미노출) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 14:32:55 +09:00
Hyungi Ahn	b2306c3afd	feat(ask): Phase 3.5b guardrails — verifier + telemetry + grounding 강화 Phase 3.5a(classifier+refusal gate+grounding) 위에 4개 Item 추가: Item 0: ask_events telemetry 배선 - AskEvent ORM 모델 + record_ask_event() — ask_events INSERT 완성 - defense_layers에 input_snapshot(query, chunks, answer) 저장 - refused/normal 두 경로 모두 telemetry 호출 Item 3: evidence 간 numeric conflict detection - 동일 단위 다른 숫자 → weak flag - "이상/이하/초과/미만" threshold 표현 → skip (FP 방지) Item 4: fabricated_number normalization 개선 - 단위 접미사 건/원 추가, 범위 표현(10~20%) 양쪽 추출 - bare number 2자리 이상만 (1자리 FP 제거) Item 1: exaone semantic verifier (판단권 잠금 배선) - verifier_service.py — 3s timeout, circuit breaker, severity 3단계 - direct_negation만 strong, numeric/intent→medium, 나머지→weak - verifier strong 단독 refuse 금지 — grounding과 교차 필수 - 6-tier re-gate (4라운드 리뷰 확정) - grounding strong 2+ OR max_score<0.2 → verifier skip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 09:49:56 +09:00
Hyungi Ahn	1beba3402b	fix(migration): split 102 ask_events into single-statement files asyncpg cannot insert multiple commands into a prepared statement. 102 = CREATE TABLE only, 103 = CREATE INDEX only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 08:52:26 +09:00
Hyungi Ahn	06443947bf	feat(ask): Phase 3.5a guardrails (classifier + refusal gate + grounding + partial) 신규 파일: - classifier_service.py: exaone binary classifier (sufficient/insufficient) parallel with evidence, circuit breaker, timeout 5s - refusal_gate.py: multi-signal fusion (score + classifier) AND 조건, conservative fallback 3-tier (classifier 부재 시) - grounding_check.py: strong/weak flag 분리 strong: fabricated_number + intent_misalignment(important keywords) weak: uncited_claim + low_overlap + intent_misalignment(generic) re-gate: 2+ strong → refuse, 1 strong → partial - sentence_splitter.py: regex 기반 (Phase 3.5b KSS 업그레이드) - classifier.txt: exaone Y+ prompt (calibration examples 포함) - search_synthesis_partial.txt: partial answer 전용 프롬프트 - 102_ask_events.sql: /ask 관측 테이블 (completeness 3-분리 지표) - queries.yaml: Phase 3.5 smoke test 평가셋 10개 수정 파일: - search.py /ask: classifier parallel + refusal gate + grounding re-gate + defense_layers 로깅 + AskResponse completeness/aspects/confirmed_items - config.yaml: classifier model 섹션 (exaone3.5:7.8b GPU Ollama) - config.py: classifier optional 파싱 - AskAnswer.svelte: 4분기 렌더 (full/partial/insufficient/loading) - ask.ts: Completeness + ConfirmedItem 타입 P1 실측: exaone ternary 불안정 → binary gate 축소. partial은 grounding이 담당. 토론 9라운드 확정. plan: quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 08:49:11 +09:00
Hyungi Ahn	9bef049af6	fix(migration): SQLAlchemy text() bind 충돌 회피 — [:200] 표기 제거 migration 101 의 SQL 주석에 '[:200]' 이 들어 있었는데 SQLAlchemy text() 가 :200 을 named bind parameter 로 해석해 init_db() 가 'A value is required for bind parameter 200' 로 실패. fastapi startup 자체가 떨어지는 문제. 주석을 '첫 200자' 로 고쳐서 콜론+숫자/영문 패턴 제거.	2026-04-09 07:56:50 +09:00
Hyungi Ahn	75a1919342	feat(digest): Phase 4 Global News Digest (cluster-level batch summarization) 7일 rolling window 뉴스를 country × topic 2-level로 묶어 매일 04:00 KST 배치 생성. search 파이프라인 미사용. documents → clustering → cluster-level LLM summarization → digest. 핵심 결정: - adaptive threshold (0.75/0.78/0.80) + EMA centroid (α=0.7) + time-decay (λ=ln(2)/3) - min_articles=3, max_topics=10/country, top-5 MMR diversity, ai_summary[:300] truncate - cluster-level LLM only, drop금지 fallback (topic_label="주요 뉴스 묶음" + top member ai_summary[:200]) - importance_score country별 0~1 normalize + raw_weight_sum 별도 보존, max(score, 0.01) floor - per-call timeout 25s + pipeline hard cap 600s - DELETE+INSERT idempotent (UNIQUE digest_date), AIClient._call_chat 직접 호출 (client.py 수정 없음) 신규: - migrations/101_global_digests.sql (2테이블 정규화) - app/models/digest.py (GlobalDigest + DigestTopic ORM) - app/services/digest/{loader,clustering,selection,summarizer,pipeline}.py - app/workers/digest_worker.py (PIPELINE_HARD_CAP + CLI 진입점) - app/api/digest.py (/latest, ?date\|country, /regenerate, inline Pydantic) - app/prompts/digest_topic.txt (JSON-only + 절대 금지 블록) main.py 4줄: import 2 + scheduler add_job 1 + include_router 1. plan: ~/.claude/plans/quiet-herding-tome.md	2026-04-09 07:45:11 +09:00
Hyungi Ahn	22117a2a6d	feat(search): Phase 1.2-AB — migration 016 + trigram retrieval migration 016: documents FTS 확장 + trigram 인덱스 (1.5초 빌드) - idx_documents_fts_full — title+ai_tags+ai_summary+user_note+extracted_text 통합 FTS - idx_documents_title_trgm — title 단독 trigram - idx_documents_extracted_text_trgm — 본문 trigram (NULL 제외) - idx_documents_ai_summary_trgm — AI 요약 trigram - CONCURRENTLY 불필요 (765 docs / 6.5MB) retrieval_service.search_text: ILIKE 완전 제거 → trigram % + similarity() - WHERE: title %, ai_summary %, FTS @@ (모두 인덱스 활용) - ORDER BY: 5컬럼 similarity 가중 합산 + ts_rank * 2.0 - 가중치 그대로 (title 3.0 / tags 2.5 / note 2.0 / summary 1.5 / extracted 1.0) - threshold default 0.3 (필요 시 set_limit으로 조정) 목표: text_ms 470ms → 100~200ms (ILIKE 풀스캔 제거 효과)	2026-04-07 14:36:22 +09:00
Hyungi Ahn	378fbc7845	feat(chunk): Phase 0.1 chunk 인덱싱 — ORM/worker/migration 정리 GPU 서버에 untracked로만 존재하던 Phase 0.1 코드를 정식 commit: - app/models/chunk.py — DocumentChunk ORM (country/source/domain 메타 포함) - app/workers/chunk_worker.py — 6가지 chunking 전략 (legal/news/markdown/email/long_pdf/default) - migrations/014_document_chunks.sql — pgvector + FTS + trigram 인덱스 - app/models/queue.py — ProcessingQueue enum에 'chunk' stage 추가 - app/workers/queue_consumer.py — chunk stage 등록, classify→[embed,chunk] 자동 연결 Phase 1 reranker 통합 작업의 전제 조건. document_chunks 테이블 기반 retrieval에 사용.	2026-04-07 13:26:37 +09:00
Hyungi Ahn	f005922483	feat(search): Phase 0.3 검색 실패 자동 로깅 검색 실패 케이스를 자동 수집해 gold dataset 시드로 활용. wiggly-weaving-puppy 플랜 Phase 0.3 산출물. 자동 수집 트리거 (3가지): - result_count == 0 → no_result - confidence < 0.5 → low_confidence - 60초 내 동일 사용자 재쿼리 → user_reformulated (이전 쿼리 기록) confidence는 Phase 0.3 휴리스틱 (top score + match_reason). Phase 2 QueryAnalyzer 도입 후 LLM 기반으로 교체 예정. 구현: - migrations/015_search_failure_logs.sql: 테이블 + 3개 인덱스 - app/models/search_failure.py: ORM - app/services/search_telemetry.py: confidence 계산 + recent 트래커 + INSERT - app/api/search.py: BackgroundTasks로 dispatch (응답 latency 영향 X) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 08:29:12 +09:00
Hyungi Ahn	7ca3abf17c	feat: 뉴스 전용 페이지 + 분류 격리 + 읽음 상태 - /news 전용 페이지: 신문사 필터, 읽지않음 필터, 시간순 리스트, 미리보기 - 뉴스 분류 격리: ai_domain='News', classify 제거, embed만 등록 - is_read: 클릭 시 자동 읽음, 전체 읽음 API - documents 목록에서 뉴스 제외 (source_channel != 'news') - nav에 뉴스 링크 추가 - GET /api/news/articles, POST /api/news/mark-all-read Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 14:16:00 +09:00
Hyungi Ahn	a6c19ef76c	feat: 뉴스 자동 수집 시스템 — 6개국 신문 RSS/API - news_sources 테이블 (소스 관리, UI 동적 제어) - news_collector 워커: RSS(feedparser) + NYT API - 중복 체크: hash(title+date+source) + URL normalize - category 표준화, summary HTML 정제, timezone UTC - 30일 이내만 embed, source별 try/catch - News API: 소스 CRUD + 수동 수집 트리거 - APScheduler: 6시간 간격 자동 수집 - 대상: 경향/아사히/NYT/르몽드/신화/슈피겔 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 13:38:07 +09:00
Hyungi Ahn	bf8efd1cd3	feat: 임베딩 모델 변경 — nomic-embed-text → bge-m3 (1024차원, 다국어) - config.yaml: embedding model → bge-m3 - document.py: Vector(768) → Vector(1024) - embed_worker.py: 모델 버전 업데이트 - migration 011: 벡터 컬럼 재생성 (기존 임베딩 초기화) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 12:49:45 +09:00
Hyungi Ahn	24142ea605	fix: Codex 리뷰 5건 수정 (critical 1 + high 4) 1. [critical] config.yaml → settings 객체에서 taxonomy 로드 (import crash 방지) 2. [high] ODF 변환: file_path 유지, derived_path 별도 필드 (무한 중복 방지) 3. [high] 법령 분할: 첫 장 이전 조문을 "서문"으로 보존 4. [high] Inbox: review_status 필드 분리 (pending/approved/rejected) 5. [high] 삭제: soft-delete (deleted_at) + worker 방어 + active_documents 뷰 - 모든 조회에 deleted_at IS NULL 일관 적용 - queue_consumer: row 없으면 gracefully skip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 07:15:13 +09:00
Hyungi Ahn	6d73e7ee12	feat: 분류 체계 전면 개편 — taxonomy + document_type + confidence - config.yaml: 6개 domain × 3단계 taxonomy + 13개 document_types 정의 - classify.txt: 영문 프롬프트, taxonomy 경로 기반 분류 + 분류 규칙 주입 - classify_worker: taxonomy 검증, confidence 기반 분류, document_type 저장 - migration 008: document_type, importance, ai_confidence 컬럼 - API: DocumentResponse에 document_type, importance, ai_confidence 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:32:20 +09:00

1 2

58 Commits