hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	55e39818ec	feat(briefing): register 05:10 KST APScheduler cron 매일 KST 05:10 morning_briefing_run 자동 실행. scheduler timezone=Asia/Seoul 이라 hour=5 minute=10 만 명시. Phase 4 04:00 cron 종료 후 70분 buffer + MLX semaphore 충돌 회피.	2026-05-12 14:54:20 +09:00
Hyungi Ahn	431d4fe010	feat(briefing): add morning briefing schema + services + api (historical off) 야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석 1페이지 카드. Phase 4 Global Digest 와 코드/로직/테이블 분리, 알고리즘만 services/clustering_common 공유. Backend 신규: - migrations/255_morning_briefings.sql: morning_briefings + briefing_topics (briefing_date UNIQUE, UNIQUE(briefing_id,topic_rank), FK CASCADE, historical_* 3컬럼 nullable, cluster_members JSONB, country_perspectives JSONB, status 4-state success\|partial\|failed\|empty) - app/models/briefing.py: SQLAlchemy ORM - app/services/briefing/loader.py: KST 5h 윈도우 + news_sources prefix fallback (Phase 4 패턴 미러) + historical candidate pool 로더 - app/services/briefing/clustering.py: cluster_global topic-first (LAMBDA=ln(2)/2h, MIN_COUNTRIES_PER_TOPIC=2, MAX_TOPICS=7) - app/services/briefing/comparator.py: call_primary 26B + JSON envelope sanitize (cap perspectives 10 / divergences 3 / convergences 2 / quotes 5) + fallback row 고정 형태 + retrieve_historical cosine top-K - app/services/briefing/pipeline.py: load→cluster→select(K=7,λ=0.6) →historical→compare→status 4-state→delete+insert transaction - app/workers/briefing_worker.py: APScheduler/수동 호출 공용 진입점, 600s hard cap - app/prompts/briefing_comparative.txt: 한국어 비교 분석 JSON 프롬프트, {articles_block} + {historical_block} 2섹션, 인용 금지 라벨 - app/api/briefing.py: GET /latest, GET ?date=, POST /regenerate?date= (admin, sync delete+insert tx, regenerated:true) Backend 수정: - app/main.py: briefing_router 등록 (/api/briefing prefix). scheduler 등록은 PR-3 에서. - app/services/digest/selection.py: select_for_llm 매개변수화 (K, λ caller 주입). Phase 4 동작은 default 값으로 보존. Historical 정책: - BRIEFING_HISTORICAL_ENABLED env flag, default off. - flag off → historical_* 컬럼 모두 NULL, prompt {historical_block} 빈 라벨, retrieval 호출 안 함. - flag on (PR-1b 에서 enable) → cluster centroid 와 과거 30일 doc embedding cosine top-K 5 (sim≥0.70), prompt 에 주입. Country canonical (실측 확인 후): - documents.country 컬럼 부재 확정 - document_chunks.country 매칭률 0% (chunks 자체가 뉴스에 안 만들어짐) - 유일 country 신호 = news_sources prefix 매핑 (Phase 4 와 동일) Tests: - tests/test_briefing_historical.py: 3 경로 회귀 (flag off/on with fixture/on zero match) + sanitize cap + fallback row 형태. Verification: PR-1.8 에서 GPU 컨테이너 pytest + 수동 regenerate.	2026-05-12 12:58:50 +09:00
Hyungi Ahn	9d9b3359b0	feat(events): PR-1 Events Core — schema + ORM + 최소 API 개인 운영 로그 / 일정 / 할 일 / 회고용 1차 컨테이너 도메인 신설. plan: ~/.claude/plans/beszel-tingly-sloth.md (라운드 12 v6). Schema: - enum 5종 (event_kind / event_status / event_source / event_actor / history_change_kind) - events 테이블: kind(task\|calendar_event\|activity_log) + lifecycle 7-state status - events_history: lifecycle op 자동 기록, FK RESTRICT (이력은 시점 사실) - CHECK: calendar_event → start_at NOT NULL / activity_log → started_at\|ended_at NOT NULL - partial unique (source, source_ref) — 외부 source dedup (PR-4 활용) - partial index (active status / activity_log timeline) API: - POST /api/events (kind=activity_log shortcut: status=done + ended_at=now() default) - GET /api/events/{id} \| /api/events?kind&status&from&to&project_tag&source - PATCH /api/events/{id} (extra=forbid + 시간 필드 변경 시 reschedule history) - POST /api/events/{id}/{complete,cancel,defer,reactivate} (history 자동) - GET /api/events/today (Asia/Seoul default, deferred 는 defer_until<=now() 만) - GET /api/events/inbox \| /api/events/activity?from&to 제외 (PR-2~5 또는 백로그): - DELETE (회고 데이터 → /cancel 일관화) - log shortcut / upcoming endpoint (POST + GET ?from&to 로 흡수) - /ingest (PR-4 MailPlus forward 시 정확한 요구로 추가) - iCal export / ntfy 알림 / recurrence / 일반 edit history	2026-05-11 07:19:04 +09:00
Hyungi Ahn	6785d53d3d	feat(study): Phase 4-B v1 세션 단위 종합 분석 (자유 마크다운) Phase 4-A 가 wrong/unsure 한 문제씩 풀이 캐시. 4-B 는 세션 전체 wrong/unsure 5~30건을 묶어 200~400자 자연어 요약 1건 생성. 결과 화면 헤더 카드. 큐 인프라는 4-A study_question_jobs 와 분리 — FK 단일 의미 + 운영 SQL 명확성 + 4-A/4-B 가드/payload/재시도 정책 차이. 신규 study_quiz_session_jobs (큐) + study_quiz_session_analysis (결과 캐시 PK=session_id, UPSERT) + 전용 consumer. Backend: - migrations/233 — study_quiz_session_jobs (FK study_quiz_sessions NOT NULL, status pending/processing/completed/failed/skipped, max_attempts=2) - migrations/234 — partial unique idx (session_id) WHERE pending/processing - migrations/235 — study_quiz_session_analysis (session_id PK, summary_md, confidence, model_name, generated_at, is_stale) - models/study_quiz_session_job — ORM + enqueue_session_analysis_job() (멱등) - models/study_quiz_session_analysis — ORM (PK = session_id) - services/study/session_summary_guard — GUARD_PATTERN (정규식) + normalize_confidence() 단일 source, worker + tests 가 import 공유 - services/study/session_summary_rag — gather_session_summary_context() documents 만 (PR-3 _gather_document_evidence 재사용). evidence 없어도 호출 허용 (4-A 와 다른 정책 — 세션 기록 자체가 evidence) - services/study/session_analysis_enqueue — auto (finalize/fallback) + request_session_analysis_regenerate (manual). manual 은 wrong/unsure < 5 즉시 차단, active job 차단, 기존 analysis 있으면 is_stale=true 박기 - prompts/study_session_summary_envelope.txt — envelope JSON {summary_md, confidence}. 정량 정수만 인용 가능, 비율/추세/범위/날짜 금지 - workers/study_session_analysis_worker — terminal status 분기: · wrong/unsure < 5 → status=skipped, error_code=insufficient_attempts · question_text/outcome 부족 → skipped, evidence_missing · GUARD_PATTERN match → failed, guard_fail · 800자 hard cap + confidence normalize · timeout/parse/unknown → 재시도 후보 · UPSERT study_quiz_session_analysis ON CONFLICT DO UPDATE (PK session_id) - workers/study_session_queue_consumer — 4-A consumer 패턴 복제. BATCH_SIZE=1 + STALE_MINUTES=10. MLX gate 4-A 와 공유 (Semaphore(1)) - main.py — APScheduler add_job(consume_study_session_queue, ..., 1분 주기) - session_finalize — 끝에서 enqueue_session_analysis_auto (best-effort) - api/study_topics: · QuizSessionAnalysisOut + ai_session_analysis 응답 필드 (analysis row + 최신 job status/error_code) · GET fallback enqueue (기존 analysis 또는 active job 없으면만, non-blocking) · POST /quiz-sessions/{sid}/regenerate-summary — manual 트리거 Frontend (quiz-sessions/[sid]/+page.svelte): - 결과 헤더에 세션 요약 카드 (AI 풀이 indicator 직후, 바로 할 일 직전) - summary_md 박혔으면 markdown 렌더, 없으면 job_status / error_code 분기: · pending/processing → "AI 가 세션 분석 중" · insufficient_attempts → "오답·모르겠음 5건 미만" · evidence_missing → "자료 부족" · guard_fail → "환각 검증 차단" + 재생성 링크 - confidence='low' 배지 + is_stale "재생성 중" 배지 - 재생성 버튼 + regenerateSummary() — reason 별 toast 분기 ship gate: - tests/test_session_summary_guard_pattern.py — 허용 5 + 차단 7 케이스 + normalize_confidence 표준/비표준 검증. python3 직접 실행 패스. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:20:29 +09:00
Hyungi Ahn	e8da53490c	feat(study): Phase 4-A wrong/unsure AI 풀이 prefetch batch PR-3 의 결과 화면 [AI 해설 보기] 실시간 호출이 클릭 시 8~30초 대기. 풀이 직후 백그라운드 batch 로 미리 생성해 캐시 hit. 환각 가드는 PR-3 보다 강화 — envelope JSON {answer_choice, explanation_md, confidence} + answer_choice == correct_choice 검증 + evidence 의무. processing_queue 가 documents.id FK 라 study_questions 에 직접 재사용 불가 → 별도 study_question_jobs 테이블 + 별도 consumer. Backend: - migrations/231 — study_question_jobs CREATE TABLE (13컬럼, kind 권장값 'explanation' / 'session_summary' 예약, status pending/processing/completed/ failed/skipped, max_attempts=2) - migrations/232 — partial unique idx (qid, kind) WHERE status IN (pending, processing) — active 행 중복 차단, terminal 이력 누적 허용 - models/study_question_job — ORM + enqueue_study_question_job() 헬퍼 (on_conflict_do_nothing 멱등) - prompts/study_explanation_envelope.txt — envelope 형식 프롬프트 (answer_choice 1~4 강제, confidence high/medium/low) - workers/study_explanation_worker — terminal status 분기: · evidence 둘 다 빈 리스트 → job/question 모두 skipped (LLM 호출 X) · answer_choice != correct_choice → guard_fail / failed (재시도 X) · timeout/parse → 재시도 후보 (max_attempts=2) · catch-all except → unknown 명시 + retryable 분기 · question.ai_explanation_status='ready' 이미 박혀있으면 즉시 completed · confidence 는 job.payload 에 보존 (운영 분석) - workers/study_queue_consumer — APScheduler 1분 주기, BATCH_SIZE=1, MLX gate Semaphore(1) 공유. STALE_MINUTES=10 자체 복구 - main.py — scheduler.add_job(consume_study_queue, ..., id='study_queue_consumer') - services/study/explanation_enqueue — finalize + GET fallback 공유 헬퍼: filter_needs_explanation (study_questions status + 최신 job error_code 필터, guard_fail/evidence_missing 인 마지막 job 은 자동 재enqueue 제외) + enqueue_explanation_for_qids (max_count cap) - session_finalize — 끝에서 wrong/unsure qid prefetch enqueue (best-effort, 실패해도 finalize 자체 안 깨짐) - api/study_topics get_quiz_session — done 세션에서 backfill enqueue (max=30, non-blocking, debug 로그) 대상 조건: ai_explanation_status IN ('none', 'failed') OR ai_explanation IS NULL. stale / skipped / pending / ready 는 자동 enqueue 대상 X. stale 재생성은 PR-3 명시 [다시 생성] 또는 후속 Phase 에서. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-A) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:42:08 +09:00
Hyungi Ahn	e5982ebde4	feat(study): Phase 1 학습 루프 데이터 계층 — progress 캐시 + finalize + review API vision (풀이 → 확인 → 학습 → 복습 → 다음 풀이 가중치) 의 데이터 계층. 데이터 모델 (migrations 222~225): - study_question_progress 테이블 — user × topic × question 단위 현재 상태 캐시 - 마지막 시도: last_outcome, last_attempted_at, last_attempt_id - 검토 상태: last_reviewed_at - 복습 큐: due_at, review_stage - 패턴 분류 (derived): pattern_state, pattern_updated_at, pattern_window_attempts - 3 partial idx (due / topic_pattern / pending_review) — 탭별 빠른 조회 패턴 분류 (services/study/learning_pattern.py): - 7 분류: unattempted/unsure/chronic_wrong/regressed/recovered/stable/unstable - 윈도우 = 최근 3회 + 과거 correct/wrong 존재 여부 - chronic_wrong > regressed > recovered 우선순위 (보수적 학습) - 가드: wrong 1회만으로 regressed 안 됨 (이전 correct 이력 필요) - stable 은 3 연속 correct 부터 세션 종료 집계 (services/study/session_finalize.py): - attempts append-only 원본 보존, progress upsert 만 - 마지막 attempt 직후 finalize hook 자동 발동 - finalize 는 last_* + pattern_state 만 갱신, due_at 미진입 문제는 NULL 유지 - 이미 due_at 박힌 문제는 finalize 가 stage 갱신 (correct → +1 / wrong → 리셋) API (api/study_question_progress.py): - POST /study-topics/{tid}/questions/{qid}/review-complete → last_reviewed_at + (wrong/unsure 인 경우만) due_at 최초 부여 - GET /study-topics/{tid}/review-queue?tab=due_today\|pending_review\|chronic\|regressed\|mastered → 5 탭 paginated 조회 → pending_review 는 last_reviewed_at < last_attempted_at 까지 포함 (이전 확인완료 후 다시 wrong 잡힘) Phase 1-E (풀이 선별 알고리즘) 은 후속 commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:28:46 +09:00
Hyungi Ahn	219e233a48	feat(study): related-types DB 캐시 — HNSW 매번 재계산 제거 - migrations 220/221: study_questions 에 related_repeat/similar JSONB + 카운트/grade/computed_at/threshold_version + partial idx - 임베딩 워커: ready 처리 직후 같은 트랜잭션에서 related 계산·저장 + 같은 토픽 ready 행들의 related_computed_at=NULL invalidation - 신규 cron study_q_related_refresh (1분, batch=20) — stale 캐시 일괄 재계산 - API list_related_types: cache hit (computed_at + threshold version 일치) 시 SELECT 1번으로 응답. miss 면 즉시 계산+저장 후 응답 - update_question PATCH: 본문/exam_round 변경 시 related_computed_at=NULL - soft delete: 같은 토픽 ready 행 invalidation threshold 변경 시: related_types.THRESHOLD_VERSION 갱신 + UPDATE WHERE version != '<신>' SET computed_at=NULL 한 번이면 cron 자동 일괄 재계산. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 07:22:31 +09:00
Hyungi Ahn	9d4aa201a8	feat(study): study_questions 자동 임베딩 (PR-4) 문제 본문 + 보기 1~4 → bge-m3 1024차원. status 자체가 큐 역할 (별도 큐 테이블 없음 — ProcessingQueue 인프라 영향 0). APScheduler 1분 cron 이 status in {none, failed, stale} 행을 batch=10 처리. 새 문제는 default 'none' 으로 자동 backfill. 데이터 모델 (migrations 193~194): - study_questions: embedding vector(1024), embedding_status VARCHAR(20) DEFAULT 'none' (none/pending/ready/failed/stale), embedding_updated_at, embedding_model - HNSW partial index (vector_cosine_ops) WHERE deleted_at IS NULL AND embedding IS NOT NULL — bge-m3 cosine 기준, documents.embedding (ivfflat) 과 ops 일관 재계산 트리거: question_text / choice_1~4 변경 시 ready→stale 자동. correct_choice / explanation / subject / scope 변경은 재계산 안 함 (의미 검색에 영향 없음). 워커 (workers/study_question_embed_worker.py): - race-safe pending 마킹 (조건부 UPDATE WHERE status IN none/failed/stale) - AIClient.embed(text) bge-m3 호출, 15s timeout - 실패 시 status='failed', 직전 embedding 보존, 다음 cron 틱에 재시도 - 본문 = "문제: ...\n보기:\n1. ...\n2. ...\n3. ...\n4. ..." (subject/scope 의도 제외 — 분류명이 의미 검색 노이즈) 후속 PR 예정: 비슷한 문제 검색 UI / 중복 입력 감지 / RAG 정확도 향상 / 오답 클러스터링. 본 PR 은 임베딩 저장·재계산·backfill 까지만. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:54:02 +09:00
Hyungi Ahn	4b7156061e	feat(study): 문제은행 + 복습모드 (study_questions) study_topic 워크스페이스에 4지선다 문제은행 자산 트랙 추가. 기사시험 필기 대비 시나리오 — 빠른 반복 입력 + 과목별 균등 추출 복습 + 정오답 누적. 데이터 모델 (migrations 186~190): - study_questions: study_topic 1:N, soft delete, is_active 토글, correct_choice SMALLINT CHECK 1~4 - study_question_attempts: 답 제출 1행 누적. study_question_id FK는 ON DELETE RESTRICT (이력 보존 원칙 — hard delete 실수로 풀이 기록 소실 차단) 설계 원칙: - 문제 삭제는 API 에서 soft delete only. attempts FK RESTRICT 로 DB 레벨도 보호 - correct_choice 변경 시 기존 attempts.is_correct 재계산 안 함 (시점 사실 보존) - 복습 default = 과목별 target_per_subject(20) 무작위 균등 추출. 한 과목이 부족하면 가용한 만큼만 - wrong_only=true 정의 = 가장 최근 attempt 가 오답인 문제 (latest-wrong, ever-wrong 아님) - 출제 응답에서 정답·해설 비공개. 답 제출 시점에만 노출 - subject/scope 강한 enum 미사용 (자유 텍스트, 자동완성은 후속) API: /api/study-topics/{id}/questions, /review/questions, /api/study-questions/{id}, /attempt. 통합뷰(/study-topics/{id}) 응답에 sections.questions / stats.question_count 추가. 기존 question_set_count 는 후속 PR(회차/모의고사 묶음)용으로 보존. 프론트: /study/topics/[id]에 문제 섹션 + "새 문제"/"복습 시작" 진입. /questions/new (저장 후 계속 입력 + sessionStorage persistent), /questions/[qid]/edit (정답 변경 시 attempts 재계산 안 됨 안내 배너), /review (시작 옵션 → 풀이 → 마지막 요약). 후속 PR 예정: 오답노트/취약 과목 리포트, AI 해설/클러스터링, spaced repetition, 이미지 OCR 입력, CSV import, study_question_sets 묶음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:00:37 +09:00
Hyungi Ahn	63ed4d81e5	feat(study): study_topics 학습 워크스페이스 컨테이너 도입 필기 세션과 자료(library document)를 한 학습 주제(예: 가스기사) 아래로 묶는 1차 컨테이너. 향후 단어장/오디오/문제세트 등 학습 자산이 같은 묶음으로 들어올 수 있도록 응답 구조(sections + stats)를 dict 기반으로 설계. 데이터 모델 (migrations 179~185): - study_topics: user_id × name partial unique (active 행만), soft delete - study_sessions.study_topic_id: 1:N nullable FK (ON DELETE SET NULL) - study_topic_documents: 자료 N:M 매핑 (user_id 반정규화로 권한 격리) 설계 원칙: - documents.category(자료실 UI 축)와 직교 → 자료실 facet/카테고리 미터치 - StudySession.certification/subject/topic 보존 (세부 메타로 계속 사용) - study_type은 느슨한 분류 (강한 enum 미사용, jlpt_n3 등 확장 여지) - polymorphic study_topic_items 영구 금지 → 자산 타입별 조인 테이블 추가 방식 API: /api/study-topics CRUD + /by-document/{id} + 자료/세션 매핑 엔드포인트. 프론트: /study/topics 목록 + /study/topics/[id] 통합 뷰(필기·자료 두 트랙) + write 폼에 워크스페이스 드롭다운 + study hub 진입 카드. 후속 PR-2 어학 UX, PR-3 오디오 자산, PR-4 AI retrieval scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:06:37 +09:00
Hyungi Ahn	24bd363beb	feat(library): 자료별 손글씨 노트 (PR-D) — iPad 학습 시 옆에 필기 자료실 자료 detail 에 "필기" 버튼 → 본문 아래에 HandwriteCanvas 띄움. 자료당 사용자별 1개 캔버스 (UNIQUE user×document). upsert 방식. Backend: - migrations 177~178: document_notes (user_id, document_id, strokes_json, canvas 크기) + UNIQUE(user_id, document_id) + 인덱스 - app/models/document_note.py: DocumentNote ORM - app/api/document_notes.py: · GET /api/documents/{id}/note — 단건 조회 (없으면 strokes_json=null) · PUT /api/documents/{id}/note — upsert (PostgreSQL ON CONFLICT) · DELETE /api/documents/{id}/note · ownership: WHERE user_id=current_user.id (single-user 가정) - app/main.py: document_notes_router 등록 (/api/documents prefix) Frontend: - routes/documents/[id]/+page.svelte: · 자료실 자료 (category='library') 의 affordance row 에 "필기" 토글 추가 · 클릭 시 GET /note 로 strokes 로드 → HandwriteCanvas 본문 카드 아래 마운트 · 캔버스 onChange → PUT /note 자동 저장 (HandwriteCanvas 내부 3초 idle 디바운스 활용) · 60vh / min-h-[400px] 분할. 모바일에선 본문 아래 스크롤로 자연스럽게. - HandwriteCanvas 재사용 — sessionId prop 에 documentId 전달. localStorage 키도 그대로 사용 (자료별로 namespacing).	2026-04-27 12:38:03 +09:00
Hyungi Ahn	49d8f68986	feat(library): 자료실 회독 카운트 추적 (PR-A backend) 자료실 자료를 사용자가 명시적으로 "1회독 완료" 클릭 시 +1 누적. detail 진입 자동 카운트 ❌. append-only 로그. 데이터: - migrations 174~176: document_reads 테이블 + 인덱스 2개 (단일 statement 분할) ORM: - app/models/document_read.py: DocumentRead (user_id, document_id, read_at) API (app/api/document_reads.py, /api/documents prefix): - POST /api/documents/{id}/read — 회독 +1 - GET /api/documents/{id}/read-stats — {read_count, last_read_at} - DELETE /api/documents/{id}/read/last — 현재 사용자의 그 문서 마지막 1건만 · ownership: WHERE user_id=current_user.id AND document_id=:doc_id · documents 에 user_id 부재 (single-user). multi-user 전환 시 ownership check 추가 필요 — 코드 주석 명시. 응답 확장: - DocumentResponse: read_count(default 0), last_read_at(default None) - /api/documents/library: 페이지 N건 한정 LEFT JOIN 으로 read 통계 매핑 (N+1 회피) - /api/library/tree CategoryTreeNode: unread_count 추가 · 기존 path_docs 가 ancestor 누적 구조라 그대로 활용 — 하위 경로 합산 자동 규칙 (사용자 명시 — 변경 금지): · 같은 날 여러 번 클릭 → 각각 별개 회독 · 실수 클릭 취소 = DELETE /read/last · documents 에 read_count 컬럼 추가 ❌, 로그 기반 COUNT(*) 만 plan: ~/.claude/plans/scalable-chasing-stonebraker.md 브랜치: feature/library-reads (손글씨 트랙과 분리)	2026-04-27 12:08:36 +09:00
Hyungi Ahn	7804f22dce	feat(study): study_sessions backend (Phase 1) — 자격증/어학 일반 학습 세션 + assets 연결 iPad 손글씨 필사 / 모바일 암기노트 / 모바일 퀴즈가 같은 데이터를 공유하는 일반 학습 세션 backend. study_type 으로 certification/language 분기. - migrations/164: study_sessions + study_session_assets DDL + 5 partial indexes - app/models/study_session.py: StudySession + StudySessionAsset ORM (cascade) - app/api/study_sessions.py: CRUD + snapshot(PNG) + assets + filter + groups - ownership: 모든 endpoint user_id 검증, mismatch 도 404 (정보 누설 방지) - 409 중복: UNIQUE(session, document, asset_type, role) 사전 SELECT + IntegrityError 폴백 - enum 422: study_type / mode / asset_type / role / review_state / order - filter: 11개 (study_type, certification, language_code, learning_level, subject, topic, review_state, document_id, asset_type, mode, due_before) - groups: certification 트리 + language 트리 + has_audio/has_video - snapshot: documents.py atomic rename + error_code 패턴 차용 - app/main.py: /api/study-sessions router 등록 plan: ~/.claude/plans/scalable-chasing-stonebraker.md Phase 1 미사용 필드 (review_state/quiz/ocr/ai_summary/prompt) 는 NULL 허용, 자동 로직은 Phase 2~4 별도 PR 에서 활성.	2026-04-27 08:15:28 +09:00
Hyungi Ahn	a95294ff42	feat(ops): 야간 auto tier 백필 스케줄러 (PR-B 레거시 해소) 6720건 레거시 문서를 야간에 자동으로 tier triage + deep_summary 처리. app/workers/tier_backfill.py (신규): - APScheduler 30분 주기 트리거. KST 00:00~06:00 시간대만 실제 enqueue. - safety > law > manual 우선순위 25건씩 classify 큐 재투입. - classify 큐 40건 이상 쌓여있으면 MLX 부하 보호로 skip. - drive_sync / memo / news 는 제외 (plan 스코프 밖 또는 가치 낮음). - off-switch: settings.ai.tier_backfill.enabled = false 로 전면 중단 가능. app/main.py lifespan: - scheduler.add_job(tier_backfill_run, interval=30min, id='tier_backfill'). - AsyncIOScheduler 이미 timezone='Asia/Seoul' 로 설정돼 tier_backfill 내부의 zoneinfo('Asia/Seoul') 와 일치. 수치 예상: 야간 6시간 × 2회/시간 × 25건 = 150건/야간. 6720 / 150 = 약 45일이면 전체 레거시 소화. MLX 부하 제어가 가장 강한 관심 — R2 backlog guard 와 중복 안전장치. 운영 중 과부하 감지 시: config.yaml 에 `ai.tier_backfill.enabled: false` 만 넣으면 즉시 정지 (재시작 없이 스케줄러가 매번 체크). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:28:28 +09:00
Hyungi Ahn	8f25d396df	feat(upload): §4-독립 — error_code 체계 + .uploading orphan cleanup + 진행률/abort UX plan: ~/.claude/plans/luminous-sprouting-hamster.md §4 (1GB/stt/dashboard 외 독립 항목) backend: - _upload_error(status, code, msg) 헬퍼 정의 (§3 가 호출만 추가했던 누락 수정). detail = {error_code, message} — 프론트가 error_code 로 분기. - upload_document 의 모든 HTTPException 을 _upload_error 로 전환: body_too_large / invalid_input / empty_file / unsupported_codec / internal - ClientDisconnect → 499 network_abort + 임시파일 정리. asyncio.TimeoutError → 408 upload_timeout. - 쓰기 중 .uploading 임시명 → 완료 후 staging.replace(target) atomic rename. → 프로세스 크래시 잔존물은 cleanup_orphan_uploads 가 수거. - file_watcher SKIP_EXTENSIONS 에 .uploading 추가 (오해 픽업 방지). cleanup scheduler: - workers/upload_cleanup.py 신규. 10분 주기로 Inbox 하위 *.uploading 중 mtime > orphan_max_age_sec(3600) 인 파일 삭제. - 최근 3회 (≈30분) 누적 삭제 수가 cleanup_warn_threshold(10) 이상이면 WARNING 로그. in-memory deque (재시작 시 리셋) — 집요한 이슈만 잡는 목적. - core/config.py UploadConfig 에 두 임계치 필드 (defaults — config.yaml override 무관). frontend: - api.ts: ApiError 에 optional errorCode/errorMessage 필드 (detail string 유지로 기존 5+ 소비자 호환). parseDetail() 가 {error_code, message} 객체 응답을 풀어 정규화. uploadFile(path, formData, {signal, onProgress}) XHR 헬퍼 신규 (fetch() 가 upload progress 미지원이라 XHR). 401 refresh 1회 정책 동일. - UploadDropzone.svelte 재작성: 진행률 바, 파일별/전체 abort 버튼, 페이지 이탈 beforeunload 경고, errorCode 별 토스트 메시지 분기 (7 코드 — body_too_large / upload_timeout / network_abort / empty_file / invalid_input / unsupported_codec / internal). 컴포넌트 unmount 시 진행 중 업로드 abort. 보류: - max_bytes 1GB 상향 + Caddyfile 1100MB (별도 결정으로 100MB 유지) - /dashboard 카테고리 카드 (별도 plan) - docs/categories.md (§1-3 정의 안착 후) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:57:02 +09:00
Hyungi Ahn	1e2c004dd4	feat(media): §3 audio STT + video 재생 인프라 plan: ~/.claude/plans/luminous-sprouting-hamster.md §3 스키마: - migrations/147_audio_segments_table.sql: audio_segments (STT 타임스탬프 세그먼트) - migrations/148_audio_segments_idx.sql: (document_id, start_s) idx - migrations/149_document_media_cols.sql: documents.thumbnail_path + needs_conversion - migrations/150_queue_stage_stt.sql: process_stage += 'stt' - migrations/151_queue_stage_thumbnail.sql: process_stage += 'thumbnail' - app/models/audio_segment.py, document.py (thumbnail_path/needs_conversion) 서비스: - services/stt/{Dockerfile, requirements.txt, server.py} — faster-whisper large-v3 GPU 컨테이너. /transcribe (filePath/langs/beamSize) + /health + /ready (cuda device_count + model_loaded). NFC/NFD 경로 resolver (OCR 교훈). - docker-compose.yml: stt-service 추가 (GPU 1 예약, :3300, NAS ro mount, stt_models volume, start_period 300s), fastapi env 에 STT_ENDPOINT. 파이프라인 (의존 §1 category): - app/workers/stt_worker.py 신규: stage='stt' pickup → STT_ENDPOINT 호출 → extracted_text + audio_segments 저장. Timeout 30분. - app/workers/thumbnail_worker.py 신규: ffmpeg 50% 지점 1장 → PKM/Videos/.thumbs/{id}.jpg + thumbnail_path 세팅. needs_conversion=true 는 skip. - app/workers/file_watcher.py 확장: PKM/{Inbox, Recordings, Videos} 스캔. 확장자→category, audio→stage=stt, video .mp4/.webm→ stage=thumbnail, video .mov/.mkv/.avi→needs_conversion=true + stage 없음. settings.roon_library_path prefix skip. - app/workers/queue_consumer.py 확장: stt + thumbnail workers 등록, BATCH_SIZE(stt=1, thumbnail=3), next_stages 에 stt→[classify] 추가 (audio 는 extract 건너뜀). - app/Dockerfile: ffmpeg 추가 (썸네일 subprocess 용). API (의존 §1): - /api/audio/{id}/segments — AudioSegment ORDER BY start_s - /api/video/{id}/thumbnail — thumbnail_path FileResponse (쿼리 토큰) - /api/documents/{id}/file: media_types 에 audio/video mime 포함 (§2 커밋에 이미 포함). Starlette FileResponse 가 Range 자동. - upload_document: .mov/.mkv/.avi 웹 업로드 거부 (error_code unsupported_codec). NAS 드롭은 file_watcher 가 quarantine 수용. 프론트: - AudioPlayer.svelte: HTML5 audio + 전사 세그먼트 sticky 패널 + 줄 클릭 seek. activeIdx 하이라이트. - VideoPlayer.svelte: HTML5 video direct play + needs_conversion 안내 카드. poster 는 thumbnail endpoint. - /audio (목록 grid) + /audio/[id] (플레이어) - /video (썸네일 grid + 변환 필요 배지) + /video/[id] (플레이어) - Sidebar.svelte: Mic/Film 아이콘 + audio/video 네비 활성, count 배지 (§2 /stats/category-counts 재사용). 설정: - app/core/config.py: stt_endpoint + roon_library_path. DoD 배포 후 smoke: /ready cuda:true, 회의 mp3 transcribe, audio extract 없이 classify 진행(queue 회귀), /audio 재생, .mp4 재생, .mov 웹 400, .mov NAS quarantine, Sidebar 네비 + count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 06:47:36 +09:00
Hyungi Ahn	8622a97e7d	feat(upload): backend-owned upload size contract + public config 엔드포인트 업로드 크기 한도를 프론트 하드코딩이 아닌 서버 config 의 단일 진실 공급원 으로 이동. 프론트는 Phase B 후속 커밋에서 이 값을 읽어 pre-check UX 에 사용. - config.yaml 에 `upload` 블록 추가: * max_bytes (authoritative policy) * content_length_slack_ratio (multipart 오버헤드 여유) * stream_chunk_bytes (스트리밍 IO 단위) - app/core/config.py 에 UploadConfig pydantic 모델 + Settings.upload 필드 - app/api/config.py 신규 — GET /api/config/public 엔드포인트 * 민감정보 없는 프론트 필수 설정만 노출 * 범용 서버 설정 공개 창구로 확대 금지 (docstring 명시) - /api/config 를 setup redirect bypass 에 추가 (초기 setup 전에도 조회 가능) 이 커밋 자체는 기존 upload 동작에 영향 없음. 후속 커밋에서 enforcement + 프론트 구독을 연결. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:02:19 +09:00
Hyungi Ahn	964d4ffc67	feat(library): 자료실 분류 체계 독립 관리 Phase 1 library_categories 테이블 추가로 빈 카테고리 생성 가능. CRUD API (생성/leaf rename/leaf delete) + 트리 머지 엔드포인트. 사이드바 트리에 컨텍스트 메뉴 (추가/이름변경/삭제). LibraryPathEditor를 카테고리 기반 flat selector로 전환. 미분류는 시스템 분류로 보호 (삭제/이름변경 불가). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:01:53 +09:00
Hyungi Ahn	b46a75758b	feat(memos): 내장 메모 기능 — 파일 없는 문서(file_type='note') Document Server에 Memos 앱 대체 기능 내장. 메모를 documents 테이블의 file_type='note' 레코드로 관리하여 기존 AI 파이프라인(classify/embed/ chunk/search/ask) 재활용. Backend: - migration 105: source_channel 'memo', file_path NULL 허용, user_tags/pinned/ask_includable 컬럼, 메모 인덱스 - api/memos.py: CRUD 7개 엔드포인트 + #태그 파싱 + stale AI 초기화 + 큐 pending 중복 방지 - queue_consumer: note extract/preview skip - documents API: file_path NULL 가드, 목록에서 메모 제외 - search /ask: ask_includable=false 문서 evidence 제외 Frontend: - /memos 타임라인 페이지 (빠른 입력 + 피드 + 인라인 편집 + 태그 필터) - QuickMemoButton FAB (Ctrl+M, 모든 페이지) - Sidebar 메모 링크 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 16:00:00 +09:00
Hyungi Ahn	75a1919342	feat(digest): Phase 4 Global News Digest (cluster-level batch summarization) 7일 rolling window 뉴스를 country × topic 2-level로 묶어 매일 04:00 KST 배치 생성. search 파이프라인 미사용. documents → clustering → cluster-level LLM summarization → digest. 핵심 결정: - adaptive threshold (0.75/0.78/0.80) + EMA centroid (α=0.7) + time-decay (λ=ln(2)/3) - min_articles=3, max_topics=10/country, top-5 MMR diversity, ai_summary[:300] truncate - cluster-level LLM only, drop금지 fallback (topic_label="주요 뉴스 묶음" + top member ai_summary[:200]) - importance_score country별 0~1 normalize + raw_weight_sum 별도 보존, max(score, 0.01) floor - per-call timeout 25s + pipeline hard cap 600s - DELETE+INSERT idempotent (UNIQUE digest_date), AIClient._call_chat 직접 호출 (client.py 수정 없음) 신규: - migrations/101_global_digests.sql (2테이블 정규화) - app/models/digest.py (GlobalDigest + DigestTopic ORM) - app/services/digest/{loader,clustering,selection,summarizer,pipeline}.py - app/workers/digest_worker.py (PIPELINE_HARD_CAP + CLI 진입점) - app/api/digest.py (/latest, ?date\|country, /regenerate, inline Pydantic) - app/prompts/digest_topic.txt (JSON-only + 절대 금지 블록) main.py 4줄: import 2 + scheduler add_job 1 + include_router 1. plan: ~/.claude/plans/quiet-herding-tome.md	2026-04-09 07:45:11 +09:00
Hyungi Ahn	c81b728ddf	refactor(search): Phase 2.1 QueryAnalyzer를 async-only 구조로 전환 ## 철학 수정 (실측 기반) gemma-4-26b-a4b-it-8bit MLX 실측: - full query_analyze.txt (prompt_tok=2406) → 10.5초 - max_tokens 축소 무효 (모델 자연 EOS 조기 종료) - 쿼리 길이 영향 거의 없음 (프롬프트 자체가 지배) → 800ms timeout 가정은 13배 초과. 동기 호출 완전히 불가능. 따라서 QueryAnalyzer는 "즉시 실행하는 기능" → "미리 준비해두는 기능"으로 포지셔닝 변경. retrieval 경로에서 analyzer 동기 호출 금지. ## 구조 ``` query → retrieval (항상 즉시) ↘ trigger_background_analysis (fire-and-forget) → analyze() [5초+] → cache 저장 다음 호출 (동일 쿼리) → get_cached() 히트 → Phase 2 파이프라인 활성화 ``` ## 변경 사항 ### app/prompts/query_analyze.txt - 5971 chars → 2403 chars (40%) - 예시 4개 → 1개, 규칙 설명 축약 - 목표 prompt_tok 2406 → ~600 (1/4) ### app/services/search/query_analyzer.py - LLM_TIMEOUT_MS 800 → 5000 (background이므로 여유 OK) - PROMPT_VERSION v1 → v2 (cache auto-invalidate) - get_cached / set_cached 유지 — retrieval 경로 O(1) 조회 - trigger_background_analysis(query) 신규 — 동기 함수, 즉시 반환, task 생성 - _PENDING set으로 task 참조 유지 (premature GC 방지) - _INFLIGHT set으로 동일 쿼리 중복 실행 방지 - prewarm_analyzer() 신규 — startup에서 15~20 쿼리 미리 분석 - DEFAULT_PREWARM_QUERIES: 평가셋 fixed 7 + 법령 3 + 뉴스 2 + 실무 3 ### app/api/search.py - 기존 sync analyzer 호출 완전 제거 - analyze=True → get_cached(q) 조회만 O(1) - hit: query_analysis 활용 (Phase 2.2/2.3 파이프라인 조건부 활성화) - miss: trigger_background_analysis(q) + 기존 경로 그대로 - timing["analyze_ms"] 제거 (경로에 LLM 호출 없음) - notes에 analyzer cache_hit/cache_miss 상태 기록 - debug.query_analysis는 cache hit 시에만 채워짐 ### app/main.py - lifespan startup에 prewarm_analyzer() background task 추가 - 논블로킹 — 앱 시작 막지 않음 - delay_between=0.5로 MLX 부하 완화 ## 기대 효과 - cold 요청 latency: 기존 Phase 1.3 그대로 (회귀 0) - warm 요청 + prewarmed: cache hit → query_analysis 활용 - 예상 cache hit rate: 초기 70~80% (prewarm) + 사용 누적 - Phase 2.2/2.3 multilingual/filter 기능은 cache hit 시에만 동작 ## 참조 - memory: feedback_analyzer_async_only.md (영구 룰 저장) - plan: ~/.claude/plans/zesty-painting-kahan.md ("철학 수정" 섹션) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 14:47:09 +09:00
Hyungi Ahn	a6c19ef76c	feat: 뉴스 자동 수집 시스템 — 6개국 신문 RSS/API - news_sources 테이블 (소스 관리, UI 동적 제어) - news_collector 워커: RSS(feedparser) + NYT API - 중복 체크: hash(title+date+source) + URL normalize - category 표준화, summary HTML 정제, timezone UTC - 30일 이내만 embed, source별 try/catch - News API: 소스 CRUD + 수동 수집 트리거 - APScheduler: 6시간 간격 자동 수집 - 대상: 경향/아사히/NYT/르몽드/신화/슈피겔 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 13:38:07 +09:00
Hyungi Ahn	87683ca000	security: NAS 마운트 검증 + AI 서비스 포트 제한 + deploy 문서 갱신 - NAS fail-fast: 시작 시 /documents/PKM 존재 확인, NFS 미마운트 방지 - ollama/ai-gateway 포트를 127.0.0.1로 제한 (외부 무인증 접근 차단) - deploy.md: Caddy HTTPS 자동발급 → 앞단 프록시 HTTPS 종료 구조 반영 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 08:25:07 +09:00
Hyungi Ahn	cfa95ff031	feat: implement Phase 4 SvelteKit frontend + backend enhancements Backend: - Add dashboard API (today stats, inbox count, law alerts, pipeline status) - Add /api/documents/tree endpoint for sidebar domain/sub_group tree - Migrate auth to HttpOnly cookie for refresh token (XSS defense) - Add /api/auth/logout endpoint (cookie cleanup) - Register dashboard router in main.py Frontend (SvelteKit + Tailwind CSS v4): - api.ts: fetch wrapper with refresh queue pattern, 401 single retry, forced logout on refresh failure - Auth store: login/logout/refresh with memory-based access token - UI store: toast system, sidebar state - Login page with TOTP support - Dashboard with 4 stat widgets + recent documents - Document list with hybrid search (debounce, URL query state, mode select) - Document detail with format-aware viewer (markdown/PDF/HWP/Synology/fallback) - Metadata panel (AI summary, tags, processing history) - Inbox triage UI (batch select, confirm dialog, domain override) - Settings page (password change, TOTP status) Infrastructure: - Enable frontend service in docker-compose - Caddy path routing (/api/* → fastapi, / → frontend) + gzip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 06:46:19 +09:00
Hyungi Ahn	31d5498f8d	feat: implement Phase 3 automation workers - Add automation_state table for incremental sync (last UID, last check) - Add law_monitor worker: 국가법령정보센터 API → NAS/DB/CalDAV VTODO (LAW_OC 승인 대기 중, 코드 완성) - Add mailplus_archive worker: IMAP(993) → .eml NAS save + DB + SMTP notification (imaplib via asyncio.to_thread, timeout=30) - Add daily_digest worker: PostgreSQL/pipeline stats → Markdown + SMTP (documents, law changes, email, queue errors, inbox backlog) - Add CalDAV VTODO helper and SMTP email helper to core/utils.py - Wire 3 cron jobs in APScheduler (law@07:00, mail@07:00+18:00, digest@20:00) with timezone=Asia/Seoul Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 15:24:50 +09:00
Hyungi Ahn	4b695332b9	feat: implement Phase 2 core features - Add document CRUD API (list/get/upload/update/delete with auth) - Upload saves to Inbox + auto-enqueues processing pipeline - Delete defaults to DB-only, explicit flag for file deletion - Add hybrid search API (FTS 0.4 + trigram 0.2 + vector 0.4 weighted) - Modes: fts, trgm, vector, hybrid (default) - Vector search gracefully degrades if GPU unavailable - Add Inbox file watcher (5min interval, new file + hash change detection) - Register documents/search routers and file_watcher scheduler in main.py - Add IVFFLAT vector index migration (lists=50, with tuning guide) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:49:12 +09:00
Hyungi Ahn	299fac3904	feat: implement Phase 1 data pipeline and migration - Implement kordoc /parse endpoint (HWP/HWPX/PDF via kordoc lib, text files direct read, images flagged for OCR) - Add queue consumer with APScheduler (1min interval, stage chaining extract→classify→embed, stale item recovery, retry logic) - Add extract worker (kordoc HTTP call + direct text read) - Add classify worker (Qwen3.5 AI classification with think-tag stripping and robust JSON extraction from AI responses) - Add embed worker (GPU server nomic-embed-text, graceful failure) - Add DEVONthink migration script with folder mapping for 16 DBs, dry-run mode, batch commits, and idempotent file_path UNIQUE - Enhance ai/client.py with strip_thinking() and parse_json_response() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:35:36 +09:00
Hyungi Ahn	a601991f48	feat: implement Phase 0 auth system, setup wizard, and Docker config - Add users table to migration, User ORM model - Implement JWT+TOTP auth API (login, refresh, me, change-password) - Add first-run setup wizard with rate-limited admin creation, TOTP QR enrollment (secret saved only after verification), and NAS path verification — served as Jinja2 single-page HTML - Add setup redirect middleware (bypasses /health, /docs, /openapi.json) - Mount config.yaml, scripts, logs volumes in docker-compose - Route API vs frontend traffic in Caddyfile - Include admin seed script as CLI fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 13:21:45 +09:00
Hyungi Ahn	131dbd7b7c	feat: scaffold v2 project structure with Docker, FastAPI, and config 동작하는 최소 코드 수준의 v2 스캐폴딩: - docker-compose.yml: postgres, fastapi, kordoc, frontend, caddy - app/: FastAPI 백엔드 (main, core, models, ai, prompts) - services/kordoc/: Node.js 문서 파싱 마이크로서비스 - gpu-server/: AI Gateway + GPU docker-compose - frontend/: SvelteKit 기본 구조 - migrations/: PostgreSQL 초기 스키마 (documents, tasks, processing_queue) - tests/: pytest conftest 기본 설정 - config.yaml, Caddyfile, credentials.env.example 갱신 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 10:20:15 +09:00

29 Commits