hyungi_document_server

Author	SHA1	Message	Date
hyungi	1f4bbb9413	Merge pull request 'docs(readme): refresh stack/features/infra to 2026-05 reality' (#13 ) from docs/readme-refresh into main Reviewed-on: #13	2026-05-12 15:05:20 +09:00
Hyungi Ahn	6d8d207669	docs(readme): refresh stack/features/infra to 2026-05 reality - AI: Qwen3.5-35B → gemma-4 26B MLX / 4B triage / bge-m3 / TEI reranker / Surya OCR / MLX Whisper - infra: Mac mini Docker Compose → GPU 서버 메인 / Mac mini = MLX inference + STT - features: 아침 브리핑, Phase 4 Digest, library, memos, events, study, audio/video, marker - inventory authoritative 안내 (README 가 stale 진실 대신 inventory 우선) - gpu-server/ deprecated 표기	2026-05-12 15:03:52 +09:00
hyungi	49f44bba60	Merge pull request 'feat(briefing): register 05:10 KST APScheduler cron' (#12 ) from feat/morning-briefing-scheduler into main Reviewed-on: #12	2026-05-12 14:54:52 +09:00
Hyungi Ahn	55e39818ec	feat(briefing): register 05:10 KST APScheduler cron 매일 KST 05:10 morning_briefing_run 자동 실행. scheduler timezone=Asia/Seoul 이라 hour=5 minute=10 만 명시. Phase 4 04:00 cron 종료 후 70분 buffer + MLX semaphore 충돌 회피.	2026-05-12 14:54:20 +09:00
hyungi	ff351e5a0f	Merge pull request 'feat/morning-briefing-frontend' (#11 ) from feat/morning-briefing-frontend into main Reviewed-on: #11	2026-05-12 14:53:18 +09:00
Hyungi Ahn	1696926b8c	refactor(briefing): nav label to 아침 브리핑	2026-05-12 14:35:16 +09:00
Hyungi Ahn	4d9beb37ef	feat(briefing): swap /news to morning briefing card UI - /news/+page.svelte 전면 재작성: article list 폐기, /api/briefing/latest fetch → topic 카드 list - 각 카드: topic_label + headline + country_perspectives (flag + 한국어 + summary + article #id 링크) + divergences/convergences/key_quotes + historical_context - status 4-state UI 분기 (empty/partial/failed/success) - 디자인 시스템 토큰 only, Card 공용 컴포넌트 재사용, Svelte 5 runes + TS - layout 라벨 뉴스 → 브리핑 (라우트 /news 유지) - 백업: git history	2026-05-12 14:30:42 +09:00
hyungi	8b4f4e53f4	Merge pull request 'feat/morning-briefing-backend' (#10 ) from feat/morning-briefing-backend into main Reviewed-on: #10	2026-05-12 14:26:13 +09:00
Hyungi Ahn	6966be9cf6	fix(briefing): backfill country_perspectives[].article_ids from cluster members LLM 이 article_ids 를 자율적으로 비워두는 케이스 (2026-05-12 첫 briefing 6 topics 모두 빈 list) 를 서버에서 보정. 후처리 정책 (_resolve_article_ids): 1. LLM 이 준 id ∩ cluster member id (엉뚱한 id 차단, hallucination 방어) 2. 비어있으면 같은 country cluster member top weight N 개 자동 주입 3. cluster 안 country 매칭 멤버 0 → [] per-country cap = MAX_ARTICLE_IDS_PER_COUNTRY = 5. weight 내림차순. API 계약 강화: country_perspectives 가 있는 topic 은 article_ids ≥ 1 보장 (같은 country cluster member 존재 시). frontend / 외부 채널 / archive UI 모두 신뢰 가능. tests 3 케이스 추가.	2026-05-12 13:15:26 +09:00
Hyungi Ahn	36fea2789a	fix(briefing): split migration into 4 single-statement files asyncpg 의 prepared statement 가 multi-statement 불허. Phase 4 101 SQL 은 2026-04-08 적용 당시엔 통과했지만 현재 asyncpg/sqlalchemy 버전에서 fail. 255_morning_briefings_table.sql CREATE TABLE morning_briefings 256_morning_briefings_idx.sql CREATE INDEX (briefing_date) 257_briefing_topics_table.sql CREATE TABLE briefing_topics + UNIQUE 258_briefing_topics_idx.sql CREATE INDEX (briefing_id, topic_rank)	2026-05-12 13:04:56 +09:00
Hyungi Ahn	4aed9c6173	fix(briefing): simplify migration SQL (remove unicode, ::jsonb cast) asyncpg 'cannot insert multiple commands into a prepared statement' 회피. 가설: 한국어 코멘트의 special char (lambda/arrow) + '::jsonb' cast 가 asyncpg prepare 에서 multi-statement 오인. Phase 4 101 SQL 패턴과 정확히 맞춤 — JSONB column 이라 default literal 은 자동 cast.	2026-05-12 13:02:16 +09:00
Hyungi Ahn	431d4fe010	feat(briefing): add morning briefing schema + services + api (historical off) 야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석 1페이지 카드. Phase 4 Global Digest 와 코드/로직/테이블 분리, 알고리즘만 services/clustering_common 공유. Backend 신규: - migrations/255_morning_briefings.sql: morning_briefings + briefing_topics (briefing_date UNIQUE, UNIQUE(briefing_id,topic_rank), FK CASCADE, historical_* 3컬럼 nullable, cluster_members JSONB, country_perspectives JSONB, status 4-state success\|partial\|failed\|empty) - app/models/briefing.py: SQLAlchemy ORM - app/services/briefing/loader.py: KST 5h 윈도우 + news_sources prefix fallback (Phase 4 패턴 미러) + historical candidate pool 로더 - app/services/briefing/clustering.py: cluster_global topic-first (LAMBDA=ln(2)/2h, MIN_COUNTRIES_PER_TOPIC=2, MAX_TOPICS=7) - app/services/briefing/comparator.py: call_primary 26B + JSON envelope sanitize (cap perspectives 10 / divergences 3 / convergences 2 / quotes 5) + fallback row 고정 형태 + retrieve_historical cosine top-K - app/services/briefing/pipeline.py: load→cluster→select(K=7,λ=0.6) →historical→compare→status 4-state→delete+insert transaction - app/workers/briefing_worker.py: APScheduler/수동 호출 공용 진입점, 600s hard cap - app/prompts/briefing_comparative.txt: 한국어 비교 분석 JSON 프롬프트, {articles_block} + {historical_block} 2섹션, 인용 금지 라벨 - app/api/briefing.py: GET /latest, GET ?date=, POST /regenerate?date= (admin, sync delete+insert tx, regenerated:true) Backend 수정: - app/main.py: briefing_router 등록 (/api/briefing prefix). scheduler 등록은 PR-3 에서. - app/services/digest/selection.py: select_for_llm 매개변수화 (K, λ caller 주입). Phase 4 동작은 default 값으로 보존. Historical 정책: - BRIEFING_HISTORICAL_ENABLED env flag, default off. - flag off → historical_* 컬럼 모두 NULL, prompt {historical_block} 빈 라벨, retrieval 호출 안 함. - flag on (PR-1b 에서 enable) → cluster centroid 와 과거 30일 doc embedding cosine top-K 5 (sim≥0.70), prompt 에 주입. Country canonical (실측 확인 후): - documents.country 컬럼 부재 확정 - document_chunks.country 매칭률 0% (chunks 자체가 뉴스에 안 만들어짐) - 유일 country 신호 = news_sources prefix 매핑 (Phase 4 와 동일) Tests: - tests/test_briefing_historical.py: 3 경로 회귀 (flag off/on with fixture/on zero match) + sanitize cap + fallback row 형태. Verification: PR-1.8 에서 GPU 컨테이너 pytest + 수동 regenerate.	2026-05-12 12:58:50 +09:00
Hyungi Ahn	1ca6d8b522	refactor(digest): extract clustering helpers to clustering_common Phase 4 Global Digest 의 클러스터링 핵심 알고리즘 (time-decay weight, adaptive threshold, greedy cosine assign + EMA centroid, importance normalize) 을 `app/services/clustering_common.py` 로 추출. country 축은 caller 책임 — Phase 4 cluster_country 는 그대로 country 별 호출, 신규 morning briefing 모듈이 country 없이 cluster_global 로 호출 예정. selection.py 의 중복 _normalize 도 공통 util 로 통일. 동작 변경 0: - LAMBDA / threshold / EMA alpha / MIN_ARTICLES 모두 Phase 4 기본값 유지 - docs.sort (in-place) → sorted (copy) 변경했으나 caller 가 정렬된 docs 를 재사용하지 않으므로 무관 (dict element 의 weight 부여는 reference 라 그대로 반영) 다음 commit 에서 Phase 4 회귀 검증 (digest regenerate diff 0).	2026-05-12 12:38:32 +09:00
hyungi	de36a9abca	Merge pull request 'fix(memos): voice memo file_type → 'immutable' (doc_type enum 호환)' (#9 ) from fix/memos-voice-doc-type into main Reviewed-on: #9	2026-05-11 12:29:44 +09:00
Hyungi Ahn	3dc78e4f94	fix(memos): voice memo file_type → 'immutable' (doc_type enum 호환) GPU 서버 main pull 후 /api/memos/?archived=false 가 500 — doc_type enum 에 'audio' 값 없음 (immutable/editable/note 만). list_memos WHERE file_type IN ('note', 'audio') 가 invalid_text_representation. 수정: - voice upload Document.file_type = 'audio' → 'immutable' (기존 audio 컨테이너 인입과 같은 패턴: file_type='immutable' + category='audio' + source_channel='voice') - list_memos 필터에서 file_type 조건 제거 (source_channel IN ('memo','voice') 만으로 분리 — file_type='immutable' 필터는 일반 PDF 까지 끌어옴, 위험) - module docstring + voice upload 주석 업데이트 원본 plan 의 file_type='audio' 결정은 doc_type enum 미확인이 원인. enum 확장(ALTER TYPE ADD VALUE 'audio') 대신 기존 패턴 재사용 — 안전 + 회귀 X.	2026-05-11 12:28:58 +09:00
hyungi	f3693fa2ea	Merge pull request 'feat/memo-intake-upgrade' (#8 ) from feat/memo-intake-upgrade into main Reviewed-on: #8	2026-05-11 12:10:50 +09:00
Hyungi Ahn	1424e79495	docs(memos): iOS Shortcuts guide for voice memo upload	2026-05-11 12:09:12 +09:00
Hyungi Ahn	e3adbb8961	feat(frontend): show memo triage and voice source UI PR-2B/2C frontend (commit 4/4). plan v9 Memo Intake Upgrade. PR-2B 분류 표시 + 1-click promote: - 메모 카드 상단에 AI 분류 배지 (task/calendar/activity/reference + confidence%) - ai_event_kind != 'note' 메모 하단에 4 버튼: · [할 일로] [일정으로] [활동으로] (AI 추천 kind 는 색깔 highlight) · [그냥 메모] (dismiss → ai_event_kind='note' 강제) - promote 후 메모 카드에 "→ events #N" link 배지 (사용자 시각 확인) PR-2C 음성 메모 표시: - source_channel='voice' 메모는 🎙️ "음성" 배지 - audio player (<audio src=/api/documents/{id}/file?token=>) — 기존 file endpoint 재활용 - STT 대기 중인 voice 메모는 "음성 → 텍스트 변환 대기 중…" placeholder API helpers: - promoteMemo(memoId, kind) → POST /memos/{id}/promote-to-event - dismissEventSuggestion(memoId) → POST /memos/{id}/dismiss-event-suggestion - voiceAudioUrl(memoId) → /api/documents/{id}/file?token= (access token URL pattern) Sidebar 영향 0 (events 진입점은 이미 PR-2 에서 추가됨). 원칙 (재명시): AI worker 는 events row 직접 생성 X — 본 UI 의 promote 버튼만이 events 진입.	2026-05-11 12:08:34 +09:00
Hyungi Ahn	6490050b04	feat(memos): promote memo to event + voice memo upload endpoint PR-2B/2C backend 2/2. plan v9 commit 분할 2~3 통합 (memos.py 단일 파일 변경). PR-2B promote-to-event: - POST /api/memos/{memo_id}/promote-to-event — 메모 → events 1-click 승급 · kind 결정: body.kind > documents.ai_event_kind > 400 · activity_log 면 status=done + ended_at=now() 자동 (5초 행동 기록 UX) · calendar_event + start_at 있으면 status=scheduled · Event row + events_history(create) 자동 생성 · memo_document_id 자동 link + source='memo' + raw_metadata 에 AI 추천값 보존 · 한 메모 → N events 가능 (사용자 의도에 따라 dedup 없음) - POST /api/memos/{memo_id}/dismiss-event-suggestion — '그냥 메모' (ai_event_kind='note' 강제) · MVP: AI 추천값과 사용자 확정값 같은 컬럼 (정확도 측정 흐려질 수 있음) · 백로그: user_event_kind 별 컬럼 분리 (plan Memo Intake Upgrade 백로그) - MemoResponse 확장: ai_event_kind / ai_event_confidence / source_channel / file_type / file_path - list_memos 필터 완화: file_type IN (note, audio) + source_channel IN (memo, voice) → voice 메모도 같은 inbox list 에 표시 (사용자 의도: 메모 = 모든 입력의 inbox) PR-2C voice upload: - migration 254: ALTER TYPE source_channel ADD VALUE 'voice' - POST /api/memos/voice (multipart audio + recorded_at + device_hint) · 검증: Content-Type audio/* + size ≤ 50MB + 확장자 화이트리스트 · NAS 저장: /documents/PKM/Recordings/{YYYY-MM}/{uuid}.{ext} · fsync + rename(atomic) 패턴 (NAS soft mount 안전) · Document row: file_type='audio' + source_channel='voice' + category='audio' · enqueue stt 큐 → 기존 stt_worker → classify (PR-2B triage) → embed → chunk · extract_meta 에 device_hint / recorded_at 보존 - 응답: MemoResponse (file_path 포함, frontend audio player 용) 원칙: AI worker 는 events row 직접 생성 X. 본 endpoint 가 사용자 의도 channel.	2026-05-11 12:06:41 +09:00
Hyungi Ahn	63990ac632	feat(memos): add AI event-kind triage fields PR-2B (Memo Inbox Triage) backend 1/2. plan: beszel-tingly-sloth.md 라운드 13. 사용자 비전 = 메모는 inbox, AI 는 triage assistant. AI worker 는 events row 직접 생성 X. Migrations 250–253 (실측 N=250): - 250 CREATE TYPE event_kind_hint AS ENUM (note\|task\|calendar_event\|activity_log\|reference) - 251 ALTER TABLE documents ADD ai_event_kind event_kind_hint - 252 ALTER TABLE documents ADD ai_event_confidence NUMERIC(3,2) + CHECK 0–1 - 253 CREATE INDEX idx_documents_ai_event_kind partial WHERE ai_event_kind IS NOT NULL ORM: - Document.ai_event_kind / ai_event_confidence 컬럼 추가 (Enum SQLAlchemy 동기) - source_channel enum 에 'voice' 추가 (PR-2C 와 호환) Worker: - classify_worker Phase 3 (Gemma 4B triage) 확장 · TriageOutput 에 event_kind_hint + event_kind_confidence 필드 추가 · 4B 응답에 hint 가 있을 때만 Document 에 저장 (enum 외 값은 무시) - prompt p3a_short_summary.txt 확장 — note/task/calendar_event/activity_log/reference 분류 기준 + confidence + default='note' 명시 원칙: AI worker 는 hint 만 제공. events 생성은 다음 commit 의 promote endpoint 에서만.	2026-05-11 12:04:21 +09:00
hyungi	a842dc682e	Merge pull request 'wip/gpu-main-snapshot-2026-05-11' (#7 ) from wip/gpu-main-snapshot-2026-05-11 into main Reviewed-on: #7	2026-05-11 08:11:44 +09:00
hyungi	2f7b45d82c	Merge pull request 'feat/events-ui-mvp' (#6 ) from feat/events-ui-mvp into main Reviewed-on: #6	2026-05-11 08:11:32 +09:00
Hyungi Ahn	6d71116553	feat(events): PR-2 UI MVP — 4-tab + 빠른 행동 기록 + 상세/생성/이력 plan v6 PR-2 scope. 5초 행동 기록 UX 가 핵심 가설. Backend: - GET /api/events/{id}/history — events_history timeline 조회 (lifecycle op 자동 기록) Frontend (SvelteKit 5 runes mode): - /events 메인 — 4-tab (오늘/Inbox/예정/활동) + 빠른 행동 기록 widget · 단일 입력 + Enter → POST /api/events kind=activity_log · status=done + 시간 default 채워짐 (서버 측) → Activity 탭 즉시 반영 · 새 항목을 list 최상단 prepend (refetch 불필요) · 연속 입력 위해 입력 ref focus 유지 · lifecycle 버튼 (complete/defer/cancel/reactivate) — activity_log 는 lifecycle 대상 X - /events/[id] 상세 — PATCH 허용 필드 edit (title/desc/시간/priority/project_tag) + history timeline · PATCH 금지 필드는 UI 노출 X (status/completed_at/cancelled_at/defer_until 은 별 버튼) - /events/new — kind 선택 (task/calendar_event/activity_log) 후 필드 분기 form · task: due_at + start_at (선택, "14:00 전화" 같은 시각 task 허용 — 라운드 10) · calendar_event: start_at 필수 + end_at + all_day · activity_log: started_at/ended_at 비우면 서버 default now() - Sidebar 메모 옆에 events 진입점 (CalendarCheck icon) API helpers: frontend/src/lib/utils/events.ts (createEvent / logActivity / list* / lifecycle ops / kind&status enum label/color). quickref doc: docs/events_api_quickref.md (이전 commit, PR-2 frontend reference). PR-2 핵심 가설 검증 = 빠른 입력 → 저장 → Activity 즉시 반영 → 새로고침 유지. PR-1 deferred HTTP behavior 5건도 본 UI 의 자연 사용으로 닫힘.	2026-05-11 07:56:31 +09:00
Hyungi Ahn	477be3892a	docs(events): PR-1 → PR-2 quickref — API contract + 5초 행동 기록 UX 가이드 PR-2 (frontend UI MVP) 진입 전 reference doc. plan: beszel-tingly-sloth.md v6. 내용: - JWT 인증 flow (curl 예시) - 9 endpoint 표 (Create/List/Detail + 4 Lifecycle + 3 View) - kind / status enum 의미 + UI 분기 hint - 빠른 행동 기록 5초 UX (PR-2 핵심 가설) - PR-2 smoke 로 자연 검증할 5건 (PR-1 closure 의 deferred 항목) - events_history 조회 endpoint 미존재 (필요 시 PR-2 에서 추가) authoritative API contract = /openapi.json. 본 doc 은 frontend cheat sheet.	2026-05-11 07:50:33 +09:00
hyungi	bce18386f0	Merge pull request 'docs(storage): Storage PR-1 — read-only inventory + 정책 문서' (#4 ) from chore/storage-inventory into main Reviewed-on: #4	2026-05-11 07:26:46 +09:00
hyungi	dc96d2b298	Merge pull request 'feat(events): PR-1 Events Core — schema + ORM + 최소 API' (#5 ) from feat/events-core into main Reviewed-on: #5	2026-05-11 07:26:31 +09:00
Hyungi Ahn	768fc36746	docs(storage): Storage PR-1 — read-only inventory + 정책 문서 Storage Backbone NAS 트랙의 첫 PR. plan v6 명시대로 read-only inventory PR — 운영 변경 / mount 변경 / file_path 갱신 / asset 이동 모두 0건. 문서만. 산출물: - docs/storage_layout.md 영구 정책 문서 (정책 / 마운트 매트릭스 / NFS 옵션 baseline) - reports/storage_inventory_2026-05-11.md 측정 결과 snapshot 핵심 인사이트: 1. NAS binary layer 는 이미 잘 분리되어 있음 — PKM/extracted_images/ study_question_images 모두 이미 NAS. 추가 이관 PR-3/4 작업량 거의 없음. 2. 현 GPU NFS mount = plan v6 권고안 baseline 과 정확히 같음 (soft, vers=4.1, timeo=10, retrans=3) — PR-2 는 mount 옵션 변경 아닌 애플리케이션 layer (정규화 wrapper / 장애 처리 / uid 매핑) 에 집중. 3. fastapi 만 NAS rw, worker 는 ro — 원본 안전 분리 OK. 4. Postgres pgdata = 1.1GB (DB 본체 이관 안 함, plan 결정 = GPU 잔류). 5. PR-4 도입 시 extracted_emails/ 신규 디렉토리 추가 예정 (Storage PR-5 합류). 실측 명령: SSH 100.111.160.84 → df/mount/du/docker volume ls/docker run -v ... alpine du. 모두 read-only. 운영 영향 0.	2026-05-11 07:23:28 +09:00
Hyungi Ahn	9d9b3359b0	feat(events): PR-1 Events Core — schema + ORM + 최소 API 개인 운영 로그 / 일정 / 할 일 / 회고용 1차 컨테이너 도메인 신설. plan: ~/.claude/plans/beszel-tingly-sloth.md (라운드 12 v6). Schema: - enum 5종 (event_kind / event_status / event_source / event_actor / history_change_kind) - events 테이블: kind(task\|calendar_event\|activity_log) + lifecycle 7-state status - events_history: lifecycle op 자동 기록, FK RESTRICT (이력은 시점 사실) - CHECK: calendar_event → start_at NOT NULL / activity_log → started_at\|ended_at NOT NULL - partial unique (source, source_ref) — 외부 source dedup (PR-4 활용) - partial index (active status / activity_log timeline) API: - POST /api/events (kind=activity_log shortcut: status=done + ended_at=now() default) - GET /api/events/{id} \| /api/events?kind&status&from&to&project_tag&source - PATCH /api/events/{id} (extra=forbid + 시간 필드 변경 시 reschedule history) - POST /api/events/{id}/{complete,cancel,defer,reactivate} (history 자동) - GET /api/events/today (Asia/Seoul default, deferred 는 defer_until<=now() 만) - GET /api/events/inbox \| /api/events/activity?from&to 제외 (PR-2~5 또는 백로그): - DELETE (회고 데이터 → /cancel 일관화) - log shortcut / upcoming endpoint (POST + GET ?from&to 로 흡수) - /ingest (PR-4 MailPlus forward 시 정확한 요구로 추가) - iCal export / ntfy 알림 / recurrence / 일반 edit history	2026-05-11 07:19:04 +09:00
Hyungi Ahn	aca2f0d62c	feat(canonical): restore GPU STT owner and extend KGS watch paths D9 Track B revised (2026-05-08): 1) STT owner GPU 정식 복귀: - docker-compose.yml: stt-service profiles:[legacy] 제거 → 상시 활성 - fastapi STT_ENDPOINT = http://stt-service:3300 (compose 내부 DNS) - 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유. 이전 "Mac mini 이전본" 주석은 trace 오인 기반. 2) KGS Code 등 외부 학습 자료 추가 스캔 경로: - ADDITIONAL_WATCH_TARGETS env (쉼표 구분, PKM 상대경로) - app/core/config.py: additional_watch_targets list 설정 추가 - app/workers/file_watcher.py: 추가 watch path 처리 - app/workers/classify_worker.py: KGS Code 분류 분기 (가스기사 학습 자료) - 모두 expected_category=library 처리 (md/pdf/docx 만) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 05:47:20 +00:00
hyungi	c1b22d8833	docs(eval): Phase 2 path fix — log_tsv/post-report 는 /app/logs (bind-mount), /app/evals 는 미마운트 cron dry-run 검증 중 발견: - /app/scripts/ 는 bind-mount 활성 (Phase 2 main FF 후 컨테이너 가시화 ✓) - /app/evals/ 는 fastapi 이미지에도 없고 compose 마운트도 없음 - 이전 README/plan 의 --log-tsv /app/evals/markdown/... 은 컨테이너 writable layer 에 쓰여 재기동 시 유실되는 문제 해결: nightly --log-tsv 와 post-report --output-* 는 /app/logs/ 사용 (rw bind-mount → host ~/Documents/code/hyungi_Document_Server/logs/ 영구). 주 1회 git commit 시 logs/ → evals/markdown/ 로 cp 후 add. post-report 도 동일 패턴.	2026-05-10 05:47:20 +00:00
hyungi	df2b09b0fa	docs(eval): Phase 2 canary retry GO — success 37/40 (92.5%) failed 2 skipped 1 옵션 C 실행 (2026-05-03 02:36-02:39 UTC): - 5201 documents stuck processing → failed (conditional UPDATE 1 row) - 3817 재 enqueue → success 35.8s - 4059 재 enqueue → success 100.7s - GPU contention 해소 확인 (free 8820 MiB) 최종 tally: success 37 / failed 2 (3810 corrupt PDF + 5201 scan-likely timeout) / skipped 1 (5090 MAX_PAGES). Plan 3 게이트 모두 PASS. 다음 = 사용자 승인 게이트 (2-C 진입 + nightly 모드 선택). main 머지 + parent pull + cron 추가 4단계 대기. 후속 백로그 (Phase 1B+, 별도 PR): - B1 scan-likely auto-skip (5201 패턴) - B2 OOM 503 transient (야간 contention 자동 복구) - B3 queue exhausted → doc.md_status 동기화 (corner case 정리)	2026-05-10 05:47:20 +00:00
hyungi	5bf9ff9dc2	docs(eval): Phase 2 canary 결과 — HALT (failed 4/40 = 10%, but 분류상 Marker 0 fail) 35 success / 3 failed / 1 skipped / 1 stuck processing (corner case). Plan 게이트 FAIL (success<36 + failed>2). 다만 failure root cause 분석: - 2/4 = GPU contention (5.93+5.35 GiB 다른 process 점유, free 50 MiB) - 1/4 = 진짜 corrupt PDF (Pdfium error, non-retryable) - 1/4 = scan-likely + tiny text + ReadTimeout (Phase 1B corner case) Marker quality 자체 fail = 0. p50 elapsed 33.2s (1D 34s 와 동등), text_length_ratio p50 1.00 (1D 1.15 대비 -13%, 정상 범위), 신규 warning 없음. 사용자 결정: A(수용) / B(코드 가드 추가) / C(OOM 2건 즉시 재 enqueue → GO 통과) / D(HALT 유지). 추천 C 또는 A. 5201 stuck processing 은 어느 옵션이든 수동 DB 정리 필요 (사용자 승인 후).	2026-05-10 05:47:20 +00:00
hyungi	f61dce262e	docs(eval): Phase 2 경로 정책 정정 — 2-B /app/logs vs 2-C /app/scripts canonical Plan/README 가 /app/scripts 를 통일 경로로 가정했으나 실측 결과 read-only bind-mount 라 docker cp 불가. soft lock 으로 --build 도 금지. 단계별로 다른 경로 사용해야 함: - 2-B canary (pre-merge): /app/logs/phase2_backfill.py + /app/logs/.csv (docker cp worktree → /app/logs rw bind-mount). canary 검증 동안 미검증 코드 main 진입 회피. - 2-C nightly (post-merge canonical): /app/scripts/phase2_backfill.py + /app/evals/markdown/phase2_ (feat/phase2-backfill main 머지 + parent git pull 후 bind-mount 자동 활성). cron 도 canonical path. evals/markdown/README.md 의 enqueue 예제 + 신규 #### 경로 정책 섹션 반영.	2026-05-10 05:47:20 +00:00
hyungi	48f8bf6ca6	docs(eval): Phase 2 canary sample — 40 docs (seed 20260503) Bucket distribution (algorithm vs allocated): - large (>10MB): 6 / 6 - scan_likely (text_density<5): 2 / 2 - study_note born-digital: 10 / 10 - Academic_Paper born-digital: 2 / 8 (under-fill — only 20 born-digital docs total in pool) - Reference born-digital: 0 / 6 (under-fill — 동상) - tech_doc (Standard/Manual/Specification): 4 / 4 - minor_doc (Note/Report/Memo/NULL): 4 / 4 - filler (rest from candidates): 12 (picked up under-fill slack) Note: 1D 의 born-digital bias 가정이 Phase 2 실 모집단과 안 맞음 (text_density 분포가 mixed-dominant: 174/237). 그래도 40 docs 가 large / scan-likely / 다양 doctype 커버 — canary 진단 목적 충족. Next: 사용자 승인 게이트 — --no-dry-run enqueue 시점 결정.	2026-05-10 05:47:20 +00:00
hyungi	ac58c8262c	docs(eval): Phase 2 inventory dry-run — 237 pending PDFs, 227 convert candidates - forecast_skip_reason distribution: - none: 227 (convert candidates) - over_max_pages_estimated: 10 (file_size > 25MB proxy) - handwritten_hint: 0 (1D-A1 skip already in marker_worker) - doctype_skip: 0 - file_size_band: S=47 / M=160 / L=30 - text_density_band: mixed=174 / scan-likely=43 / born-digital=20 - doc_type top: study_note 79 / Academic_Paper 57 / Reference 35 / Standard 24 / Manual 19 - 시드 baseline for select-canary (next step)	2026-05-10 05:47:20 +00:00
hyungi	25ee10ac34	feat(scripts): Phase 2 markdown backfill — script + README - scripts/phase2_backfill.py: 5 subcommands - inventory: pending PDFs dry-run CSV with skip forecast - select-canary: stratified 40 sample (seed 20260503) - enqueue: one-shot from sample CSV (--no-dry-run gate) - nightly-enqueue: cron-friendly with disable flag / marker /ready / active-queue threshold (oldest_age stuck guard) / DB pool guards - post-report: final state CSV + 1D baseline comparison MD - evals/markdown/README.md: Phase 2 section appended - plan: ~/.claude/plans/iridescent-gathering-clover.md - depends on Phase 1B handwritten skip `7d0fca2` (marker_worker side guard)	2026-05-10 05:47:20 +00:00
Hyungi Ahn	8ca27eb573	fix(markdown): img auth via ?token= query param (Authorization header 미지원) `<img src=>` 가 Authorization header 를 못 보내서 /api/documents/{id}/images/{key}/raw 가 401 반환 → 이미지 안 보임. 기존 /file?token= iframe 패턴과 동일하게 access token 쿼리 파라미터로 전달. backend: get_current_user 의존성 제거하고 token 쿼리 파라미터 직접 검증 (기존 /file 엔드포인트와 동일 흐름). frontend: MarkdownDoc 의 swap selector 가 img.src 에 ?token={getAccessToken()} 부여. 로그아웃 상태면 placeholder 유지. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:47:09 +09:00
Hyungi Ahn	f2a5c729b7	fix(scripts): marker reprocess SQL — CAST(:payload AS jsonb) 로 named-param 충돌 해소 `:payload::jsonb` 의 `::` postfix 캐스트가 SQLAlchemy text() 의 named-param prefix `:` 와 충돌해 asyncpg syntax error. doc 3757 sample reprocess 시 발견. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:27:17 +09:00
Hyungi Ahn	68fa86ea52	feat(markdown): persist extracted images with auth routes Markdown Canonical Phase 1B.5 — marker 가 추출하던 이미지를 NAS 에 영구 저장하고 DB 메타 + 인증 라우트 + 프론트 swap 까지 wiring. 핵심 변경: - marker-service /convert 응답에 base64 image 리스트 포함 (stateless 유지, NAS write 권한 X) - marker_worker 가 NAS `/documents/extracted_images/{doc_id}/` 에 persist + UPSERT + 고아 row DELETE + md_content ref 를 `docimg:img_NNN` stable scheme 으로 정규화 - /api/documents/{id}/images/{key}/raw 인증 라우트 (Cache-Control private + ETag = content_hash) - frontend MarkdownDoc 가 placeholder card 안의 docimg ref 를 실제 <img> 로 swap 원칙: - 이미지 binary = NAS, metadata = Postgres (학습 섹션 패턴 동일) - image_key sequence 기반 결정적 → 재변환 idempotent - MARKDOWN_IMAGE_PERSIST=false env 로 rollback 가능 (placeholder card 폴백 자연 유지) 기존 28건 marker success 문서는 본 PR 에서 건드리지 않음 — deploy + 신규 업로드 1건 + sample 5건 검증 후 scripts/marker_reprocess_existing_success.py 로 targeted reprocess. plan: ~/.claude/plans/piped-humming-crystal.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:05:41 +09:00
Hyungi Ahn	5b62c59f8a	fix(canonical): marker_worker transport 계층 오류는 transient retry 분류 기존: (ConnectError, TimeoutException) 만 transient → raise → queue retry. ReadError / WriteError / RemoteProtocolError 같은 다른 transport 류는 'except Exception' 이 잡아 _fail 처리 → max_attempts 무시하고 final fail. Phase 1D pilot 에서 5111/5115 두 건이 'Server disconnected without sending a response' (RemoteProtocolError) 로 retry 없이 final fail. Fix: except (ConnectError, TimeoutException) → except TransportError. TransportError 가 Connect/Read/Write/RemoteProtocol/Timeout 의 공통 부모 라서 모든 transport 계층 오류가 transient queue retry 대상이 됨. 5135 의 ReadTimeout (queue exhausted) 는 본 fix 와 별개 — 8.4MB PDF 가 MARKER_TIMEOUT=300s 안에 못 끝나 3번 retry 다 timeout. timeout 자체를 늘리거나 큰 PDF 분할 처리하는 별도 결정 필요. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 15:29:47 +09:00
Hyungi Ahn	5185501bbd	feat(search): PR-RAG-Time-1 freshness decay (news/law_monitor) 뉴스/법령 알림 retrieval 결과에 시간 가중치 soft multiplier 적용. reranker 이후 final score 합성 단계에서 운영 정책 단계로 분리. - news (source_channel='news'): half-life 90일 - law_monitor (source_channel='law_monitor'): half-life 365일 - 비적용: manual / drive_sync / inbox_route / memo / Manual / Reference / Academic_Paper / Checklist / KGS Code / Study / content_origin='ai_drafted' - formula: decay = exp(-ln(2) * age / HL); final = base * (0.7 + 0.3 * decay) - floor 0.7 (완전 demote 금지) - 가드: missing date / future date / unknown source 모두 no-op - 임시 date source: documents.created_at (published_date 컬럼 부재 — 후속 PR) debug 메타 (?debug=true 응답 + logs/search.log): base_score / age_days / decay_factor / freshness_adjusted_score / freshness_policy / freshness_date_source 신규: app/services/search/freshness_decay.py hook: app/services/search/search_pipeline.py:303 (apply_diversity 직후, normalize 직전) schema: app/api/search.py SearchResult.freshness_debug (Optional[dict]) tests: tests/test_freshness_decay.py 24 case (정책 디스패처 9 + age/decay/score 11 + apply integration 6 — guard 1~6 all) Episode/Fact layer 와 contradiction detection 은 본 PR 스코프 외. plan: ~/.claude/plans/pr-rag-time-1-freshness-decay.md	2026-05-03 08:38:09 +09:00
Hyungi Ahn	e4fe18b7a8	docs(eval): 1D pilot 약식 평가 결과 기록 사용자 quality 평가: "애플펜슬로 필기한건 내 글씨체 이슈에 더해서 좋은 자료를 뽑아내지 못하네 그 외에는 잘되는거 같은데" 분류: overall_pass=true 24건 — 일반 PDF (born-digital + scan-like 中 5127 같이 정상 변환되는 케이스) overall_pass=false 4건 — 애플펜슬 필기 4건 (4798/4813/4815 controlled_backfill + 4809 anchor) overall_pass=empty 2건 — page_count > MAX_PAGES=200 의도 skip (5178 ASME 272p, 5180 ASME Sec I 453p) 정식 rubric 5축 (text_accuracy/structure/noise_rate/multi_script/ completeness) 점수는 비워둠 — 사용자 약식 판정으로도 의사결정 매트릭스 분기 (필기만 fail → SKIP rule 확장) 가 명확해 정식 채점 over-investment. 후속 라운드 (Marker 튜닝/대안 OCR 도입 시) 같은 30건 재평가에는 정식 rubric 채울 가치 있음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:15:33 +09:00
Hyungi Ahn	7d0fca267d	feat(marker): handwritten 자동 skip — Phase 1D pilot 결과 반영 1D pilot (2026-05-02 야간 sweep, 25 controlled_backfill 결과) 에서 필기 PDF 3건 (4798 / 4813 / 4815) 이 status='success' 로 변환됐으나 사용자 quality 평가에서 좋은 자료 추출 불가 판정. 근본 원인은 Marker 설정 부족이 아니라 입력 자체 (애플펜슬 손글씨 + 사용자 글씨체 = OCR/ layout 모델 한계 영역). Marker 튜닝으로 해결될 영역이 아니므로 enqueue 단계에서 자동 skip. 가드 로직: marker_worker.process() 의 doc_type SKIP 직후 (1.5 단계) title/path 의 보수적 키워드 4개 (필기, 손글씨, handwritten, handwriting) 매칭 시 _set_skipped() 호출. md_content/md_content_hash NULL clear, md_extraction_error='skipped: handwritten note (title/path heuristic)', content_origin='extracted'. 키워드 선정 (보수적): 포함: 필기 / 손글씨 / handwritten / handwriting 제외 (false positive 위험): - 노트 (노트북 매뉴얼 / release notes / Note_240528_워크숍 같이 필기 아닌 정상 문서까지 잡음) - scan / 스캔 (스캔 PDF 中 정상 변환되는 케이스 있음, 1D 결과 doc 5127 표준기계설계(KS)_08_핀 density 1.59 / scan_likely 인데 성공) logger: markdown_skip_handwritten_hint id=<id> keyword=<matched> title=<...> regex 단위 테스트 15 케이스 (실 production fastapi venv) 전부 통과: 매칭: Note_240805_용접교육 필기 / Note_240827_필기 / 손글씨 모음 / Handwritten Notes 2024 / handwriting practice / path/필기/* / path/handwritten_collection/* (8건) 비매칭: 다이아프람워크숍 / 노트북 매뉴얼 / Release notes v2 / PIPE FABRICATORS / 표준기계설계 / scan documentation / 스캔 문서 (7건) 이번 가드는 enqueue 시점 적용. 이미 success 인 4건의 md_content 는 보존 (사용자가 직접 보고 싶을 때 표시 가능). 정리 필요 시 별건. 후속 (별 PR): - A2 (정식 doc_type='필기노트' 라벨): 1D 3건 sample 너무 적어 라벨 정의 보류. 필기 PDF 누적 후 별도 검토. - C (Phase 2 풀 backfill plan): 본 PR 머지 후 별도 라운드. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:11:42 +09:00
Hyungi Ahn	0362f52130	fix(scripts): Phase 1D enqueue 가 existing_success 재처리하지 않도록 필터 Round 2 sample 에 existing_success 5건 (anchor doc 4809 + calibration 4) 이 포함되었지만, cmd_enqueue 가 sample_source 무시하고 30건 전부 enqueue 하던 버그. 결과: - existing 5건 marker 재처리 (~25분 marker 시간 낭비) - 동일 quality output 으로 md_content overwrite → baseline 유실 - anchor (doc 4809) 의 "before" 상태가 사라져 후속 라운드 비교 anchor 손상 Fix: - default = sample_source == "controlled_backfill" 만 enqueue (25건) - --include-existing flag 추가 (후속 Marker 튜닝 라운드에서 anchor 재처리 필요 시 사용) - print 로 mode 명시 + 제외된 ids 표시 야간 단발 sweep (23:00 KST) 예약 실행 전 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 16:27:31 +09:00
Hyungi Ahn	b09687d41d	feat(scripts): Phase 1D Round 2 — controlled backfill stratification 기존 phase1d_pilot.py (단순 ai_domain × file_size 3-bucket) 를 plan ~/.claude/plans/stratified-mingling-otter.md 의 4축 + sample_source 분리 + forced_include 로 augment. Round 1 (ai_domain × file_size 3-bucket) 의 한계: pending PDFs 의 자연 분포만 반영 → 알려진 약점 (필기/스캔/한중일 mixed OCR) 이 sample 에 안 들어옴. 1C 시각 확인에서 doc 4809 (Note_240805_용접교육 필기) 가 실제로 그 패턴을 보였는데, 자연 selection 에 맡기면 다음 라운드도 같은 case 가 빠질 위험. Round 2 디자인: - 4 축 stratification: doc_type × file_size_band × text_density_band × handwritten_hint - sample_source ∈ {existing_success(5), controlled_backfill(25)} - forced_include doc 4809 — known bad anchor. 다음 튜닝/대안 도입 후 같은 문서 재변환 결과와 1:1 비교 가능. - text_density = LENGTH(extracted_text) / (file_size / 1024) chars/KB 가장 깨끗한 단일 proxy. 0.17(필기 4809) ↔ 94(born-digital 3759) 양 끝 검증. - script_mix proxy: Hangul/CJK/Hiragana/Katakana/Latin Unicode block ratio → korean_dominant / mixed_korean_cjk / mixed_korean_latin / cjk_dominant / latin_dominant / unknown. - page_count_estimate: existing_success 는 md_extraction_quality. metrics.source_page_count 사용. controlled_backfill 은 NULL (marker 가 PyMuPDF 로 어차피 다시 읽음). - 시드 SAMPLE_SEED=20260502 고정, 재현성 보장. Sample 분포 (실측 2026-05-02): bucket_label: born_digital=12, mixed=5, existing_calibration=4, handwritten=3, scan_likely=3, large=2, existing_anchor=1 doc_type: Academic_Paper=7, study_note=6, Standard=5, Note=4, Reference=3, Manual=3, Drawing=1, Report=1 file_size_band: M=14, S=12, L=4 text_density_band: born-digital=15, scan-likely=9, mixed=6 handwritten_hint: lo=26, hi=4 (모집단 1.1% 대비 13배 over-sample) forced anchor doc 4809 = density 0.17 (사용자 시각 확인의 그 문서) 새 subcommand: eval_template — pilot_1d_eval.csv 스켈레톤 (rubric 5축 1~5 + overall_pass + notes). 사용자가 MarkdownDoc + PDF 토글 비교하며 점수 채움. 기존 cmd_enqueue (snapshot/backup/dedup) + cmd_report (quality 메트릭) 는 유지. 산출물: scripts/phase1d_pilot.py — 4축 + sample_source + forced_include + eval_template subcommand. CSV+JSON dual output. evals/markdown/README.md — rubric + decision matrix + workflow guide. evals/markdown/pilot_1d_sample.csv — 30 rows × 15 cols (시드 결과, 재현성 보존). evals/markdown/pilot_1d_eval.csv — 빈 스켈레톤 (사용자 평가 후 채움). 실행 경계: Step 1~3 (selection / template / dry-run) = 본 PR 으로 완료. Step 4 (--yes enqueue, 실제 30건 markdown 큐 인입) = 사용자 timing 승인 + 야간 단발 sweep 윈도우 (23:00~03:00 KST) 안에서 별도 실행. marker-service BATCH_SIZE=1, 30건 평균 5분/건 ≈ 2.5h. Verify: GPU 서버 fastapi 컨테이너에서 select 실행 → 30건 sample CSV 생성됨. eval_template subcommand 동작 확인. enqueue dry-run 으로 30 doc_ids + snapshot 출력 후 사용자 취소 분기 확인. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 16:15:09 +09:00
Hyungi Ahn	91e7a64713	ops(guardrails): activate migration 142 ask_events.source NOT NULL 1주 운영 관찰 후 활성화 (배포 2026-04-17 이후 source IS NULL 행 0건 확인). deferred → migrations/142_*.sql 이동.	2026-05-02 16:12:38 +09:00
Hyungi Ahn	d6e0f5de04	feat(frontend): Phase 1C — markdown viewer 완성 (PDF 통합 + status badge + image placeholder) Phase 1B marker_worker 결과(현재 success 29건, 전부 PDF)를 사용자 흐름에 연결하고 1D pilot 품질 평가 데이터를 확보하기 위한 viewer 마무리 작업. 빠진 부분 3가지를 닫는다: 1) PDF viewerType 기본 view = Markdown - md_status='success' AND md_content 비어있지 않음일 때 MarkdownDoc 기본 표시. - 사용자가 "PDF 원본" 토글 시 iframe. - pdfViewMode 초기화는 doc.id 변경 시에만 (lastDocId tracker) — reactive cycle 이 사용자 토글을 덮어쓰지 않도록 보호. - markdown 사라지는 케이스(success → failed 재처리)는 자동으로 pdf 로 보호. 2) Image renderer → placeholder card (docMarkdown.ts) - md_content 의 69%(20/29)에 image syntax 포함. asset serving(1B.5) 미구현 상태에서 raw <img> 를 emit 하면 깨진 아이콘 → 1D pilot 평가가 markdown 품질이 아닌 viewer 미완성 문제로 오염됨. - href / alt / basename 모두 escape 후 figure.md-image-placeholder 로 렌더. - 원본 src 는 data-md-image-src 에 escape 보존 → 1B.5 ImgAuth selector 로 실제 <img> 로 교체할 entry point 마련. - DOMPurify ADD_ATTR 에 data-md-image-src 추가. 3) MarkdownStatusBadge (신규) — 4-state badge - pending 숨김(legacy 9792건 시각 노이즈 회피). - processing/success/skipped/failed 표시. - success tooltip: md_extraction_quality 의 metrics raw 일부 (markdown_heading_count / markdown_table_row_count / markdown_image_count / text_length_ratio / warnings) 만 노출. text_length_ratio / null / metrics nested / flat fallback 모두 방어. - skipped/failed tooltip: md_extraction_error 또는 정책 문구. - MarkdownDoc 내부 + PDF iframe fallback 양쪽에서 재사용 → failed 같이 MarkdownDoc 가 안 렌더되는 경로에서도 사용자가 상태를 알 수 있음. 기존 markdown/hwp-markdown/article 분기에도 mdExtractionQuality prop 전달. Out of scope (1B.5 또는 후속): - ImgAuth blob URL 실제 wiring (data-md-image-src selector + Bearer raw) - /data/assets/<doc_id>/ 저장 + 서빙 - Caddy /data/assets/* 라우팅 - localStorage 사용자 view preference 저장 - side-by-side viewer (1D pilot 결과 본 후) - quality chip 별도 UI (1D 후) Verify: - npm run build 통과 - npm run lint:tokens 신규 파일 위반 0 - 관련 plan: ~/.claude/plans/iterative-nibbling-catmull.md - pre-flight: md_extraction_quality 실제 shape 확인 ({score, metrics:{...}, warnings:[]}) Risks: - feature/design-system worktree 가 [id]/+page.svelte 의 stale 버전 보유 (main 보다 212 commits behind, MarkdownDoc 부재). 1C 머지 후 worktree 머지 시 conflict 확정 — 그쪽 rebase 필요 (별건). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 15:38:45 +09:00
Hyungi Ahn	dfc5913c5e	fix(tests): explanation cap test setup — 한글 chunk 길이 부족 보정 case 3/4 의 setup 이 EXPLANATION_MAX_CHARS (1200) 보다 작은 text 를 만들어 assert 실패. 한글 chunk 반복 횟수 늘려 1200 자 이상 보장. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:35:34 +09:00
Hyungi Ahn	6b52d57bac	feat(study): Phase 4-A explanation_md 길이 cap + prompt 강화 운영 데이터에서 ready 박힌 풀이가 793/838/866자 — 권장 200~400 대비 큰 편. 1차 운영 후 결과 화면 가독성 + 토큰 사용량 통제 위해 prompt 강화 + 저장 전 cap. Prompt (study_explanation_envelope.txt): - explanation_md 권장 300~600자, 최대 900자 명시 - 핵심 개념 + 정답 근거 + 헷갈리는 1~2개 오답만 — 모든 오답 풀이 X - explanation_md 안 줄바꿈 최소화 (parse_json fix 와 결합 — invalid escape 줄임) - LaTeX 수식 자제 — \\circ/\\text/\\, 매크로 가능하면 평문 ('0°C', 'C') - 출력은 raw JSON 한 객체만 — 코드 펜스/thinking/메타 X 강조 Worker (study_explanation_worker.py): - _cap_explanation_md(text, max_chars=1200) 헬퍼 신규 · 1200자 이하 passthrough · 초과 시 마지막 200자 안에서 \\n\\n / \\n / '. ' / '다.' / '요.' 경계 탐색 · 경계에서 자르기 + '…' (단어 중간 자르기 회피) · 경계 못 찾으면 단순 자르기 + '…' - save 전 cap 적용. ai_explanation_status='ready' 유지 (cap 됐다고 failed X) - payload 에 운영 분석 metadata: explanation_len_original / _saved / capped 플래그 검증: - tests/test_explanation_cap.py (6 케이스) · short passthrough / exact at limit / paragraph boundary / sentence boundary · no boundary fallback / empty input - scripts/phase4_health.sql 섹션 8/9 추가 · ai_explanation 길이 p50/p95/max (study_questions.ready) · cap 작동 빈도 (job.payload 의 explanation_capped/_original/_saved) cap 1200 = 800 (4-B summary_md) 보다 여유 — 기사시험 풀이는 공식+오답+개념 묶이면 800 빡빡함. 운영 후 800~1000 으로 조정 검토. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:33:18 +09:00
Hyungi Ahn	b3dbf1a11e	fix(ai): parse_json_response — string literal 안만 fix 하는 stateful walker 직전 fallback 의 무차별 newline replace 가 string 외부 (object 구조) 의 raw newline 까지 escape 해서 JSON 거부. 또 LaTeX 수식 (\circ, \text, \, etc) 의 invalid backslash 는 newline 이슈와 별개라 별도 fix 필요. state machine: in_string 토글 (`\"` 만남). string literal 안에서만: - raw LF/CR/TAB → \\n/\\r/\\t 로 변환 - backslash 다음에 valid escape char (\"\\/bfnrtu) 면 그대로 - backslash 다음에 invalid (\\c, \\,) 면 backslash 자체를 \\\\ 로 escape - string 외부 raw newline 은 JSON whitespace 라 보존 운영 데이터 id=243 의 raw 940자에 \\circ \\text \\, \\approx \\times 등 다수 LaTeX + markdown 줄바꿈 → 새 walker 가 두 케이스 모두 fix. 다른 worker (classify/triage/ study_explanation/evidence/study_session_analysis) 자동 혜택. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:00:20 +09:00

1 2 3 4 5 ...

524 Commits