hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	aca2f0d62c	feat(canonical): restore GPU STT owner and extend KGS watch paths D9 Track B revised (2026-05-08): 1) STT owner GPU 정식 복귀: - docker-compose.yml: stt-service profiles:[legacy] 제거 → 상시 활성 - fastapi STT_ENDPOINT = http://stt-service:3300 (compose 내부 DNS) - 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유. 이전 "Mac mini 이전본" 주석은 trace 오인 기반. 2) KGS Code 등 외부 학습 자료 추가 스캔 경로: - ADDITIONAL_WATCH_TARGETS env (쉼표 구분, PKM 상대경로) - app/core/config.py: additional_watch_targets list 설정 추가 - app/workers/file_watcher.py: 추가 watch path 처리 - app/workers/classify_worker.py: KGS Code 분류 분기 (가스기사 학습 자료) - 모두 expected_category=library 처리 (md/pdf/docx 만) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 05:47:20 +00:00
Hyungi Ahn	8ca27eb573	fix(markdown): img auth via ?token= query param (Authorization header 미지원) `<img src=>` 가 Authorization header 를 못 보내서 /api/documents/{id}/images/{key}/raw 가 401 반환 → 이미지 안 보임. 기존 /file?token= iframe 패턴과 동일하게 access token 쿼리 파라미터로 전달. backend: get_current_user 의존성 제거하고 token 쿼리 파라미터 직접 검증 (기존 /file 엔드포인트와 동일 흐름). frontend: MarkdownDoc 의 swap selector 가 img.src 에 ?token={getAccessToken()} 부여. 로그아웃 상태면 placeholder 유지. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:47:09 +09:00
Hyungi Ahn	68fa86ea52	feat(markdown): persist extracted images with auth routes Markdown Canonical Phase 1B.5 — marker 가 추출하던 이미지를 NAS 에 영구 저장하고 DB 메타 + 인증 라우트 + 프론트 swap 까지 wiring. 핵심 변경: - marker-service /convert 응답에 base64 image 리스트 포함 (stateless 유지, NAS write 권한 X) - marker_worker 가 NAS `/documents/extracted_images/{doc_id}/` 에 persist + UPSERT + 고아 row DELETE + md_content ref 를 `docimg:img_NNN` stable scheme 으로 정규화 - /api/documents/{id}/images/{key}/raw 인증 라우트 (Cache-Control private + ETag = content_hash) - frontend MarkdownDoc 가 placeholder card 안의 docimg ref 를 실제 <img> 로 swap 원칙: - 이미지 binary = NAS, metadata = Postgres (학습 섹션 패턴 동일) - image_key sequence 기반 결정적 → 재변환 idempotent - MARKDOWN_IMAGE_PERSIST=false env 로 rollback 가능 (placeholder card 폴백 자연 유지) 기존 28건 marker success 문서는 본 PR 에서 건드리지 않음 — deploy + 신규 업로드 1건 + sample 5건 검증 후 scripts/marker_reprocess_existing_success.py 로 targeted reprocess. plan: ~/.claude/plans/piped-humming-crystal.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:05:41 +09:00
Hyungi Ahn	5b62c59f8a	fix(canonical): marker_worker transport 계층 오류는 transient retry 분류 기존: (ConnectError, TimeoutException) 만 transient → raise → queue retry. ReadError / WriteError / RemoteProtocolError 같은 다른 transport 류는 'except Exception' 이 잡아 _fail 처리 → max_attempts 무시하고 final fail. Phase 1D pilot 에서 5111/5115 두 건이 'Server disconnected without sending a response' (RemoteProtocolError) 로 retry 없이 final fail. Fix: except (ConnectError, TimeoutException) → except TransportError. TransportError 가 Connect/Read/Write/RemoteProtocol/Timeout 의 공통 부모 라서 모든 transport 계층 오류가 transient queue retry 대상이 됨. 5135 의 ReadTimeout (queue exhausted) 는 본 fix 와 별개 — 8.4MB PDF 가 MARKER_TIMEOUT=300s 안에 못 끝나 3번 retry 다 timeout. timeout 자체를 늘리거나 큰 PDF 분할 처리하는 별도 결정 필요. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 15:29:47 +09:00
Hyungi Ahn	5185501bbd	feat(search): PR-RAG-Time-1 freshness decay (news/law_monitor) 뉴스/법령 알림 retrieval 결과에 시간 가중치 soft multiplier 적용. reranker 이후 final score 합성 단계에서 운영 정책 단계로 분리. - news (source_channel='news'): half-life 90일 - law_monitor (source_channel='law_monitor'): half-life 365일 - 비적용: manual / drive_sync / inbox_route / memo / Manual / Reference / Academic_Paper / Checklist / KGS Code / Study / content_origin='ai_drafted' - formula: decay = exp(-ln(2) * age / HL); final = base * (0.7 + 0.3 * decay) - floor 0.7 (완전 demote 금지) - 가드: missing date / future date / unknown source 모두 no-op - 임시 date source: documents.created_at (published_date 컬럼 부재 — 후속 PR) debug 메타 (?debug=true 응답 + logs/search.log): base_score / age_days / decay_factor / freshness_adjusted_score / freshness_policy / freshness_date_source 신규: app/services/search/freshness_decay.py hook: app/services/search/search_pipeline.py:303 (apply_diversity 직후, normalize 직전) schema: app/api/search.py SearchResult.freshness_debug (Optional[dict]) tests: tests/test_freshness_decay.py 24 case (정책 디스패처 9 + age/decay/score 11 + apply integration 6 — guard 1~6 all) Episode/Fact layer 와 contradiction detection 은 본 PR 스코프 외. plan: ~/.claude/plans/pr-rag-time-1-freshness-decay.md	2026-05-03 08:38:09 +09:00
Hyungi Ahn	7d0fca267d	feat(marker): handwritten 자동 skip — Phase 1D pilot 결과 반영 1D pilot (2026-05-02 야간 sweep, 25 controlled_backfill 결과) 에서 필기 PDF 3건 (4798 / 4813 / 4815) 이 status='success' 로 변환됐으나 사용자 quality 평가에서 좋은 자료 추출 불가 판정. 근본 원인은 Marker 설정 부족이 아니라 입력 자체 (애플펜슬 손글씨 + 사용자 글씨체 = OCR/ layout 모델 한계 영역). Marker 튜닝으로 해결될 영역이 아니므로 enqueue 단계에서 자동 skip. 가드 로직: marker_worker.process() 의 doc_type SKIP 직후 (1.5 단계) title/path 의 보수적 키워드 4개 (필기, 손글씨, handwritten, handwriting) 매칭 시 _set_skipped() 호출. md_content/md_content_hash NULL clear, md_extraction_error='skipped: handwritten note (title/path heuristic)', content_origin='extracted'. 키워드 선정 (보수적): 포함: 필기 / 손글씨 / handwritten / handwriting 제외 (false positive 위험): - 노트 (노트북 매뉴얼 / release notes / Note_240528_워크숍 같이 필기 아닌 정상 문서까지 잡음) - scan / 스캔 (스캔 PDF 中 정상 변환되는 케이스 있음, 1D 결과 doc 5127 표준기계설계(KS)_08_핀 density 1.59 / scan_likely 인데 성공) logger: markdown_skip_handwritten_hint id=<id> keyword=<matched> title=<...> regex 단위 테스트 15 케이스 (실 production fastapi venv) 전부 통과: 매칭: Note_240805_용접교육 필기 / Note_240827_필기 / 손글씨 모음 / Handwritten Notes 2024 / handwriting practice / path/필기/* / path/handwritten_collection/* (8건) 비매칭: 다이아프람워크숍 / 노트북 매뉴얼 / Release notes v2 / PIPE FABRICATORS / 표준기계설계 / scan documentation / 스캔 문서 (7건) 이번 가드는 enqueue 시점 적용. 이미 success 인 4건의 md_content 는 보존 (사용자가 직접 보고 싶을 때 표시 가능). 정리 필요 시 별건. 후속 (별 PR): - A2 (정식 doc_type='필기노트' 라벨): 1D 3건 sample 너무 적어 라벨 정의 보류. 필기 PDF 누적 후 별도 검토. - C (Phase 2 풀 backfill plan): 본 PR 머지 후 별도 라운드. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:11:42 +09:00
Hyungi Ahn	6b52d57bac	feat(study): Phase 4-A explanation_md 길이 cap + prompt 강화 운영 데이터에서 ready 박힌 풀이가 793/838/866자 — 권장 200~400 대비 큰 편. 1차 운영 후 결과 화면 가독성 + 토큰 사용량 통제 위해 prompt 강화 + 저장 전 cap. Prompt (study_explanation_envelope.txt): - explanation_md 권장 300~600자, 최대 900자 명시 - 핵심 개념 + 정답 근거 + 헷갈리는 1~2개 오답만 — 모든 오답 풀이 X - explanation_md 안 줄바꿈 최소화 (parse_json fix 와 결합 — invalid escape 줄임) - LaTeX 수식 자제 — \\circ/\\text/\\, 매크로 가능하면 평문 ('0°C', 'C') - 출력은 raw JSON 한 객체만 — 코드 펜스/thinking/메타 X 강조 Worker (study_explanation_worker.py): - _cap_explanation_md(text, max_chars=1200) 헬퍼 신규 · 1200자 이하 passthrough · 초과 시 마지막 200자 안에서 \\n\\n / \\n / '. ' / '다.' / '요.' 경계 탐색 · 경계에서 자르기 + '…' (단어 중간 자르기 회피) · 경계 못 찾으면 단순 자르기 + '…' - save 전 cap 적용. ai_explanation_status='ready' 유지 (cap 됐다고 failed X) - payload 에 운영 분석 metadata: explanation_len_original / _saved / capped 플래그 검증: - tests/test_explanation_cap.py (6 케이스) · short passthrough / exact at limit / paragraph boundary / sentence boundary · no boundary fallback / empty input - scripts/phase4_health.sql 섹션 8/9 추가 · ai_explanation 길이 p50/p95/max (study_questions.ready) · cap 작동 빈도 (job.payload 의 explanation_capped/_original/_saved) cap 1200 = 800 (4-B summary_md) 보다 여유 — 기사시험 풀이는 공식+오답+개념 묶이면 800 빡빡함. 운영 후 800~1000 으로 조정 검토. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:33:18 +09:00
Hyungi Ahn	b3dbf1a11e	fix(ai): parse_json_response — string literal 안만 fix 하는 stateful walker 직전 fallback 의 무차별 newline replace 가 string 외부 (object 구조) 의 raw newline 까지 escape 해서 JSON 거부. 또 LaTeX 수식 (\circ, \text, \, etc) 의 invalid backslash 는 newline 이슈와 별개라 별도 fix 필요. state machine: in_string 토글 (`\"` 만남). string literal 안에서만: - raw LF/CR/TAB → \\n/\\r/\\t 로 변환 - backslash 다음에 valid escape char (\"\\/bfnrtu) 면 그대로 - backslash 다음에 invalid (\\c, \\,) 면 backslash 자체를 \\\\ 로 escape - string 외부 raw newline 은 JSON whitespace 라 보존 운영 데이터 id=243 의 raw 940자에 \\circ \\text \\, \\approx \\times 등 다수 LaTeX + markdown 줄바꿈 → 새 walker 가 두 케이스 모두 fix. 다른 worker (classify/triage/ study_explanation/evidence/study_session_analysis) 자동 혜택. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:00:20 +09:00
Hyungi Ahn	95b127fd8d	fix(ai): parse_json_response — raw newline escape fallback (5단계) Phase 4-A debug 결과 study_question_jobs.parse_fail 33건의 raw preview 분석: - 모델이 explanation_md 안에 raw newline (LF) 그대로 박음 ('### [풀이]\n\n**자료...') - JSON 표준상 string literal 안 raw control char 금지 → json.loads 거부 - 4단계 fallback (greedy slice) 도 이 때문에 실패 5단계 fallback 추가: candidate 의 \r\n/\n/\r 을 ``\\n``/``\\r`` escape 로 치환 후 재시도. 이미 escape 된 ``\\n`` (Python str = backslash+n 두 글자) 는 raw newline 아니라 영향 없음. 다른 worker (classify/triage/study_explanation/evidence/study_session_analysis) 모두 같은 파서를 공유하므로 자동으로 혜택. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:56:01 +09:00
Hyungi Ahn	ff41feb3e3	fix(study): Phase 4-A parse_fail 디버깅 — 파서 fallback + raw 저장 운영 데이터에서 4-A study_question_jobs 의 33/114 가 'envelope JSON parse failed' 로 종결. parse_json_response 의 balanced 정규식이 못 잡는 케이스 다수 추정. 원인 분류 위해: 1. 파서 보강 (app/ai/client.py) - 기존 4단계 파싱 (fenced / balanced finditer / 전체 cleaned) 보존 - 5단계 fallback 추가: first '{' ~ last '}' greedy slice → json.loads - envelope JSON 안에 내부 따옴표/뉴라인/escape 때문에 balanced 가 못 잡는 케이스 방어. 모델이 JSON 앞뒤 자유 텍스트 섞어도 본체만 추출. - 회귀 위험 낮은 추가만 (앞 단계 성공 시 즉시 반환) 2. parse_fail 시 raw preview 저장 (study_explanation_worker) - 3개 inline parse_fail 분기 (not_dict / invalid_answer_choice / empty_explanation_md) 모두 _save_raw_preview() 헬퍼 호출 - job.payload.debug_raw_preview = raw_text[:1000] - job.payload.parse_fail_reason = 분류 키 - 향후 parse_fail row 의 payload 분석으로 원인 정확히 분류 가능 다음 단계: 배포 후 재발생 추이 + raw preview 분석 → prompt 추가 강화 또는 parser 추가 보강. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:48:10 +09:00
Hyungi Ahn	8074be6b6d	feat(study): Phase 4-D 운영 관찰 + confidence calibration Phase 4-B v1 첫 검증 결과 자료 부족 토픽인데도 모델이 confidence='high' 박는 케이스 발견. 정의 (high = 자료 + 다른 ai_explanation 으로 패턴 명확) 보다 과신 — UX 신뢰도 위험. 자동 cap 보정 + 운영 관찰 SQL 추가. confidence calibration (services/study/session_summary_guard): - calibrate_confidence(c, ctx_docs_count, ready_explanation_count) 신규 · ctx_docs_count == 0 AND ready_explanation_count == 0 → 'low' cap · ctx_docs_count == 0 (ready 만 있음) → 'medium' cap · ctx_docs_count >= 1 → 모델 값 그대로 - 모델이 정의보다 더 보수적인 값 박은 경우 (모델 'low' + cap 'medium') 는 보존 — 더 보수적인 값을 절대 올리지 않음 worker 적용 (study_session_analysis_worker): - ctx_docs_count = len(ctx_docs) - ready_explanation_count = sum(1 for a in prompt_attempts if a.get('ai_explanation')) - calibrate_confidence 호출 → study_quiz_session_analysis.confidence 박힘 - job.payload 에 운영 분석 metadata 보존: · ctx_docs_count / ready_explanation_count · model_confidence_raw (모델 응답) vs calibrated_confidence (cap 후) · prompt_attempts / valid_attempts_total / summary_len → SQL 4 번 쿼리가 cap 작동 빈도 측정 scripts/phase4_health.sql (신규 운영 점검 SQL 7 섹션): 1. 4-A study_question_jobs status × error_code 분포 2. 4-B study_quiz_session_jobs status × error_code 분포 3. 4-B confidence 분포 (calibrated) 4. 4-B model_confidence_raw vs calibrated 차이 (cap 작동 빈도) 5. 4-A/4-B 최근 7일 처리 지연 p50/p95/max/avg 6. 4-A/4-B skipped 사유 분포 7. 4-B guard_fail / parse_fail / llm_timeout 비율 ship gate (단위 테스트): - test_calibrate_confidence_no_evidence_caps_to_low (3 케이스) - test_calibrate_confidence_only_explanations_caps_to_medium (3 케이스) - test_calibrate_confidence_with_documents_passthrough (3 케이스) - test_calibrate_confidence_normalizes_invalid_first (2 케이스) Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1 후속) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:33:57 +09:00
Hyungi Ahn	1186537ecf	fix(study): Phase 4-B v1 worker — completed 박을 때 error_code 명시 clear 이전 attempt 가 llm_timeout/parse_fail 박은 후 다음 attempt 가 정상 완료해도 error_code 가 잔존해서 운영 분석 시 혼선. status='completed' 박는 시점에 error_code = None / error_message = None 으로 명시 reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:28:43 +09:00
Hyungi Ahn	ea6b2cf351	fix(study): Phase 4-B v1 prompt cap — 큰 세션 LLM timeout 방어 세션 1 (wrong+unsure 84건) 에서 prompt 가 23K자 넘어 30초 timeout. plan 가정 (5~30건) 대로 MAX_ATTEMPTS_IN_PROMPT=30 cap 추가. 가장 최근 attempts 우선 (answered_at asc 정렬의 뒤쪽). 기존 valid_attempts 카운트 검증 (5건 미만 skip) 은 그대로 유지 — cap 은 prompt 입력만, 검증은 전체 기준. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:25:14 +09:00
Hyungi Ahn	6785d53d3d	feat(study): Phase 4-B v1 세션 단위 종합 분석 (자유 마크다운) Phase 4-A 가 wrong/unsure 한 문제씩 풀이 캐시. 4-B 는 세션 전체 wrong/unsure 5~30건을 묶어 200~400자 자연어 요약 1건 생성. 결과 화면 헤더 카드. 큐 인프라는 4-A study_question_jobs 와 분리 — FK 단일 의미 + 운영 SQL 명확성 + 4-A/4-B 가드/payload/재시도 정책 차이. 신규 study_quiz_session_jobs (큐) + study_quiz_session_analysis (결과 캐시 PK=session_id, UPSERT) + 전용 consumer. Backend: - migrations/233 — study_quiz_session_jobs (FK study_quiz_sessions NOT NULL, status pending/processing/completed/failed/skipped, max_attempts=2) - migrations/234 — partial unique idx (session_id) WHERE pending/processing - migrations/235 — study_quiz_session_analysis (session_id PK, summary_md, confidence, model_name, generated_at, is_stale) - models/study_quiz_session_job — ORM + enqueue_session_analysis_job() (멱등) - models/study_quiz_session_analysis — ORM (PK = session_id) - services/study/session_summary_guard — GUARD_PATTERN (정규식) + normalize_confidence() 단일 source, worker + tests 가 import 공유 - services/study/session_summary_rag — gather_session_summary_context() documents 만 (PR-3 _gather_document_evidence 재사용). evidence 없어도 호출 허용 (4-A 와 다른 정책 — 세션 기록 자체가 evidence) - services/study/session_analysis_enqueue — auto (finalize/fallback) + request_session_analysis_regenerate (manual). manual 은 wrong/unsure < 5 즉시 차단, active job 차단, 기존 analysis 있으면 is_stale=true 박기 - prompts/study_session_summary_envelope.txt — envelope JSON {summary_md, confidence}. 정량 정수만 인용 가능, 비율/추세/범위/날짜 금지 - workers/study_session_analysis_worker — terminal status 분기: · wrong/unsure < 5 → status=skipped, error_code=insufficient_attempts · question_text/outcome 부족 → skipped, evidence_missing · GUARD_PATTERN match → failed, guard_fail · 800자 hard cap + confidence normalize · timeout/parse/unknown → 재시도 후보 · UPSERT study_quiz_session_analysis ON CONFLICT DO UPDATE (PK session_id) - workers/study_session_queue_consumer — 4-A consumer 패턴 복제. BATCH_SIZE=1 + STALE_MINUTES=10. MLX gate 4-A 와 공유 (Semaphore(1)) - main.py — APScheduler add_job(consume_study_session_queue, ..., 1분 주기) - session_finalize — 끝에서 enqueue_session_analysis_auto (best-effort) - api/study_topics: · QuizSessionAnalysisOut + ai_session_analysis 응답 필드 (analysis row + 최신 job status/error_code) · GET fallback enqueue (기존 analysis 또는 active job 없으면만, non-blocking) · POST /quiz-sessions/{sid}/regenerate-summary — manual 트리거 Frontend (quiz-sessions/[sid]/+page.svelte): - 결과 헤더에 세션 요약 카드 (AI 풀이 indicator 직후, 바로 할 일 직전) - summary_md 박혔으면 markdown 렌더, 없으면 job_status / error_code 분기: · pending/processing → "AI 가 세션 분석 중" · insufficient_attempts → "오답·모르겠음 5건 미만" · evidence_missing → "자료 부족" · guard_fail → "환각 검증 차단" + 재생성 링크 - confidence='low' 배지 + is_stale "재생성 중" 배지 - 재생성 버튼 + regenerateSummary() — reason 별 toast 분기 ship gate: - tests/test_session_summary_guard_pattern.py — 허용 5 + 차단 7 케이스 + normalize_confidence 표준/비표준 검증. python3 직접 실행 패스. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:20:29 +09:00
Hyungi Ahn	c7630b9815	feat(study): Phase 4-A 결과 화면 inline indicator — AI 풀이 진척 노출 결과 화면에서 사용자가 [AI 해설 보기] 누를 때 캐시 hit/miss 가 불투명함. 헤더에 한 줄 indicator 추가 — 오답·모르겠음 대상 N건 중 ready 박힌 카운트 + 진행 중/실패/자료 부족 분포. Backend (study_topics.py get_quiz_session): - questions[i].ai_explanation_status 응답에 추가 (q.ai_explanation_status 그대로) · frontend 가 attempts.outcome (wrong/unsure) 와 결합해 카운트 Frontend (quiz-sessions/[sid]/+page.svelte): - $derived aiExplProgress — wrong/unsure attempts 와 question.ai_explanation_status 결합 카운트 (target / ready / pending / failed / skipped) - 헤더에 Sparkles 아이콘 + "AI 풀이 자동 생성: N/M (P%)" 한 줄 · pending > 0: "생성 중 N" (warning 색) · failed > 0: "실패 N" (error 색) · skipped > 0: "자료 부족 N" (dim) · 셋 다 0인데 ready < target: "대기열 처리 대기" (worker 1분 주기 안내) 이 indicator 는 GET fallback enqueue 와 함께 작동 — 결과 화면 진입 시점에 backfill 이 누락된 wrong/unsure 가 이미 enqueue 되고, 1분 주기로 ready 박힘. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 12:15:35 +09:00
Hyungi Ahn	3db5d331de	feat(study): Phase 4-A 운영 가시화 — 통계 대시보드 AI 풀이 카드 Phase 4-A 가 wrong/unsure 풀이를 background batch 로 캐시하는데, 사용자/운영자 입장에서 (1) 지금까지 얼마나 캐시 채워졌는지, (2) 환각 차단/파싱 실패/자료 없음 같은 worker 결과 분포를 볼 수 없었음. 통계 대시보드에 카드 추가. Backend (study_question_progress.py /stats): - StatsAiExplanation 신규 응답 섹션 · status_distribution — 토픽 전체 study_questions.ai_explanation_status 분포 (none/ready/failed/skipped/stale/pending 6 키 default 0) · target_total / target_ready — wrong/unsure progress 의 ready 비율 (캐시 hit 가능성 추정 핵심 지표) · recent_jobs — 최근 7일 study_question_jobs 의 (status, error_code) 분포 ('completed', 'failed:guard_fail', 'failed:parse_fail', 'skipped:evidence_missing' 같은 합성 키) Frontend (/study/topics/[id]/stats): - 신규 Card "AI 풀이 캐시" — Sparkles 아이콘 · 큰 숫자 + 진행률 바: ready / wrong+unsure · 토픽 전체 status 분포 inline (한국어 라벨) · 최근 7일 worker 결과 grid (환각 차단 / 파싱 실패 / 자료 없음 skip 등 분리) - statusLabel / jobLabel 헬퍼 — 운영자 친화 한국어 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:59:20 +09:00
Hyungi Ahn	43097e6fd9	fix(study): Phase 4-A envelope 프롬프트 — answer_choice 사용자 정답 강제 검증 결과 모델이 envelope 안에서 자료 근거로 정답 번호를 재판단해서 거의 매번 guard_fail (answer_choice != correct_choice). 환각 가드는 정확히 작동했지만 caching 효율 0%. PR-3 의 free-form 풀이는 "사용자 정답 우선, 충돌 명시" 라 정상 ready 박혔지만 envelope.txt 가 "자료 근거 우선" 으로 충돌. 환각 가드의 본질 — 모델이 envelope 형식을 어겨 임의로 다른 번호를 박는 케이스 차단 — 을 유지하되, answer_choice 값은 사용자 정답 (correct_choice) 을 그대로 박도록 명시. 자료 근거와 사용자 정답이 다를 경우 explanation_md 안에 짧게 명시만 하고 answer_choice 는 보존. 정답 자체를 바꾸는 게 환각 가드의 차단 대상이라고 강조. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:47:44 +09:00
Hyungi Ahn	e8da53490c	feat(study): Phase 4-A wrong/unsure AI 풀이 prefetch batch PR-3 의 결과 화면 [AI 해설 보기] 실시간 호출이 클릭 시 8~30초 대기. 풀이 직후 백그라운드 batch 로 미리 생성해 캐시 hit. 환각 가드는 PR-3 보다 강화 — envelope JSON {answer_choice, explanation_md, confidence} + answer_choice == correct_choice 검증 + evidence 의무. processing_queue 가 documents.id FK 라 study_questions 에 직접 재사용 불가 → 별도 study_question_jobs 테이블 + 별도 consumer. Backend: - migrations/231 — study_question_jobs CREATE TABLE (13컬럼, kind 권장값 'explanation' / 'session_summary' 예약, status pending/processing/completed/ failed/skipped, max_attempts=2) - migrations/232 — partial unique idx (qid, kind) WHERE status IN (pending, processing) — active 행 중복 차단, terminal 이력 누적 허용 - models/study_question_job — ORM + enqueue_study_question_job() 헬퍼 (on_conflict_do_nothing 멱등) - prompts/study_explanation_envelope.txt — envelope 형식 프롬프트 (answer_choice 1~4 강제, confidence high/medium/low) - workers/study_explanation_worker — terminal status 분기: · evidence 둘 다 빈 리스트 → job/question 모두 skipped (LLM 호출 X) · answer_choice != correct_choice → guard_fail / failed (재시도 X) · timeout/parse → 재시도 후보 (max_attempts=2) · catch-all except → unknown 명시 + retryable 분기 · question.ai_explanation_status='ready' 이미 박혀있으면 즉시 completed · confidence 는 job.payload 에 보존 (운영 분석) - workers/study_queue_consumer — APScheduler 1분 주기, BATCH_SIZE=1, MLX gate Semaphore(1) 공유. STALE_MINUTES=10 자체 복구 - main.py — scheduler.add_job(consume_study_queue, ..., id='study_queue_consumer') - services/study/explanation_enqueue — finalize + GET fallback 공유 헬퍼: filter_needs_explanation (study_questions status + 최신 job error_code 필터, guard_fail/evidence_missing 인 마지막 job 은 자동 재enqueue 제외) + enqueue_explanation_for_qids (max_count cap) - session_finalize — 끝에서 wrong/unsure qid prefetch enqueue (best-effort, 실패해도 finalize 자체 안 깨짐) - api/study_topics get_quiz_session — done 세션에서 backfill enqueue (max=30, non-blocking, debug 로그) 대상 조건: ai_explanation_status IN ('none', 'failed') OR ai_explanation IS NULL. stale / skipped / pending / ready 는 자동 enqueue 대상 X. stale 재생성은 PR-3 명시 [다시 생성] 또는 후속 Phase 에서. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-A) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:42:08 +09:00
Hyungi Ahn	711b81f8f0	feat(study): Phase 2-F due_at 정체 정리 — overdue redistribute 사용자가 며칠 안 들어오면 due_today 가 누적되어 학습 페이스 압박. Phase 1 plan 위험 항목 처리. 자동 batch 대신 사용자 명시 액션으로 통제권 보장. Backend: - POST /study-topics/{tid}/review-queue/redistribute — overdue 를 round-robin 분산. days_offset = i % spread_days + 1 (오늘 + 1~7일). 같은 날 안에서도 i*7분 spread 로 시간 분산. review_stage 는 보존 (재배치만, stage 리셋 X). body { spread_days: 1~14, default 7 }. 응답 { redistributed_count, spread_days }. - GET /review-queue?tab=due_today 응답에 overdue_count: int 옵션 필드 — UI 가 경고 + [정리] 노출 판단. due_at < today 0시 (UTC) + stage<4 카운트. Frontend (review-queue): - due_today 탭에서 overdue_count>0 시 노란 banner — "정체 N건" + [정리] 버튼. - 정리 클릭 → confirm → POST → toast (N건을 7일에 분산) → 카운트/목록 reload. - 다른 탭에서는 banner 미노출 (backend 가 overdue_count=0 응답). Plan: ~/.claude/plans/crispy-petting-dijkstra.md (Phase 2-F) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:48:00 +09:00
Hyungi Ahn	f42f6ff480	feat(study): Phase 2-E 복습함 멀티 셀렉트 → 복습 세션 복습함 카드 단위 체크박스 + sticky bottom bar 로 N개 골라 한 quiz_session. backend QuizSessionStartRequest 에 question_ids 파라미터 추가 — 우선순위 stage > question_ids > 기존 subject 경로. 명시되면 selection 우회 + 검증 (user × topic 소속 + 미삭제 + 최대 200 + 중복 제거 순서 보존). Backend: - question_ids: list[int] \| None — Field 한도 200 - valid_set 검증: 다른 user/topic 또는 deleted_at 인 qid 는 silent drop - subject_distribution 자동 계산 (결과 카드용) - 빈 wanted / 무효 qid → 400 Frontend (review-queue 페이지): - 카드 좌측 체크박스 (분리 영역, 본문 클릭은 기존대로 문제 페이지) - "이 페이지 전체 선택 / 해제" 토글 - 선택 N>0 시 sticky bottom bar — `{N}개 풀이 시작` 버튼 - 탭 변경 시 선택 초기화 (다른 의도 묶음 가능성) - 페이지 이동 시 선택 유지 (Set<question_id>) - 진행 중 in_progress 세션 있으면 confirm 후 abandon - 200 한도 도달 시 toast 경고 Plan: ~/.claude/plans/crispy-petting-dijkstra.md (Phase 2-E) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:39:46 +09:00
Hyungi Ahn	d39882c38e	feat(study): Phase 2-D 학습 통계 대시보드 — 6 섹션 신규 라우트 /study/topics/[id]/stats — backend 단일 endpoint 호출로 6 섹션: 진척도 / 학습 상태 분포 / 복습 큐 / 세션 추이 / 일별 풀이량 / 과목별 약점. 차트는 SVG 직접 렌더 (의존성 0). Backend (app/api/study_question_progress.py): - GET /study-topics/{tid}/stats — 6~7 쿼리 묶음 · 문제 진척도 (study_questions count + progress count) · pattern_state 분포 (NULL → unattempted + 토픽 미시도분 합산) · review_stage 분포 (0/1/2/3/mastered≥4) · due 분류 (today / this_week / later / mastered) — datetime 비교 + filter · 최근 done 세션 추이 (Phase 2-B 4 컬럼 활용, limit 20) · 일별 풀이량 30일 (cast Date + group) · 과목별 약점 (subject 별 attempted/correct/pending_review/chronic) Frontend (/study/topics/[id]/stats): - Card grid 6개. 진행률 바 + stacked horizontal bar + SVG sparkline + bar chart. - 패턴 분포: 7색 stacked bar + 범례 grid. - 복습 큐: 4 카운트 박스 + stage 분포 inline. - 세션 추이: SVG sparkline (50% baseline) + 최근 5세션 표 (회복/퇴행/새로 맞힘 인라인). - 일별 풀이량: SVG bar (max 동적) + title tooltip + start/end 날짜 라벨. - 과목별: 정답률 진행률 바 + 미확인/반복 오답 인라인. 진입: 토픽 페이지 헤더 [통계] 버튼. Plan: ~/.claude/plans/crispy-petting-dijkstra.md (Phase 2-D) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:04:03 +09:00
Hyungi Ahn	d3bf963a66	feat(study): Phase 2-B 결과 화면 변화 카운트 + 확인완료 progress 통합 Phase 1 finalize 가 계산하던 SessionSummary 가 응답에 포함되지 않고 discard 되던 것을 quiz_session row 4 컬럼으로 영속화. 결과 화면 헤더에 회복/퇴행/ 새로 맞힘/반복 오답 누적 변화 카운트 + "바로 할 일" 콜아웃 (지금 시점 progress 기반 동적 카운트 — pending_review/chronic/regressed). 동적 카운트는 결과 GET 호출 시점에만 계산 (목록 endpoint 비용 회피). 확인완료 통합 — 결과 카드의 [학습완료] 버튼이 attempts.reviewed_at 만 박던 것을 progress.last_reviewed_at + (wrong/unsure 면 due_at 최초 부여) 도 같이 박도록. reviewed=false 토글은 attempts 만 되돌림 (다른 attempt 가 검토 표시 했을 수 있어 progress 의 last_reviewed_at 은 보존). - migrations/230 — quiz_sessions 4 컬럼 ADD (단일 ALTER TABLE) - StudyQuizSession 모델 + finalize_session 가 row 영속화 - QuizSessionSummary 응답에 4 스냅샷 + 3 동적 필드 (default 0) - _build_session_summary include_progress_counts=True 시 SQL 3회 - review-mark 가 reveiwed=true 시 progress 동기화 - 결과 화면: 헤더 변화 카운트 줄 + 바로 할 일 콜아웃 (값 있을 때만) Plan: ~/.claude/plans/crispy-petting-dijkstra.md (Phase 2-B) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:49:01 +09:00
Hyungi Ahn	d038f11444	feat(canonical): Phase 1C MarkdownDoc renderer + heading anchor + KaTeX 문서 상세 페이지에서 canonical markdown(md_content) 을 우선 렌더하고 없으면 extracted_text fallback. md_frontmatter 가 있으면 본문 위에 메타 박스. h1~h6 에 GFM heading id + hover 시 # 링크 표시. 이미지 alt 가 있으면 figure + figcaption. KaTeX 수식 ($...$ / $$...$$) 지원. Backend: - DocumentDetailResponse 신규 (DocumentResponse + extracted_text + md_*) - GET /documents/{doc_id} 응답 모델 전환 - 리스트 응답은 DocumentResponse 그대로 (페이로드 비대화 회피) Frontend: - lib/utils/docMarkdown.ts — 별도 Marked 인스턴스 (study mathMarkdown.ts 영향 0). marked-katex-extension + marked-gfm-heading-id + custom image renderer (figure/figcaption + data-md-img marker). - lib/components/MarkdownDoc.svelte — md_content/extracted_text 우선순위, frontmatter 박스, mdStatus=failed 안내 배지, heading anchor DOM 후처리. - /documents/[id] markdown / hwp-markdown / article viewer 3 곳 wiring. - app.css — .markdown-doc heading-anchor / md-figure / katex 가로 스크롤. 이미지 ImgAuth 후처리(blob URL 교체) wiring 은 Phase 1B.5 에서. 현재는 data-md-img="1" 마킹만 두고 marker 출력 src 그대로. Plan: ~/.claude/plans/plan-idempotent-sundae.md (Phase 1C) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:37:33 +09:00
Hyungi Ahn	9094b2dbc5	feat(study): Phase 1-E 풀이 선별 알고리즘 — bucket + stage 비율 vision 의 단일 풀이 진입점 — stage (intro/learning/pre_exam) + size 옵션으로 같은 endpoint 가 다른 분포의 문제 출제. services/study/quiz_selection.py: - bucket: unattempted / wrong_or_unsure / due_review / regressed / frequent / random - stage 별 비율: - intro: unattempted 55, wrong_or_unsure 30, frequent 15 - learning: due_review 20, wrong_or_unsure 40, unattempted 30, frequent 10 - pre_exam: due_review 20, wrong_or_unsure 30, regressed 10, frequent 20, random 20 - bucket 우선순위 (dict 순서) — 다음 bucket 은 이미 뽑힌 qid 제외 - 후보 부족 시 random backfill, 그래도 부족 시 ValueError api/study_topics.py: - QuizSessionStartRequest 에 stage / size 옵션 추가 - stage 명시 시 select_questions_for_quiz 사용 - stage 미명시 시 기존 PR-12-B 경로 (subject bucket + spacing) 호환 유지 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:30:11 +09:00
Hyungi Ahn	e5982ebde4	feat(study): Phase 1 학습 루프 데이터 계층 — progress 캐시 + finalize + review API vision (풀이 → 확인 → 학습 → 복습 → 다음 풀이 가중치) 의 데이터 계층. 데이터 모델 (migrations 222~225): - study_question_progress 테이블 — user × topic × question 단위 현재 상태 캐시 - 마지막 시도: last_outcome, last_attempted_at, last_attempt_id - 검토 상태: last_reviewed_at - 복습 큐: due_at, review_stage - 패턴 분류 (derived): pattern_state, pattern_updated_at, pattern_window_attempts - 3 partial idx (due / topic_pattern / pending_review) — 탭별 빠른 조회 패턴 분류 (services/study/learning_pattern.py): - 7 분류: unattempted/unsure/chronic_wrong/regressed/recovered/stable/unstable - 윈도우 = 최근 3회 + 과거 correct/wrong 존재 여부 - chronic_wrong > regressed > recovered 우선순위 (보수적 학습) - 가드: wrong 1회만으로 regressed 안 됨 (이전 correct 이력 필요) - stable 은 3 연속 correct 부터 세션 종료 집계 (services/study/session_finalize.py): - attempts append-only 원본 보존, progress upsert 만 - 마지막 attempt 직후 finalize hook 자동 발동 - finalize 는 last_* + pattern_state 만 갱신, due_at 미진입 문제는 NULL 유지 - 이미 due_at 박힌 문제는 finalize 가 stage 갱신 (correct → +1 / wrong → 리셋) API (api/study_question_progress.py): - POST /study-topics/{tid}/questions/{qid}/review-complete → last_reviewed_at + (wrong/unsure 인 경우만) due_at 최초 부여 - GET /study-topics/{tid}/review-queue?tab=due_today\|pending_review\|chronic\|regressed\|mastered → 5 탭 paginated 조회 → pending_review 는 last_reviewed_at < last_attempted_at 까지 포함 (이전 확인완료 후 다시 wrong 잡힘) Phase 1-E (풀이 선별 알고리즘) 은 후속 commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:28:46 +09:00
Hyungi Ahn	daaf18bdae	fix(canonical): add markdown to process_stage ORM enum (Phase 1B follow-up) migration 222 가 DB enum 에 markdown 을 추가했지만 SQLAlchemy ORM 측 enum 정의 (app/models/queue.py) 에 누락되어 LookupError 발생. 테스트 enqueue → consumer 실행 시: LookupError: 'markdown' is not among the defined enum values. DB enum 마이그레이션은 migration 222 가 처리. ORM 측은 SQLAlchemy 가 직렬화/역직렬화에 사용하는 Python 측 enum mirror 역할. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:16:03 +00:00
Hyungi Ahn	e50869cbda	feat(canonical): Phase 1B marker-service + marker_worker for PDF→markdown (222) 신규 컨테이너 marker-service (port 3300, Marker 1.10.2 + surya 0.17.1 + HF cache volume). marker_worker 가 markdown stage 큐 소비: classify_worker → enqueue 'markdown' (leaf, embed/chunk 와 독립) → SKIP_DOC_TYPES (발주서/세금계산서/명세표) 스킵 → 확장자 != .pdf 스킵 (Phase 1B = PDF only) → page_count > 200 스킵 → marker-service POST /convert → 422/404 = doc-level failed, 5xx = queue retry 안정성 장치: - migration 222: ALTER TYPE process_stage ADD VALUE markdown (단일 statement) - md_extraction_quality JSONB dict 직접 저장 - skip 시 md_content/hash NULL 클리어 - /ready Response.status_code + warmup_error 가시화 - HF cache volume (build-time download 0) - file_path 는 NAS 상대경로 → /documents prefix prepend 성공 기준: 파이프라인 안정성. markdown 품질은 Phase 1D pilot. Pre-flight (2026-05-01): - marker-pdf 1.10.2 stable - file_path 9503건 NAS 상대경로 - DOCUMENT_TYPES 한국어 7종 → SKIP alias 보강 - queue retry max_attempts=3 + reset_stale_items 확인 - main 220/221 study_q_related 선점 → 222 rebump Plan: ~/.claude/plans/plan-idempotent-sundae.md (Round 5 approved) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:06:23 +00:00
Hyungi Ahn	219e233a48	feat(study): related-types DB 캐시 — HNSW 매번 재계산 제거 - migrations 220/221: study_questions 에 related_repeat/similar JSONB + 카운트/grade/computed_at/threshold_version + partial idx - 임베딩 워커: ready 처리 직후 같은 트랜잭션에서 related 계산·저장 + 같은 토픽 ready 행들의 related_computed_at=NULL invalidation - 신규 cron study_q_related_refresh (1분, batch=20) — stale 캐시 일괄 재계산 - API list_related_types: cache hit (computed_at + threshold version 일치) 시 SELECT 1번으로 응답. miss 면 즉시 계산+저장 후 응답 - update_question PATCH: 본문/exam_round 변경 시 related_computed_at=NULL - soft delete: 같은 토픽 ready 행 invalidation threshold 변경 시: related_types.THRESHOLD_VERSION 갱신 + UPDATE WHERE version != '<신>' SET computed_at=NULL 한 번이면 cron 자동 일괄 재계산. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 07:22:31 +09:00
Hyungi Ahn	fc8aea1649	feat(study): 반복 출제 라벨 등급 + cosine 임계값 0.85 조정 - round_count 별 등급 매핑 (단골/잘 나오는 반복 출제/반복 출제/신출/빈출) - ≥7 단골, 5–6 잘 나오는 반복 출제, 3–4 반복 출제, 2 + max(연도)≥2024 신출, 2 + 모두 옛 빈출 - SIMILAR_THRESHOLD 0.88 → 0.85 (5-source 분포 측정 결과 자연 갭 위치 반영) - API 응답 + 프론트 3곳 (보기/통합뷰/결과 카드) 라벨 일괄 통일 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 08:50:39 +09:00
Hyungi Ahn	b20c4f933b	feat(study): exam_round 필터 + 일괄 import 스크립트 — 1천+ 문제 대비 (P0) 문제: 1천+ 문항 토픽에서 보기 페이지 prev/next 가 page_size=200 cap 으로 회차 외 문항만 받아 같은 회차 prev/next 누락 회귀. 해결: - /study-topics/{tid}/questions 에 exam_round Query 파라미터 추가 (exact match). - StudyQuestionSummary 응답에 exam_question_number 필드 추가. - exam_round 필터 시 정렬 = exam_question_number asc NULLS LAST, created_at asc. - 보기 페이지 loadRoundSiblings 가 ?exam_round= 로 한 회차만 fetch. - 토스트 문구 "토픽 200문제 초과" → "이 회차에 200문항 초과" (의미 일치). 추가 — 가스기사 기출 일괄 import 스크립트: - scripts/import_gas_questions.py: md 파서 + dry-run + apply. · exam_question_number 3소스 (파일명/제목/메타) 일치 검증. · subject 정규화 (괄호 세부분류는 scope 로 이동, 5과목 통일). · 이미지 4케이스 판정 + import_reports/{회차}_image_required.md 생성. · 첫 실패 abort 기본, --skip-existing/--continue-on-error 옵션. · 토큰 사전 검사 (GET /study-topics/{tid}). - import_reports/: 2019년 1~3회 + 2020년 1~2회 리포트. - 운영: 4회분 360문항 자동 import 완료 (이미지 4건 자동 첨부). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:39:02 +09:00
Hyungi Ahn	13404cd366	feat(study): 같은 유형 과밀 방지 — 출제 단계 spacing (PR-12-B) 학습 의미: 한 quiz 세션 안에서 같은 유형 문제가 과도하게 몰리지 않게 분산. 같은 유형을 없애는 게 아니라 펼치는 것 — dedup/제거 프레임 금지. - 마이그레이션 210: study_quiz_sessions.quiz_mode VARCHAR(30) DEFAULT 'random' - ORM: StudyQuizSession.quiz_mode 필드 - service.related_types: apply_type_spacing helper 추가 - SPACING_THRESHOLD=0.88 (회차 무관 — PR-12-A 회차 필터 재사용 X) - PER_TYPE_CAP=2 (local neighbor cap, transitive cluster 보장 X) - SPACING_BUFFER_RATIO=2.0 - 3단계 fallback: ready spacing → pending 보충 → hold cap 위반 fallback - debug 로그 type_spacing_applied subject=... ready=N selected=M ... - _select_questions_for_topic: subject bucket 단위 spacing (과목 균등 보호) - QuizMode Enum (random) — 향후 frequent_focus/wrong_variants 예약 - start_quiz_session 에 quiz_mode 받기 + apply_spacing 전달 - 프론트 startNewQuiz body 에 quiz_mode='random' 명시 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:45:15 +09:00
Hyungi Ahn	cbe852bb37	feat(study): 반복 출제 / 유사 유형 분리 표시 (PR-12-A) 학습 의미가 회차 간 반복성 — 차단/제거가 아니라 패턴 표시 frame. - 신규 service `related_types.py` — threshold/회차 필터/round_count 계산 공유 - REPEAT >= 0.95 / SIMILAR 0.88~0.95 - 회차 조건 백엔드 강제 (자기 자신/같은 회차/null exam_round candidate 제외) - round_count: related_count == 0 → 0 (현재 회차만 1로 채우지 않음) - GET /study-questions/{qid}/related-types — 단건 분류 (repeat_questions / similar_questions) - POST /study-topics/{tid}/related-types-bulk — 카드 배지용 카운트 batch - 비교 대상 = 토픽 전체 ready pool (입력 qid 끼리 비교 X) - 응답 키 보존 — 권한 없음/임베딩 미준비 등도 (0,0,0,0) - 보기 페이지: PR-11 비슷한 문제 토글 제거 + 🔥 반복 출제 / 🧩 유사 유형 두 섹션 자동 노출 - 헤더 = round_count "N개 회차", 본문 위 = related_count "관련 N문제" - source_status / source_exam_round 안내 분기 - 결과 페이지 (틀린/모르겠음 카드): bulk 호출 후 round_count >= 2 일 때만 배지 - 통합뷰 회차 expand 시 lazy bulk 호출 — 같은 회차 캐시 - 기존 /similar 엔드포인트 유지 (raw 디버깅용) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:09:14 +09:00
Hyungi Ahn	1cf64fd11e	feat(study): 문제 회차별 그룹 + 읽기전용 보기 페이지 (PR-11) - 통합뷰 문제 섹션: 평면 리스트 → 회차별 아코디언 (디폴트 모두 접힘) - 회차 정렬: "YYYY년 N회" 파싱 → year desc / round desc (localeCompare 단독 회귀 차단) - 회차 행 라벨: "총 시도 N건 · 마지막 결과: 정답 K / 오답 M" (누적/마지막 혼동 회피) - 회차 미지정 그룹은 노란 톤 + 안내, 표시 문자열은 UI 전용 (원본 NULL 분리) - 본문 / [편집] 링크 구조 분리로 이벤트 버블링 충돌 차단 - /study/topics/{tid}/questions/{qid} 신규 — KaTeX 마크다운 렌더 + 정답 표시 + AI 해설 5상태 (idle/loading/success/stale/error) + 비슷한 문제 + prev/next - prev/next URL 직접 접근 — 단건 fetch + 같은 회차 목록 fetch 자체 처리 - page_size=200 만땅 + total>200 시 토스트 안내 (조용히 자르지 않음) - 사용자 입력 해설/이미지 없으면 섹션 숨김, exam_round NULL 이면 prev/next 비활성 - StudyTopicQuestionSummary 에 exam_question_number 추가 (회차 안 정렬 키) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 07:01:27 +09:00
Hyungi Ahn	6e25523600	fix(study): quiz_session 결과 — StudyQuestionImage.position → sort_order 재사용 PR-10 결과 페이지에서 GET /quiz-sessions/{sid} 가 500. 이미지 batch 호출에서 존재하지 않는 컬럼 position 사용 → AttributeError. 기존 _images_for_questions_batch 헬퍼 (sort_order 기준 + served_url 포함) 재사용. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:52:52 +09:00
Hyungi Ahn	7f4d64c6df	feat(study): 문제풀이 세션 + 결과 카드 + 학습완료 체크 (PR-10) - study_quiz_sessions 테이블 (한 토픽 in_progress 1개 partial unique) - study_question_attempts 에 quiz_session_id + reviewed_at 컬럼 - 풀이 진행률 서버 단일 진실 (cursor) — 나갔다 와도 이어풀기 가능 - 통합뷰: 진행 중 카드(이어풀기) + 최근 완료 결과 카드(미확인 N건 배지) - 신규 /quiz-sessions/[sid] 결과 페이지 (3 카테고리 + AI 해설 + 분야 설명 + 학습완료 토글) - /review 페이지는 풀이만, 마지막 문제 풀이 후 결과 페이지로 redirect - 마이그레이션 206~209 (single-statement, asyncpg 호환) - API: POST/GET/PATCH /study-topics/{tid}/quiz-sessions(/{sid}), PATCH /study-question-attempts/{aid}/review-mark - AttemptCreate.quiz_session_id 추가 — submit_attempt 가 같은 트랜잭션에서 세션 cursor + count 증가, 마지막이면 status='done' + finished_at Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:49:21 +09:00
Hyungi Ahn	d968b2d901	feat(study): 문제풀이 모드 개편 + 결과 분류 + 분야 설명 (PR-9) - 라벨 "복습 시작" → "문제풀이" - attempts.outcome 컬럼 + selected_choice nullable (correct/wrong/unsure) - 풀이 중 정답·해설·AI·비슷한 문제 모두 비노출, 답 클릭 시 자동 진행 - "모르겠음" 5번째 옵션 추가 - 결과 화면 = 정답/틀린/모르겠음 3 카테고리 탭, 카드 클릭 expand - 틀린 → PR-3 AI 해설 (RAG) - 모르겠음 → 분야(subject+scope) 설명 AI 즉석 생성 + 캐시 (PR-9 신규) - 분야 설명 RAG: 매핑 documents 청크 + 같은 분야 다른 문제·해설 → bge-reranker - 마이그레이션 200~205 (single-statement, asyncpg 호환) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 15:58:35 +09:00
Hyungi Ahn	8b15e6e019	feat(study): 문제 첨부 이미지 (PR-8) 문제별 N개 이미지 첨부. 회로도/그래프 등이 필요한 시험 문제 지원. 입력·편집·복습 모두에서 표시. 데이터 모델 (migration 198): - study_question_images: id, user_id FK CASCADE, study_question_id FK CASCADE, file_path, file_size, mime_type, sort_order, created_at - partial idx (study_question_id, sort_order, id) 저장: NAS /documents/study_question_images/{topic_id}/{qid}/{img_id}.{ext} file_watcher 가 보는 PKM 경로와 분리 — 자동 인덱싱 안 됨. API: - POST /api/study-questions/{qid}/images (multipart, MIME PNG/JPEG/WEBP/GIF, 10MB/파일 제한, sort_order 자동 max+1) - GET /api/study-questions/{qid}/images/{img_id}/raw (FileResponse, Bearer 인증) - DELETE /api/study-questions/{qid}/images/{img_id} (DB row + 파일 시스템 정리) - StudyQuestionResponse / ReviewQuestionItem 응답에 images 배열 포함 - StudyQuestionSummary 응답에 has_images bool 추가 프론트: - 신규 lib/components/ImgAuth.svelte — Bearer 인증 endpoint 의 이미지를 fetch + blob URL 로 변환해 <img> 표시. unmount 시 URL.revokeObjectURL. - /questions/new: 입력 폼에 이미지 dropzone (client-side 보유) → POST /questions 받은 qid 로 자동 multipart 업로드. "저장 후 계속 입력" 시 reset. - /questions/[qid]/edit: 별도 카드 — 기존 이미지 grid + 추가/삭제. 즉시 업로드. - /review: 문제 본문 아래 이미지 grid (max-h-72 object-contain). - 모든 표시는 ImgAuth 컴포넌트 — accessToken 만료 케이스 대비. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:41:50 +09:00
Hyungi Ahn	7dd77ec926	fix(classify): data_origin enum 검증 — knowledge 등 잘못된 값 cascade fail 방지 AI 응답에서 dataOrigin='knowledge' 같은 doc_purpose enum 값이 data_origin 컬럼에 잘못 매핑되면 asyncpg InvalidTextRepresentationError 발생. 같은 classify_worker session 의 후속 autoflush 호출이 PendingRollbackError 로 cascade 되어 batch 안 다른 문서까지 모두 실패. doc_purpose 처럼 enum 허용값(work/external) 검증 후 박도록 수정. 외 값은 skip (data_origin NULL 유지). 가스기사 토픽 결손 15건의 RAG 결손 root cause.	2026-04-28 10:01:45 +09:00
Hyungi Ahn	8803e6a0fd	feat(study): 시험·회차·문항 관리 (PR-6) 기사시험 회차별 100문제 채워가기 시나리오. 문제 입력 페이지를 단순 폼에서 "회차 진행률 추적·재개" 도구로 보강. 데이터 모델 (migrations 195~197): - study_topics: exam_round_size INT CHECK 1~300 (회차당 문항 수, NULL=미설정) + exam_subjects JSONB DEFAULT '[]' (과목 리스트, 입력 페이지 드롭다운 옵션) - study_questions: exam_question_number SMALLINT CHECK >0 (회차 안 문항 번호) - partial idx (study_topic_id, exam_round, exam_question_number) WHERE deleted_at IS NULL AND exam_round IS NOT NULL — 회차별 max+count 고속화 백엔드: - POST /questions: exam_round 명시 + exam_question_number 미명시 시 서버가 같은 토픽·회차의 max+1 자동 채움 - 신규 GET /api/study-topics/{id}/exam-rounds: 회차별 진행률 집계 {exam_round_size, items: [{exam_round, question_count, max_question_number, next_question_number, is_complete}]} - StudyTopic Create/Update/Response/Meta 에 exam_round_size·exam_subjects - StudyQuestion Create/Update/Response 에 exam_question_number - exam_question_number 변경은 embedding stale 트리거에서 제외 (의미 영향 없음) 프론트: - 토픽 생성/편집 모달: "시험 정보" 섹션 (회차당 문항 수 + 과목 리스트 +추가/제거 칩) - /study/topics/[id]/exam-rounds 신규 페이지: 회차 카드 + 진행 바 + [N번부터 이어서] 버튼 + [새 회차 시작] 모달 - 통합뷰 문제 섹션 헤더에 [회차 보기] 진입점 - /questions/new 페이지 전면 개편: - 시험명 = topic.name 자동 prefill - 과목 드롭다운 (topic.exam_subjects + 기존 distinct, "직접 입력" 토글) - 회차 드롭다운 (기존 distinct + "새 회차") - 문항 번호 자동 (회차 선택 시 next_question_number, 새 회차 = 1) - 진행률 바 (현재/exam_round_size) - 출처/메모 자동 합성 "회차 N번" (수정 가능) - "저장 후 계속 입력" → 본문/보기/정답 reset, 회차 유지, 문항 +1 - 회차 변경 감지 시 문항 번호 1로 reset - exam_round_size 도달 시 회차 강조 + "저장 후 계속 입력" 비활성 - query string ?exam_round=&start_qnum= 지원 (회차 목록에서 재개 진입) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 09:31:06 +09:00
Hyungi Ahn	5b55274368	feat(study): 비슷한 문제 검색 (PR-5) study_questions 자동 임베딩(PR-4 bge-m3 1024차원) 기반 cosine 유사도 top-K. 26B 호출 없음, vector search 만. additive UI — 기존 입력·복습 흐름 영향 없음. 백엔드: GET /api/study-questions/{id}/similar?limit=5&topic_only=true - 자기 자신/soft-deleted/embedding_status!=ready 제외 - topic_only=true (default) 면 같은 study_topic 안에서만 - 응답: items[{id, question_text(80자 truncate), subject, scope, exam_round, similarity(1-cosine), attempt_count, last_correct}], source_status, source_id - 현재 문제 embedding 미생성/실패/stale 시 빈 결과 + source_status 안내 - attempt_count + last_correct batch 조회 (N+1 회피) 프론트: - 편집 화면(/questions/[qid]/edit): 페이지 로드 시 자동 GET /similar → 카드 5개. 본문 truncate + subject/scope/exam_round + 유사도 % + attempt 배지 (정/오답 아이콘). 카드 클릭 시 해당 문제 편집 페이지로 이동. - 복습 화면(/review): 답 제출 후 "비슷한 문제 보기" 토글 → expand 5개 카드. 같은 형태. 다음 문제로 cursor 이동 시 자동 닫힘. - 통합뷰: 변경 없음 (이미 편집 진입점이 시각적 cue 역할). source_status별 안내 (pending/failed/stale/none): 임베딩이 아직 준비 안 됐을 때 "약 1분 안에 cron 자동 처리" 메시지 노출. 후속 PR 예정: subject/scope 자동 추천(PR-6), 오답노트/통계(PR-7), AI 풀이 idle batch(PR-8). 현재 PR-5 는 vector search 결과 노출까지만. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 09:05:55 +09:00
Hyungi Ahn	5a8c7595d7	fix(study): 워커 mapper chain — User/Document FK ref 추가	2026-04-28 08:59:33 +09:00
Hyungi Ahn	b0a087ab6f	fix(study): 워커 mapper chain — StudySession 도 defensive import	2026-04-28 08:57:57 +09:00
Hyungi Ahn	de781ed622	fix(study): 워커 단독 진입 시 StudyQuestion mapper 초기화 위해 StudyTopic defensive import	2026-04-28 08:55:55 +09:00
Hyungi Ahn	9d4aa201a8	feat(study): study_questions 자동 임베딩 (PR-4) 문제 본문 + 보기 1~4 → bge-m3 1024차원. status 자체가 큐 역할 (별도 큐 테이블 없음 — ProcessingQueue 인프라 영향 0). APScheduler 1분 cron 이 status in {none, failed, stale} 행을 batch=10 처리. 새 문제는 default 'none' 으로 자동 backfill. 데이터 모델 (migrations 193~194): - study_questions: embedding vector(1024), embedding_status VARCHAR(20) DEFAULT 'none' (none/pending/ready/failed/stale), embedding_updated_at, embedding_model - HNSW partial index (vector_cosine_ops) WHERE deleted_at IS NULL AND embedding IS NOT NULL — bge-m3 cosine 기준, documents.embedding (ivfflat) 과 ops 일관 재계산 트리거: question_text / choice_1~4 변경 시 ready→stale 자동. correct_choice / explanation / subject / scope 변경은 재계산 안 함 (의미 검색에 영향 없음). 워커 (workers/study_question_embed_worker.py): - race-safe pending 마킹 (조건부 UPDATE WHERE status IN none/failed/stale) - AIClient.embed(text) bge-m3 호출, 15s timeout - 실패 시 status='failed', 직전 embedding 보존, 다음 cron 틱에 재시도 - 본문 = "문제: ...\n보기:\n1. ...\n2. ...\n3. ...\n4. ..." (subject/scope 의도 제외 — 분류명이 의미 검색 노이즈) 후속 PR 예정: 비슷한 문제 검색 UI / 중복 입력 감지 / RAG 정확도 향상 / 오답 클러스터링. 본 PR 은 임베딩 저장·재계산·backfill 까지만. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:54:02 +09:00
Hyungi Ahn	e1a2cdc677	feat(study): AI 풀이 생성 — 수동 트리거 + RAG (PR-3) 복습 답 제출 후 또는 편집 화면에서 사용자가 명시적으로 누를 때만 AI 가 4지선다 풀이 생성. 자동 일괄 생성 금지 (하루 100문제 입력 시 MLX 부하· 잘못 입력 문제 해설 위험). 데이터 모델 (migrations 191~192): - study_questions 4 컬럼 추가: ai_explanation TEXT, ai_explanation_status VARCHAR(20) DEFAULT 'none' (none/pending/ready/failed/stale), ai_explanation_generated_at, ai_explanation_model - partial idx (study_topic_id, ai_explanation_status) WHERE status != 'none' PATCH stale 자동 전이: question_text/choice_*/correct_choice 변경 시 status='ready' 만 'stale' 로. 본문은 보존, UI 배지 + "다시 생성" 동선. 신규 엔드포인트: POST /api/study-questions/{id}/ai-explanation - regenerate=false + ready/stale → 캐시 즉시 (MLX 호출 없음, is_stale 플래그) - pending → 409 (race-safe 조건부 UPDATE 로 동시 호출 차단) - 그 외 → 새 생성 RAG 입력 풀: - 1순위: study_topic 매핑 documents 청크 + ai_summary, bge-reranker top-5 - 2순위: 같은 토픽 다른 questions (자기 자신 제외, ai_explanation 은 ready 상태만 포함 — 재귀적 hallucination 방지), reranker top-3 - 제외: 필기 OCR / 외부 웹 / Premium 모델 모델: Mac mini MLX gemma-4-26b primary 단독. get_mlx_gate() Semaphore(1) 경유, 30s timeout. 실패 시 status='failed' + 직전 본문 보존. 프롬프트 (app/prompts/study_question_explanation.txt): 자료 우선순위·인용 형식·할루시네이션 방지 절대 규칙 (법령명·조항·수치·표준 번호 단정 금지, "자료에서 확인되지 않음" 명시). 프론트: - 복습 화면 답 제출 후 인라인 expand. status별 버튼 분기 (ready 캐시 / stale "이전 풀이"+"다시 생성" / failed "다시 시도") - 편집 화면 별도 카드. 상태 배지 + "이전 풀이 보기" / "다시 생성" 분리 - 참고 근거 토글 (source_type 별 아이콘 📄/❓ + 제목 + snippet) 후속 PR 보류: 오답노트/통계, AI 일괄 백그라운드 생성, 필기 OCR RAG, Premium/Claude 재생성, /api/search/ask retrieval scope 통합. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:41:46 +09:00
Hyungi Ahn	0e2a430a6c	fix(study): 통합뷰 자료 섹션 카테고리 트리 그룹핑 + 접기 가스기사처럼 한 워크스페이스에 273건 자료가 묶이면 평면 리스트로 쭉 나열 되어 통합뷰가 무너졌음. /study/topics/[id] 자료 섹션을 자료실 카테고리 경로 기반 트리로 그룹핑하고 노드별 접기/펼치기 도입. 기본값 모두 접힘. 백엔드: StudyTopicDocumentSummary 에 library_paths(`@library/<path>` 태그 에서 prefix 제거) 필드 추가. 그룹핑은 첫 path 만 사용 (단순화). 프론트: documents 를 path segment 별로 트리 빌드 → snippet 재귀 렌더링. 헤더에 "자료 N개 · 카테고리 K개 · [모두 펼치기/접기]" 컨트롤. 분류 없는 자료는 "분류 없음" 그룹으로 별도. 자료 0건 path 는 자동 누락. 필기/문제 섹션은 분류축이 달라(certification/subject vs subject) 동일 트리 못 쓰므로 본 PR 범위 밖. 후속에서 패턴 일관성 검토.	2026-04-28 08:14:58 +09:00
Hyungi Ahn	4b7156061e	feat(study): 문제은행 + 복습모드 (study_questions) study_topic 워크스페이스에 4지선다 문제은행 자산 트랙 추가. 기사시험 필기 대비 시나리오 — 빠른 반복 입력 + 과목별 균등 추출 복습 + 정오답 누적. 데이터 모델 (migrations 186~190): - study_questions: study_topic 1:N, soft delete, is_active 토글, correct_choice SMALLINT CHECK 1~4 - study_question_attempts: 답 제출 1행 누적. study_question_id FK는 ON DELETE RESTRICT (이력 보존 원칙 — hard delete 실수로 풀이 기록 소실 차단) 설계 원칙: - 문제 삭제는 API 에서 soft delete only. attempts FK RESTRICT 로 DB 레벨도 보호 - correct_choice 변경 시 기존 attempts.is_correct 재계산 안 함 (시점 사실 보존) - 복습 default = 과목별 target_per_subject(20) 무작위 균등 추출. 한 과목이 부족하면 가용한 만큼만 - wrong_only=true 정의 = 가장 최근 attempt 가 오답인 문제 (latest-wrong, ever-wrong 아님) - 출제 응답에서 정답·해설 비공개. 답 제출 시점에만 노출 - subject/scope 강한 enum 미사용 (자유 텍스트, 자동완성은 후속) API: /api/study-topics/{id}/questions, /review/questions, /api/study-questions/{id}, /attempt. 통합뷰(/study-topics/{id}) 응답에 sections.questions / stats.question_count 추가. 기존 question_set_count 는 후속 PR(회차/모의고사 묶음)용으로 보존. 프론트: /study/topics/[id]에 문제 섹션 + "새 문제"/"복습 시작" 진입. /questions/new (저장 후 계속 입력 + sessionStorage persistent), /questions/[qid]/edit (정답 변경 시 attempts 재계산 안 됨 안내 배너), /review (시작 옵션 → 풀이 → 마지막 요약). 후속 PR 예정: 오답노트/취약 과목 리포트, AI 해설/클러스터링, spaced repetition, 이미지 OCR 입력, CSV import, study_question_sets 묶음. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:00:37 +09:00
Hyungi Ahn	62afc571c0	feat(study): 카테고리 트리에서 자료 일괄 추가 자료 추가 모달이 1건씩 검색·체크박스만 지원해서 같은 카테고리에 자료가 많을 때 비효율적. /api/library/tree 의 카테고리 구조를 모달 좌측에 띄우고, 노드 옆 아이콘 한 번으로 그 path 하위 자료 전체를 한 번에 매핑. 백엔드: POST /api/study-topics/{id}/documents/by-path 추가. user_tags @library/<path> prefix 매칭(documents.py 의 list_library_documents 와 동일한 EXISTS 쿼리)으로 100건 limit 우회. 응답은 linked_count / skipped_existing_count / total_in_path 카운트만 노출. 프론트: 모달을 max-w-4xl + grid(트리/자료) 레이아웃으로 개편. 트리 노드 클릭 = 우측 자료 목록 path 필터링, 노드 옆 FolderPlus 버튼 = 즉시 일괄 추가. 검색·체크박스·전체선택은 그대로. 모바일은 트리가 상단 max-h-40vh 영역으로 stack.	2026-04-28 07:29:59 +09:00
Hyungi Ahn	63ed4d81e5	feat(study): study_topics 학습 워크스페이스 컨테이너 도입 필기 세션과 자료(library document)를 한 학습 주제(예: 가스기사) 아래로 묶는 1차 컨테이너. 향후 단어장/오디오/문제세트 등 학습 자산이 같은 묶음으로 들어올 수 있도록 응답 구조(sections + stats)를 dict 기반으로 설계. 데이터 모델 (migrations 179~185): - study_topics: user_id × name partial unique (active 행만), soft delete - study_sessions.study_topic_id: 1:N nullable FK (ON DELETE SET NULL) - study_topic_documents: 자료 N:M 매핑 (user_id 반정규화로 권한 격리) 설계 원칙: - documents.category(자료실 UI 축)와 직교 → 자료실 facet/카테고리 미터치 - StudySession.certification/subject/topic 보존 (세부 메타로 계속 사용) - study_type은 느슨한 분류 (강한 enum 미사용, jlpt_n3 등 확장 여지) - polymorphic study_topic_items 영구 금지 → 자산 타입별 조인 테이블 추가 방식 API: /api/study-topics CRUD + /by-document/{id} + 자료/세션 매핑 엔드포인트. 프론트: /study/topics 목록 + /study/topics/[id] 통합 뷰(필기·자료 두 트랙) + write 폼에 워크스페이스 드롭다운 + study hub 진입 카드. 후속 PR-2 어학 UX, PR-3 오디오 자산, PR-4 AI retrieval scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 07:06:37 +09:00
Hyungi Ahn	6e3ce91de6	fix(library): library-neighbors JSONB cast 오류 — EXISTS 서브쿼리로 교체	2026-04-27 12:43:26 +09:00

1 2 3 4 5

226 Commits