hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	8074be6b6d	feat(study): Phase 4-D 운영 관찰 + confidence calibration Phase 4-B v1 첫 검증 결과 자료 부족 토픽인데도 모델이 confidence='high' 박는 케이스 발견. 정의 (high = 자료 + 다른 ai_explanation 으로 패턴 명확) 보다 과신 — UX 신뢰도 위험. 자동 cap 보정 + 운영 관찰 SQL 추가. confidence calibration (services/study/session_summary_guard): - calibrate_confidence(c, ctx_docs_count, ready_explanation_count) 신규 · ctx_docs_count == 0 AND ready_explanation_count == 0 → 'low' cap · ctx_docs_count == 0 (ready 만 있음) → 'medium' cap · ctx_docs_count >= 1 → 모델 값 그대로 - 모델이 정의보다 더 보수적인 값 박은 경우 (모델 'low' + cap 'medium') 는 보존 — 더 보수적인 값을 절대 올리지 않음 worker 적용 (study_session_analysis_worker): - ctx_docs_count = len(ctx_docs) - ready_explanation_count = sum(1 for a in prompt_attempts if a.get('ai_explanation')) - calibrate_confidence 호출 → study_quiz_session_analysis.confidence 박힘 - job.payload 에 운영 분석 metadata 보존: · ctx_docs_count / ready_explanation_count · model_confidence_raw (모델 응답) vs calibrated_confidence (cap 후) · prompt_attempts / valid_attempts_total / summary_len → SQL 4 번 쿼리가 cap 작동 빈도 측정 scripts/phase4_health.sql (신규 운영 점검 SQL 7 섹션): 1. 4-A study_question_jobs status × error_code 분포 2. 4-B study_quiz_session_jobs status × error_code 분포 3. 4-B confidence 분포 (calibrated) 4. 4-B model_confidence_raw vs calibrated 차이 (cap 작동 빈도) 5. 4-A/4-B 최근 7일 처리 지연 p50/p95/max/avg 6. 4-A/4-B skipped 사유 분포 7. 4-B guard_fail / parse_fail / llm_timeout 비율 ship gate (단위 테스트): - test_calibrate_confidence_no_evidence_caps_to_low (3 케이스) - test_calibrate_confidence_only_explanations_caps_to_medium (3 케이스) - test_calibrate_confidence_with_documents_passthrough (3 케이스) - test_calibrate_confidence_normalizes_invalid_first (2 케이스) Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1 후속) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:33:57 +09:00
Hyungi Ahn	6785d53d3d	feat(study): Phase 4-B v1 세션 단위 종합 분석 (자유 마크다운) Phase 4-A 가 wrong/unsure 한 문제씩 풀이 캐시. 4-B 는 세션 전체 wrong/unsure 5~30건을 묶어 200~400자 자연어 요약 1건 생성. 결과 화면 헤더 카드. 큐 인프라는 4-A study_question_jobs 와 분리 — FK 단일 의미 + 운영 SQL 명확성 + 4-A/4-B 가드/payload/재시도 정책 차이. 신규 study_quiz_session_jobs (큐) + study_quiz_session_analysis (결과 캐시 PK=session_id, UPSERT) + 전용 consumer. Backend: - migrations/233 — study_quiz_session_jobs (FK study_quiz_sessions NOT NULL, status pending/processing/completed/failed/skipped, max_attempts=2) - migrations/234 — partial unique idx (session_id) WHERE pending/processing - migrations/235 — study_quiz_session_analysis (session_id PK, summary_md, confidence, model_name, generated_at, is_stale) - models/study_quiz_session_job — ORM + enqueue_session_analysis_job() (멱등) - models/study_quiz_session_analysis — ORM (PK = session_id) - services/study/session_summary_guard — GUARD_PATTERN (정규식) + normalize_confidence() 단일 source, worker + tests 가 import 공유 - services/study/session_summary_rag — gather_session_summary_context() documents 만 (PR-3 _gather_document_evidence 재사용). evidence 없어도 호출 허용 (4-A 와 다른 정책 — 세션 기록 자체가 evidence) - services/study/session_analysis_enqueue — auto (finalize/fallback) + request_session_analysis_regenerate (manual). manual 은 wrong/unsure < 5 즉시 차단, active job 차단, 기존 analysis 있으면 is_stale=true 박기 - prompts/study_session_summary_envelope.txt — envelope JSON {summary_md, confidence}. 정량 정수만 인용 가능, 비율/추세/범위/날짜 금지 - workers/study_session_analysis_worker — terminal status 분기: · wrong/unsure < 5 → status=skipped, error_code=insufficient_attempts · question_text/outcome 부족 → skipped, evidence_missing · GUARD_PATTERN match → failed, guard_fail · 800자 hard cap + confidence normalize · timeout/parse/unknown → 재시도 후보 · UPSERT study_quiz_session_analysis ON CONFLICT DO UPDATE (PK session_id) - workers/study_session_queue_consumer — 4-A consumer 패턴 복제. BATCH_SIZE=1 + STALE_MINUTES=10. MLX gate 4-A 와 공유 (Semaphore(1)) - main.py — APScheduler add_job(consume_study_session_queue, ..., 1분 주기) - session_finalize — 끝에서 enqueue_session_analysis_auto (best-effort) - api/study_topics: · QuizSessionAnalysisOut + ai_session_analysis 응답 필드 (analysis row + 최신 job status/error_code) · GET fallback enqueue (기존 analysis 또는 active job 없으면만, non-blocking) · POST /quiz-sessions/{sid}/regenerate-summary — manual 트리거 Frontend (quiz-sessions/[sid]/+page.svelte): - 결과 헤더에 세션 요약 카드 (AI 풀이 indicator 직후, 바로 할 일 직전) - summary_md 박혔으면 markdown 렌더, 없으면 job_status / error_code 분기: · pending/processing → "AI 가 세션 분석 중" · insufficient_attempts → "오답·모르겠음 5건 미만" · evidence_missing → "자료 부족" · guard_fail → "환각 검증 차단" + 재생성 링크 - confidence='low' 배지 + is_stale "재생성 중" 배지 - 재생성 버튼 + regenerateSummary() — reason 별 toast 분기 ship gate: - tests/test_session_summary_guard_pattern.py — 허용 5 + 차단 7 케이스 + normalize_confidence 표준/비표준 검증. python3 직접 실행 패스. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:20:29 +09:00
Hyungi Ahn	e8da53490c	feat(study): Phase 4-A wrong/unsure AI 풀이 prefetch batch PR-3 의 결과 화면 [AI 해설 보기] 실시간 호출이 클릭 시 8~30초 대기. 풀이 직후 백그라운드 batch 로 미리 생성해 캐시 hit. 환각 가드는 PR-3 보다 강화 — envelope JSON {answer_choice, explanation_md, confidence} + answer_choice == correct_choice 검증 + evidence 의무. processing_queue 가 documents.id FK 라 study_questions 에 직접 재사용 불가 → 별도 study_question_jobs 테이블 + 별도 consumer. Backend: - migrations/231 — study_question_jobs CREATE TABLE (13컬럼, kind 권장값 'explanation' / 'session_summary' 예약, status pending/processing/completed/ failed/skipped, max_attempts=2) - migrations/232 — partial unique idx (qid, kind) WHERE status IN (pending, processing) — active 행 중복 차단, terminal 이력 누적 허용 - models/study_question_job — ORM + enqueue_study_question_job() 헬퍼 (on_conflict_do_nothing 멱등) - prompts/study_explanation_envelope.txt — envelope 형식 프롬프트 (answer_choice 1~4 강제, confidence high/medium/low) - workers/study_explanation_worker — terminal status 분기: · evidence 둘 다 빈 리스트 → job/question 모두 skipped (LLM 호출 X) · answer_choice != correct_choice → guard_fail / failed (재시도 X) · timeout/parse → 재시도 후보 (max_attempts=2) · catch-all except → unknown 명시 + retryable 분기 · question.ai_explanation_status='ready' 이미 박혀있으면 즉시 completed · confidence 는 job.payload 에 보존 (운영 분석) - workers/study_queue_consumer — APScheduler 1분 주기, BATCH_SIZE=1, MLX gate Semaphore(1) 공유. STALE_MINUTES=10 자체 복구 - main.py — scheduler.add_job(consume_study_queue, ..., id='study_queue_consumer') - services/study/explanation_enqueue — finalize + GET fallback 공유 헬퍼: filter_needs_explanation (study_questions status + 최신 job error_code 필터, guard_fail/evidence_missing 인 마지막 job 은 자동 재enqueue 제외) + enqueue_explanation_for_qids (max_count cap) - session_finalize — 끝에서 wrong/unsure qid prefetch enqueue (best-effort, 실패해도 finalize 자체 안 깨짐) - api/study_topics get_quiz_session — done 세션에서 backfill enqueue (max=30, non-blocking, debug 로그) 대상 조건: ai_explanation_status IN ('none', 'failed') OR ai_explanation IS NULL. stale / skipped / pending / ready 는 자동 enqueue 대상 X. stale 재생성은 PR-3 명시 [다시 생성] 또는 후속 Phase 에서. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-A) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:42:08 +09:00
Hyungi Ahn	d3bf963a66	feat(study): Phase 2-B 결과 화면 변화 카운트 + 확인완료 progress 통합 Phase 1 finalize 가 계산하던 SessionSummary 가 응답에 포함되지 않고 discard 되던 것을 quiz_session row 4 컬럼으로 영속화. 결과 화면 헤더에 회복/퇴행/ 새로 맞힘/반복 오답 누적 변화 카운트 + "바로 할 일" 콜아웃 (지금 시점 progress 기반 동적 카운트 — pending_review/chronic/regressed). 동적 카운트는 결과 GET 호출 시점에만 계산 (목록 endpoint 비용 회피). 확인완료 통합 — 결과 카드의 [학습완료] 버튼이 attempts.reviewed_at 만 박던 것을 progress.last_reviewed_at + (wrong/unsure 면 due_at 최초 부여) 도 같이 박도록. reviewed=false 토글은 attempts 만 되돌림 (다른 attempt 가 검토 표시 했을 수 있어 progress 의 last_reviewed_at 은 보존). - migrations/230 — quiz_sessions 4 컬럼 ADD (단일 ALTER TABLE) - StudyQuizSession 모델 + finalize_session 가 row 영속화 - QuizSessionSummary 응답에 4 스냅샷 + 3 동적 필드 (default 0) - _build_session_summary include_progress_counts=True 시 SQL 3회 - review-mark 가 reveiwed=true 시 progress 동기화 - 결과 화면: 헤더 변화 카운트 줄 + 바로 할 일 콜아웃 (값 있을 때만) Plan: ~/.claude/plans/crispy-petting-dijkstra.md (Phase 2-B) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:49:01 +09:00
Hyungi Ahn	9094b2dbc5	feat(study): Phase 1-E 풀이 선별 알고리즘 — bucket + stage 비율 vision 의 단일 풀이 진입점 — stage (intro/learning/pre_exam) + size 옵션으로 같은 endpoint 가 다른 분포의 문제 출제. services/study/quiz_selection.py: - bucket: unattempted / wrong_or_unsure / due_review / regressed / frequent / random - stage 별 비율: - intro: unattempted 55, wrong_or_unsure 30, frequent 15 - learning: due_review 20, wrong_or_unsure 40, unattempted 30, frequent 10 - pre_exam: due_review 20, wrong_or_unsure 30, regressed 10, frequent 20, random 20 - bucket 우선순위 (dict 순서) — 다음 bucket 은 이미 뽑힌 qid 제외 - 후보 부족 시 random backfill, 그래도 부족 시 ValueError api/study_topics.py: - QuizSessionStartRequest 에 stage / size 옵션 추가 - stage 명시 시 select_questions_for_quiz 사용 - stage 미명시 시 기존 PR-12-B 경로 (subject bucket + spacing) 호환 유지 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:30:11 +09:00
Hyungi Ahn	e5982ebde4	feat(study): Phase 1 학습 루프 데이터 계층 — progress 캐시 + finalize + review API vision (풀이 → 확인 → 학습 → 복습 → 다음 풀이 가중치) 의 데이터 계층. 데이터 모델 (migrations 222~225): - study_question_progress 테이블 — user × topic × question 단위 현재 상태 캐시 - 마지막 시도: last_outcome, last_attempted_at, last_attempt_id - 검토 상태: last_reviewed_at - 복습 큐: due_at, review_stage - 패턴 분류 (derived): pattern_state, pattern_updated_at, pattern_window_attempts - 3 partial idx (due / topic_pattern / pending_review) — 탭별 빠른 조회 패턴 분류 (services/study/learning_pattern.py): - 7 분류: unattempted/unsure/chronic_wrong/regressed/recovered/stable/unstable - 윈도우 = 최근 3회 + 과거 correct/wrong 존재 여부 - chronic_wrong > regressed > recovered 우선순위 (보수적 학습) - 가드: wrong 1회만으로 regressed 안 됨 (이전 correct 이력 필요) - stable 은 3 연속 correct 부터 세션 종료 집계 (services/study/session_finalize.py): - attempts append-only 원본 보존, progress upsert 만 - 마지막 attempt 직후 finalize hook 자동 발동 - finalize 는 last_* + pattern_state 만 갱신, due_at 미진입 문제는 NULL 유지 - 이미 due_at 박힌 문제는 finalize 가 stage 갱신 (correct → +1 / wrong → 리셋) API (api/study_question_progress.py): - POST /study-topics/{tid}/questions/{qid}/review-complete → last_reviewed_at + (wrong/unsure 인 경우만) due_at 최초 부여 - GET /study-topics/{tid}/review-queue?tab=due_today\|pending_review\|chronic\|regressed\|mastered → 5 탭 paginated 조회 → pending_review 는 last_reviewed_at < last_attempted_at 까지 포함 (이전 확인완료 후 다시 wrong 잡힘) Phase 1-E (풀이 선별 알고리즘) 은 후속 commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 09:28:46 +09:00
Hyungi Ahn	219e233a48	feat(study): related-types DB 캐시 — HNSW 매번 재계산 제거 - migrations 220/221: study_questions 에 related_repeat/similar JSONB + 카운트/grade/computed_at/threshold_version + partial idx - 임베딩 워커: ready 처리 직후 같은 트랜잭션에서 related 계산·저장 + 같은 토픽 ready 행들의 related_computed_at=NULL invalidation - 신규 cron study_q_related_refresh (1분, batch=20) — stale 캐시 일괄 재계산 - API list_related_types: cache hit (computed_at + threshold version 일치) 시 SELECT 1번으로 응답. miss 면 즉시 계산+저장 후 응답 - update_question PATCH: 본문/exam_round 변경 시 related_computed_at=NULL - soft delete: 같은 토픽 ready 행 invalidation threshold 변경 시: related_types.THRESHOLD_VERSION 갱신 + UPDATE WHERE version != '<신>' SET computed_at=NULL 한 번이면 cron 자동 일괄 재계산. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 07:22:31 +09:00
Hyungi Ahn	fc8aea1649	feat(study): 반복 출제 라벨 등급 + cosine 임계값 0.85 조정 - round_count 별 등급 매핑 (단골/잘 나오는 반복 출제/반복 출제/신출/빈출) - ≥7 단골, 5–6 잘 나오는 반복 출제, 3–4 반복 출제, 2 + max(연도)≥2024 신출, 2 + 모두 옛 빈출 - SIMILAR_THRESHOLD 0.88 → 0.85 (5-source 분포 측정 결과 자연 갭 위치 반영) - API 응답 + 프론트 3곳 (보기/통합뷰/결과 카드) 라벨 일괄 통일 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 08:50:39 +09:00
Hyungi Ahn	13404cd366	feat(study): 같은 유형 과밀 방지 — 출제 단계 spacing (PR-12-B) 학습 의미: 한 quiz 세션 안에서 같은 유형 문제가 과도하게 몰리지 않게 분산. 같은 유형을 없애는 게 아니라 펼치는 것 — dedup/제거 프레임 금지. - 마이그레이션 210: study_quiz_sessions.quiz_mode VARCHAR(30) DEFAULT 'random' - ORM: StudyQuizSession.quiz_mode 필드 - service.related_types: apply_type_spacing helper 추가 - SPACING_THRESHOLD=0.88 (회차 무관 — PR-12-A 회차 필터 재사용 X) - PER_TYPE_CAP=2 (local neighbor cap, transitive cluster 보장 X) - SPACING_BUFFER_RATIO=2.0 - 3단계 fallback: ready spacing → pending 보충 → hold cap 위반 fallback - debug 로그 type_spacing_applied subject=... ready=N selected=M ... - _select_questions_for_topic: subject bucket 단위 spacing (과목 균등 보호) - QuizMode Enum (random) — 향후 frequent_focus/wrong_variants 예약 - start_quiz_session 에 quiz_mode 받기 + apply_spacing 전달 - 프론트 startNewQuiz body 에 quiz_mode='random' 명시 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:45:15 +09:00
Hyungi Ahn	cbe852bb37	feat(study): 반복 출제 / 유사 유형 분리 표시 (PR-12-A) 학습 의미가 회차 간 반복성 — 차단/제거가 아니라 패턴 표시 frame. - 신규 service `related_types.py` — threshold/회차 필터/round_count 계산 공유 - REPEAT >= 0.95 / SIMILAR 0.88~0.95 - 회차 조건 백엔드 강제 (자기 자신/같은 회차/null exam_round candidate 제외) - round_count: related_count == 0 → 0 (현재 회차만 1로 채우지 않음) - GET /study-questions/{qid}/related-types — 단건 분류 (repeat_questions / similar_questions) - POST /study-topics/{tid}/related-types-bulk — 카드 배지용 카운트 batch - 비교 대상 = 토픽 전체 ready pool (입력 qid 끼리 비교 X) - 응답 키 보존 — 권한 없음/임베딩 미준비 등도 (0,0,0,0) - 보기 페이지: PR-11 비슷한 문제 토글 제거 + 🔥 반복 출제 / 🧩 유사 유형 두 섹션 자동 노출 - 헤더 = round_count "N개 회차", 본문 위 = related_count "관련 N문제" - source_status / source_exam_round 안내 분기 - 결과 페이지 (틀린/모르겠음 카드): bulk 호출 후 round_count >= 2 일 때만 배지 - 통합뷰 회차 expand 시 lazy bulk 호출 — 같은 회차 캐시 - 기존 /similar 엔드포인트 유지 (raw 디버깅용) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:09:14 +09:00
Hyungi Ahn	d968b2d901	feat(study): 문제풀이 모드 개편 + 결과 분류 + 분야 설명 (PR-9) - 라벨 "복습 시작" → "문제풀이" - attempts.outcome 컬럼 + selected_choice nullable (correct/wrong/unsure) - 풀이 중 정답·해설·AI·비슷한 문제 모두 비노출, 답 클릭 시 자동 진행 - "모르겠음" 5번째 옵션 추가 - 결과 화면 = 정답/틀린/모르겠음 3 카테고리 탭, 카드 클릭 expand - 틀린 → PR-3 AI 해설 (RAG) - 모르겠음 → 분야(subject+scope) 설명 AI 즉석 생성 + 캐시 (PR-9 신규) - 분야 설명 RAG: 매핑 documents 청크 + 같은 분야 다른 문제·해설 → bge-reranker - 마이그레이션 200~205 (single-statement, asyncpg 호환) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 15:58:35 +09:00
Hyungi Ahn	e1a2cdc677	feat(study): AI 풀이 생성 — 수동 트리거 + RAG (PR-3) 복습 답 제출 후 또는 편집 화면에서 사용자가 명시적으로 누를 때만 AI 가 4지선다 풀이 생성. 자동 일괄 생성 금지 (하루 100문제 입력 시 MLX 부하· 잘못 입력 문제 해설 위험). 데이터 모델 (migrations 191~192): - study_questions 4 컬럼 추가: ai_explanation TEXT, ai_explanation_status VARCHAR(20) DEFAULT 'none' (none/pending/ready/failed/stale), ai_explanation_generated_at, ai_explanation_model - partial idx (study_topic_id, ai_explanation_status) WHERE status != 'none' PATCH stale 자동 전이: question_text/choice_*/correct_choice 변경 시 status='ready' 만 'stale' 로. 본문은 보존, UI 배지 + "다시 생성" 동선. 신규 엔드포인트: POST /api/study-questions/{id}/ai-explanation - regenerate=false + ready/stale → 캐시 즉시 (MLX 호출 없음, is_stale 플래그) - pending → 409 (race-safe 조건부 UPDATE 로 동시 호출 차단) - 그 외 → 새 생성 RAG 입력 풀: - 1순위: study_topic 매핑 documents 청크 + ai_summary, bge-reranker top-5 - 2순위: 같은 토픽 다른 questions (자기 자신 제외, ai_explanation 은 ready 상태만 포함 — 재귀적 hallucination 방지), reranker top-3 - 제외: 필기 OCR / 외부 웹 / Premium 모델 모델: Mac mini MLX gemma-4-26b primary 단독. get_mlx_gate() Semaphore(1) 경유, 30s timeout. 실패 시 status='failed' + 직전 본문 보존. 프롬프트 (app/prompts/study_question_explanation.txt): 자료 우선순위·인용 형식·할루시네이션 방지 절대 규칙 (법령명·조항·수치·표준 번호 단정 금지, "자료에서 확인되지 않음" 명시). 프론트: - 복습 화면 답 제출 후 인라인 expand. status별 버튼 분기 (ready 캐시 / stale "이전 풀이"+"다시 생성" / failed "다시 시도") - 편집 화면 별도 카드. 상태 배지 + "이전 풀이 보기" / "다시 생성" 분리 - 참고 근거 토글 (source_type 별 아이콘 📄/❓ + 제목 + snippet) 후속 PR 보류: 오답노트/통계, AI 일괄 백그라운드 생성, 필기 OCR RAG, Premium/Claude 재생성, /api/search/ask retrieval scope 통합. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:41:46 +09:00

12 Commits