Files
Hyungi Ahn 6b52d57bac feat(study): Phase 4-A explanation_md 길이 cap + prompt 강화
운영 데이터에서 ready 박힌 풀이가 793/838/866자 — 권장 200~400 대비 큰 편.
1차 운영 후 결과 화면 가독성 + 토큰 사용량 통제 위해 prompt 강화 + 저장 전 cap.

Prompt (study_explanation_envelope.txt):
- explanation_md 권장 300~600자, 최대 900자 명시
- 핵심 개념 + 정답 근거 + 헷갈리는 1~2개 오답만 — 모든 오답 풀이 X
- explanation_md 안 줄바꿈 최소화 (parse_json fix 와 결합 — invalid escape 줄임)
- LaTeX 수식 자제 — \\circ/\\text/\\, 매크로 가능하면 평문 ('0°C', 'C')
- 출력은 raw JSON 한 객체만 — 코드 펜스/thinking/메타 X 강조

Worker (study_explanation_worker.py):
- _cap_explanation_md(text, max_chars=1200) 헬퍼 신규
  · 1200자 이하 passthrough
  · 초과 시 마지막 200자 안에서 \\n\\n / \\n / '. ' / '다.' / '요.' 경계 탐색
  · 경계에서 자르기 + '…' (단어 중간 자르기 회피)
  · 경계 못 찾으면 단순 자르기 + '…'
- save 전 cap 적용. ai_explanation_status='ready' 유지 (cap 됐다고 failed X)
- payload 에 운영 분석 metadata: explanation_len_original / _saved / capped 플래그

검증:
- tests/test_explanation_cap.py (6 케이스)
  · short passthrough / exact at limit / paragraph boundary / sentence boundary
  · no boundary fallback / empty input
- scripts/phase4_health.sql 섹션 8/9 추가
  · ai_explanation 길이 p50/p95/max (study_questions.ready)
  · cap 작동 빈도 (job.payload 의 explanation_capped/_original/_saved)

cap 1200 = 800 (4-B summary_md) 보다 여유 — 기사시험 풀이는 공식+오답+개념 묶이면
800 빡빡함. 운영 후 800~1000 으로 조정 검토.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:33:18 +09:00

148 lines
5.1 KiB
SQL
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- Phase 4 운영 점검 SQL — 4-A (study_question_jobs) + 4-B (study_quiz_session_jobs)
-- 사용:
-- ssh gpu 'docker exec -i hyungi_document_server-postgres-1 psql -U pkm pkm' < scripts/phase4_health.sql
-- 또는 개별 SECTION 만 골라 실행. 모든 섹션은 read-only.
\echo '── 1. 4-A study_question_jobs status × error_code 분포 ──'
SELECT
status,
COALESCE(error_code, '(none)') AS error_code,
COUNT(*) AS cnt
FROM study_question_jobs
GROUP BY status, error_code
ORDER BY status, error_code;
\echo ''
\echo '── 2. 4-B study_quiz_session_jobs status × error_code 분포 ──'
SELECT
status,
COALESCE(error_code, '(none)') AS error_code,
COUNT(*) AS cnt
FROM study_quiz_session_jobs
GROUP BY status, error_code
ORDER BY status, error_code;
\echo ''
\echo '── 3. 4-B study_quiz_session_analysis confidence 분포 (calibrated) ──'
SELECT
COALESCE(confidence, '(null)') AS confidence,
COUNT(*) AS cnt,
COUNT(*) FILTER (WHERE is_stale) AS stale_count
FROM study_quiz_session_analysis
GROUP BY confidence
ORDER BY
CASE COALESCE(confidence, '(null)')
WHEN 'high' THEN 0
WHEN 'medium' THEN 1
WHEN 'low' THEN 2
ELSE 3
END;
\echo ''
\echo '── 4. 4-B confidence calibration 차이 (job.payload 기반) ──'
\echo ' model_confidence_raw vs calibrated_confidence — 자료 부족 cap 작동 빈도 측정'
SELECT
payload->>'model_confidence_raw' AS model_raw,
payload->>'calibrated_confidence' AS calibrated,
(payload->>'ctx_docs_count')::int AS docs_n,
(payload->>'ready_explanation_count')::int AS ready_n,
COUNT(*) AS cnt
FROM study_quiz_session_jobs
WHERE status = 'completed'
AND payload IS NOT NULL
AND payload ? 'model_confidence_raw'
GROUP BY model_raw, calibrated, docs_n, ready_n
ORDER BY cnt DESC
LIMIT 20;
\echo ''
\echo '── 5. 4-A/4-B 최근 7일 처리 지연 (created_at → completed_at) ──'
\echo ' p50/p95/max 단순 ROUND(EXTRACT). 4-A 와 4-B 분리.'
SELECT
'study_question_jobs' AS source,
COUNT(*) AS terminal_n,
ROUND(AVG(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1) AS avg_sec,
ROUND(MAX(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1) AS max_sec,
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
))::numeric, 1) AS p50_sec,
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
))::numeric, 1) AS p95_sec
FROM study_question_jobs
WHERE created_at >= NOW() - INTERVAL '7 days'
AND completed_at IS NOT NULL
UNION ALL
SELECT
'study_quiz_session_jobs',
COUNT(*),
ROUND(AVG(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1),
ROUND(MAX(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1),
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
))::numeric, 1),
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
))::numeric, 1)
FROM study_quiz_session_jobs
WHERE created_at >= NOW() - INTERVAL '7 days'
AND completed_at IS NOT NULL;
\echo ''
\echo '── 6. 4-A/4-B skipped 사유 분포 (어떤 데이터 부족이 가장 많이 막는가) ──'
SELECT
'study_question_jobs' AS source,
error_code,
COUNT(*) AS cnt
FROM study_question_jobs
WHERE status = 'skipped'
GROUP BY error_code
UNION ALL
SELECT
'study_quiz_session_jobs',
error_code,
COUNT(*)
FROM study_quiz_session_jobs
WHERE status = 'skipped'
GROUP BY error_code
ORDER BY source, cnt DESC;
\echo ''
\echo '── 7. 4-B guard_fail / parse_fail / llm_timeout 비율 (전체 job 대비) ──'
SELECT
error_code,
COUNT(*) AS cnt,
ROUND(100.0 * COUNT(*) / NULLIF((SELECT COUNT(*) FROM study_quiz_session_jobs), 0), 1) AS pct
FROM study_quiz_session_jobs
WHERE error_code IN ('guard_fail', 'parse_fail', 'llm_timeout', 'unknown')
GROUP BY error_code
ORDER BY cnt DESC;
\echo ''
\echo '── 8. 4-A explanation_md 길이 분포 (cap 적용 효과) ──'
\echo ' _original = 모델 raw 응답 길이 / _saved = cap 후 저장 길이'
SELECT
'ai_explanation len p50/p95/max (study_questions.ready)' AS metric,
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY LENGTH(ai_explanation)))::numeric, 0) AS p50,
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY LENGTH(ai_explanation)))::numeric, 0) AS p95,
MAX(LENGTH(ai_explanation)) AS max_len,
COUNT(*) AS n_ready
FROM study_questions
WHERE ai_explanation_status = 'ready'
AND ai_explanation IS NOT NULL;
\echo ''
\echo '── 9. 4-A cap 작동 빈도 (job.payload 의 explanation_capped) ──'
SELECT
payload->>'explanation_capped' AS capped,
COUNT(*) AS cnt,
ROUND(AVG((payload->>'explanation_len_original')::int)::numeric, 0) AS avg_original_len,
ROUND(AVG((payload->>'explanation_len_saved')::int)::numeric, 0) AS avg_saved_len,
MAX((payload->>'explanation_len_original')::int) AS max_original
FROM study_question_jobs
WHERE status = 'completed'
AND payload IS NOT NULL
AND payload ? 'explanation_len_original'
GROUP BY capped
ORDER BY capped;