6b52d57bac
운영 데이터에서 ready 박힌 풀이가 793/838/866자 — 권장 200~400 대비 큰 편.
1차 운영 후 결과 화면 가독성 + 토큰 사용량 통제 위해 prompt 강화 + 저장 전 cap.
Prompt (study_explanation_envelope.txt):
- explanation_md 권장 300~600자, 최대 900자 명시
- 핵심 개념 + 정답 근거 + 헷갈리는 1~2개 오답만 — 모든 오답 풀이 X
- explanation_md 안 줄바꿈 최소화 (parse_json fix 와 결합 — invalid escape 줄임)
- LaTeX 수식 자제 — \\circ/\\text/\\, 매크로 가능하면 평문 ('0°C', 'C')
- 출력은 raw JSON 한 객체만 — 코드 펜스/thinking/메타 X 강조
Worker (study_explanation_worker.py):
- _cap_explanation_md(text, max_chars=1200) 헬퍼 신규
· 1200자 이하 passthrough
· 초과 시 마지막 200자 안에서 \\n\\n / \\n / '. ' / '다.' / '요.' 경계 탐색
· 경계에서 자르기 + '…' (단어 중간 자르기 회피)
· 경계 못 찾으면 단순 자르기 + '…'
- save 전 cap 적용. ai_explanation_status='ready' 유지 (cap 됐다고 failed X)
- payload 에 운영 분석 metadata: explanation_len_original / _saved / capped 플래그
검증:
- tests/test_explanation_cap.py (6 케이스)
· short passthrough / exact at limit / paragraph boundary / sentence boundary
· no boundary fallback / empty input
- scripts/phase4_health.sql 섹션 8/9 추가
· ai_explanation 길이 p50/p95/max (study_questions.ready)
· cap 작동 빈도 (job.payload 의 explanation_capped/_original/_saved)
cap 1200 = 800 (4-B summary_md) 보다 여유 — 기사시험 풀이는 공식+오답+개념 묶이면
800 빡빡함. 운영 후 800~1000 으로 조정 검토.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
148 lines
5.1 KiB
SQL
148 lines
5.1 KiB
SQL
-- Phase 4 운영 점검 SQL — 4-A (study_question_jobs) + 4-B (study_quiz_session_jobs)
|
||
-- 사용:
|
||
-- ssh gpu 'docker exec -i hyungi_document_server-postgres-1 psql -U pkm pkm' < scripts/phase4_health.sql
|
||
-- 또는 개별 SECTION 만 골라 실행. 모든 섹션은 read-only.
|
||
|
||
\echo '── 1. 4-A study_question_jobs status × error_code 분포 ──'
|
||
SELECT
|
||
status,
|
||
COALESCE(error_code, '(none)') AS error_code,
|
||
COUNT(*) AS cnt
|
||
FROM study_question_jobs
|
||
GROUP BY status, error_code
|
||
ORDER BY status, error_code;
|
||
|
||
\echo ''
|
||
\echo '── 2. 4-B study_quiz_session_jobs status × error_code 분포 ──'
|
||
SELECT
|
||
status,
|
||
COALESCE(error_code, '(none)') AS error_code,
|
||
COUNT(*) AS cnt
|
||
FROM study_quiz_session_jobs
|
||
GROUP BY status, error_code
|
||
ORDER BY status, error_code;
|
||
|
||
\echo ''
|
||
\echo '── 3. 4-B study_quiz_session_analysis confidence 분포 (calibrated) ──'
|
||
SELECT
|
||
COALESCE(confidence, '(null)') AS confidence,
|
||
COUNT(*) AS cnt,
|
||
COUNT(*) FILTER (WHERE is_stale) AS stale_count
|
||
FROM study_quiz_session_analysis
|
||
GROUP BY confidence
|
||
ORDER BY
|
||
CASE COALESCE(confidence, '(null)')
|
||
WHEN 'high' THEN 0
|
||
WHEN 'medium' THEN 1
|
||
WHEN 'low' THEN 2
|
||
ELSE 3
|
||
END;
|
||
|
||
\echo ''
|
||
\echo '── 4. 4-B confidence calibration 차이 (job.payload 기반) ──'
|
||
\echo ' model_confidence_raw vs calibrated_confidence — 자료 부족 cap 작동 빈도 측정'
|
||
SELECT
|
||
payload->>'model_confidence_raw' AS model_raw,
|
||
payload->>'calibrated_confidence' AS calibrated,
|
||
(payload->>'ctx_docs_count')::int AS docs_n,
|
||
(payload->>'ready_explanation_count')::int AS ready_n,
|
||
COUNT(*) AS cnt
|
||
FROM study_quiz_session_jobs
|
||
WHERE status = 'completed'
|
||
AND payload IS NOT NULL
|
||
AND payload ? 'model_confidence_raw'
|
||
GROUP BY model_raw, calibrated, docs_n, ready_n
|
||
ORDER BY cnt DESC
|
||
LIMIT 20;
|
||
|
||
\echo ''
|
||
\echo '── 5. 4-A/4-B 최근 7일 처리 지연 (created_at → completed_at) ──'
|
||
\echo ' p50/p95/max 단순 ROUND(EXTRACT). 4-A 와 4-B 분리.'
|
||
SELECT
|
||
'study_question_jobs' AS source,
|
||
COUNT(*) AS terminal_n,
|
||
ROUND(AVG(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1) AS avg_sec,
|
||
ROUND(MAX(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1) AS max_sec,
|
||
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (
|
||
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
|
||
))::numeric, 1) AS p50_sec,
|
||
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (
|
||
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
|
||
))::numeric, 1) AS p95_sec
|
||
FROM study_question_jobs
|
||
WHERE created_at >= NOW() - INTERVAL '7 days'
|
||
AND completed_at IS NOT NULL
|
||
UNION ALL
|
||
SELECT
|
||
'study_quiz_session_jobs',
|
||
COUNT(*),
|
||
ROUND(AVG(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1),
|
||
ROUND(MAX(EXTRACT(EPOCH FROM (completed_at - created_at)))::numeric, 1),
|
||
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (
|
||
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
|
||
))::numeric, 1),
|
||
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (
|
||
ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))
|
||
))::numeric, 1)
|
||
FROM study_quiz_session_jobs
|
||
WHERE created_at >= NOW() - INTERVAL '7 days'
|
||
AND completed_at IS NOT NULL;
|
||
|
||
\echo ''
|
||
\echo '── 6. 4-A/4-B skipped 사유 분포 (어떤 데이터 부족이 가장 많이 막는가) ──'
|
||
SELECT
|
||
'study_question_jobs' AS source,
|
||
error_code,
|
||
COUNT(*) AS cnt
|
||
FROM study_question_jobs
|
||
WHERE status = 'skipped'
|
||
GROUP BY error_code
|
||
UNION ALL
|
||
SELECT
|
||
'study_quiz_session_jobs',
|
||
error_code,
|
||
COUNT(*)
|
||
FROM study_quiz_session_jobs
|
||
WHERE status = 'skipped'
|
||
GROUP BY error_code
|
||
ORDER BY source, cnt DESC;
|
||
|
||
\echo ''
|
||
\echo '── 7. 4-B guard_fail / parse_fail / llm_timeout 비율 (전체 job 대비) ──'
|
||
SELECT
|
||
error_code,
|
||
COUNT(*) AS cnt,
|
||
ROUND(100.0 * COUNT(*) / NULLIF((SELECT COUNT(*) FROM study_quiz_session_jobs), 0), 1) AS pct
|
||
FROM study_quiz_session_jobs
|
||
WHERE error_code IN ('guard_fail', 'parse_fail', 'llm_timeout', 'unknown')
|
||
GROUP BY error_code
|
||
ORDER BY cnt DESC;
|
||
|
||
\echo ''
|
||
\echo '── 8. 4-A explanation_md 길이 분포 (cap 적용 효과) ──'
|
||
\echo ' _original = 모델 raw 응답 길이 / _saved = cap 후 저장 길이'
|
||
SELECT
|
||
'ai_explanation len p50/p95/max (study_questions.ready)' AS metric,
|
||
ROUND((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY LENGTH(ai_explanation)))::numeric, 0) AS p50,
|
||
ROUND((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY LENGTH(ai_explanation)))::numeric, 0) AS p95,
|
||
MAX(LENGTH(ai_explanation)) AS max_len,
|
||
COUNT(*) AS n_ready
|
||
FROM study_questions
|
||
WHERE ai_explanation_status = 'ready'
|
||
AND ai_explanation IS NOT NULL;
|
||
|
||
\echo ''
|
||
\echo '── 9. 4-A cap 작동 빈도 (job.payload 의 explanation_capped) ──'
|
||
SELECT
|
||
payload->>'explanation_capped' AS capped,
|
||
COUNT(*) AS cnt,
|
||
ROUND(AVG((payload->>'explanation_len_original')::int)::numeric, 0) AS avg_original_len,
|
||
ROUND(AVG((payload->>'explanation_len_saved')::int)::numeric, 0) AS avg_saved_len,
|
||
MAX((payload->>'explanation_len_original')::int) AS max_original
|
||
FROM study_question_jobs
|
||
WHERE status = 'completed'
|
||
AND payload IS NOT NULL
|
||
AND payload ? 'explanation_len_original'
|
||
GROUP BY capped
|
||
ORDER BY capped;
|