feat(dashboard): Day 4 튜닝 — 임계치 재조정 + deep_summary 안정성 카드

3일 telemetry (599 triage / 555 deep) 기반 임계치 재평가:

1. 에스컬레이션 비율 — 임계치 의미 reframe
   - 기존: >20% 적색 (튜닝 필요) → 항상 적색 (운영 패턴 97%)
   - 신규: <80% 적색 (정책 매칭 실패 증가)
   - 메시지: "safety 정책상 95~100% 가 정상" 보조 표시
   - safety_reference 99.7%, generic 100% (fallback risk_flag), msds 46.2%
     → 운영 정상 패턴 확인

2. Deep summary 안정성 — 신규 카드 추가
   - mode='summary_deep' 의 error_code IS NOT NULL 비율
   - 현재 5.2% (call_failed 21 + parse:ValidationError 8)
   - >5% 적색 임계
   - MLX 호출 timeout / JSON 파싱 실패 모니터

3. triage JSON 건강도, Backlog Suppression — 임계치 유지
   - 현재 0%, 1% — 매우 안정. 보수적 임계 유효.

Backend: TierHealthStack 에 deep_total / deep_err_total 추가
Frontend: 카드 그리드 3열 → 4열 (lg), Day 4 신규 카드.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-27 08:29:53 +09:00
parent 95bcdb851b
commit e8c348ab21
3 changed files with 43 additions and 9 deletions
+15 -5
View File
@@ -50,12 +50,15 @@ class QueueLag(BaseModel):
class TierHealthStack(BaseModel):
"""PR-B B-3 — tier 관측성 3종 카드 소스 (24h 윈도우).
"""PR-B B-3 — tier 관측성 카드 소스 (24h 윈도우).
대시보드 카드:
- "에스컬레이션 비율": escalated_total / triage_total (>20% 적색, <1% 회색)
대시보드 카드 (Day 4 튜닝 — 2026-04-27 임계치 재조정):
- "에스컬레이션 비율": escalated_total / triage_total
· <80% 적색 (정책 매칭 실패 증가 — 진짜 튜닝 필요)
· 80~99% 정상 (safety/health 정책 의도)
- "triage JSON 건강도": triage_json_invalid / triage_total (>5% 적색)
- "Backlog Suppression": suppressed_total / triage_total (>10% 주황)
- "Deep summary 안정성": deep_err_total / deep_total (>5% 적색)
"""
triage_total: int = 0
escalated_total: int = 0
@@ -63,6 +66,9 @@ class TierHealthStack(BaseModel):
escalation_by_domain: dict[str, int] = {} # safety_reference / news_item / ...
triage_json_invalid: int = 0 # error_code='triage_json_invalid'
suppressed_total: int = 0 # suppressed_reason IS NOT NULL
# Day 4 튜닝 신규 — deep_summary 호출 안정성
deep_total: int = 0 # mode='summary_deep' 전체
deep_err_total: int = 0 # error_code IS NOT NULL (call_failed / parse:*)
class DashboardResponse(BaseModel):
@@ -216,13 +222,15 @@ async def get_dashboard(
for row in lag_result.all()
]
# ─── PR-B B-3 — tier 관측성 (24h) ───
# ─── PR-B B-3 — tier 관측성 (24h) + Day 4 deep_err 추가 ───
tier_rows = (await session.execute(text("""
SELECT
COUNT(*) FILTER (WHERE mode = 'summary_triage') AS triage_total,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND escalated_to_26b = true) AS escalated_total,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND error_code = 'triage_json_invalid') AS json_invalid,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND suppressed_reason IS NOT NULL) AS suppressed_total
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND suppressed_reason IS NOT NULL) AS suppressed_total,
COUNT(*) FILTER (WHERE mode = 'summary_deep') AS deep_total,
COUNT(*) FILTER (WHERE mode = 'summary_deep' AND error_code IS NOT NULL) AS deep_err_total
FROM analyze_events
WHERE created_at > NOW() - INTERVAL '24 hours'
"""))).one()
@@ -253,6 +261,8 @@ async def get_dashboard(
escalated_total=int(tier_rows.escalated_total or 0),
triage_json_invalid=int(tier_rows.json_invalid or 0),
suppressed_total=int(tier_rows.suppressed_total or 0),
deep_total=int(tier_rows.deep_total or 0),
deep_err_total=int(tier_rows.deep_err_total or 0),
escalation_by_reason=escalation_by_reason,
escalation_by_domain=escalation_by_domain,
)
+3
View File
@@ -41,6 +41,9 @@ export interface TierHealthStack {
escalation_by_domain: Record<string, number>;
triage_json_invalid: number;
suppressed_total: number;
// Day 4 신규 — deep_summary 호출 안정성
deep_total?: number;
deep_err_total?: number;
}
export interface DashboardSummary {
+25 -4
View File
@@ -357,10 +357,15 @@
{@const esc_rate = th.triage_total > 0 ? th.escalated_total / th.triage_total : 0}
{@const json_rate = th.triage_total > 0 ? th.triage_json_invalid / th.triage_total : 0}
{@const sup_rate = th.triage_total > 0 ? th.suppressed_total / th.triage_total : 0}
{@const esc_tone = esc_rate > 0.20 ? 'text-error' : (esc_rate < 0.01 ? 'text-dim' : 'text-text')}
{@const deep_total = th.deep_total ?? 0}
{@const deep_err_rate = deep_total > 0 ? (th.deep_err_total ?? 0) / deep_total : 0}
<!-- Day 4 튜닝 (2026-04-27): 운영 패턴 실측 후 임계치 재조정.
3일 telemetry 기준 escalate 97% 가 정상 (safety 정책 의도) → <80% 가 진짜 신호. -->
{@const esc_tone = esc_rate < 0.80 ? 'text-error' : 'text-text'}
{@const json_tone = json_rate > 0.05 ? 'text-error' : 'text-text'}
{@const sup_tone = sup_rate > 0.10 ? 'text-warning' : 'text-text'}
<div class="grid grid-cols-1 md:grid-cols-3 gap-4 mb-5">
{@const deep_tone = deep_err_rate > 0.05 ? 'text-error' : 'text-text'}
<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4 mb-5">
<!-- 에스컬레이션 비율 -->
<Card class="h-full">
<div class="flex items-start justify-between">
@@ -372,9 +377,9 @@
</p>
<p class="text-xs text-dim mt-1">
{th.escalated_total} / {th.triage_total}
{#if esc_rate > 0.20}<span class="text-error ml-1">(튜닝 필요)</span>{/if}
{#if esc_rate < 0.01}<span class="text-dim ml-1">(false negative?)</span>{/if}
{#if esc_rate < 0.80}<span class="text-error ml-1">(매칭 실패 증가)</span>{/if}
</p>
<p class="text-[10px] text-faint mt-1">safety 정책상 95~100% 가 정상</p>
{#if Object.keys(th.escalation_by_reason).length > 0}
<div class="mt-2 flex flex-wrap gap-1">
{#each Object.entries(th.escalation_by_reason).slice(0, 4) as [reason, n]}
@@ -417,6 +422,22 @@
</p>
<p class="text-[10px] text-faint mt-1">10% 초과 시 ratio/pending threshold 조정</p>
</Card>
<!-- Deep summary 안정성 (Day 4 신규) -->
<Card class="h-full">
<div class="flex items-start justify-between">
<p class="text-sm text-dim">Deep summary 안정성 (24h)</p>
<Sparkles size={18} class="text-faint" />
</div>
<p class="text-2xl font-bold mt-2 {deep_tone}">
{(deep_err_rate * 100).toFixed(1)}%
</p>
<p class="text-xs text-dim mt-1">
실패 {th.deep_err_total ?? 0} / {deep_total}
{#if deep_err_rate > 0.05}<span class="text-error ml-1">(MLX 안정성 점검)</span>{/if}
</p>
<p class="text-[10px] text-faint mt-1">call_failed / parse:* 합계, 5% 초과 시 점검</p>
</Card>
</div>
{/if}