Files
hyungi_document_server/migrations/102_ask_events.sql
Hyungi Ahn b2306c3afd feat(ask): Phase 3.5b guardrails — verifier + telemetry + grounding 강화
Phase 3.5a(classifier+refusal gate+grounding) 위에 4개 Item 추가:

Item 0: ask_events telemetry 배선
- AskEvent ORM 모델 + record_ask_event() — ask_events INSERT 완성
- defense_layers에 input_snapshot(query, chunks, answer) 저장
- refused/normal 두 경로 모두 telemetry 호출

Item 3: evidence 간 numeric conflict detection
- 동일 단위 다른 숫자 → weak flag
- "이상/이하/초과/미만" threshold 표현 → skip (FP 방지)

Item 4: fabricated_number normalization 개선
- 단위 접미사 건/원 추가, 범위 표현(10~20%) 양쪽 추출
- bare number 2자리 이상만 (1자리 FP 제거)

Item 1: exaone semantic verifier (판단권 잠금 배선)
- verifier_service.py — 3s timeout, circuit breaker, severity 3단계
- direct_negation만 strong, numeric/intent→medium, 나머지→weak
- verifier strong 단독 refuse 금지 — grounding과 교차 필수
- 6-tier re-gate (4라운드 리뷰 확정)
- grounding strong 2+ OR max_score<0.2 → verifier skip

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 09:49:56 +09:00

25 lines
980 B
SQL

-- Phase 3.5a: /ask 호출 관측 테이블
-- refusal rate 측정, 지표 3 분리 (full/partial/insufficient), defense layer 디버깅
CREATE TABLE IF NOT EXISTS ask_events (
id BIGSERIAL PRIMARY KEY,
query TEXT NOT NULL,
user_id BIGINT REFERENCES users(id),
completeness TEXT, -- full / partial / insufficient
synthesis_status TEXT,
confidence TEXT,
refused BOOLEAN DEFAULT false,
classifier_verdict TEXT, -- sufficient / insufficient / null (skipped)
max_rerank_score REAL,
aggregate_score REAL,
hallucination_flags JSONB DEFAULT '[]',
evidence_count INT,
citation_count INT,
defense_layers JSONB, -- per-layer flag snapshot (score_gate, classifier, grounding)
total_ms INT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_ask_events_created ON ask_events(created_at);
CREATE INDEX IF NOT EXISTS idx_ask_events_completeness ON ask_events(completeness);