Phase 3.5a(classifier+refusal gate+grounding) 위에 4개 Item 추가: Item 0: ask_events telemetry 배선 - AskEvent ORM 모델 + record_ask_event() — ask_events INSERT 완성 - defense_layers에 input_snapshot(query, chunks, answer) 저장 - refused/normal 두 경로 모두 telemetry 호출 Item 3: evidence 간 numeric conflict detection - 동일 단위 다른 숫자 → weak flag - "이상/이하/초과/미만" threshold 표현 → skip (FP 방지) Item 4: fabricated_number normalization 개선 - 단위 접미사 건/원 추가, 범위 표현(10~20%) 양쪽 추출 - bare number 2자리 이상만 (1자리 FP 제거) Item 1: exaone semantic verifier (판단권 잠금 배선) - verifier_service.py — 3s timeout, circuit breaker, severity 3단계 - direct_negation만 strong, numeric/intent→medium, 나머지→weak - verifier strong 단독 refuse 금지 — grounding과 교차 필수 - 6-tier re-gate (4라운드 리뷰 확정) - grounding strong 2+ OR max_score<0.2 → verifier skip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
25 lines
980 B
SQL
25 lines
980 B
SQL
-- Phase 3.5a: /ask 호출 관측 테이블
|
|
-- refusal rate 측정, 지표 3 분리 (full/partial/insufficient), defense layer 디버깅
|
|
|
|
CREATE TABLE IF NOT EXISTS ask_events (
|
|
id BIGSERIAL PRIMARY KEY,
|
|
query TEXT NOT NULL,
|
|
user_id BIGINT REFERENCES users(id),
|
|
completeness TEXT, -- full / partial / insufficient
|
|
synthesis_status TEXT,
|
|
confidence TEXT,
|
|
refused BOOLEAN DEFAULT false,
|
|
classifier_verdict TEXT, -- sufficient / insufficient / null (skipped)
|
|
max_rerank_score REAL,
|
|
aggregate_score REAL,
|
|
hallucination_flags JSONB DEFAULT '[]',
|
|
evidence_count INT,
|
|
citation_count INT,
|
|
defense_layers JSONB, -- per-layer flag snapshot (score_gate, classifier, grounding)
|
|
total_ms INT,
|
|
created_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_ask_events_created ON ask_events(created_at);
|
|
CREATE INDEX IF NOT EXISTS idx_ask_events_completeness ON ask_events(completeness);
|