feat(ask): Phase 3.5a guardrails (classifier + refusal gate + grounding + partial)
신규 파일: - classifier_service.py: exaone binary classifier (sufficient/insufficient) parallel with evidence, circuit breaker, timeout 5s - refusal_gate.py: multi-signal fusion (score + classifier) AND 조건, conservative fallback 3-tier (classifier 부재 시) - grounding_check.py: strong/weak flag 분리 strong: fabricated_number + intent_misalignment(important keywords) weak: uncited_claim + low_overlap + intent_misalignment(generic) re-gate: 2+ strong → refuse, 1 strong → partial - sentence_splitter.py: regex 기반 (Phase 3.5b KSS 업그레이드) - classifier.txt: exaone Y+ prompt (calibration examples 포함) - search_synthesis_partial.txt: partial answer 전용 프롬프트 - 102_ask_events.sql: /ask 관측 테이블 (completeness 3-분리 지표) - queries.yaml: Phase 3.5 smoke test 평가셋 10개 수정 파일: - search.py /ask: classifier parallel + refusal gate + grounding re-gate + defense_layers 로깅 + AskResponse completeness/aspects/confirmed_items - config.yaml: classifier model 섹션 (exaone3.5:7.8b GPU Ollama) - config.py: classifier optional 파싱 - AskAnswer.svelte: 4분기 렌더 (full/partial/insufficient/loading) - ask.ts: Completeness + ConfirmedItem 타입 P1 실측: exaone ternary 불안정 → binary gate 축소. partial은 grounding이 담당. 토론 9라운드 확정. plan: quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
33
app/prompts/classifier.txt
Normal file
33
app/prompts/classifier.txt
Normal file
@@ -0,0 +1,33 @@
|
||||
You are an answerability judge. Given a query and evidence chunks, determine if the evidence can answer the query. Respond ONLY in JSON.
|
||||
|
||||
## CALIBRATION (CRITICAL)
|
||||
- verdict=full: evidence is SUFFICIENT to answer the CORE of the query. Missing minor details does NOT make it insufficient.
|
||||
- verdict=partial: evidence covers SOME major aspects but CLEARLY MISSES others the user explicitly asked about.
|
||||
- verdict=insufficient: evidence has NO relevant information for the query, or is completely off-topic.
|
||||
|
||||
Example: Query="제6장 주요 내용", Evidence covers 제6장 definition+scope → verdict=full (core is covered).
|
||||
Example: Query="제6장 처벌 조항", Evidence covers 제6장 definition but NOT 처벌 → verdict=partial.
|
||||
Example: Query="감귤 출하량", Evidence about 산업안전보건법 → verdict=insufficient.
|
||||
|
||||
## Rules
|
||||
1. Your "verdict" must be based ONLY on whether the CONTENT semantically answers the query. Ignore retrieval scores for this field.
|
||||
2. "covered_aspects": query aspects that evidence covers. Korean labels for Korean queries.
|
||||
3. "missing_aspects": query aspects that evidence does NOT cover. Korean labels.
|
||||
4. Keep aspects concise (2-5 words each), non-overlapping.
|
||||
|
||||
## Output Schema
|
||||
{
|
||||
"verdict": "full" | "partial" | "insufficient",
|
||||
"covered_aspects": ["aspect1"],
|
||||
"missing_aspects": ["aspect2"],
|
||||
"confidence": "high" | "medium" | "low"
|
||||
}
|
||||
|
||||
## Query
|
||||
{query}
|
||||
|
||||
## Evidence chunks:
|
||||
{chunks}
|
||||
|
||||
## Retrieval scores (for reference only, NOT for verdict):
|
||||
[{scores}]
|
||||
34
app/prompts/search_synthesis_partial.txt
Normal file
34
app/prompts/search_synthesis_partial.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
You are a grounded answer synthesizer handling a PARTIAL answer case. Some aspects of the query CAN be answered, others CANNOT. Respond ONLY in JSON.
|
||||
|
||||
## Task
|
||||
Answer ONLY the covered aspects. Do NOT attempt to answer missing aspects.
|
||||
|
||||
## Output Schema
|
||||
{
|
||||
"confirmed_items": [
|
||||
{"aspect": "aspect label", "text": "1~2 sentence answer", "citations": [1, 2]}
|
||||
],
|
||||
"confidence": "medium" | "low",
|
||||
"refused": false
|
||||
}
|
||||
|
||||
## Rules
|
||||
- Each confirmed_item: aspect label + 1~2 sentences + inline [n] citations
|
||||
- ONLY use facts present in evidence. No outside knowledge, no guessing.
|
||||
- Do NOT mention or address missing_aspects in your text.
|
||||
- Korean query → Korean answer / English → English
|
||||
- confidence: medium (2+ strong evidence matches) / low (1 or weak)
|
||||
- Max total text: 400 chars across all items
|
||||
- 모든 주장 문장 끝에 [n] 필수
|
||||
|
||||
## Covered aspects (answer these):
|
||||
{covered_aspects}
|
||||
|
||||
## Missing aspects (do NOT answer these):
|
||||
{missing_aspects}
|
||||
|
||||
## Query
|
||||
{query}
|
||||
|
||||
## Evidence
|
||||
{numbered_evidence}
|
||||
Reference in New Issue
Block a user