feat(search): E.3 — ask synthesis prompt v2-600char bump

한도 400 → 600 자. baseline 관찰(partial avg 168자 / full 10%)에서
길이 제약이 실제 출력 제약이 되는 현상 확인, 절차·비교 카테고리
답변 깊이 확보 목적.

변경 4 라인:
- search_synthesis.txt:17  answer 400→600 characters max
- prompt_versions.py:20    v1-400char → v2-600char (telemetry)
- synthesis_service.py:42  PROMPT_VERSION v1→v2 (cache key 의미론 동기화)
- synthesis_service.py:46  MAX_ANSWER_CHARS 400→600 (hard clip 동기화)

v1 post-tier0 baseline: 225 rows, partial 51% / insufficient 49% / full 0%
(Tier 0 fix 로 full+refused=True 모순 0 건). E.6 는 이 clean baseline 을
compare-against 로 사용.

향후 티켓: PROMPT_VERSION 과 ASK_PROMPT_VERSION 단일 소스 통합.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-17 12:02:51 +09:00
parent c9f766512d
commit eb9dc94604
3 changed files with 4 additions and 4 deletions
+1 -1
View File
@@ -14,7 +14,7 @@ Given a query and numbered evidence spans, write a short answer that cites speci
}
## Rules
- `answer`: **400 characters max**. Must contain inline `[n]` citations. Every claim sentence ends with at least one `[n]`. Multiple sources: `[1][3]`. **Only use facts present in evidence. No outside knowledge, no guessing, no paraphrasing what is not there.**
- `answer`: **600 characters max**. Must contain inline `[n]` citations. Every claim sentence ends with at least one `[n]`. Multiple sources: `[1][3]`. **Only use facts present in evidence. No outside knowledge, no guessing, no paraphrasing what is not there.**
- `used_citations`: integer list of `n` values that actually appear in `answer` (for cross-check). Must be sorted ascending, no duplicates.
- `confidence`:
- `high`: 3+ evidence items with strong relevance
+1 -1
View File
@@ -17,7 +17,7 @@ from __future__ import annotations
# ─── ask (/search/ask) 프롬프트 버전 ─────────────────────────
# synthesis_service.py 가 로드하는 app/prompts/search_synthesis.txt 기준
ASK_PROMPT_VERSION: str = "search_synthesis.v1-400char"
ASK_PROMPT_VERSION: str = "search_synthesis.v2-600char"
# ─── /analyze 프롬프트 버전 ──────────────────────────────────
# documents.py analyze 라우트가 로드하는 app/prompts/document_analyze.txt 기준
+2 -2
View File
@@ -39,11 +39,11 @@ if TYPE_CHECKING:
logger = setup_logger("synthesis")
# ─── 상수 (plan 영구 룰) ─────────────────────────────────
PROMPT_VERSION = "v1"
PROMPT_VERSION = "v2"
LLM_TIMEOUT_MS = 15000
CACHE_TTL = 3600 # 1h (answer 는 원문 변경에 민감 → query_analyzer 24h 보다 짧게)
CACHE_MAXSIZE = 300
MAX_ANSWER_CHARS = 400
MAX_ANSWER_CHARS = 600
SynthesisStatus = Literal[
"completed",