feat(search): E.3 — ask synthesis prompt v2-600char bump

한도 400 → 600 자. baseline 관찰(partial avg 168자 / full 10%)에서 길이 제약이 실제 출력 제약이 되는 현상 확인, 절차·비교 카테고리 답변 깊이 확보 목적. 변경 4 라인: - search_synthesis.txt:17 answer 400→600 characters max - prompt_versions.py:20 v1-400char → v2-600char (telemetry) - synthesis_service.py:42 PROMPT_VERSION v1→v2 (cache key 의미론 동기화) - synthesis_service.py:46 MAX_ANSWER_CHARS 400→600 (hard clip 동기화) v1 post-tier0 baseline: 225 rows, partial 51% / insufficient 49% / full 0% (Tier 0 fix 로 full+refused=True 모순 0 건). E.6 는 이 clean baseline 을 compare-against 로 사용. 향후 티켓: PROMPT_VERSION 과 ASK_PROMPT_VERSION 단일 소스 통합. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:02:51 +09:00
parent c9f766512d
commit eb9dc94604
3 changed files with 4 additions and 4 deletions
@@ -14,7 +14,7 @@ Given a query and numbered evidence spans, write a short answer that cites speci
 }

 ## Rules
- `answer`: **400 characters max**. Must contain inline `[n]` citations. Every claim sentence ends with at least one `[n]`. Multiple sources: `[1][3]`. **Only use facts present in evidence. No outside knowledge, no guessing, no paraphrasing what is not there.**
+- `answer`: **600 characters max**. Must contain inline `[n]` citations. Every claim sentence ends with at least one `[n]`. Multiple sources: `[1][3]`. **Only use facts present in evidence. No outside knowledge, no guessing, no paraphrasing what is not there.**
 - `used_citations`: integer list of `n` values that actually appear in `answer` (for cross-check). Must be sorted ascending, no duplicates.
 - `confidence`:
  - `high`: 3+ evidence items with strong relevance
@@ -17,7 +17,7 @@ from __future__ import annotations

 # ─── ask (/search/ask) 프롬프트 버전 ─────────────────────────
 # synthesis_service.py 가 로드하는 app/prompts/search_synthesis.txt 기준
-ASK_PROMPT_VERSION: str = "search_synthesis.v1-400char"
+ASK_PROMPT_VERSION: str = "search_synthesis.v2-600char"

 # ─── /analyze 프롬프트 버전 ──────────────────────────────────
 # documents.py analyze 라우트가 로드하는 app/prompts/document_analyze.txt 기준
@@ -39,11 +39,11 @@ if TYPE_CHECKING:
 logger = setup_logger("synthesis")

 # ─── 상수 (plan 영구 룰) ─────────────────────────────────
-PROMPT_VERSION = "v1"
+PROMPT_VERSION = "v2"
 LLM_TIMEOUT_MS = 15000
 CACHE_TTL = 3600  # 1h (answer 는 원문 변경에 민감 → query_analyzer 24h 보다 짧게)
 CACHE_MAXSIZE = 300
-MAX_ANSWER_CHARS = 400
+MAX_ANSWER_CHARS = 600

 SynthesisStatus = Literal[
    "completed",