Files
Hyungi Ahn 34f79f84f2 feat(search): B-2 evidence LLM → 4B triage 전환 + answerability 컬럼
Plan 본래 의도: 근거 선별은 4B, 합성은 26B.

- evidence_service: LLM 호출을 primary(26B MLX) → triage(4B Ollama) 로 전환.
  Ollama concurrent 가능하므로 get_mlx_gate() 제거. synthesis 는 여전히
  llm_gate Semaphore(1) 경유로 MLX 보호.
- prompt_version v3-evidence-triage bump (synthesis 프롬프트 자체는 v2-600char
  그대로, evidence LLM 경로 변경을 분리 추적).
- migrations 161/162: analyze_events 에 answerability / partial_basis /
  suggested_query_count 컬럼 + partial index. /ask 는 이미 ask_events 에
  completeness (full/partial/insufficient) 기록 운영 중이므로, analyze_events
  쪽은 향후 문서 분석에서 answerability 개념 도입 시 활용 예비.
- telemetry record_analyze_event 에 answerability / partial_basis /
  suggested_query_count 파라미터 확장.

기존 /ask 3-state completeness 로직 (classifier_service + 7-tier gate) 은
그대로 유지 — 이미 Phase 3.5a 에서 완성된 상태. B-2 는 LLM 부하 재분배와
관측성 확장에 집중.

MLX 부하 감소 효과: 이전엔 쿼리 1건당 evidence(26B) + synthesis(26B) 2번
MLX 호출. 이제는 evidence(4B Ollama) + synthesis(26B MLX) 로 MLX 호출 절반.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:33:32 +09:00

111 lines
3.8 KiB
Python

"""document 관련 telemetry — Phase E.2 (analyze_events).
/documents/{id}/analyze 호출을 background task로 DB에 기록.
search_telemetry.py 패턴 동일 (단독 세션 + 에러 흡수).
"""
from __future__ import annotations
import logging
from typing import Any
from sqlalchemy.exc import SQLAlchemyError
from core.database import async_session
from models.analyze_event import AnalyzeEvent
logger = logging.getLogger("document_telemetry")
# source enum validation — 서버 강제 fallback
VALID_SOURCES: set[str] = {
"document_server",
"synology_chat",
"ui_search",
"ui_detail",
"eval",
"unknown",
}
DEFAULT_SOURCE = "document_server"
def sanitize_source(raw: str | None) -> str:
"""source 값 서버 강제. enum 외 값은 unknown, None은 document_server."""
if raw is None:
return DEFAULT_SOURCE
lowered = raw.strip().lower()
if lowered in VALID_SOURCES:
return lowered
return "unknown"
async def record_analyze_event(
doc_id: int,
user_id: int | None,
mode: str,
text_limit: int | None,
truncated: bool,
layers_returned: list[str],
cached: bool,
latency_ms: int,
model_name: str | None,
prompt_version: str | None,
error_code: str | None,
source: str,
# PR-A shadow observability — 아래 6개는 routing 이 동반될 때만 세팅, 그 외는 None 유지.
subject_domain: str | None = None,
risk_flags: list[str] | None = None,
high_impact_task: bool | None = None,
escalation_reasons: list[str] | None = None,
confidence: float | None = None,
policy_version: str | None = None,
shadow_would_route_to: str | None = None,
# PR-B B-1 — 실제 호출 tier 와 R2 backlog guard
tier: str | None = None,
escalated_to_26b: bool | None = None,
suppressed_reason: str | None = None,
# PR-B B-2 — /ask 3-state answerability
answerability: str | None = None,
partial_basis: bool | None = None,
suggested_query_count: int | None = None,
) -> None:
"""analyze_events INSERT. background task에서 호출 — 에러 삼킴.
layers_returned: 성공 시 ["evidence","summary"] 등 layer 문자열 리스트. 실패 시 [].
error_code: None (성공) | "timeout" | "llm" | "parse" | "missing_summary" | "no_text" | "not_found"
tier: 'triage' | 'primary' | 'fallback' — 실제 호출된 tier (PR-B B-0~B-2).
suppressed_reason: R2 backlog guard 로 soft escalate 가 suppress 된 경우의 이유 문자열.
"""
try:
async with async_session() as session:
row = AnalyzeEvent(
doc_id=doc_id,
user_id=user_id,
mode=mode,
text_limit=text_limit,
truncated=truncated,
layers_returned=layers_returned,
cached=cached,
latency_ms=latency_ms,
model_name=model_name,
prompt_version=prompt_version,
error_code=error_code,
source=source,
subject_domain=subject_domain,
risk_flags=risk_flags,
high_impact_task=high_impact_task,
escalated_to_26b=escalated_to_26b,
escalation_reasons=escalation_reasons,
confidence=confidence,
policy_version=policy_version,
shadow_would_route_to=shadow_would_route_to,
tier=tier,
suppressed_reason=suppressed_reason,
answerability=answerability,
partial_basis=partial_basis,
suggested_query_count=suggested_query_count,
)
session.add(row)
await session.commit()
except SQLAlchemyError as exc:
logger.warning(f"analyze_event insert failed: {exc}")