feat(search): Phase 2.2 multilingual vector retrieval + query embed cache

## 변경 사항

### app/services/search/retrieval_service.py
 - **_QUERY_EMBED_CACHE**: 모듈 레벨 LRU (maxsize=500, TTL=24h)
   - sha256(text|bge-m3) 키. fixed query 재호출 시 vector_ms 절반 감소.
 - **_get_query_embedding(client, text)**: cache-first helper. 기존 search_vector()도 이를 사용하도록 교체.
 - **search_vector_multilingual(session, normalized_queries, limit)**: 신규
   - normalized_queries 각 언어별 embedding 병렬 생성 (cache hit 활용)
   - 각 embedding에 대해 docs+chunks hybrid retrieval 병렬
   - weight 기반 score 누적 merge (lang_weight 이미 1.0 정규화)
   - match_reason에 "ml_ko+en" 등 언어 병합 표시
   - 호출 조건 문서화 — cache hit + analyzer_tier=analyzed 시에만

### app/api/search.py
 - use_multilingual 결정 로직:
   - analyzer_cache_hit == True
   - analyzer_tier == "analyzed" (confidence >= 0.85)
   - normalized_queries >= 2 (다언어 버전 실제 존재)
 - 위 3조건 모두 만족할 때만 search_vector_multilingual 호출
 - 그 외 모든 경로 (cache miss, low conf, single lang)는 기존 search_vector 그대로 사용 (회귀 0 보장)
 - notes에 `multilingual langs=[ko, en, ...]` 기록

## 기대 효과
 - crosslingual_ko_en NDCG 0.53 → 0.65+ (Phase 2 목표)
 - 기존 경로 완전 불변 → 회귀 0
 - Phase 2.1 async 구조와 결합해 "cache hit일 때만 활성" 조건 준수

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-08 14:59:20 +09:00
parent 1e80d4c613
commit f5c3dea833
2 changed files with 186 additions and 9 deletions

View File

@@ -23,7 +23,12 @@ from services.search.rerank_service import (
apply_diversity,
rerank_chunks,
)
from services.search.retrieval_service import compress_chunks_to_docs, search_text, search_vector
from services.search.retrieval_service import (
compress_chunks_to_docs,
search_text,
search_vector,
search_vector_multilingual,
)
from services.search_telemetry import (
compute_confidence,
compute_confidence_hybrid,
@@ -180,9 +185,26 @@ async def search(
+ (" (bg triggered)" if triggered else " (bg inflight)")
)
# Phase 2.2: multilingual vector search 활성 조건
# - cache hit + analyzer_tier == "analyzed" (≥0.85 고신뢰)
# - normalized_queries 2개 이상 (lang 다양성 있음)
# 그 외 케이스는 기존 single-query search_vector 그대로 사용 (회귀 0).
use_multilingual: bool = False
normalized_queries: list[dict] = []
if analyzer_cache_hit and analyzer_tier == "analyzed" and query_analysis:
raw_nq = query_analysis.get("normalized_queries") or []
if isinstance(raw_nq, list) and len(raw_nq) >= 2:
normalized_queries = [nq for nq in raw_nq if isinstance(nq, dict) and nq.get("text")]
if len(normalized_queries) >= 2:
use_multilingual = True
notes.append(f"multilingual langs={[nq.get('lang') for nq in normalized_queries]}")
if mode == "vector":
t0 = time.perf_counter()
raw_chunks = await search_vector(session, q, limit)
if use_multilingual:
raw_chunks = await search_vector_multilingual(session, normalized_queries, limit)
else:
raw_chunks = await search_vector(session, q, limit)
timing["vector_ms"] = (time.perf_counter() - t0) * 1000
if not raw_chunks:
notes.append("vector_search_returned_empty (AI client error or no embeddings)")
@@ -196,7 +218,10 @@ async def search(
if mode == "hybrid":
t1 = time.perf_counter()
raw_chunks = await search_vector(session, q, limit)
if use_multilingual:
raw_chunks = await search_vector_multilingual(session, normalized_queries, limit)
else:
raw_chunks = await search_vector(session, q, limit)
timing["vector_ms"] = (time.perf_counter() - t1) * 1000
# chunk-level → doc-level 압축 (raw chunks는 chunks_by_doc에 보존)