feat(search): Phase 2.2 multilingual vector retrieval + query embed cache

## 변경 사항 ### app/services/search/retrieval_service.py - **_QUERY_EMBED_CACHE**: 모듈 레벨 LRU (maxsize=500, TTL=24h) - sha256(text|bge-m3) 키. fixed query 재호출 시 vector_ms 절반 감소. - **_get_query_embedding(client, text)**: cache-first helper. 기존 search_vector()도 이를 사용하도록 교체. - **search_vector_multilingual(session, normalized_queries, limit)**: 신규 - normalized_queries 각 언어별 embedding 병렬 생성 (cache hit 활용) - 각 embedding에 대해 docs+chunks hybrid retrieval 병렬 - weight 기반 score 누적 merge (lang_weight 이미 1.0 정규화) - match_reason에 "ml_ko+en" 등 언어 병합 표시 - 호출 조건 문서화 — cache hit + analyzer_tier=analyzed 시에만 ### app/api/search.py - use_multilingual 결정 로직: - analyzer_cache_hit == True - analyzer_tier == "analyzed" (confidence >= 0.85) - normalized_queries >= 2 (다언어 버전 실제 존재) - 위 3조건 모두 만족할 때만 search_vector_multilingual 호출 - 그 외 모든 경로 (cache miss, low conf, single lang)는 기존 search_vector 그대로 사용 (회귀 0 보장) - notes에 `multilingual langs=[ko, en, ...]` 기록 ## 기대 효과 - crosslingual_ko_en NDCG 0.53 → 0.65+ (Phase 2 목표) - 기존 경로 완전 불변 → 회귀 0 - Phase 2.1 async 구조와 결합해 "cache hit일 때만 활성" 조건 준수 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:59:20 +09:00
parent 1e80d4c613
commit f5c3dea833
2 changed files with 186 additions and 9 deletions
--- a/app/api/search.py
+++ b/app/api/search.py
@@ -23,7 +23,12 @@ from services.search.rerank_service import (
    apply_diversity,
    rerank_chunks,
 )
-from services.search.retrieval_service import compress_chunks_to_docs, search_text, search_vector
+from services.search.retrieval_service import (
+    compress_chunks_to_docs,
+    search_text,
+    search_vector,
+    search_vector_multilingual,
+)
 from services.search_telemetry import (
    compute_confidence,
    compute_confidence_hybrid,
@@ -180,9 +185,26 @@ async def search(
                + (" (bg triggered)" if triggered else " (bg inflight)")
            )

+    # Phase 2.2: multilingual vector search 활성 조건
+    #   - cache hit + analyzer_tier == "analyzed" (≥0.85 고신뢰)
+    #   - normalized_queries 2개 이상 (lang 다양성 있음)
+    # 그 외 케이스는 기존 single-query search_vector 그대로 사용 (회귀 0).
+    use_multilingual: bool = False
+    normalized_queries: list[dict] = []
+    if analyzer_cache_hit and analyzer_tier == "analyzed" and query_analysis:
+        raw_nq = query_analysis.get("normalized_queries") or []
+        if isinstance(raw_nq, list) and len(raw_nq) >= 2:
+            normalized_queries = [nq for nq in raw_nq if isinstance(nq, dict) and nq.get("text")]
+            if len(normalized_queries) >= 2:
+                use_multilingual = True
+                notes.append(f"multilingual langs={[nq.get('lang') for nq in normalized_queries]}")
+
    if mode == "vector":
        t0 = time.perf_counter()
-        raw_chunks = await search_vector(session, q, limit)
+        if use_multilingual:
+            raw_chunks = await search_vector_multilingual(session, normalized_queries, limit)
+        else:
+            raw_chunks = await search_vector(session, q, limit)
        timing["vector_ms"] = (time.perf_counter() - t0) * 1000
        if not raw_chunks:
            notes.append("vector_search_returned_empty (AI client error or no embeddings)")
@@ -196,7 +218,10 @@ async def search(

        if mode == "hybrid":
            t1 = time.perf_counter()
-            raw_chunks = await search_vector(session, q, limit)
+            if use_multilingual:
+                raw_chunks = await search_vector_multilingual(session, normalized_queries, limit)
+            else:
+                raw_chunks = await search_vector(session, q, limit)
            timing["vector_ms"] = (time.perf_counter() - t1) * 1000

            # chunk-level → doc-level 압축 (raw chunks는 chunks_by_doc에 보존)