refactor(search): /api/search/ask dispatcher route via llm-router
PR-2 of DS AI routing policy (2026-05-23, see plan ~/.claude/plans/document-server-ai-cheeky-reddy.md + memory project_document_server_ai_routing_policy). DS 의 모든 backend 호출이 llm-router :8890 단일 경유. 정칙 정합: - 신규 RouterBackend (services/llm/backends.py) — alias 별 router POST + requires_gate 분기 (mac-mini-default 만 llm_gate FOREGROUND 보호). - 기존 GemmaMacMiniBackend + QwenMacBookBackend = legacy 보존 (DS_BACKENDS_VIA_ROUTER=false rollback safety only). 1주 후 별 cleanup PR (PR-DS-Backends-Legacy-Cleanup-1) 로 폐기. - get_backend factory dual-path (env flag) — backward-compat (gemma-macmini alias → mac-mini-default 매핑). - search.py:457 Query pattern 확장: mac-mini-default|claude-cloud|auto 추가. /ask/react 의 isinstance(QwenMacBookBackend) → hasattr duck-typing (RouterBackend + Legacy 모두 generate_with_tools 구현). - SearchAskBackendConfig 에 router_url 신규 (env LLM_ROUTER_URL 또는 hardcoded MVP default http://100.76.254.116:8890). - docker-compose.yml fastapi env 에 LLM_ROUTER_URL + DS_BACKENDS_VIA_ROUTER 추가. AIClient (_call_chat, call_triage, call_primary, call_fallback) 경유 path 는 별 PR (PR-AIClient-Router-Migration-1) — MVP scope C 채택, 회귀 risk 최소화. Closure (즉시 fixture/matrix): - factory smoke 6 alias (None/mac-mini-default/gemma-macmini/ qwen-macbook/claude-cloud/auto) + 1 invalid (nonsense → ValueError). - live 3 case: mac-mini-default 200 \"pong! 🏓\" + qwen-macbook cold 502 upstream_502_primary=ConnectError + claude-cloud 503 provider_not_configured. - silent fallback 0 + direct M5/Mac mini socket 0 (RouterBackend 만 router 호출). Backup: ~/.local/share/ds-routing-pr2-backups/20260523/ (backends.py + config.py + search.py + docker-compose.yml). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+15
-6
@@ -36,18 +36,27 @@ class DeepSummaryBacklogConfig(BaseModel):
|
||||
|
||||
|
||||
class SearchAskBackendConfig(BaseModel):
|
||||
"""PR-MacBook-RAG-Backend-1: /api/search/ask backend dispatcher.
|
||||
"""PR-2 of DS AI routing policy ([[document-server-ai-routing-policy]], 2026-05-23):
|
||||
/api/search/ask backend dispatcher 가 llm-router :8890 단일 경유.
|
||||
|
||||
backend 미지정 = Gemma Mac mini (settings.ai.primary 경로 그대로).
|
||||
backend="qwen-macbook" 명시 opt-in = MacBook M5 Max mlx-vlm.server.
|
||||
MacBook unavailable 시 503 + error_reason=macbook_unavailable (자동 fallback 없음).
|
||||
- backend 미지정 / "gemma-macmini" / "mac-mini-default" → router 가 tier_b
|
||||
- backend "qwen-macbook" → router 가 named upstream (M5 Max)
|
||||
- backend "claude-cloud" → router 가 503 명시 (scaffold)
|
||||
- backend "auto" → router 의 rule + LLM triage
|
||||
|
||||
Unavailable → BackendUnavailable → 503 명시 (silent fallback 0).
|
||||
Rollback: DS_BACKENDS_VIA_ROUTER=false 로 legacy 직접 호출 path.
|
||||
legacy macmini_url / macbook_url / macbook_model 은 fallback 시만 사용.
|
||||
"""
|
||||
|
||||
# PR-2 신규: llm-router URL. 비면 env LLM_ROUTER_URL 또는 hardcoded default.
|
||||
router_url: str = ""
|
||||
# Legacy fields (DS_BACKENDS_VIA_ROUTER=false 시만 사용)
|
||||
macmini_url: str = "http://100.76.254.116:8801"
|
||||
macbook_url: str = "http://100.118.112.84:8810"
|
||||
macbook_model: str = "mlx-community/Qwen3.6-27B-8bit"
|
||||
timeout_connect_s: int = 1
|
||||
timeout_read_s: int = 30
|
||||
timeout_connect_s: int = 5
|
||||
timeout_read_s: int = 60
|
||||
|
||||
|
||||
class SearchAskReactConfig(BaseModel):
|
||||
|
||||
Reference in New Issue
Block a user