Feat/ds ai routing policy #23

Merged
hyungi merged 2 commits from feat/ds-ai-routing-policy into main 2026-05-24 12:20:50 +09:00
Owner
No description provided.
hyungi added 4 commits 2026-05-23 13:02:57 +09:00
queries.yaml v0.1 23 case → v0.2 schema swap:
- 7 카테고리 (standards / korean_only / english_only / mixed / exam /
  ocr_derived / failure_expected)
- language / ocr_derived / failure_expected / graded_relevance 컬럼 추가
- v0.1 호환 보존 (legacy_category + relevant_ids + top3_ids)
- 신규 28 case (50+ 목표) 는 후속 PR-Eval-V0_2-Baseline-Analysis

run_eval.py 확장:
- graded_ndcg_at_k / graded_recall_at_k 함수 추가
- Query / QueryResult dataclass 확장 (v0.2 컬럼)
- load_queries v0.1 fallback (top3 → grade 3, 나머지 → grade 2)
- --eval-version v0.1/v0.2/both flag (default both)
- print_summary 의 by_language / by_ocr_derived 집계 추가
- write_csv 의 graded 컬럼 추가

README.md 신규:
- graded 등급 정의 (0~3) + 카테고리 정의 (7개)
- v0.2 schema 컬럼 + 신규 case 작성 가이드
- v0.1 호환성 + CLI 사용 예 + baseline 박제 정책

Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md
Parent: ~/.claude/plans/peppy-hugging-nest.md § Phase 1

본 PR closure: schema + harness + README. 신규 28 case + baseline 박제 +
약점 분석 (embedding-sensitive failure pattern 4 카테고리 식별) 은 후속 PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-1 (725a4e1) v0.2 schema + harness 위에 신규 28 case 추가 → 51 case
완성 + 현재 모델로 baseline 박제 + 약점 카테고리 analysis md.

신규 28 case 분포 (계획 +28 = standards +6 / english_only +8 / mixed +5
/ exam +7 / failure_expected +2 / ocr_derived 0):
- standards 5 → 11 (KGS FP111/FU551 + 산안기준 후반 편 + 고압가스법)
- english_only 1 → 9 (Pressure Vessel Design Manual + ASME VIII/IX +
  Hydrogen ASME + Industrial Safety 영문 교재 + Structural Analysis)
- mixed 5 → 10 (한↔영 ASME / KGS-영문 / 양언어 압력용기)
- exam 0 → 7 (가스기사 study_questions → library 개념 docs 매핑)
- failure_expected 3 → 5 (KGS AC999 / 초전도 안전 관리법)
- ocr_derived 0 (TBD-O FAILED: extract_meta NULL 21385, chunks.source
  = RSS feed 명. OCR 식별 컬럼 부재 → +4 case 재배분, analysis 명시)

baseline 측정 결과 (corpus 21,385, hybrid mode, bge-m3 + bge-reranker-v2-m3):
- v0.1 Recall@10 0.646, MRR 0.724, NDCG 0.606, Top-3 0.891
- v0.2 graded NDCG 0.659, Recall@10 g≥2 0.695, g≥3 0.761
- latency p50 528ms / p95 1,664ms
- failure precision 0/5 (DS confidence threshold 미적용)

약점 top 3 (analysis md):
- mixed crosslingual 0.39 graded NDCG — TOP weakness, bge-m3
  multilingual 한계 추정
- korean_only natural language 0.51 — query rewrite 부재 추정
- failure_expected 0/5 — confidence cutoff 부재

Phase 2 dispatch 권고 (analysis md):
- 2A Embedding bge-m3 — 즉시 진입 (mixed/korean 동시 타격)
- 2B Reranker — M (2A 이후)
- 2C OCR-Marker — 선행 chore (OCR 식별 컬럼 추가) 필요
- 2D STT — 본 평가셋 외 (별 평가셋 필요)

Query rewrite 는 Phase 2Q/Search-PR 로 별도 분리.

영향 받는 파일:
- tests/search_eval/queries.yaml: 23 → 51 case (기존 23 변경 0, append only)
- tests/search_eval/baselines/v0_2_baseline_2026-05-23.json: 신규
- tests/search_eval/baselines/v0_2_baseline_2026-05-23_analysis.md: 신규

PR plan: ~/.claude/plans/pr-2-serialized-hummingbird.md
Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 of DS AI routing policy (2026-05-23, see plan
~/.claude/plans/document-server-ai-cheeky-reddy.md +
memory project_document_server_ai_routing_policy).

DS 의 모든 backend 호출이 llm-router :8890 단일 경유. 정칙 정합:
- 신규 RouterBackend (services/llm/backends.py) — alias 별 router POST
  + requires_gate 분기 (mac-mini-default 만 llm_gate FOREGROUND 보호).
- 기존 GemmaMacMiniBackend + QwenMacBookBackend = legacy 보존
  (DS_BACKENDS_VIA_ROUTER=false rollback safety only). 1주 후 별
  cleanup PR (PR-DS-Backends-Legacy-Cleanup-1) 로 폐기.
- get_backend factory dual-path (env flag) — backward-compat
  (gemma-macmini alias → mac-mini-default 매핑).
- search.py:457 Query pattern 확장: mac-mini-default|claude-cloud|auto
  추가. /ask/react 의 isinstance(QwenMacBookBackend) → hasattr
  duck-typing (RouterBackend + Legacy 모두 generate_with_tools 구현).
- SearchAskBackendConfig 에 router_url 신규 (env LLM_ROUTER_URL 또는
  hardcoded MVP default http://100.76.254.116:8890).
- docker-compose.yml fastapi env 에 LLM_ROUTER_URL +
  DS_BACKENDS_VIA_ROUTER 추가.

AIClient (_call_chat, call_triage, call_primary, call_fallback) 경유
path 는 별 PR (PR-AIClient-Router-Migration-1) — MVP scope C 채택,
회귀 risk 최소화.

Closure (즉시 fixture/matrix):
- factory smoke 6 alias (None/mac-mini-default/gemma-macmini/
  qwen-macbook/claude-cloud/auto) + 1 invalid (nonsense → ValueError).
- live 3 case: mac-mini-default 200 \"pong! 🏓\" + qwen-macbook cold
  502 upstream_502_primary=ConnectError + claude-cloud 503
  provider_not_configured.
- silent fallback 0 + direct M5/Mac mini socket 0
  (RouterBackend 만 router 호출).

Backup: ~/.local/share/ds-routing-pr2-backups/20260523/
(backends.py + config.py + search.py + docker-compose.yml).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-3 of DS AI routing policy (2026-05-23, see plan
~/.claude/plans/document-server-ai-cheeky-reddy.md +
memory project_document_server_ai_routing_policy).

기존 BackendSelector (PR-DocSrv-Web-Ask-Selector-1, 2 옵션 default
qwen-macbook) 확장 — 4 옵션 + DeviceToggle inline.

UI 변경 (frontend/src/routes/ask/+page.svelte):
- BackendChoice = auto | mac-mini-default | qwen-macbook | claude-cloud
  (기존 default 는 legacy alias, auto 또는 mac-mini-default 로 자동 매핑).
- select 4 옵션 (Auto router / Mac mini default / This device /
  Claude Cloud) + tooltip.
- DeviceToggle (checkbox 'This is M5 Max') inline — localStorage
  ds_device_self_label = macbook-m5-max | null. mount 시 복원.
- This device 옵션 disabled state = !isMacBookM5Max (토글 off 시
  grey-out). 토글 off 시 qwen-macbook 선택돼 있었으면 auto 복귀.
- Claude Cloud 옵션 disabled state = !CLOUD_DEV_ENABLED (build-time
  flag VITE_ENABLE_CLOUD_BACKEND_DEV, default false). 운영 토글
  불가 — 후속 PR DS runtime feature flag API 로 migrate 예정.
- friendlyErrorMessage(reason) — 503 error_reason 매핑
  (macbook_unavailable / provider_not_configured / router_* / upstream_*).
- retryWithDefault → retryWithMacMiniDefault 명명 정정.
- parseBackend backward-compat: default / gemma-macmini →
  mac-mini-default.

source IP 의존 0 (PR-0 round 2 발견: caddy 2-hop + X-Forwarded-For
미설정 → DS 가 보는 source IP = LAN gateway, 신뢰 불가).
사용자 명시 토글 + localStorage 방식 채택 (Q3=C).

Closure (build + bundle string + lint):
- frontend build PASS (SvelteKit/TS syntax + svelte compile 모두 OK).
- 컴파일된 bundle 에 9 핵심 string 박혀있음 (mac-mini-default /
  qwen-macbook / claude-cloud / Auto router / This is M5 Max /
  ds_device_self_label / provider_not_configured / This device /
  Cloud backend not configured).
- lint:tokens 본 PR 변경 위반 0 (기존 62 stale debt 는 별 chore
  PR-DocSrv-Frontend-Token-Cleanup-1).

Backup: ~/.local/share/ds-routing-pr2-backups/20260523/
ask-page.svelte.pre-pr3.

선행: PR-1 (llm-router alias scaffold) + PR-2 (RouterBackend
dispatcher, refactor commit bcf644f) closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyungi merged commit 1ae7802485 into main 2026-05-24 12:20:50 +09:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: hyungi/hyungi_document_server#23