hyungi
b734fc54af
fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)
...
plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker
413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries
한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도).
4 iteration 진단 history (json 박제):
1) cap 60 + dedup = 413 다수 (batch 54 > 32)
2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261)
3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개)
4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL)
변경:
- app/services/search/search_pipeline.py:
· _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only.
variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존.
· PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline
MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도).
· search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap.
- docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정):
· MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도)
· MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause)
· GPU VRAM free 6199MiB 충분 사전 verify.
- tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test.
38/38 PASS (기존 32 + 신규 6).
측정 결과 (51 case, gemma backend, snapshot 25180/56526):
vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수):
· NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족)
· Recall t≥2 0.721 (+0.034 회복)
· Recall t≥3 0.739 (+0.011)
· latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win
· 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0
· 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목)
closure gate PASS:
· unit test 38/38, production smoke 413 0
· 51 case 413 < 5/51 (1건만)
· latency 대폭 개선
· NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족
· Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready
산출물:
· reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw)
· tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-24 03:54:59 +00:00
hyungi
1ae7802485
Merge pull request 'Feat/ds ai routing policy' ( #23 ) from feat/ds-ai-routing-policy into main
...
Reviewed-on: #23
2026-05-24 12:20:49 +09:00
hyungi
711d4952a2
merge(search): Phase 2Q Query Rewrite Diagnose closed — H1 multi-query gemma-4 추천
2026-05-24 02:57:59 +00:00
hyungi
c57e4c52dc
docs(eval): Phase 2Q Diagnose Phase 4 — decision tree md + Apply PR 백로그
...
phase-2q-query-rewrite-diagnose.md v6 plan §7 Phase 4 closure.
Phase 3 commit a41adb6 의 3 측정 결과 + 4 factor weighted decision.
decision = H1 (both backends NDCG net 개선 ≥ +0.26):
- 추천 Apply LLM = cand_multi_query_macmini (gemma-4)
- 사유: F3 ⭐ 24/7 가동 + F1 NDCG 0.927 dominant + F4 cold latency 우세
- 대안: qwen (mixed/english 강점 + MacBook always-on 의향 시)
산출물:
- tests/search_eval/baselines/v0_2_phase2q_decision_2026-05-24.md (180 lines)
· §1 결정 요약 / §2 측정 표 / §3 카테고리 회복 / §4 4-factor weighted
· §5 분석 노트 5건 (multi-query 효과 / variants 구성 / cache hit / Recall 회귀 / Phase 3 incident)
· §6 closure gate (branch close 사용자 결정 보류)
· §7 follow-up PR 백로그: Apply 1 + 별 chore 2 + Extended 4 + Cloud 1 + Cleanup 1
· §9 사용자 검토 항목 5건
Phase 2Q Diagnose closure 완료. Apply PR 진입 = 사용자 LLM 선택 + sequencing 결정 후.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-24 00:57:48 +00:00
hyungi
a41adb63a0
fix(search): Phase 2Q variants bug fix + Phase 3 3 measurement 박제
...
Phase 3 cold 측정 1차에서 NDCG 0.033 catastrophic 발견 — 모든 query 에 동일 variants
반환. root cause = _call_llm 이 user 메시지 1개에 prompt template 전체 박음. LLM 이
actual query 인식 못 함. fixture request_body 형식 (system=prompt / user=query) 과
mismatch. fixture-first invariant 위반.
fix:
- app/services/search/query_rewriter.py _call_llm — system/user 메시지 분리.
fixture request_body 와 단일 source-of-truth. _render_prompt 는 [deprecated] 유지.
- tests/test_query_rewriter.py — Phase 3 regression test 2:
· _call_llm 가 system + user 분리 호출 verify (httpx.AsyncClient monkeypatch)
· qwen backend = response_format 미사용 verify
- 32/32 unit test PASS.
Phase 3 측정 (fix 후 재측정, 51 case × 3 candidate × cold/warm = 5 run):
- baseline_rebaseline (rewrite_backend=null): NDCG 0.659 = Phase 2A 0.659, diff 0.000 PASS
- cand_multi_query_macmini cold: NDCG 0.927 (Δ +0.268), p50 2757ms / p95 9684ms
- cand_multi_query_macmini warm: NDCG 0.927 동일, p50 998ms (cache hit -64%)
- cand_multi_query_macbook cold: NDCG 0.919 (Δ +0.260), p50 3647ms / p95 5202ms
- cand_multi_query_macbook warm: NDCG 0.919 동일, p50 873ms (cache hit -76%)
핵심 약점 회복 (gemma / qwen):
- mixed 0.39 → 0.57 / 0.65
- korean_only 0.51 → 0.71 / 0.67
- standards 0.87 → 1.44 / 1.31
- exam 0.74 → 1.11 / 1.04
decision = H1 (both backends 유의미 net 개선). LLM 선택 = Phase 4 decision md 별 step.
산출물:
- reports/v0_2_phase2q_*.csv (5 raw run_eval output)
- tests/search_eval/baselines/v0_2_phase2q_results_2026-05-24.json (요약 + incident 박제)
follow-up:
- rerank 413 Payload Too Large 다수 관찰 (RRF fallback 작동, NDCG 영향 없음). Apply PR 전 별 chore — chunk dedup 또는 reranker batch cap 검토.
- p95 cold 9684ms 매우 큼. production rollout 시 cache prewarm 정책 필수.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-24 00:51:56 +00:00
hyungi
ecd2350c15
feat(search): Phase 2Q Diagnose Phase 2 — multi-query retrieval fusion
...
phase-2q-query-rewrite-diagnose.md v6 plan §5.5 + §7 Phase 2.
Phase 1B 3e6866b (scaffold + dispatcher) 위 retrieval 합성 wire-up.
신규:
- search_pipeline._rrf_fuse_variants() — N variant ranked list RRF 합성.
fusion_service.RRFOnly 알고리즘 동일 (k=60), 첫 등장 variant representative 보존.
- search_pipeline.search_with_rewrite() — variant N 별 retrieval+fusion 후
unified RRF (cap 60) → reranker 1회 (query=원본 q) → diversity+freshness+display.
· per-variant K = 50//3 = 16 (PHASE2Q_PRODUCTION_TOPK//N, A1 채택)
· variant 별 retrieval asyncio.gather 병렬
· chunks_by_doc merge (variant 무관 unified reranker input)
· production fusion_service.get_strategy() + rerank_chunks() 재사용
- 상수: PHASE2Q_PRODUCTION_TOPK=50, PHASE2Q_UNIFIED_CAP=60, PHASE2Q_RRF_K=60.
수정:
- search_pipeline.run_search() — rewrite_backend param 추가. hybrid + cand_<slug> 시
search_with_rewrite() 위임. baseline/None 시 기존 single-query path 그대로 (invariant).
- app/api/search.py — Phase 1B scaffold discard call 제거. run_search 에 rewrite_backend
전달. ValueError → 400 (unknown_rewrite_backend 우선 분기) / RuntimeError → 503
(rewrite_llm_unavailable).
- tests/test_query_rewriter.py — Phase 2 test 9개 추가:
· _rrf_fuse_variants 6 (single / overlap accumulation / representative / cap limit /
empty / rank position)
· search_pipeline import + run_search rewrite_backend default=None signature 1
· PHASE2Q_* constants 1
· DATABASE_URL dummy 주입 (api.search import → SQLAlchemy engine init 회피)
30/30 unit test PASS (Phase 1B 21 + Phase 2 9).
baseline 회귀 0 invariant:
- run_search(rewrite_backend=None) → 기존 path 100% 그대로 (분기 first line guard)
- run_search(rewrite_backend=baseline) → 동일
- mode != hybrid → multi-query path 비활성 (text-only/vector-only/trgm 영향 0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 22:41:50 +00:00
hyungi
3e6866b4ae
feat(search): Phase 2Q Diagnose Phase 1B — scaffold + dispatcher
...
phase-2q-query-rewrite-diagnose.md v6 plan Phase 1 의 fixture 외 잔여.
Phase 1A 446ba82 위 dispatcher + cache + LLM call + API param + eval flag + 21 unit test.
retrieval 합성 (search_with_rewrite) 은 Phase 2 별 commit.
신규:
- app/services/search/query_rewriter.py — LLM_BACKEND_MAP + _resolve + cache + rewrite()
· slug-based allowlist (no silent fallback), httpx 직접, Priority.FOREGROUND semaphore
· sampling 박제 (gemma response_format json_object / qwen prompt rule only — Phase 0 inspect 9)
· manual TTL cache (query_analyzer 패턴 1:1, sha256[:32] NFKC key, LLM_REWRITE_TIMEOUT_MS=15000)
- tests/test_query_rewriter.py — 21 test PASS (resolve / cache key / parser / cache TTL / constants)
수정:
- app/api/search.py — ?rewrite_backend= query param + 400 unknown / 503 unavailable.
scaffold = call but discard variants (retrieval path 영향 0). Phase 2 에서 합성.
- tests/search_eval/run_eval.py — --rewrite-backend flag + 4 hot spot wire-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 22:25:03 +00:00
hyungi
446ba82c91
feat(eval): Phase 2Q Diagnose Phase 1A — fixture (4 카테고리 × 2 LLM) + prompt v1
...
phase-2q-query-rewrite-diagnose.md v6 plan 의 Phase 1 fixture 박제 (G0-1 + G0-2).
산출물:
- app/prompts/query_rewrite.txt — multi-query rewrite prompt v1 (3 variants: 원본 + 한국어 rephrase + 영어 번역)
- tests/fixtures/macmini_gemma4_query_rewrite_response.json — 4 카테고리 (korean_only/mixed/english_only/exam)
- tests/fixtures/macbook_qwen_query_rewrite_response.json — 4 카테고리 동일
inspect 9 결과 (2026-05-24):
- Mac mini gemma-4-26B-A4B :8801 = response_format json_object 지원
- MacBook qwen3.6-27B-8bit :8810 = response_format json_object 미지원 (120s hang) — prompt rule only
- prompt rule \"no markdown, no code fence\" 강제 시 둘 다 strict JSON (gemma 도 fence wrap 없음)
- parser fallback (markdown fence regex) 유지 — 첫 호출 prompt 없을 때 wrap 관찰 사례
8 호출 측정:
- gemma 1.16~1.36s / qwen 1.93~2.24s (warm)
- variants 의미 일관 + 도메인 용어 (ASME/Section VIII/압력용기/가스기사) verbatim preserve
- 한국어→영어 cross-lingual translation 자연
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 22:09:29 +00:00
hyungi
a0b11d66f3
fix(worker): summarize ai_model_version label 정정 — qwen3.5 hardcode → primary config 동적
...
C5 of family-adaptive-bengio. summarize_worker.py 의 doc.ai_model_version 이 실제 모델 (Gemma) 과 무관한 \"qwen3.5-35b-a3b\" hardcode 였음 — 추적/분석/로그 신뢰도 영향. client.ai.primary.model (config.yaml ai.models.primary.model = \"mlx-community/gemma-4-26b-a4b-it-8bit\") 으로 동적 swap — 향후 config model 변경 시 자동 정합.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 21:28:05 +00:00
hyungi
076c0e1802
feat(eval): Phase 2B Reranker Diagnose — dispatcher + gte 측정 + decision (H3 bge-reranker-v2-m3 유지)
...
round-2-review-mighty-starfish.md v2.1 (Phase 2B Reranker Diagnose) plan 실행.
Phase 2A 의 CANDIDATE_BACKEND_MAP 패턴 재사용 + RERANKER_BACKEND_MAP 신규.
코드 변경 (4 파일):
- app/services/search/rerank_service.py:
- RERANKER_BACKEND_MAP allowlist (baseline / cand_gte_ml_base, slug-based resolve)
- _resolve_reranker(slug) → endpoint URL or None
- _rerank_via_candidate_endpoint() — 후보 TEI POST /rerank
- rerank_chunks() 시그니처에 reranker_backend + snapshot_*_id_max 추가 + dispatch log
- app/services/search/search_pipeline.py: run_search() threading
- app/api/search.py: reranker_backend Query parameter + 400 unknown_reranker_backend 에러 매핑
- tests/search_eval/run_eval.py: --reranker-backend flag + call_search/evaluate threading
infra:
- docker-compose.override.rerank-cand.yml: 3 후보 service (gte_ml_base / mxbai_large / bge_v2_gemma_2b),
profile 'rerank-cand' 격리, restart=unless-stopped
측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2b_baseline_snapshot_2026-05-23.csv (NDCG 0.659, Phase 2A 와 일치 = 재현성 PASS)
- reports/v0_2_phase2b_gte_ml_base_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2b_{baseline_snapshot,gte_ml_base}_2026-05-23.json
- reports/phase_2b_reranker_decision_2026-05-23.md
- tests/fixtures/tei_rerank_response.json (G0-1 한국어+영어 mixed sample sanity PASS)
후보 TEI 1.7 호환성 (Phase 1 smoke gate):
- cand_gte_ml_base : ✅ PASS (xlm-roberta-based, TEI 호환)
- cand_mxbai_large : ❌ deberta-v2 미지원 → Phase 2B-Extended (sentence-transformers wrapper)
- cand_bge_v2_gemma_2b : ❌ LLM-based reranker, 1_Pooling/config.json 부재 → Phase 2B-Extended (FlagEmbedding wrapper)
결과 (1 후보 측정 + baseline rebaseline):
| Candidate | NDCG | Δ baseline | mixed | korean | exam | p50 ms |
|------------------------------------|------:|-----------:|------:|-------:|------:|-------:|
| bge-reranker-v2-m3 (baseline) | 0.659 | — | 0.39 | 0.51 | 0.74 | 454 |
| cand_gte_ml_base | 0.604 | -0.055 | 0.38 | 0.41 | 0.62 | 345 |
Decision (H3): bge-reranker-v2-m3 유지. gte 의 reranker quality 가 production 보다 약함 (korean_only -0.10, exam -0.12, overall -0.055).
후속 PR 백로그 (6건):
- PR-Search-Query-Rewrite-1 (Phase 2Q, korean_only/mixed 보완 권고)
- PR-2B-Extended-Mxbai-Large (sentence-transformers wrapper)
- PR-2B-Extended-Bge-V2-Gemma (FlagEmbedding LayerwiseReranker wrapper)
- PR-2B-Extended-Jina-V2-ML (license 결정 후, 개인 비영리 가정)
- PR-2B-Cloud-Reranker-Scaffold-1 (Cohere scaffold-only, 선택)
- PR-2B-Rerank-Cand-Cleanup-1 (1주 후 cand 컨테이너 정리)
production 영향:
- production reranker (bge-reranker-v2-m3) 변경 0
- config.yaml ai.models.rerank.endpoint 변경 0
- embedding (bge-m3 ollama) 변경 0 (Phase 2A 결정 보존)
- documents / document_chunks 변경 0 (21365 docs / 30605 chunks 그대로)
- 4 smoke PASS (baseline / baseline+snapshot / cand_gte_ml_base / cand_invalid → 400)
- dispatch log 박제 verify (endpoint + snapshot id)
closure gate: 16 항목 PASS (flex closure 조항 적용 — 1 후보 측정, 2 후보 TEI 호환 탈락 사유 명시).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 08:37:42 +00:00
hyungi
0e8d5cccaf
feat(worker): summarize sliding window — 50k chunk + cumulative carry-over
...
P3 of family-adaptive-bengio (Mac mini 4-lever bundle).
50k 초과 input 은 CHUNK_SIZE=50000 단위로 N 분할 + cumulative carry-over (prev chunk summary 를 다음 chunk prompt 에 prefix). 50k 이하 input = 기존 동작 (변동 0). 첫 chunk = client.summarize() legacy / 후속 chunk = call_primary + SUMMARY_PROMPT_CONTINUATION. log trace: single vs sliding chunk N/M done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 07:08:23 +00:00
hyungi
3092e3009d
feat(eval): Phase 2A Diagnose Phase 3+4 — dispatcher + 3 측정 + decision (H3 bge-m3 유지)
...
phase-2a-embedding-diagnose.md v4 § 6 (dispatcher) + § 7 Phase 3 (51 case 측정) + § 7 Phase 4 (decision)
Round 2 review: round-2-review-mighty-starfish.md (R2-2 + R2-B1 페어 invariant + slug-based resolve)
코드 변경:
- app/services/search/retrieval_service.py:
- CANDIDATE_BACKEND_MAP allowlist (baseline / cand_me5_large_inst / cand_snowflake_l_v2)
- _resolve_backend(slug) → docs_table/chunks_table/embed_endpoint or None
- _embed_query_via_tei() — candidate TEI 엔드포인트 호출 (cache 미사용)
- _VALID_DOCS_TABLE + _VALID_CHUNKS_TABLE regex (R2-B1 2단계 gate)
- _search_vector_docs / _search_vector_chunks: docs_table/chunks_table + snapshot_*_id_max 파라미터
- search_vector + search_vector_multilingual: embedding_backend + snapshot_*_id_max 파라미터 + dispatch log
- app/services/search/search_pipeline.py: run_search() 시그니처 + 4 search_vector* 호출 threading
- app/api/search.py: 3 Query parameter + ValueError → HTTP 400 (allowed list 응답)
- tests/search_eval/run_eval.py: --embedding-backend + --snapshot-doc-id-max + --snapshot-chunk-id-max
+ call_search/call_search_full/evaluate threading + main 3 asyncio.run threading
측정 산출물 (51 case, scored=46, failure=5):
- reports/v0_2_phase2a_baseline_snapshot_2026-05-23.csv (snapshot filter 적용 production path)
- reports/v0_2_phase2a_me5_large_inst_2026-05-23.csv
- reports/v0_2_phase2a_snowflake_l_v2_2026-05-23.csv
- tests/search_eval/baselines/v0_2_phase2a_{baseline_snapshot,me5_large_inst,snowflake_l_v2}_2026-05-23.json (3개)
결과:
| Candidate | NDCG | Δ vs baseline | mixed | korean_only | p50 ms |
|------------------------------------|-----:|--------------:|------:|------------:|-------:|
| bge-m3 (baseline snapshot) | 0.659| — | 0.39 | 0.51 | 464 |
| cand_me5_large_inst | 0.477| -0.182 | 0.17 | 0.47 | 194 |
| cand_snowflake_l_v2 | 0.616| -0.043 | 0.35 | 0.52 | 254 |
Decision (H3): bge-m3 유지. 둘 다 net 회귀.
- mE5-large-instruct: 전 카테고리 회귀 (-0.182). prefix 미적용 변수 — 별 PR PR-2A-mE5-Prefix-Retry 후보.
- snowflake_l_v2: 가벼운 회귀 (-0.043). korean_only +0.01 미세 개선 신호.
- korean_only/mixed 약점 보완은 Phase 2B (Reranker) 또는 Phase 2Q (Query rewrite) 권고.
Decision report: reports/phase_2a_embedding_decision_2026-05-23.md (§ 1~8 포함, Closure gate 16 항목 모두 PASS).
후속 PR 백로그:
- PR-2A-mE5-Prefix-Retry (별 PR)
- PR-2A-Extended-Bge-Mgemma2 (별 PR, v3 결정)
- PR-2A-Cloud-Embedding-Scaffold-1 (Cohere/Voyage scaffold-only, 선택)
- PR-Search-Query-Rewrite-1 (Phase 2Q)
- PR-Search-Reranker-V2-Diagnose (Phase 2B)
- PR-2A-Chunks-Cand-Cleanup-1 (1주 후 cand 테이블 DROP)
production 영향:
- documents / document_chunks 컬럼/row 변경 0
- config.yaml 변경 0 (ollama bge-m3 unchanged)
- 추가된 endpoint = query parameter opt-in (미지정 시 production path 회귀 0)
- smoke 4건 PASS (baseline / baseline+snapshot / cand_me5 / cand_invalid → HTTP 400)
- dispatch log 박제 verify (snapshot_doc/chunk_id_max 박제)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 06:55:13 +00:00
hyungi
5cb8d04b50
feat(ai): config-driven sampling profile — triage T=0, primary T=0.3 top_p=0.9
...
P1 of family-adaptive-bengio (Mac mini 4-lever bundle).
AIModelConfig: temperature/top_p Optional fields (None = server default). _request OpenAI/MLX branch payload 조건부 sampling 인자 삽입. config.yaml ai.models.triage.temperature=0.0 (deterministic) / primary temperature=0.3 top_p=0.9 (summary creativity). fallback (Anthropic) branch 미적용 — 별 plan 범위. caller 코드 무변경.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 06:37:46 +00:00
hyungi
a67df0a10b
feat(eval): Phase 2A Diagnose Phase 2 — candidate reindex (me5 + snowflake 페어)
...
phase-2a-embedding-diagnose.md v4 § 7 Phase 2 산출.
페어 invariant (R2-2): documents_cand + document_chunks_cand 동기 swap, 부분 swap 금지.
- snapshot 박제 (R2-D): v0_2_phase2a_snapshot_2026-05-23.json
- SNAPSHOT_DOC_ID_MAX=25180 / SNAPSHOT_CHUNK_ID_MAX=56526
- documents_n=21365 (embedded, active) / chunks_n=30605
- production ingest 정지 0, 모든 candidate reindex + baseline rebaseline 측정이 id<=snapshot 한정
- reindex_candidate.py 신규 (R2-5):
- reindex_documents(): production _build_embed_input() import 재사용
- reindex_chunks(): document_chunks.text 그대로 (재 chunking 0)
- TEI batch=8 (1.7 internal queue overflow 회피) + truncate=true (mE5 512 context)
- retry-8 exponential backoff (10/20/40/80/90s) — TEI SIGSEGV 자동 복구
- idempotent ON CONFLICT DO NOTHING (cancellation/resume 안전)
- docker-compose.override.cand.yml: restart=unless-stopped (TEI 1.7 panic 자동 복구)
DB 산출물 (4 테이블):
- documents_cand_me5_large_inst : 21365 rows (dim 1024) + ivfflat lists=100
- document_chunks_cand_me5_large_inst : 30605 rows (dim 1024) + ivfflat lists=100
- documents_cand_snowflake_l_v2 : 21365 rows (dim 1024) + ivfflat lists=100
- document_chunks_cand_snowflake_l_v2 : 30605 rows (dim 1024) + ivfflat lists=100
- ivfflat.probes=20 (production 동일) 보존
- smoke retrieval (nearest neighbor SQL) PASS 후보 2종
production 영향:
- documents / document_chunks 컬럼/row 변경 0
- config.yaml 변경 0 (ollama bge-m3 unchanged)
- production fastapi/postgres/reranker 변경 0 (profile embed-cand 격리)
다음 단계: Phase 3 (DS API + retrieval_service slug-based dispatcher 추가, baseline rebaseline + 2 후보 51 case 측정).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 06:26:14 +00:00
hyungi
943ac5f59c
feat(eval): Phase 2A Diagnose Phase 1 — TEI candidate compose override + fixture G0
...
Phase 2A Embedding Diagnose 본 PR 의 Phase 1 산출물.
- docker-compose.override.cand.yml: 4 후보 service, profile 'embed-cand' 격리
- active: me5_large_inst (intfloat/multilingual-e5-large-instruct, smoke PASS)
- active: snowflake_l_v2 (Snowflake/snowflake-arctic-embed-l-v2.0, smoke PASS)
- 비활성 (extended profile): bge_mgemma2 (9B FP16 OOM risk → 별 PR 이관)
- 비활성 (disabled profile): me5_ko (HF 401 → 폐기)
- tests/fixtures/: G0 fixture 3건 박제
- ollama_bge_m3_embedding_response.json (G0-2: dim 1024, flat dict shape)
- tei_embedding_response.json (G0-1: me5_large_inst, dim 1024, nested array)
- tei_embedding_snowflake_l_v2_response.json (G0-1: snowflake, dim 1024, nested array)
운영 변경 0 (profile 격리, default up 시 미기동). production 9 컨테이너 영향 없음.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 05:04:21 +00:00
hyungi
e4cfd81e15
Merge pull request 'feat(eval): v0.2 28 신규 case + 2026-05-23 baseline + analysis' ( #25 ) from feat/eval-v0-2-baseline-analysis into main
...
Reviewed-on: #25
2026-05-23 13:03:23 +09:00
hyungi
3f6314494e
Merge pull request 'feat(eval): v0.2 graded relevance schema + harness' ( #24 ) from feat/eval-v0-2-graded-relevance into main
...
Reviewed-on: #24
2026-05-23 13:03:12 +09:00
hyungi
00edd6bff8
feat(ask): backend selector 4 options with device toggle
...
PR-3 of DS AI routing policy (2026-05-23, see plan
~/.claude/plans/document-server-ai-cheeky-reddy.md +
memory project_document_server_ai_routing_policy).
기존 BackendSelector (PR-DocSrv-Web-Ask-Selector-1, 2 옵션 default
qwen-macbook) 확장 — 4 옵션 + DeviceToggle inline.
UI 변경 (frontend/src/routes/ask/+page.svelte):
- BackendChoice = auto | mac-mini-default | qwen-macbook | claude-cloud
(기존 default 는 legacy alias, auto 또는 mac-mini-default 로 자동 매핑).
- select 4 옵션 (Auto router / Mac mini default / This device /
Claude Cloud) + tooltip.
- DeviceToggle (checkbox 'This is M5 Max') inline — localStorage
ds_device_self_label = macbook-m5-max | null. mount 시 복원.
- This device 옵션 disabled state = !isMacBookM5Max (토글 off 시
grey-out). 토글 off 시 qwen-macbook 선택돼 있었으면 auto 복귀.
- Claude Cloud 옵션 disabled state = !CLOUD_DEV_ENABLED (build-time
flag VITE_ENABLE_CLOUD_BACKEND_DEV, default false). 운영 토글
불가 — 후속 PR DS runtime feature flag API 로 migrate 예정.
- friendlyErrorMessage(reason) — 503 error_reason 매핑
(macbook_unavailable / provider_not_configured / router_* / upstream_*).
- retryWithDefault → retryWithMacMiniDefault 명명 정정.
- parseBackend backward-compat: default / gemma-macmini →
mac-mini-default.
source IP 의존 0 (PR-0 round 2 발견: caddy 2-hop + X-Forwarded-For
미설정 → DS 가 보는 source IP = LAN gateway, 신뢰 불가).
사용자 명시 토글 + localStorage 방식 채택 (Q3=C).
Closure (build + bundle string + lint):
- frontend build PASS (SvelteKit/TS syntax + svelte compile 모두 OK).
- 컴파일된 bundle 에 9 핵심 string 박혀있음 (mac-mini-default /
qwen-macbook / claude-cloud / Auto router / This is M5 Max /
ds_device_self_label / provider_not_configured / This device /
Cloud backend not configured).
- lint:tokens 본 PR 변경 위반 0 (기존 62 stale debt 는 별 chore
PR-DocSrv-Frontend-Token-Cleanup-1).
Backup: ~/.local/share/ds-routing-pr2-backups/20260523/
ask-page.svelte.pre-pr3.
선행: PR-1 (llm-router alias scaffold) + PR-2 (RouterBackend
dispatcher, refactor commit bcf644f ) closed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 03:42:39 +00:00
hyungi
bcf644f893
refactor(search): /api/search/ask dispatcher route via llm-router
...
PR-2 of DS AI routing policy (2026-05-23, see plan
~/.claude/plans/document-server-ai-cheeky-reddy.md +
memory project_document_server_ai_routing_policy).
DS 의 모든 backend 호출이 llm-router :8890 단일 경유. 정칙 정합:
- 신규 RouterBackend (services/llm/backends.py) — alias 별 router POST
+ requires_gate 분기 (mac-mini-default 만 llm_gate FOREGROUND 보호).
- 기존 GemmaMacMiniBackend + QwenMacBookBackend = legacy 보존
(DS_BACKENDS_VIA_ROUTER=false rollback safety only). 1주 후 별
cleanup PR (PR-DS-Backends-Legacy-Cleanup-1) 로 폐기.
- get_backend factory dual-path (env flag) — backward-compat
(gemma-macmini alias → mac-mini-default 매핑).
- search.py:457 Query pattern 확장: mac-mini-default|claude-cloud|auto
추가. /ask/react 의 isinstance(QwenMacBookBackend) → hasattr
duck-typing (RouterBackend + Legacy 모두 generate_with_tools 구현).
- SearchAskBackendConfig 에 router_url 신규 (env LLM_ROUTER_URL 또는
hardcoded MVP default http://100.76.254.116:8890 ).
- docker-compose.yml fastapi env 에 LLM_ROUTER_URL +
DS_BACKENDS_VIA_ROUTER 추가.
AIClient (_call_chat, call_triage, call_primary, call_fallback) 경유
path 는 별 PR (PR-AIClient-Router-Migration-1) — MVP scope C 채택,
회귀 risk 최소화.
Closure (즉시 fixture/matrix):
- factory smoke 6 alias (None/mac-mini-default/gemma-macmini/
qwen-macbook/claude-cloud/auto) + 1 invalid (nonsense → ValueError).
- live 3 case: mac-mini-default 200 \"pong! 🏓 \" + qwen-macbook cold
502 upstream_502_primary=ConnectError + claude-cloud 503
provider_not_configured.
- silent fallback 0 + direct M5/Mac mini socket 0
(RouterBackend 만 router 호출).
Backup: ~/.local/share/ds-routing-pr2-backups/20260523/
(backends.py + config.py + search.py + docker-compose.yml).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 03:41:29 +00:00
hyungi
4d14ab69d9
feat(eval): v0.2 28 신규 case + 2026-05-23 baseline + analysis
...
PR-1 (725a4e1 ) v0.2 schema + harness 위에 신규 28 case 추가 → 51 case
완성 + 현재 모델로 baseline 박제 + 약점 카테고리 analysis md.
신규 28 case 분포 (계획 +28 = standards +6 / english_only +8 / mixed +5
/ exam +7 / failure_expected +2 / ocr_derived 0):
- standards 5 → 11 (KGS FP111/FU551 + 산안기준 후반 편 + 고압가스법)
- english_only 1 → 9 (Pressure Vessel Design Manual + ASME VIII/IX +
Hydrogen ASME + Industrial Safety 영문 교재 + Structural Analysis)
- mixed 5 → 10 (한↔영 ASME / KGS-영문 / 양언어 압력용기)
- exam 0 → 7 (가스기사 study_questions → library 개념 docs 매핑)
- failure_expected 3 → 5 (KGS AC999 / 초전도 안전 관리법)
- ocr_derived 0 (TBD-O FAILED: extract_meta NULL 21385, chunks.source
= RSS feed 명. OCR 식별 컬럼 부재 → +4 case 재배분, analysis 명시)
baseline 측정 결과 (corpus 21,385, hybrid mode, bge-m3 + bge-reranker-v2-m3):
- v0.1 Recall@10 0.646, MRR 0.724, NDCG 0.606, Top-3 0.891
- v0.2 graded NDCG 0.659, Recall@10 g≥2 0.695, g≥3 0.761
- latency p50 528ms / p95 1,664ms
- failure precision 0/5 (DS confidence threshold 미적용)
약점 top 3 (analysis md):
- mixed crosslingual 0.39 graded NDCG — TOP weakness, bge-m3
multilingual 한계 추정
- korean_only natural language 0.51 — query rewrite 부재 추정
- failure_expected 0/5 — confidence cutoff 부재
Phase 2 dispatch 권고 (analysis md):
- 2A Embedding bge-m3 — 즉시 진입 (mixed/korean 동시 타격)
- 2B Reranker — M (2A 이후)
- 2C OCR-Marker — 선행 chore (OCR 식별 컬럼 추가) 필요
- 2D STT — 본 평가셋 외 (별 평가셋 필요)
Query rewrite 는 Phase 2Q/Search-PR 로 별도 분리.
영향 받는 파일:
- tests/search_eval/queries.yaml: 23 → 51 case (기존 23 변경 0, append only)
- tests/search_eval/baselines/v0_2_baseline_2026-05-23.json: 신규
- tests/search_eval/baselines/v0_2_baseline_2026-05-23_analysis.md: 신규
PR plan: ~/.claude/plans/pr-2-serialized-hummingbird.md
Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 03:32:55 +00:00
hyungi
725a4e1f1d
feat(eval): v0.2 graded relevance schema + harness
...
queries.yaml v0.1 23 case → v0.2 schema swap:
- 7 카테고리 (standards / korean_only / english_only / mixed / exam /
ocr_derived / failure_expected)
- language / ocr_derived / failure_expected / graded_relevance 컬럼 추가
- v0.1 호환 보존 (legacy_category + relevant_ids + top3_ids)
- 신규 28 case (50+ 목표) 는 후속 PR-Eval-V0_2-Baseline-Analysis
run_eval.py 확장:
- graded_ndcg_at_k / graded_recall_at_k 함수 추가
- Query / QueryResult dataclass 확장 (v0.2 컬럼)
- load_queries v0.1 fallback (top3 → grade 3, 나머지 → grade 2)
- --eval-version v0.1/v0.2/both flag (default both)
- print_summary 의 by_language / by_ocr_derived 집계 추가
- write_csv 의 graded 컬럼 추가
README.md 신규:
- graded 등급 정의 (0~3) + 카테고리 정의 (7개)
- v0.2 schema 컬럼 + 신규 case 작성 가이드
- v0.1 호환성 + CLI 사용 예 + baseline 박제 정책
Phase 1 plan: ~/.claude/plans/phase-1-graded-eval-v0-2.md
Parent: ~/.claude/plans/peppy-hugging-nest.md § Phase 1
본 PR closure: schema + harness + README. 신규 28 case + baseline 박제 +
약점 분석 (embedding-sensitive failure pattern 4 카테고리 식별) 은 후속 PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-23 01:21:06 +00:00
hyungi
c086c9f85d
feat(ask): /ask backend selector + 503 macbook_unavailable UI
...
선행 PR-MacBook-RAG-Backend-1 (main a7b8f15 ) backend dispatcher 의 frontend
소비. /ask 페이지에 backend selector (default | qwen-macbook) + URL
?backend=qwen-macbook 지원 + 503 friendly empty state + "Default 로 재요청"
버튼 (backend param 명시 제거 → 무한 루프 0).
정책 (선행 PR 그대로 유지):
- default / backend 미지정 = Gemma Mac mini (현 path 변동 0, 기존 호출자 호환)
- backend=qwen-macbook = MacBook 명시 opt-in. unavailable 시 HTTP 503 +
error_reason=macbook_unavailable. Gemma 자동 fallback 0.
변경 4 파일:
- types/ask.ts: AskResponse 에 backend_requested / backend_used 필드 +
SynthesisStatus 에 backend_unavailable literal 추가
- api.ts: ApiError 에 errorReason 추가, parseDetail 이 503 body 의
error_reason 흡수 (다른 endpoint 영향 0)
- AskAnswer.svelte: backend_requested 명시 시 muted chip 표시
(default 호출은 미표시, 시각 noise 회피)
- routes/ask/+page.svelte: selector dropdown + URL state + 503 분기
Non-Goals (별 PR):
- localStorage / Settings preference (PR-DocSrv-Ask-Default-Pref-1)
- SSE streaming, Tool-calling ReAct
- shared secret / MacBook auth (Tailscale ACL only)
검증: docker compose build frontend 통과 (svelte-check + vite build).
lint:tokens 본 PR 변경 위반 0 (기존 62 건은 baseline stale debt, settings/login).
Spec: ~/.claude/plans/document-buzzing-codd.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 13:47:41 +00:00
hyungi
51c3f6df10
feat(search): /ask/react endpoint with Qwen native tool calling ReAct loop
...
PR-DocSrv-Ask-ToolCalling-ReAct-1 — Qwen3.6-27B-8bit 의 native tool calling
으로 ReAct loop 도입. 기존 /api/search/ask 무수정. 트랙 B (frontend /ask SSE)
와 파일 단위 충돌 0 (search.py 의 ask() 함수 line diff = 0, 순수 추가).
핵심 invariant:
- 별 endpoint /api/search/ask/react (qwen-macbook only, implicit opt-in)
- MacBook unavailable 시 HTTP 503 + error_reason=macbook_unavailable.
Gemma 자동 fallback X (정정 4 의 연장)
G0 (구현 전 hard gate, plan b-velvety-hare.md):
- G0-1 fixture (tests/fixtures/qwen_tool_call_response.json): 실제 mlx-vlm
응답 박제. shape = OpenAI 표준 호환 (choices[0].message.tool_calls +
function.arguments JSON string). generate_with_tools() 가 본 shape 기준 구현.
- G0-2 counter semantics: max_tool_rounds=2 + max_llm_calls=3 + search_exec_max=2.
마지막 LLM 호출은 tool_choice="none" + system instruction 으로 final 강제.
- G0-3 trace exposure: default response 의 debug_trace=null. debug=true 시만
채움. server log 에는 항상 round 기록.
backends.py (193 → 261줄):
- QwenMacBookBackend.generate_with_tools(messages, tools, tool_choice)
신규 method. 기존 generate() 무수정. BackendUnavailable 처리 동일.
react_loop.py 신규 (275줄):
- agentic_ask_loop(session, query, *, backend, max_tool_rounds, debug)
- tool round 안에서 run_search 호출, results dedup by id, final round 강제,
partial=True 조건 (final content 빈 경우)
search.py (+82줄):
- POST /api/search/ask/react + AskReactRequest/Response schema
- BackendUnavailable → JSONResponse(503, error_reason=macbook_unavailable)
config.yaml + config.py:
- search.ask.react: { enabled, max_tool_rounds=2, search_tool_limit=5,
search_tool_mode=hybrid }
tests (566줄, 18 신규 + 23 회귀 모두 PASS):
- test_react_loop.py 13건: G0-1 fixture shape / G0-2 counter cap / G0-3 trace
exposure / BackendUnavailable propagation / sources dedup
- test_search_ask_react_endpoint.py 5건: 503 + run_search 호출 0 / 정상 200 /
debug=true trace 노출 / max rounds partial
- 회귀 (test_ask_eval_auth 9 + test_search_ask_macbook_503 5 +
test_backend_dispatcher 9) 모두 PASS
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 13:43:47 +00:00
hyungi
a7b8f15870
feat(search): /ask backend dispatcher (qwen-macbook opt-in, no silent fallback)
...
PR-MacBook-RAG-Backend-1 — /api/search/ask 의 명시 backend 선택 진입점.
핵심 invariant (정정 4):
- backend 미지정 = Gemma Mac mini default, 응답 contract 변동 0
- backend="qwen-macbook" 명시 opt-in 만 MacBook M5 Max mlx-vlm.server 호출
- MacBook unavailable 시 HTTP 503 + error_reason=macbook_unavailable
- 자동 fallback 절대 금지 — 실패 path 에서 Gemma backend.generate() 호출 0
backend dispatcher (services/llm/):
- BackendBase / GemmaMacMiniBackend / QwenMacBookBackend / BackendUnavailable
- Qwen backend 는 Mac mini llm_gate 점유 X, 별 Semaphore(1) — llm_gate
docstring 의 single-inference 영구 룰은 같은 endpoint 한정으로 scope 명시
- httpx Connect/Read/Pool/Timeout/5xx → BackendUnavailable, 4xx 전파
synthesis_service.py:
- backend 인자 추가, status="backend_unavailable" 신규
- cache key 에 backend_name 포함 (qwen ↔ gemma 캐시 충돌 차단)
config:
- search.ask.backend.{macmini_url, macbook_url, macbook_model,
timeout_connect_s=1, timeout_read_s=30}
- MacBook endpoint = http://100.118.112.84:8810 (M5 Max Tailscale bind)
tests (14 신규):
- tests/services/test_backend_dispatcher.py (9): dispatcher 정합성 + Qwen
generate path (mock 200 / dead port / 5xx / 4xx) + cache identity
- tests/api/test_search_ask_macbook_503.py (5): 정정 4 핵심 invariant.
backend=qwen-macbook 비가용 시 gemma.generate.assert_not_called()
기존 ask 회귀 0 (test_ask_eval_auth 9건 등 85건 모두 PASS).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 13:10:44 +00:00
Hyungi Ahn
224843ba25
ops(reports): local research M1/M2/M3 baseline 등록 (2026-05-02)
...
- M1: ProcessingQueue throughput baseline (GPU DB pkm, read-only)
- M2: MLX gemma-4 26b-a4b 동시 처리 capacity (Mac mini :8801)
- M3: bge-m3 batch embedding throughput (GPU Ollama :11434)
3 보고서 모두 4.0 가드 준수 (compose/migration/queue/worker restart/source_channel insert/SearXNG 도입 0건). trade-in 직전 untracked sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-21 07:25:27 +09:00
Hyungi Ahn
95bea0a88b
ops(worker-pool): docker-compose 에 LAPTOP_WORKER_BOT env 3개 wire-through
...
1B/1C 단계에서 host .env 변수가 fastapi 컨테이너에 주입되지 않은 누락.
voice-memo 동일 패턴으로 environment 블록에 명시 + default false.
PR-Notebook-Client-1 에서 username swap (laptop-worker-bot → notebook-client-bot)
시 env override 로 적용 가능.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-20 08:12:12 +09:00
Hyungi Ahn
eae1f48d62
feat(worker-pool): Registry-1C cap 1MB + deterministic compaction
...
사용자 결정 2026-05-19: 100KB cap 이 운영 7d 데이터 1.36MB 대비 부족 →
cap 상향만으로 raw 비대화 위험. cap 1MB + payload compaction 병행.
fetch_recap_context() 변경:
- memo payload item field 축소 = id/title/ai_tldr/ai_event_kind/created_at (5 필드)
(ai_bullets/file_type/source_channel/category/extracted_text 등 제외)
- memo top-N = RECAP_MEMO_TOP_N env (default 200) — 초과분은 aggregate 로
- aggregate = memos_by_day + memos_by_kind + omitted_memos
- payload_compacted flag = aggregate fallback 발현 여부
- events 는 raw (운영 7d 데이터에서 통상 0~소량)
internal_worker.py:
- PAYLOAD_MAX_BYTES → _payload_max_bytes() env override
(WORKER_RECAP_PAYLOAD_MAX_BYTES default 1_000_000)
- JobsRecapResponse 에 payload_compacted / omitted_memos 노출
- 413 detail 에 "after compaction" 명시 + RECAP_MEMO_TOP_N 조정 안내
테스트 3 항목 신규 + 기존 endpoint 413 test 업데이트:
- 700 memo → 200 kept + 500 omitted + compacted=true + < 1MB
- 10 memo → compacted=false + omitted=0
- 비정상 큰 title (compaction 후에도 cap 초과) → 413 유지
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-19 12:55:51 +09:00
Hyungi Ahn
0ea72c1aa6
feat(worker-pool): Registry-1C recap context + /jobs/recap + 100KB guard
...
- app/services/worker_recap_context.py — fetch_recap_context(user_id, days)
documents file_type='note' 7d (single-user invariant) + events 7d
(user_id 매칭 + cancelled 제외) JOIN. timezone Asia/Seoul.
- /internal/worker/jobs/recap POST — 일반 user JWT 인증 + context 조립
+ worker_jobs INSERT. job_type='recap' + payload JSONB.
- payload 100KB guard — JSON 직렬화 100_000 bytes 초과 시 413.
- 회귀 위험 0: memos/events API select 절 touch 0, read-only 쿼리만.
worker-pool-policy §B.2 invariant 보존: ProcessingQueue 무변경, 운영 자동
분기 변경 0, canonical promote 0 (worker_jobs.payload JSONB only).
Notebook-Pilot-1 entry condition 4항목 모두 충족 가능:
manual recap E2E / payload <100KB guard / residue 0 / 권한 분리 403.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-19 12:44:07 +09:00
Hyungi Ahn
0cbd97fcba
refactor(worker-pool): Registry-1B test fixture — NullPool helper standalone
...
각 helper 가 자체 engine + NullPool 사용 (connection 격리). fixture chain 의
asyncpg "another operation in progress" race 회피. 호출 site 단순화.
같은 파일 sequential 실행 시 module-level app + global engine pool 충돌은
별 follow-up `PR-Worker-Pool-Test-Fixture-Isolation` (P3) 영역.
단독 PASS 검증: auth 5/5 + smoke 3/3 + ownership 1/1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-19 12:43:53 +09:00
Hyungi Ahn
f60d6e52fc
feat(worker-pool): Registry-1B Pull 활성화 (auth + worker_jobs + 5 endpoint)
...
worker-pool-policy §B 1B 영역 완료. 1A scaffold (mig 270~274 + 503 stub) 위에:
- mig 275/276: worker_jobs (status CHECK + user_id=owner) + pending partial index
- create_laptop_worker_bot_token + require_worker_user dependency (voice-memo 동형)
- /internal/worker/{register,heartbeat,claim,result,drain} 5 endpoint 실 구현
- /claim FOR UPDATE SKIP LOCKED + 204 body 0
- /result 소유권 검증 (worker_id 매칭, 404) + failed 재시도 (attempts/max)
- explicit failure 시 request.result 무시 (DB result NULL 유지)
- 테스트 22 항목 7 파일
policy §B.2 5 invariant 보존: voice-memo wrapper 변경 0, drain advisory,
result raw JSONB, ProcessingQueue 무변경, 운영 자동 분기 변경 0.
활용처 (recap context + /jobs/recap + payload 100KB guard) = Registry-1C 영역.
stale recovery / 노트북 client / canonical promote = Notebook-Pilot-1 영역.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-19 08:54:07 +09:00
hyungi
acd29b963e
ops(triage): event_kind_hint diagnostic logging cleanup (PR-4B Apply 영구 보류)
...
chore-memo-NULL-backfill 6/6 H1 (historical artifact) 확정 후 Apply PR 영구 보류.
406b810 의 8-line logger.info 블록 제거 (behavior 변경 0, 진단 데이터 더 이상 불필요).
backup: app/workers/classify_worker.py.pre-eventkind-cleanup (7일 안전망 ~2026-05-25)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-18 11:27:29 +00:00
Hyungi Ahn
bbd92a840a
feat(worker-pool): Registry-1A scaffold — worker_capabilities/heartbeats + /internal/worker/* 5 endpoint 503 stub
...
PR-Worker-Pool-Registry-1A (scaffold only, no runtime activation).
신규:
- migrations/270~274 (1 statement/1 file 강제): worker_capabilities + 2 idx + worker_heartbeats + 1 idx
- app/models/worker_pool.py: WorkerCapability + WorkerHeartbeat ORM (queue.py 패턴)
- app/api/internal_worker.py: 5 endpoint 모두 _stub_503() — register/heartbeat/claim/result/drain
- tests/test_internal_worker_stub.py: 503 응답 smoke (inline ASGI client, DB 의존 0)
수정:
- app/main.py: import + include_router 각 1줄 (prefix=/internal/worker, internal_study 일관)
scaffold-first + phase-gate-material-first 강제 (worker-pool-policy §1, §12):
- 인증 dependency 0 (1B 에서 JWT + require_worker_user)
- ProcessingQueue 변경 0 (방향 b: worker_jobs 별 table = 1B)
- LLM 호출 0 / canonical DB 변경 0 / 운영 자동 분기 0
회귀 0 (1주 안전망 = app/main.py.pre-registry-1a.20260518).
plan: ~/.claude/plans/floofy-exploring-mitten.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-18 20:24:59 +09:00
hyungi
406b810e28
ops(triage): PR-4B-Diagnose-EventKindHint-Layer-A — diagnostic logging (no behavior change)
...
Layer-A Diagnose only. classify_worker.py:691 직전에 event_kind_hint 의
raw/normalized/in_valid/confidence 값 capture (logger.info 5줄 insert,
lazy formatting + %r repr). guard 통과 X 의 specific root cause (A1 field
부재 / A2 빈 string / A3 invalid enum) 확정용.
specific fix (default note / enum mapping / prompt 강화) 는 별 PR-4B-Fix-EventKindHint-Apply.
Apply PR closure gate 에 logging cleanup (info → DEBUG 또는 제거) 흡수.
plan: ~/.claude/plans/c-1-pr-infra-drift-1-phase-1b-linear-frost.md
backup: app/workers/classify_worker.py.pre-4b-eventkind-logging.20260517
2026-05-17 06:41:32 +00:00
hyungi
8998cbea8c
ops(triage): PR-4B-Diagnose — exception logging 강화 (type/repr/exc_info)
...
Layer 1 root cause 진단을 위해 classify_worker.py:595 의 exception logging
을 lazy formatting + exc_info=True 로 강화. f-string 1줄 → 5줄 block.
- type=%s: exception class name (TimeoutError/JSONDecodeError/ValueError/etc.)
- repr=%r: full exception state
- exc_info=True: traceback 까지 capture (wrapper 정확 지점 추적)
본 PR scope = Diagnose only. Layer 1 specific fix (H1/H2/H3/H4) + Layer 2
escalate path ai_event_kind fallback set 은 별 PR queue.
plan: ~/.claude/plans/c-1-pr-infra-drift-1-phase-1b-linear-frost.md
backup: app/workers/classify_worker.py.pre-4b-diagnose.20260517
2026-05-17 06:22:27 +00:00
hyungi
74876b674c
feat(auth): JWT iat + users.password_changed_at invalidation (PR-Docsrv-JWT-Invalidation-1)
...
PR-Infra-Sec-1H Phase 0 audit 에서 DS jwt invalidation 정책 부재 확정.
password rotation 으로 구 365d JWT (voice-memo-bot 등) invalidate 안 되는
hard gate STOP 진입 → 선행 PR 분리.
- migration 269: users.password_changed_at timestamptz NULL (legacy 호환)
- create_access_token / create_refresh_token: payload 에 iat (int 초) 추가
- verify_password_changed_at helper: int(password_changed_at.timestamp()) > int(iat) 시 401
- get_current_user + refresh_token route: verify helper 호출
- change_password / setup signup / seed_admin INSERT+UPDATE: password_changed_at 갱신
NULL = 검증 skip (migration 직후 운영 영향 0). 첫 password 변경 후만 iat
검증 활성. Sec-1H 의 G-token-old hard gate 통과 path 확보.
2026-05-17 06:20:46 +00:00
Hyungi Ahn
b8575084b1
docs(search): DS-Mac-mini-26B-Priority-Gate-1 (B-1) closure 보고서
...
priority separation 완료. FIFO Semaphore → heap + inflight fair queueing.
10 site (FG 6 + BG 4) 교체. 동시성 1 유지, 모델 라우팅 변경 0.
검증 (V0~V4 all PASS):
- V0 사전 grep: query_analyzer = BACKGROUND 확정 (fire-and-forget only)
- V1 unit 6/6 PASS (FIFO / FG jump / preemption X / mixed / backward compat /
cancelled waiter skip)
- V2 PR-1 Layer 1 fixture 회귀 0 (10/10 HTTP 200, p50=11.1s 자연 회복)
- V3 synthetic FG jump: bg0 release → fg dispatch (bg1~4 jump). dispatch log
`mlx_gate dispatch priority=FOREGROUND seq=5 wait_ms=1502 queue_len=4`
- V4 legacy grep: user-facing 코드 잔재 0, Semaphore-like 패턴 0
후속 = Phase 2 (digest/briefing Semaphore 통합 + verifier/call_triage gate 안 +
starvation aging) + B-2 (throughput).
closure 4 필수 단락 포함: query_analyzer 판정 / study_explanation owner /
preemption 한계 / starvation WARN (post-deploy follow-up, closure gate 아님).
plan: ~/.claude/plans/hermes-polymorphic-rossum.md
2026-05-17 08:58:38 +09:00
Hyungi Ahn
a08b620894
refactor(search): swap 10 call sites to acquire_mlx_gate(Priority.*) (B-1)
...
DS-Mac-mini-26B-Priority-Gate-1 — 사용자-facing 7 + worker 3 = 10 site 의
`async with get_mlx_gate():` → `async with acquire_mlx_gate(Priority.*):` 교체.
Foreground 6 (user-facing path):
- app/services/search/evidence_service.py:315 (/ask evidence stage)
- app/services/search/classifier_service.py:103 (/ask classifier stage)
- app/services/search/synthesis_service.py:299 (/ask synthesis stage)
- app/api/documents.py:1306 (수동 analyze API)
- app/api/study_topics.py:1183 (subject note 동기 생성)
- app/api/study_questions.py:1560 (study explanation 동기 API)
Background 4 (worker queue / fire-and-forget):
- app/services/search/query_analyzer.py:240 (V0 grep 확인: fire-and-forget only,
search_pipeline.py:179 trigger_background_analysis 만, docstring rule
"analyze() 동기 호출 금지" 부합 → BACKGROUND 확정)
- app/workers/deep_summary_worker.py:110 (classify-escalate worker)
- app/workers/study_explanation_worker.py:149
- app/workers/study_session_analysis_worker.py:237
Cleanup:
- query_analyzer._get_llm_semaphore() 제거 — self-only, unused, signature 거짓말
(이제 get_mlx_gate 가 Semaphore 아닌 context manager 반환)
기존 get_mlx_gate() legacy wrapper 는 보존 (BACKGROUND 매핑). user-facing path
잔재 0 — closure gate grep 검증 통과 (별 commit 에서).
2026-05-17 08:51:57 +09:00
Hyungi Ahn
7c9aff393a
feat(search): MLX priority gate (B-1, Priority.FOREGROUND vs BACKGROUND)
...
DS-Mac-mini-26B-Priority-Gate-1 — Mac mini 26B single-inference gate 를
FIFO Semaphore → 우선순위 기반 heap dispatch 로 교체. concurrency 1 유지,
queue ordering 만 foreground 우선.
API:
- Priority(IntEnum): FOREGROUND=0, BACKGROUND=100
- acquire_mlx_gate(priority=DEFAULT_PRIORITY) async context manager
- DEFAULT_PRIORITY = BACKGROUND (안전 default, foreground 짓밟지 않음)
- get_mlx_gate() legacy wrapper — context-manager only 호환
구현:
- _inflight: bool + _waiters heap [(priority, seq, future, enqueue_ts)]
- fast-path: not inflight and not waiters → 즉시 inflight, Future 생성 X
- _dispatch_next_locked: cancelled/done Future skip (heap 잔재 risk 회피)
- release: lock 안에서 pop, set_result 는 loop.call_soon (lock 밖) reentry deadlock 회피
- dispatch / enqueue / release / WARN log (observability)
- BACKGROUND wait_ms > 300_000 (5분) 시 starvation WARN — aging 은 Phase 2 deferred
Tests (tests/test_priority_gate.py, 6 scenario):
1. FIFO within same priority
2. Foreground jumps queue (bg5 대기 중 fg 들어오면 즉시 다음 슬롯)
3. Long-running background blocks foreground (preemption X, intended)
4. Mixed concurrent enqueue (FG fifo 먼저, BG fifo 후)
5. Backward compat (legacy get_mlx_gate() = BACKGROUND 매핑)
6. Cancelled waiter skip (heap 의 죽은 Future 건너뜀, gate stuck X)
Site 교체는 별 commit (refactor(search): swap 10 call sites).
plan: ~/.claude/plans/hermes-polymorphic-rossum.md
2026-05-17 08:42:58 +09:00
Hyungi Ahn
7e346d2d3f
docs(search): DS-Synthesis-Timeout-Calibration-1 (B-3) closure 보고서
...
5곳 LLM_TIMEOUT_MS + 2곳 outer wait_for align (classifier 30s 와 동일 정책).
synthesis/evidence/verifier/query_analyzer 모두 동시 부하 시 30s 까지 필요.
Regression fixture 결과: 10/10 HTTP 200 + 5/5 search + 3/3 failure injection
모두 PASS (회귀 0). 응답 시간 +4~20s 증가 (안정성 ↑ 의도된 trade-off).
p95 12s gate 는 여전히 FAIL — B-1 Throughput-1 (priority queue / 모델 분리)
별 plan 으로 latency 단축 방향 진입.
2026-05-17 08:07:51 +09:00
Hyungi Ahn
73f328cb65
fix(search): DS RAG LLM_TIMEOUT_MS align 15s/3s → 30s/10s (B-3 Synthesis-Timeout-Calibration-1)
...
PR-Hermes-Docsrv-Search-1 closure 측정 (synthesis_ms=30~48s / ev_ms=15005 /
query_analyze 45s) 으로 15s LLM_TIMEOUT 빈발 timeout 확인. Mac mini 26B 동시
호출 (gate Semaphore 1 직렬화 후에도 evidence + synthesis + classifier +
query_analyzer + verifier 가 sequential 누적) 시 각 호출 30s 까지 필요.
5곳 변경:
- synthesis_service.LLM_TIMEOUT_MS 15000 → 30000
- evidence_service.LLM_TIMEOUT_MS 15000 → 30000
- verifier_service.LLM_TIMEOUT_MS 3000 → 10000
- query_analyzer.LLM_TIMEOUT_MS 15000 → 30000
- search.py:522 classifier wait_for 15.0 → 30.0 (classifier_service align)
- search.py:641 verifier wait_for 4.0 → 10.0 (verifier_service align)
classifier (이전 PR 에서 30s 로 align 완료) 와 동일 정책 — outer wait_for
가 inner LLM_TIMEOUT_MS 를 override 하지 않도록 align.
ask 응답 latency 상한 ↑ 의도된 trade-off — 안정성 (refusal_gate
conservative_refuse 회피 + grounding/verifier 정상 동작) 우선.
영향: PR-1 fixture 회귀 0 예상 (이전 timeout 이 새 한도 안). B-1 Throughput-1
(priority queue / 모델 분리) 별 PR 진입 시 latency 본격 단축 검토.
2026-05-17 08:01:22 +09:00
Hyungi Ahn
117597c8aa
docs(hermes): PR-Hermes-Skill-Curl-Refine-2 (SHIPPED) + MaxTokens-Followup (PARTIAL+REVERTED)
...
Curl-Refine-2 (SHIPPED): 3 SKILL.md 본문 "Tool 선택 (필독)" 단락 추가 — terminal
direct curl 강조 + execute_code Python wrap 금지. E2E: Gemma 1st turn
execute_code → terminal 전환 + DS API 도달 0→1 + real corpus citations
("test-voice-memo", "The Good List") 첫 성공. Hard-Enforcement-1 의 hook 와
시너지 (1 call cap + 1st 정상 path).
MaxTokens-Followup 1차 (PARTIAL+REVERTED): agent.disabled_toolsets 15 toolsets
비활성 → stream 102KB→80KB 22% 감소. BUT Gemma terminal tool_call 시
"invalid tool call" 회귀 발생 → revert. toolset dependency graph 조사 후
minimal safe disabled list 결정 = 별 트랙 PR-Hermes-MaxTokens-Investigation-1.
A 카테고리 6 PR + 부산 Curl-Refine-2 모두 SHIPPED. PR-1/2 user-facing E2E 완성.
2026-05-17 07:51:02 +09:00
Hyungi Ahn
9458bea595
docs(hermes): PR-Hermes-MultiTurn-Hard-Enforcement-1 closure 보고서
...
Polish-1 의 prompt-only enforcement (PARTIAL) escalate. Shell hook
(~/.hermes/agent-hooks/docsrv_repeat_block.py) + config.yaml hooks.pre_tool_call.
execute_code/terminal tool_input 의 DS endpoint URL regex 검출 후 session-별
카운트 ≥ 1 면 silent block.
검증:
- Unit smoke 4/4 PASS
- E2E hook 매칭 2건 정확: 1st execute_code (Python wrap) allow → 2nd terminal
(direct curl) block. state={"docsrv_ask": 1}.
부산 발견: Gemma 의 1st turn code generation quality (Python f-string + curl
wrap → SyntaxError) 으로 DS API 실 호출 0 — Hermes/Adapter A 무관, 별 트랙
PR-Hermes-Skill-Curl-Refine-2 (P3).
2026-05-17 07:35:07 +09:00
Hyungi Ahn
dffc8b24dd
docs(hermes): PR-Hermes-Skill-Polish-1 closure 보고서
...
3 SKILL.md (docsrv_memo/search/ask) frontmatter 표준화 — prerequisites.env →
required_environment_variables (agentskills.io 표준). skill_view 시 자동
register_env_passthrough 발화 + config-level terminal.env_passthrough 와
이중 안전망.
docsrv_ask 본문: Multi-Turn 차단 정책 + Response Format verbatim 강화.
검증:
- Layer 1 fixture 회귀 0 (5/5 raw_leak, 3/3 finish_reason 동일)
- E2E: pre-polish 4 turn → post-polish 3 turn (25% 감소, but 목표 1 turn 도달 X)
— prompt-only enforcement 한계 명확화
결정:
- Skill-Curl-Refine-1 (frontmatter) = SHIPPED
- Multi-Turn-Refinement-1 (prompt) = PARTIAL — plugin-level escalate
- 신규 트랙 PR-Hermes-MultiTurn-Hard-Enforcement-1 (P2) 박힘 (Answer-Policy-1
과 통합 검토)
2026-05-17 07:13:53 +09:00
Hyungi Ahn
bd89d07b70
docs(hermes): PR-Hermes-Sandbox-Env-Propagation-1 closure 보고서
...
PR-Hermes-Docsrv-Search-1 / PR-Hermes-WebSearch-1 의 user-facing E2E 마지막 조각.
Adapter A 후 잔존한 401: execute_code/terminal 샌드박스가 HERMES_DOCSRV_TOKEN
strip. 해결 = ~/.hermes/config.yaml terminal.env_passthrough 1줄 추가.
검증:
- Direct: is_env_passthrough("HERMES_DOCSRV_TOKEN")=True, CLAUDE_API_KEY=False
(GHSA-rhgp-j443-p4rf provider blocklist 유지)
- E2E: Hermes chat → DS API 200 → conf=medium completeness=full + real corpus
citations ("test-voice-memo", "The Good List: 6 Things to Add Joy to Your Day")
PR-1/2 user-facing E2E unlock 완료 — Discord smoke 검증 진입 가능
(가족 onboarding 전 hyungi 채널 한정).
2026-05-17 06:37:35 +09:00
Hyungi Ahn
d3bc378c21
docs(hermes): PR-Hermes-ToolCall-Adapter-1 closure 보고서
...
mlx-proxy _stream_mlx 에 SSE filter 추가 — Gemma 4 raw <|tool_call> 토큰 leak
suppression + 구조화 tool_calls 시 finish_reason 'stop'→'tool_calls' override.
Layer 1 fixture (5 case): 5/5 raw_leak suppressed + 3/3 finish_reason override.
Hermes chat multi-turn agent loop unlocked (이전 hallucinated 종결 → tool 실행).
후속 = PR-Hermes-Sandbox-Env-Propagation-1 (execute_code 가
HERMES_DOCSRV_TOKEN inherit 못 함 — PR-1/2 user-facing E2E 마지막 조각).
2026-05-16 20:42:34 +09:00
Hyungi Ahn
e5345d7832
docs(hermes): PR-Hermes-WebSearch-1 closure 보고서
...
ddgs (DuckDuckGo) provider 활성. Layer 1 fixture 4/4 results (p95 12.3s, ddgs raw
latency 한계).
SearXNG (LocalScout PR-A 잔존) 활성화는 PR-2B 로 분리 — LAN-only bind 로 Mac mini
Tailscale 접근 불가. ddgs 1주 사용 후 SearXNG swap ROI 판정 예정.
channel_prompts 9줄 통합 (PR-1 4줄 + PR-2 web 분기 5줄). LLM tool-call 실제
실행은 Adapter A blocker — Layer 2/3 user-facing E2E 는 Adapter A closure 후.
2026-05-16 20:22:43 +09:00
Hyungi Ahn
d14064b225
docs(hermes): PR-Hermes-Docsrv-Search-1 closure 보고서
...
Hermes 의 첫 read-only orchestrator (docsrv_search + docsrv_ask skill) 구현 + DS-side
Mac mini 26B concurrent load 5건 fix closure.
핵심:
- Layer 1 curl-direct fixture 10/10 HTTP 200 + failure 3/3 PASS
- DS-side 5 commit 으로 race condition 해소 (LLM_TIMEOUT, gate, wait_for, config)
- Layer 2 Hermes CLI invoke 는 Gemma 4 tool-call leak 으로 hallucinated — Adapter A blocker
- Layer 3 Discord smoke 도 동일 — 사용자 검증은 Adapter A closure 후 이월
후속 5 별 트랙 명시.
2026-05-16 20:07:18 +09:00
Hyungi Ahn
ad3d51e3e0
fix(search): classifier + evidence gate 안으로 이동 (Mac mini 26B race 종결)
...
llm_gate.py docstring 영구 룰: "MLX primary 호출 경로는 예외 없이 gate 획득 필수".
PR #20 이후 classifier (Mac mini 26B 신규) + evidence (triage→Mac mini 26B 통합)
모두 gate 외부 실행 — concurrent 안전성 별 검토 명시. 1주 관찰 결과: race 빈번.
본 PR-Hermes-Docsrv-Search-1 Layer 1 fixture 측정:
- 8/10 query "conservative_refuse(no_classifier)" — classifier 가 동시 부하 시
거의 모두 ReadTimeout 또는 wait_for(6s) timeout
- evidence ev_ms=15005 — synthesis 와 race 로 15s 누적
영향:
- ask total 시간 증가 (parallel race → serialized): query_analyzer 5s +
classifier 3-5s + evidence 5s + synthesis 30s ≈ 40-45s 상한 (현실 평균)
- 응답률 ↑: race timeout 으로 인한 conservative_refuse 해소
- 사용자 체감: 빠른 거절 → 의미있는 답변. 단 대기 시간 ↑
후속:
- skill `docsrv_ask` curl `--max-time 20` → 60s 상향 필요 (별 PR 또는 본 PR
안의 follow-up)
- 본 메모리 `2026-05-21 Mac mini 26B 1주 부하 측정` observation 의 결정
outcome: gate 복귀 (triage 별 작은 모델 재도입 옵션은 보류)
2026-05-16 19:54:55 +09:00
Hyungi Ahn
5846baedc7
fix(search): ask classifier wait_for 6s → 15s (outer wrapper override 해소)
...
A1 (LLM_TIMEOUT_MS 5→15→30) + config(10→15→30) 후속 진단: 8/10 fixture query 가
"classifier ok" 또는 "classifier error" 로그 없이 conservative_refuse(no_classifier)
경로. search.py:518 의 outer wrapper `asyncio.wait_for(classifier_task, timeout=6.0)`
가 classifier_service.LLM_TIMEOUT_MS 와 httpx timeout 모두 override.
6s 한계 → 동시 부하 시 거의 모든 classifier 호출 6s 안에 못 끝남 → AsyncIO TimeoutError
→ ClassifierResult("timeout") → refusal_gate 가 verdict=None 받아 conservative_refuse.
15s 로 상향 — classifier_service 내부 30s 와 align 하지 않은 이유 = ask 응답 시간 상한
유지 (evidence parallel 종료 후 추가 9s 대기 cap). Mac mini 26B 동시 부하 시 실측
elapsed 11-14s 까지 자주 발생 → 15s 가 합리 균형.
본 fix 가 진짜 closure 효과. PR-Hermes-Docsrv-Search-1 Layer 1 fixture 의 8/10
no_classifier 경로 해소 예상.
2026-05-16 19:46:49 +09:00
Hyungi Ahn
a332a8aabe
fix(search): classifier timeout 15s → 30s (concurrent load 2x margin)
...
A1+config(15s) 후속 진단: voice memo PoC plan 호출 elapsed_ms=14432 — 15s 한계 거의
밀착. Mac mini 26B 동시 부하 (classifier + evidence + synthesis 3-way) 시 빈번
ReadTimeout 잔존.
30s 로 2x 마진 확보 — config.yaml + classifier_service.py 양쪽 align. Phase 3.5
guardrail 동작 자체에는 영향 없음 (timeout 시 fallback 경로 동일).
향후 별 트랙 (DS-Mac-mini-26B-Concurrent-Load-1): asyncio.Semaphore 도입으로
Mac mini 26B 동시 호출 제한 vs triage 만 작은 모델 재도입. 본 PR 은 timeout
완화만.
2026-05-16 19:42:49 +09:00