From b8575084b1e7fa139ec67cd5a50009dd58b716a1 Mon Sep 17 00:00:00 2001 From: Hyungi Ahn Date: Sun, 17 May 2026 08:58:38 +0900 Subject: [PATCH] =?UTF-8?q?docs(search):=20DS-Mac-mini-26B-Priority-Gate-1?= =?UTF-8?q?=20(B-1)=20closure=20=EB=B3=B4=EA=B3=A0=EC=84=9C?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit priority separation 완료. FIFO Semaphore → heap + inflight fair queueing. 10 site (FG 6 + BG 4) 교체. 동시성 1 유지, 모델 라우팅 변경 0. 검증 (V0~V4 all PASS): - V0 사전 grep: query_analyzer = BACKGROUND 확정 (fire-and-forget only) - V1 unit 6/6 PASS (FIFO / FG jump / preemption X / mixed / backward compat / cancelled waiter skip) - V2 PR-1 Layer 1 fixture 회귀 0 (10/10 HTTP 200, p50=11.1s 자연 회복) - V3 synthetic FG jump: bg0 release → fg dispatch (bg1~4 jump). dispatch log `mlx_gate dispatch priority=FOREGROUND seq=5 wait_ms=1502 queue_len=4` - V4 legacy grep: user-facing 코드 잔재 0, Semaphore-like 패턴 0 후속 = Phase 2 (digest/briefing Semaphore 통합 + verifier/call_triage gate 안 + starvation aging) + B-2 (throughput). closure 4 필수 단락 포함: query_analyzer 판정 / study_explanation owner / preemption 한계 / starvation WARN (post-deploy follow-up, closure gate 아님). plan: ~/.claude/plans/hermes-polymorphic-rossum.md --- reports/pr_ds_priority_gate_1_closure.md | 162 +++++++++++++++++++++++ 1 file changed, 162 insertions(+) create mode 100644 reports/pr_ds_priority_gate_1_closure.md diff --git a/reports/pr_ds_priority_gate_1_closure.md b/reports/pr_ds_priority_gate_1_closure.md new file mode 100644 index 0000000..db6a6c3 --- /dev/null +++ b/reports/pr_ds_priority_gate_1_closure.md @@ -0,0 +1,162 @@ +# DS-Mac-mini-26B-Priority-Gate-1 (B-1) Closure Report + +**Date**: 2026-05-17 +**Plan**: `~/.claude/plans/hermes-polymorphic-rossum.md` +**Commits**: +- `7c9aff3` feat(search): MLX priority gate (heap + inflight + 6 scenario tests) +- `a08b620` refactor(search): swap 10 call sites to acquire_mlx_gate(Priority.*) + +## Summary + +PR-Hermes-Docsrv-Search-1 + B-3 Synthesis-Timeout-Calibration-1 closure 후 잔존한 **우선순위 역전** (FIFO Semaphore 안에서 background work 가 user-facing ask 앞을 막음) 해결. 동시성 1 유지, 모델 라우팅 변경 0, queue ordering 만 foreground 우선. latency/throughput 변경 X (B-2 별 트랙). + +## 변경 사항 + +### 신규 API (`app/services/search/llm_gate.py`) +- `Priority(IntEnum)`: `FOREGROUND=0`, `BACKGROUND=100` +- `acquire_mlx_gate(priority=DEFAULT_PRIORITY)` async context manager — heap + inflight 기반 fair queueing +- `DEFAULT_PRIORITY = BACKGROUND` (안전 default — 미지정 호출이 foreground 짓밟지 않음) +- `get_mlx_gate()` legacy wrapper — **context-manager only 호환** (`async with` 형태만) +- 내부: `_waiters` heap `(priority, seq, future, enqueue_ts)`, `_inflight: bool`, `_lock: asyncio.Lock`, `_seq` +- fast-path: not inflight + empty queue → 즉시 inflight, Future 생성 X +- `_dispatch_next_locked()` — cancelled/done Future skip (heap stuck risk 회피) +- release: lock 안에서 pop, `set_result` 는 `loop.call_soon` lock 밖에서 (reentry deadlock 회피) +- **logging**: `mlx_gate enqueue/dispatch/release priority=... seq=... wait_ms=... queue_len=...` +- BACKGROUND wait_ms > 300_000 (5분) 시 **starvation WARN** — aging 은 Phase 2 + +### 10 site 교체 + +**Foreground 6** (user-facing path): +| File | Line | 근거 | +|---|---|---| +| `app/services/search/evidence_service.py` | 315 | `/api/search/ask` evidence stage | +| `app/services/search/classifier_service.py` | 103 | `/api/search/ask` classifier stage | +| `app/services/search/synthesis_service.py` | 299 | `/api/search/ask` synthesis stage | +| `app/api/documents.py` | 1306 | 사용자 수동 analyze API | +| `app/api/study_topics.py` | 1183 | subject note 동기 생성 | +| `app/api/study_questions.py` | 1560 | study explanation 동기 API | + +**Background 4** (worker queue / fire-and-forget): +| File | Line | 근거 | +|---|---|---| +| `app/services/search/query_analyzer.py` | 240 | **V0 grep 확인**: fire-and-forget only (아래) | +| `app/workers/deep_summary_worker.py` | 110 | classify-escalate worker | +| `app/workers/study_explanation_worker.py` | 149 | queue 기반 worker | +| `app/workers/study_session_analysis_worker.py` | 237 | queue 기반 worker | + +### Cleanup +- `_get_llm_semaphore()` 함수 제거 (self-only unused, signature 거짓말 — `get_mlx_gate` 가 이제 context manager 반환) + +## 필수 단락 (4건) + +### 1. query_analyzer priority 판정 결과 — **BACKGROUND 확정** + +V0 grep: +```bash +grep -Rn 'trigger_background_analysis|query_analyzer|analyze_query' app/api app/services/search +``` +결과: +- `app/services/search/query_analyzer.py:25` docstring rule: **"analyze() 동기 호출 금지. retrieval 경로는 get_cached() + trigger_background_analysis() 만 사용."** +- `app/services/search/search_pipeline.py:164` `query_analyzer.get_cached(q)` (LLM call X) +- `app/services/search/search_pipeline.py:179` `query_analyzer.trigger_background_analysis(q)` (fire-and-forget) + +→ `/api/search/ask` critical path 에서 `query_analyzer.analyze()` LLM call **await 안 함**. `Priority.BACKGROUND` 확정. foreground 우선권은 evidence + classifier + synthesis 3 site 로 보장됨. + +### 2. study_explanation_worker active owner — **active** + +`grep -Rn "study_question_jobs\|StudyQuestionJob" app/main.py app/workers/` 결과 — `app/main.py` 의 APScheduler 에 `study_q_embed (1m)` / `study_queue (1m)` / `study_session_queue (1m)` 등록되어 활성. `study_explanation_worker` 는 study_queue 의 consumer 로 정상 active. 변경 후 회귀 0 검증 완료. + +### 3. "왜 foreground 가 즉시 0초 대기하지 않나" 오해 방지 + +본 PR 은 **preemption 미구현**. 이미 inflight 인 background call (예: 30s digest LLM call) 은 끊지 않는다. user ask 들어와도 **현재 inflight 의 남은 시간만큼** 대기. + +`개선된 부분`: background 2~5 까지 줄 서있던 큐 — foreground 가 들어오면 이 줄을 **전부 jump**. 즉 `최대 대기 = 현재 inflight 의 남은 시간` (이전: `현재 inflight + 줄 선 background 모두`). + +E2E 측정 (synthetic): +- bg0 (2s) inflight + bg1~4 대기 + fg enqueue → bg0 release 직후 **fg dispatch** (wait_ms=1502, queue_len=4 bypass) → bg1~4 순서대로 +- log evidence: `mlx_gate dispatch priority=FOREGROUND seq=5 wait_ms=1502 queue_len=4` + +### 4. Background starvation WARN — closure gate 아님, post-deploy follow-up + +본 PR 의 closure 는 당일 hard gate (6/6 unit + synthetic foreground jump + PR-1 fixture 회귀) 로 종료. + +`mlx_gate background waiter starved wait_ms=... priority=BACKGROUND seq=...` line 빈도는 **1주 후 follow-up 참고**: +- 0건 → 정상, aging mitigation 불필요 +- 빈발 → **Phase 2 (aging) 진입 결정 신호** (Phase 2 = digest/briefing sem 통합 + verifier/call_triage gate 안 + aging) + +## 검증 결과 + +### V0: 사전 grep (query_analyzer priority 결정) ✅ +위 §1 참조. + +### V1: Unit smoke 6/6 PASS ✅ +container 안 `tests/test_priority_gate.py` (pytest 부재로 inline runner): + +``` + PASS s1_fifo_within_same_priority + PASS s2_foreground_jumps_queue + PASS s3_long_running_background_blocks_foreground + PASS s4_mixed_concurrent_enqueue + PASS s5_backward_compat + PASS s6_cancelled_waiter_skip + +Summary: PASS=6 / 6, FAIL=0 +``` + +### V2: PR-1 Layer 1 fixture 회귀 ✅ +``` +10/10 HTTP 200 + 5/5 search + 3/3 failure injection 모두 PASS + +p50 = 11.1s (이전 B-3 측정 23.2s 의 ~절반 — priority gate 가 background race 회피 효과 의심) +p95 = 30.2s (ASME outlier, B-3 측정 50.7s 보다 ↓) +min/max = 4.6s / 30.2s +mean = 12.3s + +→ **functional 회귀 0** + **latency 자연 회복** (B-1 의 mandate 가 latency 단축 아니지만, + foreground 가 background 와 race 안 하면서 자연적으로 빨라진 부산물) +``` + +### V3: Synthetic foreground jump ✅ +위 §3 참조. 정확히 의도대로 동작. + +### V4: Legacy 호출 잔재 검증 ✅ +```bash +grep -Rn 'get_mlx_gate' app tests scripts +grep -Rn '_get_llm_semaphore\|MLX_CONCURRENCY\|asyncio.Semaphore(1)' app tests scripts +``` +결과: +- `get_mlx_gate()` user-facing 코드 호출 = **0** (docstring + backup `.pre-llm-reframe-cleanup.20260515` + tests/test_priority_gate.py backward compat 만 잔존) +- `_get_llm_semaphore` 잔재 = 0 (제거) +- `asyncio.Semaphore(1)` 잔재 = digest/summarizer.py:23 + briefing/comparator.py:42 (Phase 2 알려진 대상) +- **Semaphore-like 직접 acquire/release 패턴 = 0** ✅ + +## 7일 안전망 (2026-05-24) + +- git revert `a08b620` (site refactor) — gate 자체 유지, 호출 site 만 legacy wrapper 로 복귀 +- git revert `7c9aff3` (gate 자체) — Phase 3.1 Semaphore(1) 로 완전 복귀 +- 부분 revert 가능 = wrapper 호환성 보존 결과 + +## 후속 트랙 (별 PR, MEMORY backlog 등록) + +| 우선 | 트랙 | 진입 | +|---|---|---| +| Phase 2 | digest/summarizer.py + briefing/comparator.py `_llm_sem` 통합 (공용 gate + BACKGROUND priority) | Phase 1 closure 후 운영 관찰 | +| Phase 2 | classify_worker `call_triage` gate 안 이동 (현재 gate 외부 실행) | 동상 | +| Phase 2 | verifier_service gate 안 이동 (`Priority.FOREGROUND` — ask path) — grounding 병렬 deadlock 검토 필수 | 동상 | +| Phase 2 | Background starvation **aging** (waiting time → priority boost) — WARN 빈발 시 진입 | 1주 WARN 측정 후 | +| B-2 | DS-Mac-mini-26B-Throughput-1 (priority queue → concurrency >1 / triage 모델 분리 / GPU Ollama 재도입 비교) | 별 plan | + +## Inventory + 메모리 update (별 commit) + +- `infra_inventory.md` "AI 모델 라우팅" 섹션에 **"MLX Priority Gate"** 하위절 추가 +- `project_hermes_docsrv_bridge.md` 후속 트랙 표 B-1 closed 처리 + Phase 2 트랙 등록 +- `MEMORY.md` backlog 에서 B-1 제거 + +## Scope hard limit (재확인 — 모두 준수) + +- ❌ MLX_CONCURRENCY ↑ → **유지 (1)** ✅ +- ❌ 모델 라우팅 변경 → **변경 0** ✅ +- ❌ GPU LLM 재도입 → **0** ✅ +- ❌ Mac mini 추가 모델 → **단일 모델 정책 유지** ✅ +- ❌ digest/briefing/verifier/call_triage gate 처리 → **Phase 2 deferred** ✅ +- ❌ Background aging → **WARN 만, aging Phase 2** ✅