docs(search): DS-Mac-mini-26B-Priority-Gate-1 (B-1) closure 보고서

priority separation 완료. FIFO Semaphore → heap + inflight fair queueing. 10 site (FG 6 + BG 4) 교체. 동시성 1 유지, 모델 라우팅 변경 0. 검증 (V0~V4 all PASS): - V0 사전 grep: query_analyzer = BACKGROUND 확정 (fire-and-forget only) - V1 unit 6/6 PASS (FIFO / FG jump / preemption X / mixed / backward compat / cancelled waiter skip) - V2 PR-1 Layer 1 fixture 회귀 0 (10/10 HTTP 200, p50=11.1s 자연 회복) - V3 synthetic FG jump: bg0 release → fg dispatch (bg1~4 jump). dispatch log `mlx_gate dispatch priority=FOREGROUND seq=5 wait_ms=1502 queue_len=4` - V4 legacy grep: user-facing 코드 잔재 0, Semaphore-like 패턴 0 후속 = Phase 2 (digest/briefing Semaphore 통합 + verifier/call_triage gate 안 + starvation aging) + B-2 (throughput). closure 4 필수 단락 포함: query_analyzer 판정 / study_explanation owner / preemption 한계 / starvation WARN (post-deploy follow-up, closure gate 아님). plan: ~/.claude/plans/hermes-polymorphic-rossum.md
2026-05-17 08:58:38 +09:00
parent a08b620894
commit b8575084b1
1 changed files with 162 additions and 0 deletions
@@ -0,0 +1,162 @@
+# DS-Mac-mini-26B-Priority-Gate-1 (B-1) Closure Report
+
+**Date**: 2026-05-17
+**Plan**: `~/.claude/plans/hermes-polymorphic-rossum.md`
+**Commits**:
+- `7c9aff3` feat(search): MLX priority gate (heap + inflight + 6 scenario tests)
+- `a08b620` refactor(search): swap 10 call sites to acquire_mlx_gate(Priority.*)
+
+## Summary
+
+PR-Hermes-Docsrv-Search-1 + B-3 Synthesis-Timeout-Calibration-1 closure 후 잔존한 **우선순위 역전** (FIFO Semaphore 안에서 background work 가 user-facing ask 앞을 막음) 해결. 동시성 1 유지, 모델 라우팅 변경 0, queue ordering 만 foreground 우선. latency/throughput 변경 X (B-2 별 트랙).
+
+## 변경 사항
+
+### 신규 API (`app/services/search/llm_gate.py`)
+- `Priority(IntEnum)`: `FOREGROUND=0`, `BACKGROUND=100`
+- `acquire_mlx_gate(priority=DEFAULT_PRIORITY)` async context manager — heap + inflight 기반 fair queueing
+- `DEFAULT_PRIORITY = BACKGROUND` (안전 default — 미지정 호출이 foreground 짓밟지 않음)
+- `get_mlx_gate()` legacy wrapper — **context-manager only 호환** (`async with` 형태만)
+- 내부: `_waiters` heap `(priority, seq, future, enqueue_ts)`, `_inflight: bool`, `_lock: asyncio.Lock`, `_seq`
+- fast-path: not inflight + empty queue → 즉시 inflight, Future 생성 X
+- `_dispatch_next_locked()` — cancelled/done Future skip (heap stuck risk 회피)
+- release: lock 안에서 pop, `set_result` 는 `loop.call_soon` lock 밖에서 (reentry deadlock 회피)
+- **logging**: `mlx_gate enqueue/dispatch/release priority=... seq=... wait_ms=... queue_len=...`
+- BACKGROUND wait_ms > 300_000 (5분) 시 **starvation WARN** — aging 은 Phase 2
+
+### 10 site 교체
+
+**Foreground 6** (user-facing path):
+| File | Line | 근거 |
+|---|---|---|
+| `app/services/search/evidence_service.py` | 315 | `/api/search/ask` evidence stage |
+| `app/services/search/classifier_service.py` | 103 | `/api/search/ask` classifier stage |
+| `app/services/search/synthesis_service.py` | 299 | `/api/search/ask` synthesis stage |
+| `app/api/documents.py` | 1306 | 사용자 수동 analyze API |
+| `app/api/study_topics.py` | 1183 | subject note 동기 생성 |
+| `app/api/study_questions.py` | 1560 | study explanation 동기 API |
+
+**Background 4** (worker queue / fire-and-forget):
+| File | Line | 근거 |
+|---|---|---|
+| `app/services/search/query_analyzer.py` | 240 | **V0 grep 확인**: fire-and-forget only (아래) |
+| `app/workers/deep_summary_worker.py` | 110 | classify-escalate worker |
+| `app/workers/study_explanation_worker.py` | 149 | queue 기반 worker |
+| `app/workers/study_session_analysis_worker.py` | 237 | queue 기반 worker |
+
+### Cleanup
+- `_get_llm_semaphore()` 함수 제거 (self-only unused, signature 거짓말 — `get_mlx_gate` 가 이제 context manager 반환)
+
+## 필수 단락 (4건)
+
+### 1. query_analyzer priority 판정 결과 — **BACKGROUND 확정**
+
+V0 grep:
+```bash
+grep -Rn 'trigger_background_analysis|query_analyzer|analyze_query' app/api app/services/search
+```
+결과:
+- `app/services/search/query_analyzer.py:25` docstring rule: **"analyze() 동기 호출 금지. retrieval 경로는 get_cached() + trigger_background_analysis() 만 사용."**
+- `app/services/search/search_pipeline.py:164` `query_analyzer.get_cached(q)` (LLM call X)
+- `app/services/search/search_pipeline.py:179` `query_analyzer.trigger_background_analysis(q)` (fire-and-forget)
+
+→ `/api/search/ask` critical path 에서 `query_analyzer.analyze()` LLM call **await 안 함**. `Priority.BACKGROUND` 확정. foreground 우선권은 evidence + classifier + synthesis 3 site 로 보장됨.
+
+### 2. study_explanation_worker active owner — **active**
+
+`grep -Rn "study_question_jobs\|StudyQuestionJob" app/main.py app/workers/` 결과 — `app/main.py` 의 APScheduler 에 `study_q_embed (1m)` / `study_queue (1m)` / `study_session_queue (1m)` 등록되어 활성. `study_explanation_worker` 는 study_queue 의 consumer 로 정상 active. 변경 후 회귀 0 검증 완료.
+
+### 3. "왜 foreground 가 즉시 0초 대기하지 않나" 오해 방지
+
+본 PR 은 **preemption 미구현**. 이미 inflight 인 background call (예: 30s digest LLM call) 은 끊지 않는다. user ask 들어와도 **현재 inflight 의 남은 시간만큼** 대기.
+
+`개선된 부분`: background 2~5 까지 줄 서있던 큐 — foreground 가 들어오면 이 줄을 **전부 jump**. 즉 `최대 대기 = 현재 inflight 의 남은 시간` (이전: `현재 inflight + 줄 선 background 모두`).
+
+E2E 측정 (synthetic):
+- bg0 (2s) inflight + bg1~4 대기 + fg enqueue → bg0 release 직후 **fg dispatch** (wait_ms=1502, queue_len=4 bypass) → bg1~4 순서대로
+- log evidence: `mlx_gate dispatch priority=FOREGROUND seq=5 wait_ms=1502 queue_len=4`
+
+### 4. Background starvation WARN — closure gate 아님, post-deploy follow-up
+
+본 PR 의 closure 는 당일 hard gate (6/6 unit + synthetic foreground jump + PR-1 fixture 회귀) 로 종료.
+
+`mlx_gate background waiter starved wait_ms=... priority=BACKGROUND seq=...` line 빈도는 **1주 후 follow-up 참고**:
+- 0건 → 정상, aging mitigation 불필요
+- 빈발 → **Phase 2 (aging) 진입 결정 신호** (Phase 2 = digest/briefing sem 통합 + verifier/call_triage gate 안 + aging)
+
+## 검증 결과
+
+### V0: 사전 grep (query_analyzer priority 결정) ✅
+위 §1 참조.
+
+### V1: Unit smoke 6/6 PASS ✅
+container 안 `tests/test_priority_gate.py` (pytest 부재로 inline runner):
+
+```
+  PASS  s1_fifo_within_same_priority
+  PASS  s2_foreground_jumps_queue
+  PASS  s3_long_running_background_blocks_foreground
+  PASS  s4_mixed_concurrent_enqueue
+  PASS  s5_backward_compat
+  PASS  s6_cancelled_waiter_skip
+
+Summary: PASS=6 / 6, FAIL=0
+```
+
+### V2: PR-1 Layer 1 fixture 회귀 ✅
+```
+10/10 HTTP 200 + 5/5 search + 3/3 failure injection 모두 PASS
+
+p50 = 11.1s (이전 B-3 측정 23.2s 의 ~절반 — priority gate 가 background race 회피 효과 의심)
+p95 = 30.2s (ASME outlier, B-3 측정 50.7s 보다 ↓)
+min/max = 4.6s / 30.2s
+mean = 12.3s
+
+→ **functional 회귀 0** + **latency 자연 회복** (B-1 의 mandate 가 latency 단축 아니지만,
+   foreground 가 background 와 race 안 하면서 자연적으로 빨라진 부산물)
+```
+
+### V3: Synthetic foreground jump ✅
+위 §3 참조. 정확히 의도대로 동작.
+
+### V4: Legacy 호출 잔재 검증 ✅
+```bash
+grep -Rn 'get_mlx_gate' app tests scripts
+grep -Rn '_get_llm_semaphore\|MLX_CONCURRENCY\|asyncio.Semaphore(1)' app tests scripts
+```
+결과:
+- `get_mlx_gate()` user-facing 코드 호출 = **0** (docstring + backup `.pre-llm-reframe-cleanup.20260515` + tests/test_priority_gate.py backward compat 만 잔존)
+- `_get_llm_semaphore` 잔재 = 0 (제거)
+- `asyncio.Semaphore(1)` 잔재 = digest/summarizer.py:23 + briefing/comparator.py:42 (Phase 2 알려진 대상)
+- **Semaphore-like 직접 acquire/release 패턴 = 0** ✅
+
+## 7일 안전망 (2026-05-24)
+
+- git revert `a08b620` (site refactor) — gate 자체 유지, 호출 site 만 legacy wrapper 로 복귀
+- git revert `7c9aff3` (gate 자체) — Phase 3.1 Semaphore(1) 로 완전 복귀
+- 부분 revert 가능 = wrapper 호환성 보존 결과
+
+## 후속 트랙 (별 PR, MEMORY backlog 등록)
+
+| 우선 | 트랙 | 진입 |
+|---|---|---|
+| Phase 2 | digest/summarizer.py + briefing/comparator.py `_llm_sem` 통합 (공용 gate + BACKGROUND priority) | Phase 1 closure 후 운영 관찰 |
+| Phase 2 | classify_worker `call_triage` gate 안 이동 (현재 gate 외부 실행) | 동상 |
+| Phase 2 | verifier_service gate 안 이동 (`Priority.FOREGROUND` — ask path) — grounding 병렬 deadlock 검토 필수 | 동상 |
+| Phase 2 | Background starvation **aging** (waiting time → priority boost) — WARN 빈발 시 진입 | 1주 WARN 측정 후 |
+| B-2 | DS-Mac-mini-26B-Throughput-1 (priority queue → concurrency >1 / triage 모델 분리 / GPU Ollama 재도입 비교) | 별 plan |
+
+## Inventory + 메모리 update (별 commit)
+
+- `infra_inventory.md` "AI 모델 라우팅" 섹션에 **"MLX Priority Gate"** 하위절 추가
+- `project_hermes_docsrv_bridge.md` 후속 트랙 표 B-1 closed 처리 + Phase 2 트랙 등록
+- `MEMORY.md` backlog 에서 B-1 제거
+
+## Scope hard limit (재확인 — 모두 준수)
+
+- ❌ MLX_CONCURRENCY ↑ → **유지 (1)** ✅
+- ❌ 모델 라우팅 변경 → **변경 0** ✅
+- ❌ GPU LLM 재도입 → **0** ✅
+- ❌ Mac mini 추가 모델 → **단일 모델 정책 유지** ✅
+- ❌ digest/briefing/verifier/call_triage gate 처리 → **Phase 2 deferred** ✅
+- ❌ Background aging → **WARN 만, aging Phase 2** ✅