hyungi_document_server

Files

T

Hyungi Ahn 21a78fbbf0 fix(search): semaphore로 LLM concurrency=1 강제 + run_eval analyze 파라미터 추가

## 배경
1차 Phase 2.2 eval에서 발견: 23개 쿼리가 순차 호출되지만 각 request의
background analyzer task는 모두 동시에 MLX에 요청 날림 → MLX single-inference
서버 queue 폭발 → 22개가 15초 timeout. cache 채워지지 않음.

## 수정

### query_analyzer.py
 - LLM_CONCURRENCY = 1 상수 추가
 - _LLM_SEMAPHORE: lazy init asyncio.Semaphore (event loop 바인딩)
 - analyze() 내부: semaphore → timeout(실제 LLM 호출만) 이중 래핑
   semaphore 대기 시간이 timeout에 포함되지 않도록 주의

### run_eval.py
 - --analyze true|false 파라미터 추가 (Phase 2.1+ 측정용)
 - call_search / evaluate 시그니처에 analyze 전달

## 기대 효과
 - prewarm/background/동기 호출 모두 1개씩 순차 MLX 호출
 - 23개 대기 시 최악 230초 소요, 단 모두 성공해서 cache 채움
 - MLX 서버 부하 안정

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-08 15:12:13 +09:00

scripts

feat(chunk): Phase 1.2-E reindex 스크립트 추가

2026-04-08 12:31:29 +09:00

search_eval

fix(search): semaphore로 LLM concurrency=1 강제 + run_eval analyze 파라미터 추가

2026-04-08 15:12:13 +09:00

__init__.py

feat: scaffold v2 project structure with Docker, FastAPI, and config

2026-04-02 10:20:15 +09:00

conftest.py

feat: scaffold v2 project structure with Docker, FastAPI, and config

2026-04-02 10:20:15 +09:00