fix(search): reranker MAX_CLIENT_BATCH_SIZE 64→256

rerank_service.py 가 후보를 MAX_RERANK_INPUT=200 까지 청크 없이 한 번에 TEI 로 POST → TEI 한도 64 초과(85) 시 HTTPError → RRF silent fallback(리랭크 누락=검색 품질 저하, 48h 4회). MAX_BATCH_TOKENS=16384 가 VRAM 상한이라 client batch entries 한도만 256(MAX_RERANK_INPUT 200 커버)으로 상향, reranker 만 재생성. 검증: 85-text rerank HTTP 200, batch 에러 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
refactor(ds): vestigial ai-gateway 폐기
2026-06-17 23:35:43 +00:00 · 2026-06-17 23:29:19 +00:00 · 2026-06-17 15:51:15 +09:00 · 2026-06-17 14:54:56 +09:00 · 2026-06-17 12:37:19 +09:00
2 changed files with 1 additions and 16 deletions
@@ -1,8 +1,6 @@
 # hyungi_Document_Server 설정

 ai:
-  gateway:
-    endpoint: "http://ai-gateway:8080"

  models:
    # ─── 단일 generation 호스트 routing (2026-05-14 GPU LLM 제거) ───
@@ -149,7 +149,7 @@ services:
      #       → 32 한도 초과 → 413. 64 로 늘림.
      # GPU VRAM free 6199MiB 충분. baseline path (MAX_RERANK_INPUT=200) 영향 0.
      - MAX_BATCH_TOKENS=16384
-      - MAX_CLIENT_BATCH_SIZE=64
+      - MAX_CLIENT_BATCH_SIZE=256  # 2026-06-18 fix: 64→256, MAX_RERANK_INPUT=200 커버 (batch>64 ERROR=RRF silent fallback 해소; MAX_BATCH_TOKENS가 VRAM 상한이라 entries 증가는 VRAM 무관)
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - reranker_cache:/data
@@ -168,19 +168,6 @@ services:
      start_period: 120s
    restart: unless-stopped

-  ai-gateway:
-    build: ./gpu-server/services/ai-gateway
-    ports:
-      - "127.0.0.1:8081:8080"
-    environment:
-      - PRIMARY_ENDPOINT=http://100.76.254.116:8801/v1/chat/completions
-      - FALLBACK_ENDPOINT=http://ollama:11434/v1/chat/completions
-      - CLAUDE_API_KEY=${CLAUDE_API_KEY:-}
-      - DAILY_BUDGET_USD=${DAILY_BUDGET_USD:-5.00}
-    # depends_on: ollama 제거 (2026-06-08) — ollama 서비스가 standalone 으로 이관됨.
-    # FALLBACK_ENDPOINT 의 ollama:11434 는 standalone(동일 hostname, DS 망 부착)으로 해소.
-    restart: unless-stopped
-
  fastapi:
    build: ./app
    ports:
Author	SHA1	Message	Date
hyungi	5cabf728e6	fix(search): reranker MAX_CLIENT_BATCH_SIZE 64→256 rerank_service.py 가 후보를 MAX_RERANK_INPUT=200 까지 청크 없이 한 번에 TEI 로 POST → TEI 한도 64 초과(85) 시 HTTPError → RRF silent fallback(리랭크 누락=검색 품질 저하, 48h 4회). MAX_BATCH_TOKENS=16384 가 VRAM 상한이라 client batch entries 한도만 256(MAX_RERANK_INPUT 200 커버)으로 상향, reranker 만 재생성. 검증: 85-text rerank HTTP 200, batch 에러 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 23:35:43 +00:00
hyungi	cd694e7386	refactor(ds): vestigial ai-gateway 폐기 예산캡 LLM 게이트웨이(2026-04-03 GPU 이관 최초 커밋부터 존재). config.ai.gateway 파싱만·소비코드 0줄·established 0·요청 이력 0 = vestigial 입증. docker-compose.yml ai-gateway 서비스블록 + config.yaml ai.gateway 블록 제거. 컨테이너+image(256MB) 제거, fastapi 무손상(재생성 안 함). dangling CLAUDE_API_KEY env 노출 동반 제거(credentials.env=gitignore 별도). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 23:29:19 +00:00
hyungi	7247d242a2	Merge pull request 'fix(docpage): 절뷰 로딩 시 이미지 '나왔다 사라짐' 플래시 제거' (#46 ) from fix/section-view-image-flash into main Reviewed-on: #46	2026-06-17 15:51:15 +09:00
hyungi	9434017114	Merge pull request 'fix(docpage): 절뷰 본문 MarkdownDoc 렌더 복원 — 이미지·수식 살림 (D8 배포 회귀 복구)' (#45 ) from fix/section-view-md-render-d8 into main Reviewed-on: #45	2026-06-17 14:54:56 +09:00
hyungi	66f3287564	Merge pull request 'Feat/asme item decomp d1' (#44 ) from feat/asme-item-decomp-d1 into main Reviewed-on: #44	2026-06-17 12:37:19 +09:00