refactor(search): Phase 2A/2B cand reranker 잔재 teardown

- RERANKER_BACKEND_MAP 에서 cand_gte_ml_base 슬러그 제거 (컨테이너·DB 테이블 마이그360·override 이미 종료) - docker-compose.override.cand.yml / override.rerank-cand.yml 삭제 - search.py allowlist · run_eval.py help 정합 - dispatcher scaffold(_resolve_reranker)는 보존 (후보 재진입 대비) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(search): reranker MAX_CLIENT_BATCH_SIZE 64→256
2026-06-18 00:00:42 +00:00 · 2026-06-17 23:35:43 +00:00 · 2026-06-17 23:29:19 +00:00 · 2026-06-17 15:51:15 +09:00 · 2026-06-17 14:54:56 +09:00 · 2026-06-17 12:37:19 +09:00
7 changed files with 7 additions and 259 deletions
@@ -282,7 +282,7 @@ async def search(
                content={
                    "error_reason": "unknown_reranker_backend",
                    "backend_requested": reranker_backend,
-                    "allowed": ["baseline", "cand_gte_ml_base"],
+                    "allowed": ["baseline"],
                    "detail": msg,
                },
            )
@@ -44,11 +44,10 @@ RERANK_TIMEOUT = 5.0
 # server-side allowlist map. query parameter 가 raw endpoint URL 받지 않음.
 RERANKER_BACKEND_MAP: dict[str, dict[str, str] | None] = {
    "baseline": None,  # production reranker (config.yaml endpoint via AIClient.rerank)
-    "cand_gte_ml_base": {
+    # Phase 2B 후보 reranker 전부 NO-GO 종결 (2026-06-18 teardown):
-        "endpoint": "http://rerank-cand-gte-ml-base:80/rerank",
+    #   - cand_gte_ml_base : 컨테이너·DB 테이블(마이그 360)·override.rerank-cand.yml 제거됨
-    },
+    #   - mxbai_large (deberta-v2 → TEI 1.7 미지원) / bge_v2_gemma_2b (1_Pooling 부재) 미진입
-    # mxbai_large 후보 (deberta-v2 → TEI 1.7 미지원) Phase 2B-Extended 이관
+    # dispatcher scaffold(_resolve_reranker)는 향후 후보 재진입 위해 보존.
    # bge_v2_gemma_2b 후보 (LLM-based reranker, 1_Pooling/config.json 부재) Phase 2B-Extended 이관
 }
@@ -1,8 +1,6 @@
 # hyungi_Document_Server 설정
 ai:
  gateway:
    endpoint: "http://ai-gateway:8080"
  models:
    # ─── 단일 generation 호스트 routing (2026-05-14 GPU LLM 제거) ───
@@ -1,135 +0,0 @@
 # Phase 2A — Embedding candidate compose override (Diagnose only)
 #
 # Profile-isolated: `--profile embed-cand` 명시 opt-in. default up 시 미기동.
 # production fastapi/postgres/reranker 에 영향 0.
 # 본 PR 종료 시 별 chore (PR-2A-Chunks-Cand-Cleanup-1) 에서 제거.
 #
 # 후보 상태 (2026-05-23):
 #   - me5_large_inst : ✅ smoke PASS (dim 1024)
 #   - bge_mgemma2    : ❌ Phase 2A-Extended 별 PR 이관 (9B FP16 → VRAM OOM risk + 다운로드 cost)
 #   - me5_ko         : ❌ 폐기 (401 Unauthorized, gated/모델명 부정확)
 #   - snowflake_l_v2 : 신규 추가 (Snowflake/snowflake-arctic-embed-l-v2.0, 2024-12, multilingual 강화)
 #
 # 사용:
 #   docker compose -f docker-compose.yml -f docker-compose.override.cand.yml \
 #     --profile embed-cand up -d embedding-cand-me5-inst
 #
 # 호출 (DS network 내부):
 #   http://embedding-cand-me5-inst:80/embed
 #   http://embedding-cand-snowflake-l-v2:80/embed
 services:
  embedding-cand-me5-inst:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    restart: unless-stopped
    container_name: hyungi_document_server-embedding-cand-me5-inst-1
    expose:
      - "80"
    environment:
      - MODEL_ID=intfloat/multilingual-e5-large-instruct
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - embedding_cand_me5_inst_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    profiles: ["embed-cand"]
  embedding-cand-snowflake-l-v2:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    restart: unless-stopped
    container_name: hyungi_document_server-embedding-cand-snowflake-l-v2-1
    expose:
      - "80"
    environment:
      - MODEL_ID=Snowflake/snowflake-arctic-embed-l-v2.0
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - embedding_cand_snowflake_l_v2_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    profiles: ["embed-cand"]
  # ===== 비활성 후보 (Phase 2A-Extended 별 PR 이관 또는 폐기) =====
  # 진단 박제만 보존. 본 PR scope 외.
  embedding-cand-bge-mgemma2:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    container_name: hyungi_document_server-embedding-cand-bge-mgemma2-1
    expose:
      - "80"
    environment:
      - MODEL_ID=BAAI/bge-multilingual-gemma2
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - embedding_cand_bge_mgemma2_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 300s
    profiles: ["embed-cand-extended"]   # 본 PR 미사용. extended 별 profile.
  embedding-cand-me5-ko:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    container_name: hyungi_document_server-embedding-cand-me5-ko-1
    expose:
      - "80"
    environment:
      - MODEL_ID=dragonkue/multilingual-e5-large-ko
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - embedding_cand_me5_ko_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    profiles: ["embed-cand-disabled"]   # 401 fail. 사용 X.
 volumes:
  embedding_cand_me5_inst_cache:
  embedding_cand_snowflake_l_v2_cache:
  embedding_cand_bge_mgemma2_cache:
  embedding_cand_me5_ko_cache:
@@ -1,101 +0,0 @@
 # Phase 2B — Reranker candidate compose override (Diagnose only)
 #
 # Profile-isolated: `--profile rerank-cand` 명시 opt-in. default up 시 미기동.
 # production fastapi/postgres/reranker(bge-reranker-v2-m3) 에 영향 0.
 # 본 PR 종료 후 별 chore (PR-2B-Rerank-Cand-Cleanup-1) 에서 제거.
 #
 # 후보 상태 (2026-05-23):
 #   - gte_ml_base       : Apache 2.0, 305M, smoke 대기
 #   - mxbai_large       : Apache 2.0, ~435M, safetensors 부재 — TEI smoke risk
 #   - bge_v2_gemma_2b   : Gemma 라이센스, 2.5B FP16 ~5GB, smoke 대기
 #
 # 사용:
 #   docker compose -f docker-compose.yml -f docker-compose.override.rerank-cand.yml \
 #     --profile rerank-cand up -d rerank-cand-gte-ml-base
 services:
  rerank-cand-gte-ml-base:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    restart: unless-stopped
    container_name: hyungi_document_server-rerank-cand-gte-ml-base-1
    expose:
      - "80"
    environment:
      - MODEL_ID=Alibaba-NLP/gte-multilingual-reranker-base
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - rerank_cand_gte_ml_base_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    profiles: ["rerank-cand"]
  rerank-cand-mxbai-large:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    restart: unless-stopped
    container_name: hyungi_document_server-rerank-cand-mxbai-large-1
    expose:
      - "80"
    environment:
      - MODEL_ID=mixedbread-ai/mxbai-rerank-large-v1
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - rerank_cand_mxbai_large_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    profiles: ["rerank-cand"]
  rerank-cand-bge-v2-gemma-2b:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    restart: unless-stopped
    container_name: hyungi_document_server-rerank-cand-bge-v2-gemma-2b-1
    expose:
      - "80"
    environment:
      - MODEL_ID=BAAI/bge-reranker-v2-gemma
      - MAX_BATCH_TOKENS=8192
      - MAX_CONCURRENT_REQUESTS=2
    volumes:
      - rerank_cand_bge_v2_gemma_2b_cache:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 120s
    profiles: ["rerank-cand"]
 volumes:
  rerank_cand_gte_ml_base_cache:
  rerank_cand_mxbai_large_cache:
  rerank_cand_bge_v2_gemma_2b_cache:
@@ -149,7 +149,7 @@ services:
      #       → 32 한도 초과 → 413. 64 로 늘림.
      # GPU VRAM free 6199MiB 충분. baseline path (MAX_RERANK_INPUT=200) 영향 0.
      - MAX_BATCH_TOKENS=16384
-      - MAX_CLIENT_BATCH_SIZE=64
+      - MAX_CLIENT_BATCH_SIZE=256  # 2026-06-18 fix: 64→256, MAX_RERANK_INPUT=200 커버 (batch>64 ERROR=RRF silent fallback 해소; MAX_BATCH_TOKENS가 VRAM 상한이라 entries 증가는 VRAM 무관)
      - MAX_CONCURRENT_REQUESTS=4
    volumes:
      - reranker_cache:/data
@@ -168,19 +168,6 @@ services:
      start_period: 120s
    restart: unless-stopped
  ai-gateway:
    build: ./gpu-server/services/ai-gateway
    ports:
      - "127.0.0.1:8081:8080"
    environment:
      - PRIMARY_ENDPOINT=http://100.76.254.116:8801/v1/chat/completions
      - FALLBACK_ENDPOINT=http://ollama:11434/v1/chat/completions
      - CLAUDE_API_KEY=${CLAUDE_API_KEY:-}
      - DAILY_BUDGET_USD=${DAILY_BUDGET_USD:-5.00}
    # depends_on: ollama 제거 (2026-06-08) — ollama 서비스가 standalone 으로 이관됨.
    # FALLBACK_ENDPOINT 의 ollama:11434 는 standalone(동일 hostname, DS 망 부착)으로 해소.
    restart: unless-stopped
  fastapi:
    build: ./app
    ports:
@@ -1394,7 +1394,7 @@ def main() -> int:
        "--reranker-backend",
        type=str,
        default=None,
-        help="Phase 2B Diagnose reranker dispatcher slug (baseline | cand_gte_ml_base). 미지정 = production.",
+        help="Phase 2B Diagnose reranker dispatcher slug (baseline). 후보 cand_gte_ml_base = NO-GO 종결·teardown(2026-06-18). 미지정 = production.",
    )
    parser.add_argument(
        "--rewrite-backend",
Author	SHA1	Message	Date
hyungi	624b9d523d	refactor(search): Phase 2A/2B cand reranker 잔재 teardown - RERANKER_BACKEND_MAP 에서 cand_gte_ml_base 슬러그 제거 (컨테이너·DB 테이블 마이그360·override 이미 종료) - docker-compose.override.cand.yml / override.rerank-cand.yml 삭제 - search.py allowlist · run_eval.py help 정합 - dispatcher scaffold(_resolve_reranker)는 보존 (후보 재진입 대비) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-18 00:00:42 +00:00
hyungi	5cabf728e6	fix(search): reranker MAX_CLIENT_BATCH_SIZE 64→256 rerank_service.py 가 후보를 MAX_RERANK_INPUT=200 까지 청크 없이 한 번에 TEI 로 POST → TEI 한도 64 초과(85) 시 HTTPError → RRF silent fallback(리랭크 누락=검색 품질 저하, 48h 4회). MAX_BATCH_TOKENS=16384 가 VRAM 상한이라 client batch entries 한도만 256(MAX_RERANK_INPUT 200 커버)으로 상향, reranker 만 재생성. 검증: 85-text rerank HTTP 200, batch 에러 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 23:35:43 +00:00
hyungi	cd694e7386	refactor(ds): vestigial ai-gateway 폐기 예산캡 LLM 게이트웨이(2026-04-03 GPU 이관 최초 커밋부터 존재). config.ai.gateway 파싱만·소비코드 0줄·established 0·요청 이력 0 = vestigial 입증. docker-compose.yml ai-gateway 서비스블록 + config.yaml ai.gateway 블록 제거. 컨테이너+image(256MB) 제거, fastapi 무손상(재생성 안 함). dangling CLAUDE_API_KEY env 노출 동반 제거(credentials.env=gitignore 별도). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-17 23:29:19 +00:00
hyungi	7247d242a2	Merge pull request 'fix(docpage): 절뷰 로딩 시 이미지 '나왔다 사라짐' 플래시 제거' (#46 ) from fix/section-view-image-flash into main Reviewed-on: #46	2026-06-17 15:51:15 +09:00
hyungi	9434017114	Merge pull request 'fix(docpage): 절뷰 본문 MarkdownDoc 렌더 복원 — 이미지·수식 살림 (D8 배포 회귀 복구)' (#45 ) from fix/section-view-md-render-d8 into main Reviewed-on: #45	2026-06-17 14:54:56 +09:00
hyungi	66f3287564	Merge pull request 'Feat/asme item decomp d1' (#44 ) from feat/asme-item-decomp-d1 into main Reviewed-on: #44	2026-06-17 12:37:19 +09:00