feat(deploy): Phase 1.3 reranker (TEI bge-reranker-v2-m3) 서비스 추가
docker-compose.yml에 reranker 서비스 추가: - image: ghcr.io/huggingface/text-embeddings-inference:1.5 - MODEL_ID=BAAI/bge-reranker-v2-m3 - MAX_BATCH_TOKENS=8192, MAX_CONCURRENT_REQUESTS=4 - GPU 1개 할당 (RTX 4070 Ti Super, CUDA 13.0) - expose 80만 (host 노출 X, internal network 전용) - reranker_cache volume으로 모델 영속화 - fastapi가 depends_on 안 함 → 단독 시작 가능, reranker 없어도 fastapi 동작 (rerank_service가 RRF fallback) 다음 단계: - GPU에서 docker pull로 호환성 확인 - docker compose up -d reranker → warmup - config.yaml의 rerank.endpoint를 http://reranker:80/rerank로 갱신 (GPU 직접) - fastapi rebuild + 평가셋 측정 (rerank=true)
This commit is contained in:
@@ -45,6 +45,28 @@ services:
|
||||
- "127.0.0.1:11434:11434"
|
||||
restart: unless-stopped
|
||||
|
||||
# Phase 1.3: bge-reranker-v2-m3 (TEI) — internal only, fastapi에서 reranker:80으로 호출
|
||||
# fastapi가 depends_on 안 함 → 단독 시작 가능, 없어도 fastapi 동작 (rerank=false fallback)
|
||||
reranker:
|
||||
image: ghcr.io/huggingface/text-embeddings-inference:1.5
|
||||
container_name: hyungi_document_server-reranker-1
|
||||
expose:
|
||||
- "80"
|
||||
environment:
|
||||
- MODEL_ID=BAAI/bge-reranker-v2-m3
|
||||
- MAX_BATCH_TOKENS=8192
|
||||
- MAX_CONCURRENT_REQUESTS=4
|
||||
volumes:
|
||||
- reranker_cache:/data
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
restart: unless-stopped
|
||||
|
||||
ai-gateway:
|
||||
build: ./gpu-server/services/ai-gateway
|
||||
ports:
|
||||
@@ -103,3 +125,4 @@ volumes:
|
||||
pgdata:
|
||||
caddy_data:
|
||||
ollama_data:
|
||||
reranker_cache:
|
||||
|
||||
Reference in New Issue
Block a user