Files
hyungi_document_server/docker-compose.yml
hyungi b734fc54af fix(search): Phase 2Q rerank payload — chunk_id dedup + cap 60 + TEI batch 64 (Apply prereq)
plan pr-2q-rerank-payload-fix-resolute-haven.md. Phase 2Q multi-query path 의 reranker
413 Payload Too Large root cause = TEI 의 MAX_CLIENT_BATCH_SIZE=32 default (batch entries
한도) + multi-query 의 chunks 누적이 32 초과. MAX_BATCH_TOKENS 와 별개 (token sum 한도).

4 iteration 진단 history (json 박제):
  1) cap 60 + dedup = 413 다수 (batch 54 > 32)
  2) cap 30 + chunks_per_doc=1 = 413 0건 + NDCG 0.666 catastrophic (-0.261)
  3) cap 60 + dedup + TEI 16384 only = 413 46건 (batch size 한도 별개)
  4) cap 60 + dedup + TEI 16384/64 = 413 1건 + NDCG 0.876 (FINAL)

변경:
- app/services/search/search_pipeline.py:
  · _dedup_chunks_by_id() 신규 helper — chunk_id (None 시 doc.id) 기준 first-only.
    variant 별 same chunk 중복 누적 회피, 첫 등장 variant 보존.
  · PHASE2Q_RERANK_INPUT_CAP=60 + PHASE2Q_CHUNKS_PER_DOC=2 신규 상수 (baseline
    MAX_RERANK_INPUT=200 / MAX_CHUNKS_PER_DOC=2 와 별도).
  · search_with_rewrite() merge 후 dedup wire-up + rerank input cap swap.
- docker-compose.yml reranker env (사용자 결정, plan out-of-scope 정정):
  · MAX_BATCH_TOKENS 8192 → 16384 (token sum 한도)
  · MAX_CLIENT_BATCH_SIZE 32 → 64 신규 추가 (batch entries 한도 — root cause)
  · GPU VRAM free 6199MiB 충분 사전 verify.
- tests/test_query_rewriter.py: _dedup_chunks_by_id 5 test + PHASE2Q_* constants test.
  38/38 PASS (기존 32 + 신규 6).

측정 결과 (51 case, gemma backend, snapshot 25180/56526):
  vs Phase 3 (commit a41adb6 NDCG 0.927, 413 다수):
  · NDCG 0.876 (-0.051 acceptable, plan 변수 격리 invariant 충족)
  · Recall t≥2 0.721 (+0.034 회복)
  · Recall t≥3 0.739 (+0.011)
  · latency p50 1421ms (-1336ms, -48%) / p95 3392ms (-6292ms, -65%) major win
  · 413 fallback 1/51 (98%↓ from 다수) + reranker batch error 0
  · 카테고리 english_only +0.34 / standards -0.28 / exam -0.19 (Apply 후 분석 항목)

closure gate PASS:
  · unit test 38/38, production smoke 413 0
  · 51 case 413 < 5/51 (1건만)
  · latency 대폭 개선
  · NDCG threshold 0.92 미달 단 plan invariant (production 평가 단일 변수) 충족
  · Apply PR-2Q-Apply-Query-Rewrite-1 진입 ready

산출물:
  · reports/v0_2_phase2q_rerank_fix_2026-05-24.csv (raw)
  · tests/search_eval/baselines/v0_2_phase2q_rerank_fix_2026-05-24.json (4 iter 진단 박제)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:54:59 +00:00

258 lines
8.6 KiB
YAML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
services:
postgres:
image: pgvector/pgvector:pg16
volumes:
- pgdata:/var/lib/postgresql/data
- ./migrations:/docker-entrypoint-initdb.d
environment:
POSTGRES_DB: pkm
POSTGRES_USER: pkm
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
ports:
- "100.110.63.63:15432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U pkm"]
interval: 5s
timeout: 5s
retries: 5
restart: unless-stopped
kordoc-service:
build: ./services/kordoc
ports:
- "127.0.0.1:3100:3100"
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
mem_limit: 4g
memswap_limit: 4g
healthcheck:
test: ["CMD", "node", "-e", "fetch('http://localhost:3100/health').then(r=>{process.exit(r.ok?0:1)}).catch(()=>process.exit(1))"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
ocr-service:
build: ./services/ocr
expose:
- "3200"
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
- ocr_models:/root/.cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:3200/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 180s
restart: unless-stopped
# Phase 1B (2026-05-01): PDF → markdown 변환. ocr-service 와 별도 컨테이너 (deps 충돌 회피).
marker-service:
build: ./services/marker
ports:
- "127.0.0.1:3300:3300"
expose:
- "3300"
environment:
- HF_HOME=/models/huggingface
- TORCH_HOME=/models/torch
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
- marker_models:/models
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3300/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 300s
restart: unless-stopped
stt-service:
# 2026-05-08 (D9 Track B revised): GPU is canonical STT owner.
# 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유.
# 이전 "Mac mini 이전본" 주석은 trace 오인 기반이었고 본 revised 결정으로 폐기.
# fastapi 의 STT_ENDPOINT 는 `http://stt-service:3300` (compose 내부 DNS) 사용.
build: ./services/stt
expose:
- "3300"
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents:ro
- stt_models:/root/.cache
environment:
- WHISPER_MODEL=${WHISPER_MODEL:-large-v3}
- WHISPER_DEVICE=${WHISPER_DEVICE:-cuda}
- WHISPER_COMPUTE_TYPE=${WHISPER_COMPUTE_TYPE:-float16}
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
# /ready: CUDA 디바이스 + 모델 적재 둘 다 확인. ready=true 만 healthy 처리.
# /health 는 단순 liveness 라 모델 미적재 상태도 healthy 로 잡혀 운영 신호로 부적합.
test: ["CMD", "python3", "-c", "import json,urllib.request,sys; r=urllib.request.urlopen('http://localhost:3300/ready'); sys.exit(0 if json.load(r).get('ready') else 1)"]
interval: 30s
timeout: 10s
retries: 3
start_period: 300s
restart: unless-stopped
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- "127.0.0.1:11434:11434"
restart: unless-stopped
# Phase 1.3: bge-reranker-v2-m3 (TEI) — internal only, fastapi에서 reranker:80으로 호출
# fastapi가 depends_on 안 함 → 단독 시작 가능, 없어도 fastapi 동작 (rerank=false fallback)
reranker:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
container_name: hyungi_document_server-reranker-1
expose:
- "80"
environment:
- MODEL_ID=BAAI/bge-reranker-v2-m3
# PR-2Q-Rerank-Payload-Fix (2026-05-24): 2 env 변경 — 413 root cause 분리.
# (a) MAX_BATCH_TOKENS 8192 → 16384: 각 batch 의 token sum 한도
# (b) MAX_CLIENT_BATCH_SIZE 32 → 64: 1 request 안 entries 수 한도 (default 32).
# multi-query cap 60 chunks (×2 chunks_per_doc dedup) 가 batch entries 54~60
# → 32 한도 초과 → 413. 64 로 늘림.
# GPU VRAM free 6199MiB 충분. baseline path (MAX_RERANK_INPUT=200) 영향 0.
- MAX_BATCH_TOKENS=16384
- MAX_CLIENT_BATCH_SIZE=64
- MAX_CONCURRENT_REQUESTS=4
volumes:
- reranker_cache:/data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 120s
restart: unless-stopped
ai-gateway:
build: ./gpu-server/services/ai-gateway
ports:
- "127.0.0.1:8081:8080"
environment:
- PRIMARY_ENDPOINT=http://100.76.254.116:8801/v1/chat/completions
- FALLBACK_ENDPOINT=http://ollama:11434/v1/chat/completions
- CLAUDE_API_KEY=${CLAUDE_API_KEY:-}
- DAILY_BUDGET_USD=${DAILY_BUDGET_USD:-5.00}
depends_on:
- ollama
restart: unless-stopped
fastapi:
build: ./app
ports:
- "100.110.63.63:8000:8000"
volumes:
- ${NAS_NFS_PATH:-/mnt/nas/Document_Server}:/documents
- ./config.yaml:/app/config.yaml:ro
- ./domain_policy.yaml:/app/domain_policy.yaml:ro
- ./scripts:/app/scripts:ro
- ./logs:/app/logs
- ./migrations:/app/migrations:ro
depends_on:
postgres:
condition: service_healthy
kordoc-service:
condition: service_healthy
marker-service:
condition: service_healthy
env_file:
- credentials.env
environment:
- DATABASE_URL=postgresql+asyncpg://pkm:${POSTGRES_PASSWORD}@postgres:5432/pkm
- KORDOC_ENDPOINT=http://kordoc-service:3100
- OCR_ENDPOINT=http://ocr-service:3200
- MARKER_ENDPOINT=http://marker-service:3300
- MARKER_CONTAINER_PATH_PREFIX=/documents
# 2026-05-08 (D9 Track B revised): GPU stt-service 정식 승격, 내부 DNS 사용.
- STT_ENDPOINT=http://stt-service:3300
# KGS Code 등 외부 학습 자료 추가 스캔 경로 (host .env 에서 주입). 빈 값이면 비활성.
- ADDITIONAL_WATCH_TARGETS=${ADDITIONAL_WATCH_TARGETS:-}
# PR-MacMini-Derived-Worker-1
- STUDY_EXPLANATION_ENABLED=${STUDY_EXPLANATION_ENABLED:-true}
- INTERNAL_WORKER_TOKEN=${INTERNAL_WORKER_TOKEN}
# Voice Memo PoC v1 — bot 계정 한정 long-expiry access token. default false → 일반 운영 영향 0.
# 활성화: host .env 에 VOICE_MEMO_BOT_TOKEN_ENABLED=true. plan: rosy-launching-otter.md
- VOICE_MEMO_BOT_TOKEN_ENABLED=${VOICE_MEMO_BOT_TOKEN_ENABLED:-false}
- VOICE_MEMO_BOT_USERNAME=${VOICE_MEMO_BOT_USERNAME:-voice-memo-bot}
- VOICE_MEMO_BOT_TOKEN_EXPIRE_DAYS=${VOICE_MEMO_BOT_TOKEN_EXPIRE_DAYS:-365}
# PR-Notebook-Client-1 — notebook-client-bot long-expiry token (laptop-worker-bot wrapper 재활용,
# 1B Pilot dormant 후 username rename). default false → 운영 영향 0.
- LAPTOP_WORKER_BOT_TOKEN_ENABLED=${LAPTOP_WORKER_BOT_TOKEN_ENABLED:-false}
- LAPTOP_WORKER_BOT_USERNAME=${LAPTOP_WORKER_BOT_USERNAME:-laptop-worker-bot}
- LAPTOP_WORKER_BOT_TOKEN_EXPIRE_DAYS=${LAPTOP_WORKER_BOT_TOKEN_EXPIRE_DAYS:-365}
# PR-2 of DS AI routing policy (2026-05-23) — backends dispatcher via llm-router.
# router_url default points at Mac mini Tailscale interface :8890 (PR-1).
# DS_BACKENDS_VIA_ROUTER=false 로 legacy 직접 호출 path 즉시 복귀.
- LLM_ROUTER_URL=${LLM_ROUTER_URL:-http://100.76.254.116:8890}
- DS_BACKENDS_VIA_ROUTER=${DS_BACKENDS_VIA_ROUTER:-true}
restart: unless-stopped
frontend:
build: ./frontend
ports:
- "127.0.0.1:3000:3000"
depends_on:
- fastapi
restart: unless-stopped
caddy:
image: caddy:2
ports:
- "8080:80"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- caddy_data:/data
depends_on:
- fastapi
- frontend
restart: unless-stopped
volumes:
pgdata:
caddy_data:
ollama_data:
reranker_cache:
ocr_models:
stt_models:
marker_models: