Files
gpu-services/hub-api/services/gpu_monitor.py
Hyungi Ahn 3794afff95 feat: AI Gateway Phase 1 - FastAPI 코어 구현
GPU 서버 중앙 AI 라우팅 서비스 초기 구현:
- OpenAI 호환 API (/v1/chat/completions, /v1/models, /v1/embeddings)
- 모델 레지스트리 + 백엔드 헬스체크 (30초 루프)
- Ollama SSE 프록시 (NDJSON → OpenAI SSE 변환)
- JWT 인증 이중 경로 (httpOnly 쿠키 + Bearer 토큰)
- owner/guest 역할 분리, 로그인 rate limiting
- 백엔드별 rate limiting (NanoClaude 대비)
- SQLite 스키마 사전 정의 (aiosqlite + WAL)
- Docker Compose + Caddy 리버스 프록시

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:41:46 +09:00

42 lines
1.2 KiB
Python

from __future__ import annotations
import asyncio
import logging
from config import settings
logger = logging.getLogger(__name__)
async def get_gpu_info() -> dict | None:
"""Run nvidia-smi and parse GPU info."""
try:
proc = await asyncio.create_subprocess_exec(
settings.nvidia_smi_path,
"--query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total,power.draw,name",
"--format=csv,noheader,nounits",
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=5.0)
if proc.returncode != 0:
logger.debug("nvidia-smi failed: %s", stderr.decode())
return None
line = stdout.decode().strip().split("\n")[0]
parts = [p.strip() for p in line.split(",")]
if len(parts) < 6:
return None
return {
"utilization": int(parts[0]),
"temperature": int(parts[1]),
"vram_used": int(parts[2]),
"vram_total": int(parts[3]),
"power_draw": float(parts[4]),
"name": parts[5],
}
except (FileNotFoundError, asyncio.TimeoutError):
return None