feat(ai): AIModelConfig 에 mlx 샘플링 필드(repetition_penalty/top_k) + _request 주입

코드리뷰 AIClient 정비 PR-A. Qwen3 한국어 장문에서 코드스위칭(CJK/라틴 누수)·반복루프를 억제할 손잡이가 config/코드에 부재했음(temperature/top_p만 존재). None 기본값이라 동작 무변경 — 활성화는 config.yaml 에 값 설정 시(별도). OpenAI 호환(mlx) 분기만 적용. PR-B(gate 구조강제·공유 httpx·public call_classifier/verifier)는 후속. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 19:17:55 +09:00
parent 5b5353c751
commit fb82a69c02
2 changed files with 9 additions and 0 deletions
@@ -346,6 +346,10 @@ class AIClient:
                payload["temperature"] = model_config.temperature
            if model_config.top_p is not None:
                payload["top_p"] = model_config.top_p
+            if model_config.repetition_penalty is not None:
+                payload["repetition_penalty"] = model_config.repetition_penalty
+            if model_config.top_k is not None:
+                payload["top_k"] = model_config.top_k
            response = await self._http.post(
                model_config.endpoint,
                json=payload,
@@ -30,6 +30,11 @@ class AIModelConfig(BaseModel):
    # None = MLX/OpenAI server default. Anthropic branch 는 미적용 (별 plan 범위).
    temperature: float | None = None
    top_p: float | None = None
+    # mlx 네이티브 샘플링 — 한국어 장문 코드스위칭(CJK/라틴 누수)·반복루프 억제용.
+    # Qwen3 권장: top_k=20, repetition_penalty 1.05~1.1. None = 서버 기본값(주입 안 함).
+    # OpenAI 호환 분기(mlx)만 적용 — Anthropic 분기는 미적용(별 범위).
+    repetition_penalty: float | None = None
+    top_k: int | None = None


 class DeepSummaryBacklogConfig(BaseModel):