feat(ui): DS 웹 UI 세이지 재설계 5페이지 일괄 배포 (대시보드·문서·뉴스·digest·메모)

goal 1~4 + digest. 적대 리뷰(15-agent) 확정 결함 반영본. frontend-only. 원본 커밋 = gitea feat/ui-sage-all a1a46f2. eid 마이그레이션 301~305 워킹트리 격리(미티게이션) 미접촉. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(ui): 홈 대시보드 데일리 홈 cockpit 재설계 (안1 골격+안2 위젯+안3 분포)
2026-06-07 11:12:17 +00:00 · 2026-06-07 07:57:59 +00:00 · 2026-06-07 07:22:36 +00:00 · 2026-06-07 07:18:13 +00:00 · 2026-06-07 07:07:02 +00:00 · 2026-06-07 06:43:10 +00:00
345 changed files with 2652 additions and 31351 deletions
@@ -47,6 +47,3 @@ caddy_data/
 *.bak_*
 *.pre-*
 .pre-*/
-
-# SQLite 로컬 아티팩트 (Django/툴링 잔재)
-*.sqlite3
@@ -9,30 +9,7 @@
 }

 http://document.hyungi.net {
-    # 명시 Content-Type match — 기본 match 의 text/* 는 text/event-stream 까지 포함해
-    # SSE(/api/eid/chat)의 첫 ~512B 를 gzip 버퍼링함. SSE 제외, 기존 압축 대상은 보존.
-    # (응답 매처는 header <필드> <값> 한 쌍씩 — 여러 줄 = OR. 한 줄 다중 값은 파싱 에러)
-    # 2026-06-20 보안 헤더 (M: 클릭재킹·MIME 스니핑 방어). HSTS 는 TLS 종단 edge(home-caddy) 소관.
-    header {
-        X-Content-Type-Options nosniff
-        X-Frame-Options SAMEORIGIN
-        Referrer-Policy strict-origin-when-cross-origin
-        -Server
-    }
-    encode {
-        gzip
-        match {
-            header Content-Type text/html*
-            header Content-Type text/css*
-            header Content-Type text/plain*
-            header Content-Type text/xml*
-            header Content-Type text/javascript*
-            header Content-Type application/json*
-            header Content-Type application/javascript*
-            header Content-Type application/xml*
-            header Content-Type image/svg+xml*
-        }
-    }
+    encode gzip

    # API + 문서 → FastAPI
    handle /api/* {
@@ -1,6 +1,5 @@
 """AI 추상화 레이어 — 통합 클라이언트. 기본값은 항상 Qwen3.5."""

-import asyncio
 import json
 import re
 from pathlib import Path
@@ -135,49 +134,6 @@ def _fix_json_string_escapes(s: str) -> str:
        i += 1
    return "".join(out)

-def is_deferrable_error(exc: Exception) -> bool:
-    """deep(맥북 M5 Max) 호출 실패가 '보류(StageDeferred)' 대상인지 분류 (ds-macbook-offload-1).
-
-    보류 = 맥북 일시 불가 신호:
-      - HTTP 503 (라우터 upstream_cold / editor_busy / warming — no-silent-fallback 계약)
-      - HTTP 502/504 (라우터가 upstream 연결 실패·생성 도중 절단을 502 로 변환 —
-        llm_router.py 실측 4곳. 맥북 sleep 절단이 라우터 경유 토폴로지에선 이걸로 표면화)
-      - httpx.TransportError 전계열 (ConnectError·ReadError·RemoteProtocolError +
-        ConnectTimeout·ReadTimeout 등) — 라우터 자체 불가 / DS↔라우터 구간 절단.
-    그 외(400/500, 파싱/검증 오류 등)는 보류가 아니라 호출자의 기존 실패 경로.
-    """
-    if isinstance(exc, httpx.HTTPStatusError):
-        return exc.response.status_code in (502, 503, 504)
-    return isinstance(exc, httpx.TransportError)
-
-
-async def call_deep_or_defer(
-    client: "AIClient",
-    prompt: str,
-    system: str | None = None,
-    cfg: "AIModelConfig | None" = None,
-) -> str:
-    """call_deep + 보류 변환 — 맥북 불가(503/연결/절단)는 StageDeferred 로 raise.
-
-    deep_summary_worker / summarize_worker(drain) / classify_worker(drain) 가 공유.
-    StageDeferred 는 queue_consumer/queue_drain 이 attempts 미소모 + deferred_until
-    백오프로 처리한다 (sleep-안전 불변식).
-
-    cfg: 지정 시 deep 슬롯 대신 이 config 로 호출 (classify drain — deep 슬롯의
-         endpoint 는 쓰되 triage 의 temperature/max_tokens 를 적용한 변형).
-    """
-    from models.queue import StageDeferred
-
-    try:
-        if cfg is not None:
-            return await client._request(cfg, prompt, system=system)
-        return await client.call_deep(prompt, system=system)
-    except Exception as exc:
-        if is_deferrable_error(exc):
-            raise StageDeferred(f"macbook_unavailable:{type(exc).__name__}") from exc
-        raise
-
-
 # 프롬프트 로딩
 PROMPTS_DIR = Path(__file__).parent.parent / "prompts"

@@ -189,25 +145,6 @@ def _load_prompt(name: str) -> str:
 CLASSIFY_PROMPT = _load_prompt("classify.txt") if (PROMPTS_DIR / "classify.txt").exists() else ""


-# 공유 httpx 클라이언트 — 호출마다 AsyncClient 를 새로 만들던 것(30+ 사이트, 연결풀 재사용 0)을
-# 일원화해 keep-alive 재사용. 이벤트루프 바인딩이라 루프 변경(pytest 격리 등) 시 재생성한다.
-# close() 는 공유 풀이라 no-op — 프로세스 종료 시 GC.
-_shared_http: httpx.AsyncClient | None = None
-_shared_http_loop: object | None = None
-
-
-def _get_shared_http() -> httpx.AsyncClient:
-    global _shared_http, _shared_http_loop
-    try:
-        loop: object | None = asyncio.get_running_loop()
-    except RuntimeError:
-        loop = None
-    if _shared_http is None or _shared_http.is_closed or _shared_http_loop is not loop:
-        _shared_http = httpx.AsyncClient(timeout=120)
-        _shared_http_loop = loop
-    return _shared_http
-
-
 class AIClient:
    """AI 모델 통합 클라이언트.

@@ -222,7 +159,7 @@ class AIClient:

    def __init__(self):
        self.ai = settings.ai
-        self._http = _get_shared_http()
+        self._http = httpx.AsyncClient(timeout=120)

    # ─── 3-tier routing (B-0) ───────────────────────────────────────────────

@@ -248,60 +185,28 @@ class AIClient:
        """triage/primary 실패 시 최후 방어선. Claude Sonnet 4 API (config.yaml ai.models.fallback) — PR #20 이후 swap 완료."""
        return await self._request(self.ai.fallback, prompt)

-    async def call_deep(self, prompt: str, system: str | None = None) -> str:
-        """심층 전용 — 맥북 M5 Max Qwen3.6-27B (config.yaml ai.models.deep, ds-macbook-offload-1).
-
-        llm-router :8890 경유(model=qwen-macbook alias) — 라우터의 wake preflight(~24s)·
-        editor_busy 가드를 재사용한다. 맥미니 mlx gate 와 무관(게이트는 맥미니 보호 목적)이라
-        gate 없이 호출. 자동 cloud/맥미니 폴백 없음 — 실패는 그대로 전파하고 보류 판단은
-        호출자가 is_deferrable_error() 로 한다. 슬롯 부재 시 primary 로 처리(방어적 —
-        호출자가 보통 슬롯 유무를 먼저 분기).
-        """
-        cfg = self.ai.deep or self.ai.primary
-        return await self._request(cfg, prompt, system=system)
-
-    async def call_classifier(self, prompt: str) -> str:
-        """answerability classifier (config ai.classifier, Mac mini 26B MLX).
-
-        private _request 직접 호출(classifier_service)을 봉인하는 public 진입점. gate 는
-        caller(classifier_service)가 acquire_mlx_gate 로 관리 — call_primary 와 동일한
-        caller-managed 계약(여기서 self-gate 하면 caller 와 double-acquire 데드락).
-        """
-        return await self._request(self.ai.classifier, prompt)
-
-    async def call_verifier(self, prompt: str) -> str:
-        """semantic verifier (config ai.verifier, Mac mini 26B MLX).
-
-        private _request 직접 호출(verifier_service)을 봉인. gate 는 caller(verifier_service)
-        가 관리(caller-managed — self-gate 금지).
-        """
-        return await self._request(self.ai.verifier, prompt)
-
    # ─── Legacy API (classify_worker 교체 시 제거 예정) ───────────────────

-    async def classify(self, text: str, cfg=None) -> dict:
+    async def classify(self, text: str) -> dict:
        """[DEPRECATED] 기존 classify_worker 전용. B-1 에서 summary_triage 로 대체.

        호출부 정리 전 존속. 신규 코드는 call_triage + prompt_render 를 쓸 것.
-        cfg (2026-06-12 fair-share): 지정 시 primary 대신 해당 config 로 호출 —
-        drain classify 가 deep 슬롯(맥북) 경유에 사용. cfg != ai.primary 라
-        _call_chat 의 primary→fallback 자동 전환은 발동하지 않는다 (에러 raw 전파).
        """
        prompt = CLASSIFY_PROMPT.replace("{document_text}", text)
-        response = await self._call_chat(cfg or self.ai.primary, prompt)
+        response = await self._call_chat(self.ai.primary, prompt)
        return response

-    async def summarize(self, text: str, force_premium: bool = False, cfg=None) -> str:
-        """[DEPRECATED] 기존 호출부용. B-1 에서 summary_triage 가 tldr 대체. cfg = classify() 와 동일."""
+    async def summarize(self, text: str, force_premium: bool = False) -> str:
+        """[DEPRECATED] 기존 호출부용. B-1 에서 summary_triage 가 tldr 대체."""
        if force_premium:
            return await self._call_chat(self.ai.premium, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")
-        return await self._call_chat(cfg or self.ai.primary, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")
+        return await self._call_chat(self.ai.primary, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")

    async def embed(self, text: str) -> list[float]:
        """벡터 임베딩 — GPU 서버 전용"""
        response = await self._http.post(
            self.ai.embedding.endpoint,
-            json={"model": self.ai.embedding.model, "prompt": text, "keep_alive": -1},  # bge-m3 GPU 상주(홈랩 sparse 검색 cold reload ~6s 방지)
+            json={"model": self.ai.embedding.model, "prompt": text},
        )
        response.raise_for_status()
        return response.json()["embedding"]
@@ -326,16 +231,13 @@ class AIClient:
        return response.json()

    async def _call_chat(self, model_config, prompt: str) -> str:
-        """OpenAI 호환 API 호출 (R6: 무동의 클라우드 폴백 제거).
-
-        이전엔 primary(맥미니) TimeoutException/ConnectError 시 동의·과금 통제 없이
-        self.ai.fallback(Claude API)로 자동 전환 → 개인 문서/쿼리/메모가 Anthropic 으로
-        silent egress. on-prem 추론 프라이버시 계약 위반이라 봉쇄한다. 실패는 그대로 전파:
-        배치 워커는 재시도/StageDeferred(R3·queue_consumer), interactive 호출자는 5xx 표면화
-        (documents.analyze 등 이미 502/504 변환). 클라우드는 premium explicit-trigger
-        (summarize force_premium) 또는 call_fallback 명시 호출로만 — 자동 진입 금지.
-        """
-        return await self._request(model_config, prompt)
+        """OpenAI 호환 API 호출 + 자동 폴백"""
+        try:
+            return await self._request(model_config, prompt)
+        except (httpx.TimeoutException, httpx.ConnectError):
+            if model_config == self.ai.primary:
+                return await self._request(self.ai.fallback, prompt)
+            raise

    async def _request(self, model_config, prompt: str, system: str | None = None) -> str:
        """단일 모델 API 호출 (OpenAI 호환 + Anthropic Messages API).
@@ -383,10 +285,6 @@ class AIClient:
                payload["temperature"] = model_config.temperature
            if model_config.top_p is not None:
                payload["top_p"] = model_config.top_p
-            if model_config.repetition_penalty is not None:
-                payload["repetition_penalty"] = model_config.repetition_penalty
-            if model_config.top_k is not None:
-                payload["top_k"] = model_config.top_k
            response = await self._http.post(
                model_config.endpoint,
                json=payload,
@@ -397,5 +295,4 @@ class AIClient:
            return data["choices"][0]["message"]["content"]

    async def close(self):
-        # 공유 풀(_get_shared_http) 이라 per-use close 안 함 — 연결 재사용. 프로세스 종료 시 GC.
-        return None
+        await self._http.aclose()
@@ -195,14 +195,8 @@ async def regenerate(
    date 미지정 시 오늘 KST. 같은 날 row 존재 시 transaction 안에서 삭제 후 신규 생성.
    응답 status='success' | 'partial' | 'failed' | 'empty'.
    """
-    from core.config import settings
    from workers.briefing_worker import run

-    # held(정책상 정상 보류)를 409 로 표면화 (R8) — digest.py 정본 대칭. 이전엔 briefing_worker.run()
-    # 이 held/timeout/exception 셋 다 None 반환 → API 가 셋 다 500 으로 오보(silent-state-conflation).
-    if "briefing" in settings.pipeline_held_stages:
-        raise HTTPException(status_code=409, detail="briefing 단계가 일시 보류(held) 상태입니다")
-
    result = await run(target_date=date)
    if result is None:
        raise HTTPException(status_code=500, detail="briefing 워커 실행 실패 (로그 확인)")
@@ -244,15 +244,7 @@ async def regenerate(
    user: Annotated[User, Depends(require_admin)],
 ):
    """수동 트리거 — 백그라운드 태스크로 워커 실행 (admin 필요)."""
-    from core.config import settings
    from workers.digest_worker import run

-    # 홀드 중 silent no-op 방지 — 워커 게이트와 동일 조건을 표면에서 명시.
-    if "digest" in settings.pipeline_held_stages:
-        raise HTTPException(
-            status_code=409,
-            detail="global_digest 보류 중 (config.yaml pipeline.held_stages) — 해제 후 재시도",
-        )
-
    asyncio.create_task(run())
    return {"status": "started", "message": "global_digest 워커 백그라운드 실행 시작"}
@@ -21,8 +21,8 @@ from fastapi import (
    UploadFile,
    status,
 )
-from fastapi.responses import FileResponse, StreamingResponse
-from pydantic import BaseModel, field_validator
+from fastapi.responses import FileResponse
+from pydantic import BaseModel
 from sqlalchemy import func, select
 from sqlalchemy.ext.asyncio import AsyncSession
 from starlette.requests import ClientDisconnect
@@ -30,19 +30,12 @@ from starlette.requests import ClientDisconnect
 from ai.client import AIClient, _load_prompt, parse_json_response
 from core.auth import get_current_user
 from core.config import settings
-from core.database import async_session, get_session
+from core.database import get_session
 from core.utils import file_hash
 from models.document import Document
 from models.document_image import DocumentImage
 from models.queue import ProcessingQueue, enqueue_stage
 from models.user import User
-from services.dedup import (
-    DUPLICATE_GROUPS_SQL,
-    DEDUP_OFF_CHANNELS,
-    find_canonical_for_hash,
-    find_near_duplicates,
-)
-from services.storage import StorageNotConfigured, get_storage_backend
 from services.document_telemetry import record_analyze_event, sanitize_source
 from services.prompt_versions import ANALYZE_PROMPT_VERSION, resolve_primary_model
 from services.search.llm_gate import Priority, acquire_mlx_gate
@@ -69,66 +62,6 @@ def _upload_error(status_code: int, error_code: str, message: str) -> HTTPExcept
    )


-async def get_live_document(session: AsyncSession, doc_id: int) -> Document:
-    """soft-delete(deleted_at) 가드 포함 문서 조회 — 없거나 삭제됐으면 404 (R7).
-
-    조회/수정 경로는 deleted_at 을 일관 가드하나 파일/콘텐츠 서빙 엔드포인트가 누락 →
-    삭제 문서의 원본/preview/전문이 doc_id(+유효 토큰)만으로 노출되던 비대칭. '경로마다
-    deleted_at 기억'에 의존하지 않게 헬퍼로 구조 강제(추가될 서빙 경로도 자동 보호).
-    """
-    doc = await session.get(Document, doc_id)
-    if not doc or doc.deleted_at is not None:
-        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
-    return doc
-
-
-async def _near_dup_scan_bg(doc_id: int) -> None:
-    """B-3: post-upload near_duplicate 스캔 (BackgroundTask). 자체 세션, best-effort.
-
-    업로드 직후엔 doc.embedding 이 아직 없을 수 있어(embed stage 미완) trigram 후보만
-    기록되는 경우가 많다 — non-gating. 어떤 예외도 업로드 결과(201)에 영향 주지 않는다.
-    영속화는 보류(on-the-fly) — 현재는 로깅까지. /duplicates 의 near-dup 노출은 phase2.
-    """
-    try:
-        async with async_session() as bg_session:
-            findings = await find_near_duplicates(bg_session, doc_id)
-        if findings:
-            top = findings[0]
-            logger.info(
-                "[dedup] near_dup_scan doc=%s candidates=%d top=%s(cosine=%s)",
-                doc_id, len(findings), top["doc_id"], top.get("cosine"),
-            )
-    except Exception:
-        logger.warning("[dedup] near_dup_scan failed doc=%s", doc_id, exc_info=True)
-
-
-def _parse_byte_range(range_header: str | None, size: int) -> tuple[int | None, int | None]:
-    """HTTP Range 헤더(`bytes=start-end`) 파싱 → (start, end) inclusive. 없거나 무효면 (None, None).
-
-    D-2 원격 백엔드 Range pass-through 용 (local 은 FileResponse 가 자동 처리). suffix 형식
-    (`bytes=-N`) 도 지원. 다중 range 는 첫 구간만.
-    """
-    if not range_header or not range_header.startswith("bytes=") or size <= 0:
-        return None, None
-    spec = range_header[len("bytes="):].split(",")[0].strip()
-    if "-" not in spec:
-        return None, None
-    lo, hi = spec.split("-", 1)
-    try:
-        if lo == "":                       # suffix range: 마지막 N 바이트
-            n = int(hi)
-            if n <= 0:
-                return None, None
-            return max(0, size - n), size - 1
-        start = int(lo)
-        end = int(hi) if hi else size - 1
-    except ValueError:
-        return None, None
-    if start > end or start >= size:
-        return None, None
-    return start, min(end, size - 1)
-
-
 # ─── 스키마 ───


@@ -180,10 +113,6 @@ class DocumentResponse(BaseModel):
    # 회독 추적 (자료실 등) — 현재 사용자 기준. 다른 endpoint 응답에선 0/None.
    read_count: int = 0
    last_read_at: datetime | None = None
-    # S1-ADD (migration 287): 원본 파일명 + 중복검사. 앱은 옵셔널 디코딩, 없으면 폴백.
-    original_filename: str | None = None   # 다운로드 라벨용. 없으면 file_path basename 폴백(앱 측).
-    duplicate_of: int | None = None        # canonical doc id (자기 자신이 canonical 이면 None).
-    duplicate_count: int = 0               # 본인 제외 동일 판정 사본 수 (canonical 행 기준).

    class Config:
        from_attributes = True
@@ -211,26 +140,10 @@ class DocumentDetailResponse(DocumentResponse):
    md_extraction_engine_version: str | None = None
    md_generated_at: datetime | None = None

-    @field_validator("md_status", mode="before")
-    @classmethod
-    def _db_success_to_completed(cls, v: str | None) -> str | None:
-        """DB CHECK enum 은 'success'; 계약/fixture·앱 MD-first 렌더 트리거는 'completed'.
-        read-time(DB→API) 단방향 매핑만 — write 경로(ORM)는 이 모델을 거치지 않아 미적용.
-        pending/processing/partial/failed/skipped 는 양쪽 동일하므로 'success' 만 매핑한다.
-        (불변식: md_status ∈ {success,partial} ⟹ md_content 非공백 = 워커 postcondition, C-5.)
-        """
-        return "completed" if v == "success" else v
-

 class AcceptSuggestionRequest(BaseModel):
-    """§1 accept-suggestion 요청 body — stale payload / doc 수정 검출.
-
-    jurisdiction: 안전 자료실 A-2 — material_type 제안 승인 시 사용자가 지정하는 관할.
-    law 승인은 필수 (기본값 없음 — KR 자동 부여 시 외국 자료가 KR 법령으로 오염되는
-    경로를 차단, plan A-2 계약).
-    """
+    """§1 accept-suggestion 요청 body — stale payload / doc 수정 검출."""
    expected_source_updated_at: datetime
-    jurisdiction: str | None = None


 class DocumentUpdate(BaseModel):
@@ -279,11 +192,6 @@ async def get_document_tree(
            FROM documents
            WHERE ai_domain IS NOT NULL AND ai_domain != '' AND ai_domain != 'News'
              AND deleted_at IS NULL
-              -- 문서함(list) 기본 제외와 동일하게 맞춤: 뉴스/법령 채널·메모는 문서함에 안 뜨므로
-              -- 트리 카운트도 제외해야 "트리 N건인데 클릭하면 0건" 불일치가 안 생긴다.
-              AND source_channel != 'news'
-              AND source_channel != 'law_monitor'
-              AND file_type != 'note'
            GROUP BY ai_domain
            ORDER BY ai_domain
        """)
@@ -556,8 +464,6 @@ async def list_documents(
    category: str | None = Query(None, description="doc_category enum — 지정 시 기본 news/memo 제외 해제"),
    has_suggestion: bool | None = Query(None, description="true: ai_suggestion IS NOT NULL"),
    proposed_category: str | None = Query(None, description="ai_suggestion.proposed_category 필터"),
-    material_type: str | None = Query(None, description="안전 자료실 C-1: 자료유형. 지정 시 기본 exclude 해제"),
-    jurisdiction: str | None = Query(None, description="안전 자료실 C-1: 관할 (KR/US/...)"),
 ):
    """문서 목록 조회 (페이지네이션 + 필터).

@@ -571,10 +477,6 @@ async def list_documents(
    if category:
        # 명시적 카테고리 필터 — 기본 exclude 해제
        query = query.where(Document.category == category)
-    elif material_type:
-        # 안전 자료실 C-1: material_type 지정 = 기본 exclude(news·law_monitor·note) 해제.
-        # 안전 코퍼스 본체(KOSHA 사례·CSB·법령 등)가 전부 note/crawl 채널이라 exclude 면 빈 화면.
-        query = query.where(Document.material_type == material_type)
    else:
        # 기본 목록: 뉴스/메모/법령 제외 (문서함 용도)
        query = query.where(
@@ -583,9 +485,6 @@ async def list_documents(
            Document.file_type != "note",
        )

-    if jurisdiction:
-        query = query.where(Document.jurisdiction == jurisdiction)
-
    if has_suggestion is True:
        query = query.where(Document.ai_suggestion.isnot(None))
    elif has_suggestion is False:
@@ -625,118 +524,6 @@ async def list_documents(
    )


-# ─── 중복검사 (dedup) — B-2 ───
-# ★ 고정 path 라우트(/duplicates)는 동적 /{doc_id} 라우트보다 *위*에 등록해야 매칭 충돌이 없다.
-class DuplicateGroup(BaseModel):
-    canonical_id: int
-    members: list[int]
-    reason: str
-    detail: str | None = None
-
-
-class DuplicatesResponse(BaseModel):
-    groups: list[DuplicateGroup]
-    total_groups: int
-    total_duplicate_docs: int
-
-
-@router.get("/duplicates", response_model=DuplicatesResponse)
-async def list_duplicates(
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """content_hash(= file_hash exact) 중복 그룹 목록.
-
-    OFF-whitelist(law_monitor) 제외 + deleted 제외. idx_documents_hash 재사용(신규 인덱스/테이블 불요).
-    near_duplicate(유사도 기반) 그룹은 영속화 보류 → S1 은 exact 그룹만 노출(계약 shape 동일,
-    detail 문구만 'file_hash' 기준). 응답 shape = ds-app contract `documents_duplicates.json`.
-    """
-    rows = (
-        await session.execute(DUPLICATE_GROUPS_SQL, {"off_channels": list(DEDUP_OFF_CHANNELS)})
-    ).all()
-
-    groups = [
-        DuplicateGroup(
-            canonical_id=r.canonical_id,
-            members=list(r.members),
-            reason="content_hash",
-            detail="동일 file_hash (원본 바이트 SHA-256 일치)",
-        )
-        for r in rows
-    ]
-    return DuplicatesResponse(
-        groups=groups,
-        total_groups=len(groups),
-        # 사본 수 = 그룹별 (멤버수-1) 합 (canonical 제외) — fixture total_duplicate_docs 정의와 동일.
-        total_duplicate_docs=sum(len(g.members) - 1 for g in groups),
-    )
-
-
-class ClauseHit(BaseModel):
-    doc_id: int
-    doc_title: str
-    section_title: str | None = None
-    char_start: int | None = None
-    chunk_id: int
-    node_type: str | None = None
-
-
-class ClauseLookupResponse(BaseModel):
-    label: str
-    hits: list[ClauseHit]
-
-
-# NOTE: '/{doc_id}' (int path param) 라우트보다 먼저 선언해야 '/clause-lookup' 이 doc_id 로
-# 잘못 매칭되지 않는다 (FastAPI 선언 순서 매칭). 이동 금지.
-@router.get("/clause-lookup", response_model=ClauseLookupResponse)
-async def clause_lookup(
-    label: str,
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """절 식별자(예: UG-79)로 크로스-doc 절 위치 조회 — 'UG-79 보여줘' 진입점 (U-1).
-
-    절(node_type=clause/clause_split)은 in_corpus=false(검색 비활성)라 의미검색으론 못 찾으므로,
-    라벨 prefix 정확매칭으로 (doc, char_start) 를 직접 해소해 읽기뷰 점프를 가능케 한다.
-    대부분 1건; 부록(A-/E-/F-) 등 doc 간 공유 라벨만 다중 반환(에디션 선택). /sections 와 동일하게
-    document_chunks 직접 조회 — corpus_chunks 우회는 retrieval 아닌 정확지목이므로 의도적 예외.
-    """
-    from sqlalchemy import text as sql_text
-
-    lab = (label or "").strip()
-    if not lab:
-        return ClauseLookupResponse(label=label, hits=[])
-    rows = (
-        await session.execute(
-            sql_text(
-                """
-                SELECT c.doc_id, d.title AS doc_title, c.section_title, c.char_start, c.node_type,
-                       -- 점프 타깃 = outline(/sections: is_leaf 또는 %_split)에 있는 chunk 여야 딥링크 동작.
-                       -- 자신이 그러면 자신, 아니면(컨테이너 절: 자식 heading 보유·is_leaf=false) 문서순서상
-                       -- 자신 이후 첫 딥링크 가능 chunk(=그 절 내용 시작)로 해소. 그래도 없으면 자신(폴백).
-                       COALESCE(
-                         CASE WHEN c.is_leaf = true OR c.node_type LIKE '%\\_split' ESCAPE '\\' THEN c.id END,
-                         (SELECT ch.id FROM document_chunks ch
-                          WHERE ch.doc_id = c.doc_id AND ch.source_type = 'hier_section'
-                            AND ch.chunk_index >= c.chunk_index
-                            AND (ch.is_leaf = true OR ch.node_type LIKE '%\\_split' ESCAPE '\\')
-                          ORDER BY ch.chunk_index LIMIT 1),
-                         c.id
-                       ) AS chunk_id
-                FROM document_chunks c
-                JOIN documents d ON d.id = c.doc_id
-                WHERE c.node_type IN ('clause', 'clause_split')
-                  AND (c.section_title ILIKE :lab_sp OR c.section_title ILIKE :lab_eq)
-                  AND d.deleted_at IS NULL
-                ORDER BY c.doc_id, c.char_start NULLS LAST
-                LIMIT 50
-                """
-            ).bindparams(lab_sp=lab + " %", lab_eq=lab)
-        )
-    ).mappings().all()
-    return ClauseLookupResponse(label=lab, hits=[ClauseHit(**dict(r)) for r in rows])
-
-
@router.get("/{doc_id}", response_model=DocumentDetailResponse)
 async def get_document(
    doc_id: int,
@@ -756,14 +543,8 @@ class SectionItem(BaseModel):
    section_title: str | None = None  # raw 마크다운 포함 — 정제는 프런트(headingPath.ts)
    heading_path: str | None = None   # raw
    level: int | None = None
-    node_type: str | None = None      # window | chapter_split | clause_split | section_split | null
+    node_type: str | None = None      # window | section_split | null
    is_leaf: bool
-    parent_id: int | None = None      # 트리 부모 chunk_id. window child 의 parent_id = 그 split-parent.
-                                      # 프런트 collapseWindows 가 비인접 window 를 split-parent 에 흡수할 때 사용.
-    char_start: int | None = None     # md_content 내 heading offset(UTF-16). jump-target 만 값, 그 외 None (Path B)
-    text: str | None = None           # 절 본문 = 청크 원문. 대형 split 문서는 md_content 가 앞 5만 자만 보존
-                                      # (marker LARGE_DOC_MD_CONTENT_HEAD_CHARS)이고 char_start 도 NULL 이라
-                                      # md_content 슬라이스로는 본문이 비므로, 청크 text 를 직접 렌더한다.
    section_type: str | None = None
    summary: str | None = None        # status='summarized' 인 분석행에만, 그 외 None
    confidence: float | None = None
@@ -802,12 +583,12 @@ async def get_document_sections(
        await session.execute(
            sql_text(
                """
-                SELECT chunk_id, section_title, heading_path, level, node_type, is_leaf, parent_id, char_start,
-                       text, section_type, summary, confidence
+                SELECT chunk_id, section_title, heading_path, level, node_type, is_leaf,
+                       section_type, summary, confidence
                FROM (
                  SELECT DISTINCT ON (c.id)
                         c.id AS chunk_id, c.chunk_index, c.section_title, c.heading_path,
-                         c.level, c.node_type, c.is_leaf, c.parent_id, c.char_start, c.text,
+                         c.level, c.node_type, c.is_leaf,
                         a.section_type,
                         CASE WHEN a.status = 'summarized' THEN a.summary ELSE NULL END AS summary,
                         a.confidence
@@ -816,7 +597,7 @@ async def get_document_sections(
                         ON a.chunk_id = c.id AND a.status = 'summarized'
                  WHERE c.doc_id = :doc_id
                    AND c.source_type = 'hier_section'
-                    AND (c.is_leaf = true OR c.node_type LIKE '%\\_split' ESCAPE '\\')
+                    AND c.is_leaf = true
                  ORDER BY c.id, a.created_at DESC, a.id DESC
                ) t
                ORDER BY t.chunk_index
@@ -901,7 +682,6 @@ async def get_document_file(
    session: Annotated[AsyncSession, Depends(get_session)],
    token: str | None = Query(None, description="Bearer token (iframe용)"),
    download: bool = Query(False, description="true면 attachment (브라우저 다운로드)"),
-    range_header: str | None = Header(None, alias="Range"),
    user: User | None = Depends(lambda: None),
 ):
    """문서 원본 파일 서빙 (Bearer 헤더 또는 ?token= 쿼리 파라미터)"""
@@ -916,16 +696,17 @@ async def get_document_file(
        # 일반 Bearer 헤더 인증 시도
        raise HTTPException(status_code=401, detail="토큰이 필요합니다")

-    doc = await get_live_document(session, doc_id)
+    doc = await session.get(Document, doc_id)
+    if not doc:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

    # note(메모)는 물리 파일이 없음
    if not doc.file_path:
        raise HTTPException(status_code=404, detail="파일이 없는 문서입니다 (메모)")

-    # D-2: 물리 경로 해석을 storage 백엔드로 단일화. local=FileResponse(Range 자동) /
-    #      원격=ABC.stream(range). /file URL·바디 shape 불변(non-breaking). 현재 활성 백엔드는
-    #      LocalBackend only 라 동작 변경 0.
-    backend = get_storage_backend()
+    file_path = Path(settings.nas_mount_path) / doc.file_path
+    if not file_path.exists():
+        raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")

    # 미디어 타입 매핑
    # HTML5 <audio>/<video> 직접 재생을 위해 audio/video mime 포함. Starlette
@@ -946,7 +727,7 @@ async def get_document_file(
        # 비디오 — direct play 호환 (§3 최소판)
        ".mp4": "video/mp4", ".webm": "video/webm",
    }
-    suffix = Path(doc.file_path).suffix.lower()
+    suffix = file_path.suffix.lower()
    media_type = media_types.get(suffix, "application/octet-stream")

    # Content-Disposition: download=true면 attachment (한글 filename* 호환)
@@ -958,40 +739,10 @@ async def get_document_file(
    else:
        disposition = "inline"

-    # 로컬 백엔드: 기존과 동일하게 FileResponse (Range 자동 처리).
-    if backend.is_local:
-        local = backend.local_path(doc.file_path)
-        if local is None or not Path(local).exists():
-            raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")
-        return FileResponse(
-            path=str(local),
-            media_type=media_type,
-            headers={"Content-Disposition": disposition},
-        )
-
-    # 원격 백엔드: D-1 ABC 의 Range pass-through. 미프로비전 백엔드는 stat() 가
-    # StorageNotConfigured → 503 (silent fallback 금지). 현재 LocalBackend only 라 미도달.
-    try:
-        st = await backend.stat(doc.file_path)
-    except StorageNotConfigured as exc:
-        raise HTTPException(status_code=503, detail=str(exc))
-    if not st.exists:
-        raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")
-
-    start, end = _parse_byte_range(range_header, st.size)
-    headers = {"Content-Disposition": disposition, "Accept-Ranges": "bytes"}
-    if start is None:
-        headers["Content-Length"] = str(st.size)
-        status_code = 200
-    else:
-        headers["Content-Range"] = f"bytes {start}-{end}/{st.size}"
-        headers["Content-Length"] = str(end - start + 1)
-        status_code = 206
-    return StreamingResponse(
-        backend.stream(doc.file_path, start=start, end=end),
-        status_code=status_code,
+    return FileResponse(
+        path=str(file_path),
        media_type=media_type,
-        headers=headers,
+        headers={"Content-Disposition": disposition},
    )


@@ -1019,8 +770,10 @@ async def get_document_image_raw(
    if not payload or payload.get("type") != "access":
        raise HTTPException(status_code=401, detail="유효하지 않은 토큰")

-    # 문서 존재 확인 (image_key 만 있고 doc 가 사라진 케이스 차단 + soft-delete 가드)
-    doc = await get_live_document(session, doc_id)
+    # 문서 존재 확인 (image_key 만 있고 doc 가 사라진 케이스 차단)
+    doc = await session.get(Document, doc_id)
+    if doc is None:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

    img = await session.scalar(
        select(DocumentImage).where(
@@ -1050,7 +803,6 @@ async def get_document_image_raw(
 async def upload_document(
    request: Request,
    file: UploadFile,
-    background_tasks: BackgroundTasks,
    user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
    doc_purpose: str | None = Form(None, description="business | knowledge"),
@@ -1202,9 +954,6 @@ async def upload_document(
            file_size=written,
            file_type="immutable",
            title=target.stem,
-            # B-1: 업로드 원본 파일명(다운로드 라벨용). file_path 는 충돌 시 _N 리네임되므로
-            # 원본명을 별도 보존. safe_name = Path(file.filename).name (경로 이탈 제거된 basename).
-            original_filename=safe_name,
            source_channel="manual",
            doc_purpose=doc_purpose,
            user_tags=[library_tag] if library_tag else [],
@@ -1215,26 +964,8 @@ async def upload_document(
        )
        session.add(doc)
        await session.flush()
-
-        # B-1: file_hash exact 중복 채움 (OFF-whitelist=law_monitor 제외). 거부(409) 아님 —
-        # 허용 + duplicate_of 링크 + canonical duplicate_count++ (법령 의도적 중복 보존 정책).
-        # 홈랩 저동시성이라 동시 동일-hash 업로드 TOCTOU 는 멱등/B-4 backfill 로 수습(락 불요).
-        canonical = await find_canonical_for_hash(session, fhash, exclude_id=doc.id)
-        if canonical is not None:
-            # 원래 canonical 이 soft-delete(deleted_at) 되어 former member 가 승격되면, 그 survivor 의
-            # stale duplicate_of 를 비워 'member 이자 counter' 모순을 막는다(B-4 불변식 유지). 문서는
-            # soft-delete only 라 FK ON DELETE SET NULL 이 발화하지 않아 잔여가 남기 때문(리뷰 발견).
-            # (삭제된 canonical 을 가리키는 다른 sibling 멤버의 잔여 포인터·overcount 는 야간
-            #  dedup_reconcile 잡(B-4, 03:30 KST 멱등 절대 재계산)이 정리.)
-            if canonical.duplicate_of is not None:
-                canonical.duplicate_of = None
-            doc.duplicate_of = canonical.id
-            canonical.duplicate_count = (canonical.duplicate_count or 0) + 1
-
-        # document + processing_queue 는 단일 트랜잭션으로 묶어 원자적 정리.
-        # G2: 첫 stage=presegment (extract 前 번들 PDF 분할, 후보 A 검증완료 2026-06-18).
-        # 非PDF/단일은 presegment 가 무변 통과 → extract. 번들 PDF 만 N 자식 분할(worker-side gating).
-        await enqueue_stage(session, doc.id, "presegment")
+        # document + processing_queue 는 단일 트랜잭션으로 묶어 원자적 정리
+        await enqueue_stage(session, doc.id, "extract")
        await session.commit()
    except Exception:
        # DB 예외 시 session 은 get_session 컨텍스트 종료로 자동 rollback.
@@ -1242,9 +973,6 @@ async def upload_document(
        target.unlink(missing_ok=True)
        raise

-    # B-3: near_duplicate 스캔은 post-upload 비동기 — 201 응답을 막지 않는다(non-gating 기록).
-    background_tasks.add_task(_near_dup_scan_bg, doc.id)
-
    return DocumentResponse.model_validate(doc)


@@ -1277,14 +1005,6 @@ async def update_document(
        if val is not None and val not in ("business", "knowledge"):
            raise HTTPException(status_code=400, detail="doc_purpose는 business 또는 knowledge만 가능")

-    # edit_url SSRF 가드 (2026-06-20 M1): 내부/메타데이터 주소 후속 fetch 차단 (news.py 동형 검증)
-    if update_data.get("edit_url"):
-        from core.url_validator import validate_feed_url
-        try:
-            await asyncio.to_thread(validate_feed_url, update_data["edit_url"])
-        except Exception as e:
-            raise HTTPException(status_code=400, detail=f"edit_url 검증 실패: {e}")
-
    for field, value in update_data.items():
        setattr(doc, field, value)
    doc.updated_at = datetime.now(timezone.utc)
@@ -1348,49 +1068,11 @@ async def accept_suggestion(
    # payload 적용
    proposed_category = doc.ai_suggestion.get("proposed_category")
    proposed_path = doc.ai_suggestion.get("proposed_path")
-    # 안전 자료실 A-2 — material_type 제안 (classify 의 document_type 결정적 매핑)
-    proposed_material = doc.ai_suggestion.get("proposed_material_type")

-    if not proposed_category and not proposed_material:
-        raise HTTPException(
-            status_code=422,
-            detail="proposed_category/proposed_material_type 둘 다 누락된 suggestion",
-        )
+    if not proposed_category:
+        raise HTTPException(status_code=422, detail="proposed_category 누락된 suggestion")

-    if proposed_category:
-        doc.category = proposed_category
-
-    if proposed_material:
-        _MATERIAL_TYPES = {"law", "paper", "book", "incident", "manual", "standard", "guide"}
-        _JURISDICTIONS = {"KR", "US", "EU", "JP", "GB", "INT"}
-        if proposed_material not in _MATERIAL_TYPES:
-            raise HTTPException(
-                status_code=422, detail=f"허용 밖 material_type: {proposed_material}"
-            )
-        jur = body.jurisdiction or doc.ai_suggestion.get("proposed_jurisdiction")
-        if jur is not None and jur not in _JURISDICTIONS:
-            raise HTTPException(status_code=422, detail=f"허용 밖 jurisdiction: {jur}")
-        # law = 국가 필수 입력, 기본값 없음 (plan A-2 — KR 자동 부여 시 외국 법령 오염.
-        # DB CHECK(chk_documents_law_jurisdiction) 도 거부하지만 422 로 명시 안내).
-        if proposed_material == "law" and not jur:
-            raise HTTPException(
-                status_code=422,
-                detail="법령(law) 승인은 jurisdiction 필수 — body.jurisdiction 으로 국가를 지정하세요 (기본값 없음)",
-            )
-        doc.material_type = proposed_material
-        doc.jurisdiction = jur
-        # 미러 동기화 1문 — jurisdiction 부여/정정 시 청크 country 동반 UPDATE
-        # (leg 간 국가 불일치 방지, plan A-2 계약. 단일 지점 = 본 승인 경로).
-        if jur:
-            from sqlalchemy import update as sa_update
-
-            from models.chunk import DocumentChunk
-
-            await session.execute(
-                sa_update(DocumentChunk)
-                .where(DocumentChunk.doc_id == doc.id)
-                .values(country=jur)
-            )
+    doc.category = proposed_category

    # user_tags append (중복 방지, normalize + dedup 통과)
    if proposed_path:
@@ -1441,8 +1123,9 @@ async def save_document_content(
    body: dict = None,
 ):
    """Markdown 원본 파일 저장 + extracted_text 갱신"""
-    # soft-delete 문서엔 쓰기 차단 (R7 — 삭제 문서 resurrect / NAS 재기록 방지)
-    doc = await get_live_document(session, doc_id)
+    doc = await session.get(Document, doc_id)
+    if not doc:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

    if doc.file_format not in ("md", "txt"):
        raise HTTPException(status_code=400, detail="편집 가능한 포맷이 아닙니다 (md, txt만 가능)")
@@ -1482,7 +1165,9 @@ async def get_document_preview(
    else:
        raise HTTPException(status_code=401, detail="토큰이 필요합니다")

-    doc = await get_live_document(session, doc_id)
+    doc = await session.get(Document, doc_id)
+    if not doc:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

    preview_path = Path(settings.nas_mount_path) / "PKM" / ".preview" / f"{doc_id}.pdf"
    if not preview_path.exists():
@@ -1508,24 +1193,18 @@ async def delete_document(
    doc_id: int,
    user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
-    delete_file: bool = Query(False, description="NAS 원본도 삭제 (grace 후 retention sweep 이 물리삭제)"),
+    delete_file: bool = Query(False, description="NAS 파일도 함께 삭제"),
 ):
-    """문서 삭제. 기본: soft-delete(숨김, 파일 보존). delete_file=true: purge 예약 (R7)."""
-    doc = await get_live_document(session, doc_id)
+    """문서 삭제 (기본: DB만 삭제, 파일 유지)"""
+    doc = await session.get(Document, doc_id)
+    if not doc:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

-    # soft-delete(숨김). delete_file=true 면 purge_requested_at 마커를 추가로 set —
-    # retention sweep cron(document_purge_sweep)이 grace(30일) 경과 후 NAS 원본 물리삭제
-    # + audit-log. ★일반 숨김(delete_file=false)은 파일 보존 = undelete 가능. sweep 는
-    # deleted_at 이 아니라 purge_requested_at 기준이라 단순 숨김이 영구삭제되지 않는다.
-    now = datetime.now(timezone.utc)
-    doc.deleted_at = now
-    if delete_file:
-        doc.purge_requested_at = now
+    # soft-delete (물리 파일은 cleanup job에서 나중에 정리)
+    doc.deleted_at = datetime.now(timezone.utc)
    await session.commit()

-    if delete_file:
-        return {"message": f"문서 {doc_id} 삭제 — NAS 원본은 30일 후 정리 예약"}
-    return {"message": f"문서 {doc_id} soft-delete 완료 (파일 보존)"}
+    return {"message": f"문서 {doc_id} soft-delete 완료"}


@router.get("/{doc_id}/content")
@@ -1535,7 +1214,9 @@ async def get_document_content(
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """문서 전문 텍스트 반환 (서비스 호출용)."""
-    doc = await get_live_document(session, doc_id)
+    doc = await session.get(Document, doc_id)
+    if not doc:
+        raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")

    raw_text = doc.extracted_text or ""
    content = raw_text[:15000]
@@ -1565,7 +1246,7 @@ ANALYZE_PROMPT = (
 )

 ANALYZE_TEXT_LIMIT = 12000  # chars (15000 → 12000, 실측 timeout 빈발)
-ANALYZE_TIMEOUT_S = settings.llm_call_timeout_s  # 2026-06-20 config 단일소스 (구 60s=빠른 Gemma)
+ANALYZE_TIMEOUT_S = 60  # 15,000자 입력 + 4층 출력. 실측 7~45초, safety margin 포함
 ANALYZE_CACHE_TTL_S = 1800  # 30분
 ANALYZE_CACHE_MAXSIZE = 100
 ANALYZE_LAYER_MIN_CHARS = 50  # 이 미만이면 억지 채움으로 보고 제거
@@ -1,322 +0,0 @@
-"""이드 채팅 표면 — POST /api/eid/chat (eid-chat 트랙).
-
-확정 결정:
-  - D-1 경로 = /api/eid/chat (main.py prefix=/api/eid + 본 라우터 POST /chat)
-  - D-2 mode 닫힌 어휘: daily / deep — 둘 다 mac-mini-default (맥북 백지화 2026-06-11,
-    맥미니 Qwen 27B 단일 호스트. deep = ReAct 자동검색 모드 구분). 클라는 mode 만 보냄 —
-    claude-cloud / auto 금지 (Literal 로 422 차단). 게이트 = alias 기준 자동 적용(무게이트 폐지).
-  - D-3 독립 /chat 라우트 (frontend) — 본 모듈은 백엔드 API 만.
-  - D-5 LLM 호출 = EidAIClient.call_stream 한 곳 (이드 egress 봉쇄 불변식 #5,
-    RouterBackend 직접 호출 금지).
-  - D-6 rules.md 부재 = 503 substrate_degraded fail-closed — 다른 표면의 degraded 배너
-    컨벤션(compose._rules)과 달리 채팅은 진행 자체를 거부.
-
-응답 = router SSE 라인 단위 중계 (text/event-stream — call_stream 이 model 필드를 mode
-어휘로 치환·usage 제거, 프레이밍 보존. 본 모듈은 무변형 relay). 스트림 시작 전
-backend 실패는 /api/search/ask 와 동일 shape 의 503 + error_reason 매핑(자동 fallback 0).
-로그는 메타 1줄(mode·턴수·status)만 — 대화 본문 로깅 0.
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-from collections.abc import AsyncIterator
-from typing import Annotated, Literal
-
-import httpx
-from fastapi import APIRouter, Depends
-from fastapi.responses import JSONResponse, StreamingResponse
-from pydantic import BaseModel, Field, field_validator, model_validator
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from core.auth import get_current_user
-from core.database import get_session
-from core.utils import setup_logger
-from eid import compose as eid_compose
-from eid.ai import EidAIClient
-from models.user import User
-from services.llm.backends import BackendUnavailable, _router_url, get_backend
-from services.search import llm_gate
-from services.search.react_loop import agentic_ask_loop
-
-logger = setup_logger("eid_chat")
-
-router = APIRouter()
-
-# ── ds-eid-ask-absorb P1: deep 모드 = ReAct 자동검색 (맥미니 Qwen 27B, 2026-06-11~) ──
-# 비생성 reachability probe — router 도달만 확인(coarse). 27B(맥북) 자체 미가용은
-# 첫 generate_with_tools 호출의 BackendUnavailable → mid-stream error envelope 로 커버
-# (plan: probe 정밀도 불필요, TOCTOU 는 in-stream error 가 처리). ~2s 타임아웃·생성 슬롯 비점유.
-_DEEP_PROBE_TIMEOUT = httpx.Timeout(connect=2.0, read=2.0, write=2.0, pool=2.0)
-# heartbeat: ReAct 다회 tool call 시 수십초 무출력 → 프록시 idle timeout 차단.
-# `{"phase":"ping"}` no-op 이벤트 (프론트 envelope 파서가 자연 스킵 — `: ping` comment 는
-# POST SSE fetch 파서가 처리 보장 안 됨).
-_HEARTBEAT_INTERVAL_S = 10.0
-
-
-async def _probe_router_reachable() -> bool:
-    """router(:8890) /v1/models GET — 도달 확인(비생성). 실패/비200 = 미가용."""
-    url = f"{_router_url().rstrip('/')}/v1/models"
-    try:
-        async with httpx.AsyncClient(timeout=_DEEP_PROBE_TIMEOUT) as client:
-            resp = await client.get(url)
-            return resp.status_code == 200
-    except Exception:
-        return False
-
-
-def _sse(obj: dict) -> bytes:
-    """SSE 이벤트 1건 — data: <json>\\n\\n. final_answer 는 OpenAI 호환 choices.delta.content
-    로, sources/phase 는 별 envelope 키로(프론트가 분기). model/usage 머신 메타 미포함."""
-    return b"data: " + json.dumps(obj, ensure_ascii=False).encode("utf-8") + b"\n\n"
-
-
-class ChatMessage(BaseModel):
-    """채팅 턴 1건. role=system 은 Literal 밖 → 422 (system 합본은 서버 compose 만 주입)."""
-
-    role: Literal["user", "assistant"]
-    content: str = Field(min_length=1, max_length=8000)
-
-
-# 대화 총량 cap (전 메시지 content 합) — per-message 8000·40턴 제한과 별도의 총량 상한
-_TOTAL_CONTENT_CAP = 32000
-
-
-class ChatRequest(BaseModel):
-    """POST /api/eid/chat body. mode 는 닫힌 어휘(D-2), messages 는 1~40턴 + 총량 32000자."""
-
-    mode: Literal["daily", "deep"]
-    messages: list[ChatMessage] = Field(min_length=1, max_length=40)
-
-    @field_validator("messages")
-    @classmethod
-    def _last_turn_is_user(cls, v: list[ChatMessage]) -> list[ChatMessage]:
-        if v and v[-1].role != "user":
-            raise ValueError("마지막 메시지는 role=user 여야 합니다")
-        return v
-
-    @model_validator(mode="after")
-    def _total_content_cap(self) -> "ChatRequest":
-        if sum(len(m.content) for m in self.messages) > _TOTAL_CONTENT_CAP:
-            raise ValueError(
-                "대화 총량 초과 — 새 대화로 시작하거나 입력을 줄여주세요 "
-                f"(전체 메시지 합 {_TOTAL_CONTENT_CAP}자 제한)"
-            )
-        return self
-
-
-@router.get("/status")
-async def eid_status(
-    user: Annotated[User, Depends(get_current_user)],
-):
-    """이드 backend 점유 상태 스냅샷 — GET /api/eid/status (UI 의 "대기 vs 고장" 구분용).
-
-    daily(맥미니 MLX) 의 DS 프로세스 내부 llm_gate 점유만 본다 — 외부 소비자
-    (맥미니 자체 derived-worker·Hermes 등)의 endpoint 점유는 미포착.
-    따라서 busy=true 는 확실(지금 줄이 있다), false 는 근사(외부 점유 가능성 잔존).
-
-    가벼움 보장: DB 0 / LLM 0 / 본문 로깅 0 — 폴링 대상으로 안전.
-    자동 fallback 판단 근거로 쓰지 않는다 (모드 전환 = 명시 버튼만, 정책).
-    """
-    snap = llm_gate.gate_status()
-    inflight = bool(snap["inflight"])
-    waiters = int(snap["waiters"])
-    return {
-        "daily": {
-            "busy": inflight or waiters > 0,
-            "inflight": inflight,
-            "waiters": waiters,
-        }
-    }
-
-
-def _backend_unavailable_response(body: ChatRequest, reason: str, backend_name: str) -> JSONResponse:
-    """스트림 시작 전 27B 미가용 → ask 컨벤션과 동일 shape 503 (자동 fallback 0)."""
-    logger.warning(
-        "eid_chat backend_unavailable mode=%s turns=%d status=503 reason=%s",
-        body.mode, len(body.messages), reason,
-    )
-    return JSONResponse(
-        status_code=503,
-        content={
-            "error": "backend_unavailable",
-            "error_reason": reason,
-            "backend_requested": backend_name,
-            "detail": (
-                "심층 엔진(검색)이 일시적으로 응답할 수 없습니다. "
-                "잠시 후 다시 시도하거나 일상 모드로 물어보세요."
-            ),
-        },
-    )
-
-
-async def _eid_chat_deep(body: ChatRequest, session: AsyncSession) -> StreamingResponse | JSONResponse:
-    """deep 모드 = ReAct 자동검색. ReAct(`tool_choice=auto`)가 검색 여부를 LLM 자율 판단 —
-    검색 불요 질문은 early-exit 으로 대화 답변. substrate(persona+rules+react_ask task)는
-    agentic_ask_loop 내부 compose("react_ask") 가 주입(evidence-first 자동 상속).
-
-    멀티턴 = 1단계는 마지막 user 메시지 단독 처리(agentic_ask_loop 가 query: str — history
-    미지원). 후속 질문 대명사 해소는 2단계 백로그.
-    """
-    # ① 첫 SSE 바이트(=HTTP 200 확정) 전 비생성 probe — router 도달 실패 시 503 (재매핑 가능 구간)
-    if not await _probe_router_reachable():
-        return _backend_unavailable_response(body, "router_unreachable", "mac-mini-default")
-
-    query = body.messages[-1].content  # 메시지 단독 처리 (마지막 user 턴)
-    backend = get_backend("mac-mini-default")
-
-    async def _stream() -> AsyncIterator[bytes]:
-        # ② phase:searching 방출 = HTTP 200 확정. 이후 미가용은 503 불가 → in-stream error.
-        yield _sse({"phase": "searching"})
-        task = asyncio.create_task(agentic_ask_loop(session, query, backend=backend))
-        try:
-            # heartbeat: task 미완 동안 ~10s 마다 ping (shield 로 wait_for 취소가 task 안 죽임)
-            while not task.done():
-                try:
-                    await asyncio.wait_for(asyncio.shield(task), timeout=_HEARTBEAT_INTERVAL_S)
-                except asyncio.TimeoutError:
-                    yield _sse({"phase": "ping"})
-            result = task.result()  # BackendUnavailable 은 여기서 raise (mid-stream)
-            # final_answer = OpenAI 호환 1청크(프론트 기존 content 누적 경로 재사용)
-            yield _sse({"choices": [{"delta": {"content": result.final_answer}}]})
-            # 근거 = 별 envelope (citation 번호 없음 — 프론트가 순서 기반). partial = 근거 부족 표식
-            yield _sse({"eid_sources": result.sources, "partial": result.partial})
-            yield b"data: [DONE]\n\n"
-            logger.info(
-                "eid_chat deep ok turns=%d sources=%d partial=%s iters=%d",
-                len(body.messages), len(result.sources), result.partial, result.iterations,
-            )
-        except BackendUnavailable as exc:
-            # mid-stream 미가용(검색 중 AC 분리·뚜껑 닫힘) — 200 이미 송신, in-stream error envelope.
-            # error 뒤 [DONE] = 프론트 sawDone 로 '중단' 오경보 방지(명시 error notice 유지).
-            logger.warning(
-                "eid_chat deep mid-stream unavailable turns=%d reason=%s",
-                len(body.messages), exc.reason,
-            )
-            yield _sse({"phase": "error", "error_reason": exc.reason})
-            yield b"data: [DONE]\n\n"
-        except asyncio.CancelledError:
-            raise  # 클라 disconnect — finally 가 task 정리
-        except Exception:
-            logger.exception("eid_chat deep stream failed turns=%d", len(body.messages))
-            yield _sse({"phase": "error", "error_reason": "deep_failed"})
-            yield b"data: [DONE]\n\n"
-        finally:
-            # 클라 disconnect 시 ReAct task 고아화 방지 — cancel + await(전파 완료 보장).
-            # 안 하면 27B 가 닫힌 연결 위해 수분 점유, router 동시성상 다음 검색 대기.
-            if not task.done():
-                task.cancel()
-            try:
-                await task
-            except (asyncio.CancelledError, Exception):
-                pass
-
-    return StreamingResponse(
-        _stream(),
-        media_type="text/event-stream",
-        headers={"Cache-Control": "no-store", "X-Accel-Buffering": "no"},
-    )
-
-
-@router.post("/chat")
-async def eid_chat(
-    body: ChatRequest,
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """이드 채팅 — daily = router SSE pass-through(대화) / deep = ReAct 자동검색(근거).
-
-    503 경로 (모두 자동 fallback 없음):
-      - substrate_degraded: rules.md 부재 (D-6 fail-closed, 채팅 진행 거부)
-      - backend_unavailable: 스트림 시작 전 backend 실패 (daily/deep 공통, ask 컨벤션 shape)
-    """
-    # D-6: rules 부재 = fail-closed. 채팅은 안전·정책 가드 없이 진행하지 않는다(배너 X).
-    if not eid_compose.rules_present():
-        logger.error(
-            "eid_chat substrate_degraded mode=%s turns=%d status=503 — rules.md 부재, 채팅 거부",
-            body.mode, len(body.messages),
-        )
-        return JSONResponse(
-            status_code=503,
-            content={
-                "detail": (
-                    "이드 substrate 가 degraded 상태입니다 (운영 규칙 rules.md 부재). "
-                    "복구 전까지 채팅을 진행하지 않습니다."
-                ),
-                "error_reason": "substrate_degraded",
-            },
-        )
-
-    # deep = ReAct 자동검색 (별 흐름 — probe + 동기 ReAct → SSE 변환)
-    if body.mode == "deep":
-        return await _eid_chat_deep(body, session)
-
-    # daily = 순수 대화 SSE pass-through (기존)
-    system = eid_compose.compose("eid_chat", task="")
-    client = EidAIClient()
-    stream = client.call_stream(
-        body.mode, [m.model_dump() for m in body.messages], system,
-    )
-
-    # async generator 는 첫 __anext__ 에서야 실제 요청 전송 — 스트림 시작 전 실패(연결/4xx/5xx)
-    # 를 503 으로 매핑하기 위해 첫 chunk 를 여기서 먼저 당긴다.
-    try:
-        first = await anext(stream, None)
-    except BackendUnavailable as exc:
-        logger.warning(
-            "eid_chat backend_unavailable mode=%s turns=%d status=503 reason=%s",
-            body.mode, len(body.messages), exc.reason,
-        )
-        await client.close()
-        return JSONResponse(
-            status_code=503,
-            content={
-                "error": "backend_unavailable",
-                "error_reason": exc.reason,
-                "backend_requested": exc.backend_name,
-                "detail": (
-                    "선택한 모드의 backend 가 일시적으로 응답할 수 없습니다. "
-                    "잠시 후 다시 시도하거나 mode 를 바꿔 호출하세요."
-                ),
-            },
-        )
-    except BaseException:
-        await client.close()
-        raise
-
-    # 메타 로그 1줄 — 본문 로깅 0 (대화 내용은 어디에도 남기지 않는다)
-    logger.info(
-        "eid_chat stream mode=%s turns=%d status=200", body.mode, len(body.messages)
-    )
-
-    async def _passthrough():
-        # call_stream 방출분 무변형 relay (정화는 call_stream 라인 단위 한 곳). 취소·
-        # disconnect 포함 finally 에서 generator aclose → AsyncExitStack 이 upstream 정리.
-        try:
-            try:
-                if first is not None:
-                    yield first
-                async for chunk in stream:
-                    yield chunk
-            except (BackendUnavailable, httpx.HTTPError) as exc:
-                # 스트림 시작 후 절단 — status 200 은 이미 송신돼 재매핑 불가. 메타 로그
-                # 1줄만 남기고 조용히 종료(traceback 전파 0) — 프론트는 [DONE] 부재로 처리.
-                logger.warning(
-                    "eid_chat stream aborted mode=%s turns=%d reason=%s",
-                    body.mode, len(body.messages),
-                    getattr(exc, "reason", type(exc).__name__),
-                )
-                return
-        finally:
-            # stream.aclose() 가 예외여도 client.close() 는 보장 (중첩 finally)
-            try:
-                await stream.aclose()
-            finally:
-                await client.close()
-
-    return StreamingResponse(
-        _passthrough(),
-        media_type="text/event-stream",
-        headers={"Cache-Control": "no-store", "X-Accel-Buffering": "no"},
-    )
@@ -21,7 +21,7 @@ from zoneinfo import ZoneInfo

 from fastapi import APIRouter, Body, Depends, HTTPException, Query
 from pydantic import BaseModel, Field
-from sqlalchemy import and_, func, or_, select
+from sqlalchemy import and_, or_, select
 from sqlalchemy.ext.asyncio import AsyncSession

 from core.auth import get_current_user
@@ -388,10 +388,10 @@ async def list_events(
        )

    base = select(Event).where(and_(*where))
-    # R10: 전체 ID 로딩 후 len() 대신 DB COUNT 푸시다운 (행 수 선형 메모리/전송 비용 제거).
-    total = (
-        await session.execute(select(func.count(Event.id)).where(and_(*where)))
-    ).scalar() or 0
+    total_q = await session.execute(
+        select(Event.id).where(and_(*where))
+    )
+    total = len(total_q.scalars().all())

    rows = await session.execute(
        base.order_by(Event.created_at.desc())
@@ -1,230 +0,0 @@
-"""뷰어 write-back ingest (study-to-viewer P2) — 뷰어 로컬 풀이 세션을 DS 로 흘려 finalize 재생.
-
-흐름(plan study-to-viewer-slice1 P2, r2/r3 불변식):
-  뷰어 outbox → POST /ingest/study/attempts (Bearer VIEWER_SYNC_TOKEN, study_ingest_enabled gate)
-  → pub_id→published.source_id→StudyQuestion 해소(부재 graceful skip) → principal=question.user_id
-  → topic 별 그룹(뷰어 subject 퀴즈가 여러 DS topic 걸칠 수 있음) → topic 마다 DS quiz_session
-     (source='viewer', client_session_uuid) 생성 + attempt(derive_outcome=채점 단일 소스) + 세션 done
-  → finalize_session **무수정 재생**(SR/pattern/progress + 4-A/4-B enqueue) → finalized_at 마커
-  → 전부 1 트랜잭션(원자) 후 commit.
-
-멱등(r2 P2-2): client_session_uuid 로 기존 세션 있으면 이미 적재된 것 → 캐시 요약 반환(재실행 0).
-  원자 1-tx 라 'uuid 존재 ⟺ finalize 완료' → at-least-once outbox 재전송에도 SR 이중 advance 없음.
-user_id 리터럴 금지(r2): principal = 해소된 질문의 owner(단일, mixed 면 거부).
-"""
-
-from __future__ import annotations
-
-import hmac
-import logging
-from collections import defaultdict
-from datetime import datetime, timezone
-
-from fastapi import APIRouter, Depends, Header, HTTPException
-from pydantic import BaseModel
-from sqlalchemy import select
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from core.config import settings
-from core.database import async_session
-from models.published import Published
-from models.study_question import StudyQuestion, StudyQuestionAttempt
-from models.study_quiz_session import StudyQuizSession
-from services.study.outcome import derive_outcome
-from services.study.publish_projection import KIND_QUESTION
-from services.study.session_finalize import finalize_session
-
-logger = logging.getLogger(__name__)
-router = APIRouter()
-
-
-def _verify_token(authorization: str | None = Header(default=None)) -> None:
-    """뷰어↔DS 발행 채널 Bearer(read 와 동일 토큰, r3 단일토큰 수용). default-deny(미설정=503)."""
-    if not settings.viewer_sync_token:
-        raise HTTPException(status_code=503, detail="viewer_sync_token not configured")
-    if not authorization or not authorization.lower().startswith("bearer "):
-        raise HTTPException(status_code=401, detail="missing Bearer token")
-    token = authorization[7:].strip()
-    if not hmac.compare_digest(token, settings.viewer_sync_token):
-        raise HTTPException(status_code=403, detail="invalid token")
-
-
-async def _session() -> AsyncSession:
-    async with async_session() as s:
-        yield s
-
-
-class IngestAttempt(BaseModel):
-    question_pub_id: str
-    selected_choice: int | None = None
-    is_unsure: bool = False
-    answered_at: str | None = None  # 클라(오프라인) ISO 시각 — 미래 스큐 클램프, id 가 타이브레이커
-
-
-class IngestBody(BaseModel):
-    client_session_uuid: str
-    attempts: list[IngestAttempt]
-
-
-def _parse_answered_at(s: str | None, now: datetime) -> datetime:
-    if not s:
-        return now
-    try:
-        dt = datetime.fromisoformat(s.replace("Z", "+00:00"))
-        if dt.tzinfo is None:
-            dt = dt.replace(tzinfo=timezone.utc)
-        return min(dt, now)  # 미래 스큐는 now 로 클램프(클라 시계 오염 방지)
-    except Exception:
-        return now
-
-
-@router.post("/attempts")
-async def ingest_attempts(
-    body: IngestBody,
-    _auth: None = Depends(_verify_token),
-    session: AsyncSession = Depends(_session),
-):
-    if not settings.study_ingest_enabled:
-        raise HTTPException(status_code=503, detail="study_ingest not enabled")
-    if not body.client_session_uuid or not body.attempts:
-        raise HTTPException(status_code=400, detail="client_session_uuid 와 attempts 필요")
-
-    # 멱등: 이 uuid 로 이미 적재됐나(원자 1-tx 라 존재=완료). 있으면 캐시 요약 반환(재실행 0).
-    existing = (
-        await session.execute(
-            select(StudyQuizSession).where(
-                StudyQuizSession.client_session_uuid == body.client_session_uuid
-            )
-        )
-    ).scalars().all()
-    if existing:
-        return {
-            "status": "already_ingested",
-            "sessions": [
-                {
-                    "topic_id": s.study_topic_id,
-                    "correct": s.correct_count,
-                    "wrong": s.wrong_count,
-                    "unsure": s.unsure_count,
-                }
-                for s in existing
-            ],
-        }
-
-    # pub_id → source_id(내부 질문 id) 해소. deleted tombstone 제외.
-    pub_ids = list({a.question_pub_id for a in body.attempts})
-    pub_rows = (
-        await session.execute(
-            select(Published.pub_id, Published.source_id).where(
-                Published.kind == KIND_QUESTION,
-                Published.pub_id.in_(pub_ids),
-                Published.deleted.is_(False),
-            )
-        )
-    ).all()
-    src_by_pubid = {r.pub_id: r.source_id for r in pub_rows}
-
-    # 질문 fetch(미삭제). principal = owner(단일).
-    source_ids = list(set(src_by_pubid.values()))
-    q_rows = (
-        await session.execute(
-            select(StudyQuestion).where(
-                StudyQuestion.id.in_(source_ids), StudyQuestion.deleted_at.is_(None)
-            )
-        )
-    ).scalars().all()
-    q_by_id = {q.id: q for q in q_rows}
-    owners = {q.user_id for q in q_by_id.values()}
-    if len(owners) > 1:
-        raise HTTPException(status_code=400, detail="여러 사용자 소유 질문 혼재 — 단일 principal 위반")
-    if not owners:
-        raise HTTPException(status_code=404, detail="해소 가능한 질문 없음")
-    user_id = owners.pop()
-
-    now = datetime.now(timezone.utc)
-
-    # topic 별 그룹(해소 실패 attempt 는 graceful skip). 같은 (uuid, topic) 1 세션.
-    by_topic: dict[int, list[tuple[IngestAttempt, StudyQuestion]]] = defaultdict(list)
-    skipped: list[str] = []
-    for a in body.attempts:
-        src = src_by_pubid.get(a.question_pub_id)
-        q = q_by_id.get(src) if src is not None else None
-        if q is None:
-            skipped.append(a.question_pub_id)
-            continue
-        by_topic[q.study_topic_id].append((a, q))
-    if not by_topic:
-        raise HTTPException(status_code=404, detail="해소된 attempt 없음")
-
-    summaries = []
-    for topic_id, items in by_topic.items():
-        qids = [q.id for (_, q) in items]
-        qs = StudyQuizSession(
-            user_id=user_id,
-            study_topic_id=topic_id,
-            question_ids=qids,
-            subject_distribution={},
-            status="done",
-            cursor=len(qids),
-            source="viewer",
-            client_session_uuid=body.client_session_uuid,
-            finished_at=now,
-            created_at=now,
-            updated_at=now,
-        )
-        session.add(qs)
-        await session.flush()  # qs.id
-
-        c = w = u = 0
-        for a, q in items:
-            try:
-                sel, is_corr, outcome = derive_outcome(a.selected_choice, a.is_unsure, q.correct_choice)
-            except ValueError:
-                skipped.append(a.question_pub_id)  # 선택 없고 unsure 아님 = 무효 → skip
-                continue
-            if outcome == "correct":
-                c += 1
-            elif outcome == "wrong":
-                w += 1
-            elif outcome == "unsure":
-                u += 1
-            session.add(
-                StudyQuestionAttempt(
-                    user_id=user_id,
-                    study_question_id=q.id,
-                    study_topic_id=topic_id,
-                    selected_choice=sel,
-                    correct_choice=q.correct_choice,
-                    is_correct=is_corr,
-                    outcome=outcome,
-                    quiz_session_id=qs.id,
-                    answered_at=_parse_answered_at(a.answered_at, now),
-                )
-            )
-        qs.correct_count, qs.wrong_count, qs.unsure_count = c, w, u
-        await session.flush()
-
-        # finalize 무수정 재생(progress/SR/pattern + 4-A/4-B enqueue). 그 후 멱등 마커.
-        summary = await finalize_session(
-            session, user_id=user_id, study_topic_id=topic_id, quiz_session_id=qs.id
-        )
-        qs.finalized_at = now
-        summaries.append(
-            {
-                "topic_id": topic_id,
-                "quiz_session_id": qs.id,
-                "correct": summary.correct,
-                "wrong": summary.wrong,
-                "unsure": summary.unsure,
-                "newly_correct": summary.newly_correct,
-                "relapsed": summary.relapsed,
-                "recovered": summary.recovered,
-            }
-        )
-
-    await session.commit()
-    logger.info(
-        "study_ingest uuid=%s user=%s sessions=%s skipped=%s",
-        body.client_session_uuid, user_id, len(summaries), len(skipped),
-    )
-    return {"status": "ingested", "skipped": skipped, "sessions": summaries}
@@ -6,7 +6,6 @@ Bearer token 보호 (settings.internal_worker_token).
 """
 from __future__ import annotations

-import hmac
 import logging

 from fastapi import APIRouter, Depends, Header, HTTPException, Path, Response, status
@@ -29,10 +28,7 @@ def _verify_token(authorization: str | None = Header(default=None)) -> None:
    if not authorization or not authorization.lower().startswith("bearer "):
        raise HTTPException(status_code=401, detail="missing Bearer token")
    token = authorization[7:].strip()
-    # 상수시간 비교 (R7) — 일반 != 는 첫 불일치에서 단락돼 prefix 길이로 바이트 추정 가능한
-    # timing side-channel. 이 토큰이 RAG 정답 포함 endpoint 를 보호하므로 compare_digest 로
-    # 통일(search.py 정본과 일치).
-    if not hmac.compare_digest(token, settings.internal_worker_token):
+    if token != settings.internal_worker_token:
        raise HTTPException(status_code=403, detail="invalid token")


@@ -9,7 +9,7 @@ from sqlalchemy import func, select
 from sqlalchemy import text as sql_text
 from sqlalchemy.ext.asyncio import AsyncSession

-from core.auth import get_current_user, require_admin
+from core.auth import get_current_user
 from core.database import get_session
 from core.library import LIBRARY_PREFIX, MAX_DEPTH, normalize_library_path
 from models.category import LibraryCategory
@@ -78,7 +78,7 @@ async def list_categories(
@router.post("/categories", response_model=CategoryResponse, status_code=201)
 async def create_category(
    body: CategoryCreate,
-    user: Annotated[User, Depends(require_admin)],
+    user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """카테고리 생성 (조상 자동 생성 포함)"""
@@ -133,7 +133,7 @@ async def create_category(
@router.patch("/categories", response_model=CategoryResponse)
 async def rename_category(
    body: CategoryRename,
-    user: Annotated[User, Depends(require_admin)],
+    user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """카테고리 이름 변경 (leaf only, path 기반 식별)"""
@@ -214,7 +214,7 @@ async def rename_category(
@router.delete("/categories", status_code=204)
 async def delete_category(
    path: str = Query(..., description="삭제할 카테고리 경로"),
-    user: Annotated[User, Depends(require_admin)] = None,
+    user: Annotated[User, Depends(get_current_user)] = None,
    session: Annotated[AsyncSession, Depends(get_session)] = None,
 ):
    """카테고리 삭제 (leaf only, 문서 없는 경우만)"""
@@ -410,7 +410,7 @@ async def get_facet_values(
@router.post("/facets", response_model=FacetValueResponse, status_code=201)
 async def add_facet_value(
    body: FacetValueResponse,
-    user: Annotated[User, Depends(require_admin)],
+    user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """facet 사전에 새 값 추가"""
@@ -473,35 +473,72 @@ async def get_facet_counts(

    result = FacetCountsResponse(company=[], topic=[], year=[], doctype=[])

-    # R10: 4 facet 블록 중복 제거 — 적용된 facet 필터(값 있는 것만)를 모아 각 축 집계 시
-    # '자기 자신 축'만 제외하고 적용하는 헬퍼로. 쿼리/자기제외/order_by/value 매핑 모두 동일.
-    applied: dict = {}
-    if facet_company:
-        applied["company"] = Document.facet_company == facet_company
+    # company counts (다른 facet 필터 적용, 자기 자신 제외)
+    q_company = base_query()
    if facet_topic:
-        applied["topic"] = Document.facet_topic == facet_topic
+        q_company = q_company.where(Document.facet_topic == facet_topic)
    if facet_year:
-        applied["year"] = Document.facet_year == facet_year
+        q_company = q_company.where(Document.facet_year == facet_year)
    if facet_doctype:
-        applied["doctype"] = Document.facet_doctype == facet_doctype
+        q_company = q_company.where(Document.facet_doctype == facet_doctype)
+    rows = await session.execute(
+        select(Document.facet_company, func.count())
+        .where(Document.facet_company != None)  # noqa: E711
+        .where(Document.id.in_(q_company.with_only_columns(Document.id).subquery().select()))
+        .group_by(Document.facet_company)
+        .order_by(func.count().desc())
+    )
+    result.company = [FacetCountItem(value=r[0], count=r[1]) for r in rows]

-    async def _facet_count(name, facet_col, order_by, value_fn):
-        q = base_query()
-        for k, cond in applied.items():
-            if k != name:  # 자기 자신 facet 필터는 제외 (다른 축만 적용)
-                q = q.where(cond)
-        rows = await session.execute(
-            select(facet_col, func.count())
-            .where(facet_col != None)  # noqa: E711
-            .where(Document.id.in_(q.with_only_columns(Document.id).subquery().select()))
-            .group_by(facet_col)
-            .order_by(order_by)
-        )
-        return [FacetCountItem(value=value_fn(r[0]), count=r[1]) for r in rows]
+    # topic counts
+    q_topic = base_query()
+    if facet_company:
+        q_topic = q_topic.where(Document.facet_company == facet_company)
+    if facet_year:
+        q_topic = q_topic.where(Document.facet_year == facet_year)
+    if facet_doctype:
+        q_topic = q_topic.where(Document.facet_doctype == facet_doctype)
+    rows = await session.execute(
+        select(Document.facet_topic, func.count())
+        .where(Document.facet_topic != None)  # noqa: E711
+        .where(Document.id.in_(q_topic.with_only_columns(Document.id).subquery().select()))
+        .group_by(Document.facet_topic)
+        .order_by(func.count().desc())
+    )
+    result.topic = [FacetCountItem(value=r[0], count=r[1]) for r in rows]

-    result.company = await _facet_count("company", Document.facet_company, func.count().desc(), lambda v: v)
-    result.topic = await _facet_count("topic", Document.facet_topic, func.count().desc(), lambda v: v)
-    result.year = await _facet_count("year", Document.facet_year, Document.facet_year.desc(), lambda v: str(v))
-    result.doctype = await _facet_count("doctype", Document.facet_doctype, func.count().desc(), lambda v: v)
+    # year counts
+    q_year = base_query()
+    if facet_company:
+        q_year = q_year.where(Document.facet_company == facet_company)
+    if facet_topic:
+        q_year = q_year.where(Document.facet_topic == facet_topic)
+    if facet_doctype:
+        q_year = q_year.where(Document.facet_doctype == facet_doctype)
+    rows = await session.execute(
+        select(Document.facet_year, func.count())
+        .where(Document.facet_year != None)  # noqa: E711
+        .where(Document.id.in_(q_year.with_only_columns(Document.id).subquery().select()))
+        .group_by(Document.facet_year)
+        .order_by(Document.facet_year.desc())
+    )
+    result.year = [FacetCountItem(value=str(r[0]), count=r[1]) for r in rows]
+
+    # doctype counts
+    q_doctype = base_query()
+    if facet_company:
+        q_doctype = q_doctype.where(Document.facet_company == facet_company)
+    if facet_topic:
+        q_doctype = q_doctype.where(Document.facet_topic == facet_topic)
+    if facet_year:
+        q_doctype = q_doctype.where(Document.facet_year == facet_year)
+    rows = await session.execute(
+        select(Document.facet_doctype, func.count())
+        .where(Document.facet_doctype != None)  # noqa: E711
+        .where(Document.id.in_(q_doctype.with_only_columns(Document.id).subquery().select()))
+        .group_by(Document.facet_doctype)
+        .order_by(func.count().desc())
+    )
+    result.doctype = [FacetCountItem(value=r[0], count=r[1]) for r in rows]

    return result
@@ -300,13 +300,9 @@ async def list_memos(
        base = base.where(Document.pinned == pinned)

    if tag:
-        # 파라미터 바인딩 (R7) — f-string 으로 사용자 tag 를 JSON 배열 리터럴에 직접 삽입하면
-        # tag 안 " 나 ] 가 JSON 을 깨 500 + 필터 의미 변형. jsonb_build_array 로 tag 를
-        # 바인드 파라미터로 전달(@> JSONB containment).
-        tag_arr = func.jsonb_build_array(tag)
        base = base.where(
-            Document.user_tags.op("@>")(tag_arr)
-            | Document.ai_tags.op("@>")(tag_arr)
+            Document.user_tags.op("@>")(f'["{tag}"]')
+            | Document.ai_tags.op("@>")(f'["{tag}"]')
        )

    count_query = select(func.count()).select_from(base.subquery())
@@ -692,57 +688,6 @@ async def dismiss_event_suggestion(
    return _to_memo_response(doc)


-@router.post("/{memo_id}/promote-to-document", status_code=201)
-async def promote_memo_to_document(
-    memo_id: int,
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """메모 1건 → 문서함 정식 Document 로 승격 ("자료로 보내기", P1).
-
-    동작 (in-place 변환 — 별 row 생성 X, extracted_text/태그/이력 보존):
-      - source_channel memo/voice/hermes → 'manual'  (메모 목록서 빠지고 문서함 진입)
-      - file_type 'note' → 'editable'  (문서함 목록 필터 `file_type != 'note'` 통과)
-      - category='library' (자료실), content_origin='manual'
-      - classify/embed/chunk 재큐 → 도메인 재부여 + 요약/심층분석(26B escalate) + 임베딩/청크 갱신
-    P2 'draft' 워커(후속)가 거친 메모를 구조화 마크다운(md_content)으로 정리 예정.
-    """
-    doc = await session.get(Document, memo_id)
-    if (
-        not doc
-        or doc.deleted_at is not None
-        or doc.source_channel not in ("memo", "voice", "hermes")
-        or doc.file_type != "note"
-    ):
-        raise HTTPException(status_code=404, detail="승격할 메모를 찾을 수 없습니다")
-
-    now = datetime.now(timezone.utc)
-    doc.source_metadata = {
-        **(doc.source_metadata or {}),
-        "promoted_from_memo": True,
-        "promoted_at": now.isoformat(),
-        "original_source_channel": doc.source_channel,
-        # P2: memo_draft_worker 가 집어 26B 로 구조화 마크다운(md_content) 생성.
-        "needs_draft": True,
-    }
-    doc.source_channel = "manual"
-    doc.file_type = "editable"
-    doc.category = "library"
-    doc.content_origin = "manual"
-    doc.updated_at = now
-
-    # 문서 컨텍스트로 재처리 — 도메인 재부여 + 요약/심층분석 + 임베딩/청크 갱신.
-    await _enqueue_ai_stages(session, doc.id)
-    await session.commit()
-    await session.refresh(doc)
-
-    return {
-        "document_id": doc.id,
-        "category": doc.category,
-        "message": "문서함으로 보냈습니다. AI 분류·요약·심층분석을 진행합니다.",
-    }
-
-
 # ─── Memo Intake Upgrade PR-2C: voice upload ───


@@ -65,8 +65,7 @@ async def create_source(
 ):
    from core.url_validator import validate_feed_url
    try:
-        # getaddrinfo(DNS) 는 blocking — 이벤트 루프 점유 방지 위해 off-thread (R5)
-        await asyncio.to_thread(validate_feed_url, body.feed_url)
+        validate_feed_url(body.feed_url)
    except ValueError as e:
        raise HTTPException(status_code=422, detail=f"feed_url 검증 실패: {e}")
    source = NewsSource(**body.model_dump())
@@ -195,17 +194,10 @@ async def trigger_collect(
    if _collect_lock.locked():
        raise HTTPException(status_code=429, detail="수집이 이미 진행 중입니다")

-    # TOCTOU 제거 (R9) — 기존엔 locked() 체크 후 실제 acquire 가 별도 task 안에서 일어나, 그
-    # 사이 다른 요청이 끼어들어 이중 수집 task 가 생길 수 있었다. 핸들러에서 동기적으로(uncontended
-    # Lock.acquire 는 이벤트루프 양보 없이 즉시 완료) acquire 하고 task 의 finally 에서 release.
-    await _collect_lock.acquire()
-
    async def _run_with_lock():
-        try:
+        async with _collect_lock:
            from workers.news_collector import run
            await run()
-        finally:
-            _collect_lock.release()

    asyncio.create_task(_run_with_lock())
    return {"message": "뉴스 수집 시작됨"}
@@ -1,254 +0,0 @@
-"""발행 read API (docsrv-viewer-publish P0-2) — 뷰어가 pull-sync 로 당기는 feed.
-
-published 테이블(발행 워커가 rev 커밋순 gapless 부여)을 rev 커서로 페이지네이션해 반환.
-뷰어 = Bearer(settings.viewer_sync_token) 인증, default-deny. read-only(SELECT 만).
-  GET /published/feed?since={rev}&kind={kind}&limit={n}
-    rev > since 행을 rev ASC 로 limit 만큼. kind 옵션(study_question|study_explanation|... 후속).
-    tombstone(deleted=true)도 1급 이벤트로 포함 — 뷰어가 pub_id 로 로컬 삭제(stale 회피).
-
-rev 커서 안전성: 워커가 pg_advisory_xact_lock 단일 라이터로 배치 rev 를 한 트랜잭션에
-부여·커밋 → 리더는 rev N 을 N-1 없이 보지 못함(부분가시 0). 뷰어는 next_since 로 반복.
-
-엔벨로프 schema_version = 전송 계약 버전(payload 행별 schema_version 과 별개).
-미지원 버전 가시거부는 뷰어 책임(no-silent-fallback) — 여기선 행별 schema_version 그대로 전달.
-"""
-from __future__ import annotations
-
-import hmac
-import logging
-import logging
-from datetime import datetime, timezone
-
-from fastapi import APIRouter, Depends, Header, HTTPException, Query
-from pydantic import BaseModel
-from sqlalchemy import select, text
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from core.config import settings
-from core.database import async_session
-from models.published import Published
-from models.published import Published
-from services.queue_overview import build_overview
-
-logger = logging.getLogger(__name__)
-
-router = APIRouter()
-
-# feed 엔벨로프(전송 계약) 버전 — payload schema_version 과 독립.
-FEED_SCHEMA_VERSION = 1
-DEFAULT_LIMIT = 200
-MAX_LIMIT = 500
-
-
-def _verify_token(authorization: str | None = Header(default=None)) -> None:
-    """뷰어↔DS 발행 채널 Bearer 인증. default-deny(미설정=503). 상수시간 비교(internal_study 정본).
-
-    이 토큰은 정답 포함 study payload 를 노출하므로 hmac.compare_digest 로 timing side-channel 차단.
-    """
-    if not settings.viewer_sync_token:
-        raise HTTPException(status_code=503, detail="viewer_sync_token not configured")
-    if not authorization or not authorization.lower().startswith("bearer "):
-        raise HTTPException(status_code=401, detail="missing Bearer token")
-    token = authorization[7:].strip()
-    if not hmac.compare_digest(token, settings.viewer_sync_token):
-        raise HTTPException(status_code=403, detail="invalid token")
-
-
-async def _session() -> AsyncSession:
-    async with async_session() as s:
-        yield s
-
-
-class FeedItem(BaseModel):
-    pub_id: str          # opaque+stable = 뷰어 dedup키 = progress키
-    kind: str
-    source_id: int       # DS 내부 소스 행 id (ingest write-back 역해소용, P2)
-    rev: int
-    deleted: bool        # tombstone — 뷰어 로컬 삭제 트리거
-    schema_version: int  # payload 모양 버전(뷰어 range 수용)
-    payload: dict        # render-ready projection (tombstone 이면 {})
-
-
-class FeedResponse(BaseModel):
-    schema_version: int       # 엔벨로프(전송 계약) 버전
-    items: list[FeedItem]
-    next_since: int           # 다음 호출 since (이 배치 max rev; 빈 배치면 입력 since 유지)
-    has_more: bool            # limit 가득 = 더 있을 수 있음(뷰어 반복)
-
-
-@router.get("/feed", response_model=FeedResponse)
-async def published_feed(
-    since: int = Query(0, ge=0),
-    kind: str | None = Query(None, max_length=40),
-    limit: int = Query(DEFAULT_LIMIT, ge=1, le=MAX_LIMIT),
-    _auth: None = Depends(_verify_token),
-    session: AsyncSession = Depends(_session),
-):
-    """rev > since 행을 rev ASC 로 limit 만큼 반환. 뷰어가 next_since 로 incremental pull."""
-    stmt = select(Published).where(Published.rev > since)
-    if kind:
-        stmt = stmt.where(Published.kind == kind)
-    stmt = stmt.order_by(Published.rev.asc()).limit(limit)
-    rows = (await session.execute(stmt)).scalars().all()
-
-    items = [
-        FeedItem(
-            pub_id=r.pub_id,
-            kind=r.kind,
-            source_id=r.source_id,
-            rev=r.rev,
-            deleted=r.deleted,
-            schema_version=r.schema_version,
-            payload=r.payload if r.payload is not None else {},
-        )
-        for r in rows
-    ]
-    next_since = items[-1].rev if items else since
-    has_more = len(rows) == limit
-    logger.info(
-        "published_feed since=%s kind=%s returned=%s next_since=%s has_more=%s",
-        since, kind, len(items), next_since, has_more,
-    )
-    return FeedResponse(
-        schema_version=FEED_SCHEMA_VERSION,
-        items=items,
-        next_since=next_since,
-        has_more=has_more,
-    )
-
-
-# ── P1-1: 뉴스/다이제스트 발행 read API (docsrv-viewer-publish) ────────────────────
-# global_digests(일간 컨테이너) + digest_topics(토픽 N, digest_id FK) -> render-ready
-# read-time projection. content-type 파라미터화(plan r2): version 커서=global_digests.id
-# (일간 단일 라이터라 gapless 불요·gap 무해) · pub_id=date-as-id(admin-gated feed 라 opacity
-# 불필요) · tombstone 없음(다이제스트 미삭제). 엔벨로프는 /feed 와 동일(FeedResponse)=뷰어 재사용.
-# scaffold-first: DIGEST_PUBLISH_ENABLED off(기본)=503(명시적 미가동, no-silent).
-DIGEST_PAYLOAD_SCHEMA_VERSION = 1
-
-
-@router.get("/digest", response_model=FeedResponse)
-async def published_digest(
-    since: int = Query(0, ge=0),
-    limit: int = Query(DEFAULT_LIMIT, ge=1, le=MAX_LIMIT),
-    _auth: None = Depends(_verify_token),
-    session: AsyncSession = Depends(_session),
-):
-    """global_digests.id > since 를 id ASC 로 limit 만큼. 각 digest 에 topics 조인해 render-ready 반환."""
-    if not settings.digest_publish_enabled:
-        raise HTTPException(status_code=503, detail="digest publish not enabled (scaffold)")
-
-    drows = (await session.execute(
-        text(
-            "SELECT id, digest_date, status, total_articles, total_topics, total_countries, created_at "
-            "FROM global_digests WHERE id > :since ORDER BY id ASC LIMIT :limit"
-        ),
-        {"since": since, "limit": limit},
-    )).mappings().all()
-
-    if not drows:
-        return FeedResponse(schema_version=FEED_SCHEMA_VERSION, items=[], next_since=since, has_more=False)
-
-    ids = [r["id"] for r in drows]
-    trows = (await session.execute(
-        text(
-            "SELECT digest_id, topic_rank, topic_label, summary, country, article_count, importance_score "
-            "FROM digest_topics WHERE digest_id = ANY(:ids) ORDER BY digest_id ASC, topic_rank ASC"
-        ),
-        {"ids": ids},
-    )).mappings().all()
-
-    topics_by_digest: dict[int, list[dict]] = {}
-    for t in trows:
-        topics_by_digest.setdefault(t["digest_id"], []).append({
-            "rank": t["topic_rank"],
-            "label": t["topic_label"],
-            "summary": t["summary"],
-            "country": t["country"],
-            "article_count": t["article_count"],
-            "importance": t["importance_score"],
-        })
-
-    items = []
-    for r in drows:
-        d_date = r["digest_date"].isoformat() if r["digest_date"] else None
-        items.append(FeedItem(
-            pub_id=f"digest:{d_date}",
-            kind="digest",
-            source_id=r["id"],
-            rev=r["id"],
-            deleted=False,
-            schema_version=DIGEST_PAYLOAD_SCHEMA_VERSION,
-            payload={
-                "digest_date": d_date,
-                "status": r["status"],
-                "total_articles": r["total_articles"],
-                "total_topics": r["total_topics"],
-                "total_countries": r["total_countries"],
-                "generated_at": r["created_at"].isoformat() if r["created_at"] else None,
-                "topics": topics_by_digest.get(r["id"], []),
-            },
-        ))
-    next_since = items[-1].rev
-    has_more = len(drows) == limit
-    logger.info(
-        "published_digest since=%s returned=%s next_since=%s has_more=%s",
-        since, len(items), next_since, has_more,
-    )
-    return FeedResponse(
-        schema_version=FEED_SCHEMA_VERSION,
-        items=items,
-        next_since=next_since,
-        has_more=has_more,
-    )
-
-
-# ── P1-2: 가공현황 라이브 스냅샷 API (+P1-4 점검 플래그) ──────────────────────────
-# 뷰어 리포트 '문서 가공현황' 섹션용. build_overview(기존 서비스) 재사용 + source_health
-# 조인 요약. pull-through(저장 X) — 라이브 수치라 캐시 없음, 소비자(뷰어)가 2~3s timeout 책임
-# (plan P1-2). P1-4: maintenance 플래그 동봉 — 소프트락/점검이 워커를 멈춰 수치가 정체로
-# 보일 때 뷰어가 '점검·실험 중' 배너로 구분(표면 != 데이터). read-only.
-@router.get("/processing-status")
-async def published_processing_status(
-    _auth: None = Depends(_verify_token),
-    session: AsyncSession = Depends(_session),
-):
-    """가공현황 스냅샷: queue overview + source_health 요약 + maintenance 플래그."""
-    overview = await build_overview(session)
-
-    sh_rows = (await session.execute(text(
-        "SELECT ns.name, ns.category, sh.circuit_state, sh.consecutive_failures, sh.empty_streak, "
-        "sh.last_success_at, sh.last_probe_ok "
-        "FROM source_health sh JOIN news_sources ns ON ns.id = sh.source_id "
-        "ORDER BY (sh.circuit_state <> 'closed') DESC, sh.consecutive_failures DESC"
-    ))).mappings().all()
-
-    by_state: dict[str, int] = {}
-    problems: list[dict] = []
-    for r in sh_rows:
-        st = r["circuit_state"]
-        by_state[st] = by_state.get(st, 0) + 1
-        if st != "closed":
-            problems.append({
-                "name": r["name"],
-                "category": r["category"],
-                "circuit_state": st,
-                "consecutive_failures": r["consecutive_failures"],
-                "empty_streak": r["empty_streak"],
-                "last_success_at": r["last_success_at"].isoformat() if r["last_success_at"] else None,
-                "last_probe_ok": r["last_probe_ok"],
-            })
-
-    return {
-        "schema_version": 1,
-        "generated_at": datetime.now(timezone.utc).isoformat(),
-        "overview": overview,
-        "sources": {
-            "total": len(sh_rows),
-            "by_circuit_state": by_state,
-            "problems": problems,
-        },
-        "maintenance": {
-            "active": settings.maintenance_mode,
-            "note": settings.maintenance_note,
-        },
-    }
@@ -1,208 +0,0 @@
-"""처리 머신 보드 API — /api/queue/* (plan ds-processing-ui-6an → ds-board-engines-1).
-
- GET /overview: 홈 stage 평면 테이블을 "머신 관점 보드(누가 일하나)"로 — 집계
-  로직은 services/queue_overview.py (순수 판정부 분리). 응답 스키마는 FE 와
-  계약 고정. 응답에 raw 모델명 노출 금지 — 머신 label 만 (엔진/모델 표기는
-  FE 정적 맵 책임).
- GET /failed + POST /retry|/skip: 실패 처리 (ds-board-engines-1) — 영구 실패
-  (자동 재시도 3회 소진)의 유일한 사용자 조치 경로. 일괄 조치는 FE 가 그룹의
-  id 목록을 모아 보낸다 (서버측 패턴 매칭 없음 — raw 식별자/패턴 미수신).
-"""
-
-from datetime import datetime
-from typing import Annotated, Literal
-
-from fastapi import APIRouter, Depends
-from pydantic import BaseModel, Field
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from core.auth import get_current_user
-from core.database import get_session
-from models.user import User
-from services.queue_overview import (
-    build_overview,
-    fetch_failed_items,
-    retry_failed,
-    skip_failed,
-)
-
-router = APIRouter()
-
-
-class CurrentItem(BaseModel):
-    """머신이 지금 처리 중인 문서 (최대 2건)."""
-    document_id: int
-    title: str
-    stage: str
-
-
-class MachineCard(BaseModel):
-    """머신 카드 — stage 귀속 합산 + 완료 실적(summarize 는 풀 분리) + state."""
-    key: Literal["gpu", "macmini", "macbook"]
-    label: str
-    state: Literal["active", "deferred", "idle"]
-    stages: list[str]
-    pending: int
-    processing: int
-    failed: int
-    done_1h: int
-    done_today: int
-    deferred_pending: int
-    current: list[CurrentItem]
-
-
-class SummarizeEta(BaseModel):
-    """summarize 풀 ETA — done > inflow 일 때만 eta_minutes 산출."""
-    pending: int
-    done_rate_1h: int
-    inflow_rate_1h: int
-    eta_minutes: int | None
-
-
-class MachineDone(BaseModel):
-    """머신 1대의 summarize 완료 실적 (분담 표시용)."""
-    done_1h: int
-    done_today: int
-
-
-class SummarizeByMachine(BaseModel):
-    """summarize 풀의 머신별 완료 실적 분담 — 보드 레인의 '맥미니 vs 맥북'
-    오프로드 가시화용. rows_to_summarize_split 이 이미 계산하던 값의 노출
-    (ds-board-merged A-1, 신규 수집 SQL 0)."""
-    macmini: MachineDone
-    macbook: MachineDone
-
-
-class TrendBucket(BaseModel):
-    """summarize 24h 추이 버킷 — hour 는 KST "HH:00" 라벨."""
-    hour: str
-    inflow: int
-    done: int
-
-
-class Totals(BaseModel):
-    """전 stage 합계."""
-    pending: int
-    processing: int
-    failed: int
-
-
-class StageRow(BaseModel):
-    """단계별 현황 행 — 흐름 노드/상세 패널용.
-
-    done_1h/created_1h = 처리율·유입률 (유입 우세 판정 + ETA 의 FE 재료,
-    ds-board-engines-1 추가 — 수집 SQL 에 이미 있던 값의 노출).
-    """
-    stage: str
-    pending: int
-    processing: int
-    failed: int
-    done_1h: int
-    created_1h: int
-    done_today: int
-    oldest_pending_age_sec: int | None
-
-
-class BackgroundJobItem(BaseModel):
-    """큐 밖 관리 스크립트(백필 등) 작업 — processing_queue 가 못 보는 사각지대 노출.
-    stale = running 인데 heartbeat 가 오래 끊김(프로세스 사망 추정)."""
-    id: int
-    kind: str
-    machine: str
-    label: str | None
-    state: Literal["running", "done", "failed"]
-    processed: int
-    total: int | None
-    elapsed_sec: int
-    stale: bool
-    error: str | None
-
-
-class QueueOverviewResponse(BaseModel):
-    machines: list[MachineCard]
-    stages: list[StageRow]
-    summarize_eta: SummarizeEta
-    summarize_by_machine: SummarizeByMachine
-    trend_24h: list[TrendBucket]
-    totals: Totals
-    background_jobs: list[BackgroundJobItem] = []
-
-
-class FailedItem(BaseModel):
-    """영구 실패 행 — 실패 드로어 표시 단위."""
-    id: int
-    stage: str
-    document_id: int
-    title: str
-    attempts: int
-    max_attempts: int
-    error_message: str | None
-    failed_at: datetime | None
-
-
-class FailedListResponse(BaseModel):
-    items: list[FailedItem]
-    total: int
-
-
-class QueueActionRequest(BaseModel):
-    """재시도/건너뛰기 대상 — 실패 행 id 목록 (FE 가 그룹핑 후 전달)."""
-    ids: list[int] = Field(min_length=1, max_length=300)
-
-
-class RetryResponse(BaseModel):
-    requested: int
-    retried: int
-    not_retried: int
-
-
-class SkipResponse(BaseModel):
-    requested: int
-    skipped: int
-    not_skipped: int
-
-
-@router.get("/overview", response_model=QueueOverviewResponse)
-async def get_queue_overview(
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """머신 관점 처리 보드 + summarize ETA 집계 (라이브 계산, 신규 테이블 0)"""
-    return QueueOverviewResponse.model_validate(await build_overview(session))
-
-
-@router.get("/failed", response_model=FailedListResponse)
-async def get_failed_items(
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """영구 실패 행 목록 (문서 제목 포함, 최대 300건)"""
-    items = await fetch_failed_items(session)
-    return FailedListResponse(
-        items=[FailedItem.model_validate(i) for i in items],
-        total=len(items),
-    )
-
-
-@router.post("/retry", response_model=RetryResponse)
-async def retry_failed_items(
-    body: QueueActionRequest,
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """실패 행 재시도 — attempts 리셋 + pending 복귀.
-
-    not_retried = 같은 (문서, 단계) 의 active 행 충돌(uq_queue_active) 또는
-    이미 failed 가 아닌 행 (중복 클릭 등) — 건드리지 않고 건수만 보고.
-    """
-    return RetryResponse.model_validate(await retry_failed(session, body.ids))
-
-
-@router.post("/skip", response_model=SkipResponse)
-async def skip_failed_items(
-    body: QueueActionRequest,
-    user: Annotated[User, Depends(get_current_user)],
-    session: Annotated[AsyncSession, Depends(get_session)],
-):
-    """실패 행 건너뛰기 — completed 마킹(payload.skipped_by_user) + 연쇄 없음"""
-    return SkipResponse.model_validate(await skip_failed(session, body.ids))
@@ -12,7 +12,6 @@
 import asyncio
 import hmac
 import time
-from datetime import date
 from typing import Annotated, Literal

 from fastapi import APIRouter, BackgroundTasks, Depends, Header, Query
@@ -32,8 +31,6 @@ from services.search.fusion_service import DEFAULT_FUSION
 from services.search.grounding_check import check as grounding_check
 from services.search.refusal_gate import RefusalDecision, decide as refusal_decide
 from services.search import query_rewriter
-from services.search.retrieval_service import AxisFilter
-from services.search.result_decorate import compute_facets, decorate_version_status
 from services.search.search_pipeline import PipelineResult, run_search
 from services.search.synthesis_service import SynthesisResult, synthesize
 from services.search.verifier_service import VerifierResult, verify
@@ -73,14 +70,6 @@ class SearchResult(BaseModel):
    # PR-RAG-Time-1: freshness decay 디버그 메타. apply_freshness_decay 가 채움.
    # 비적용 row 도 채워짐(freshness_policy=None). base_score 는 항상 보존.
    freshness_debug: dict | None = None
-    # 안전 자료실 C-1: 분류 축 메타 (3 leg SELECT 에서 채움 — additive, ranking 무관).
-    # D-1 UI 결과 카드 유형별 렌더 + 해외 법령(B-5) 가동 시 국가 무표지 혼재 차단의 선행 조건.
-    material_type: str | None = None
-    jurisdiction: str | None = None
-    published_date: date | None = None
-    # 안전 자료실 C-1 후속: 법령 버전 상태(legal_meta.version_status) — wrapper 1회 decorate.
-    # law 결과만 채워짐(legal_meta 위성), 그 외/무매핑 law = None. D-1 버전 뱃지 선행.
-    version_status: str | None = None


 # ─── Phase 0.4: 디버그 응답 스키마 ─────────────────────────
@@ -112,9 +101,6 @@ class SearchResponse(BaseModel):
    query: str
    mode: str
    debug: SearchDebug | None = None
-    # 안전 자료실 C-1 후속: facets=true 일 때만 채워짐(미요청=None, byte 불변).
-    # top-K 결과 내 분류 축 분포 라벨 {axis: {label: count}}.
-    facets: dict[str, dict[str, int]] | None = None


 def _to_debug_candidates(rows: list[SearchResult], n: int = 20) -> list[DebugCandidate]:
@@ -219,23 +205,9 @@ async def search(
            "분리용. production 검색에는 사용 금지 (latency 큼)."
        ),
    ),
-    material_type: str | None = Query(
-        None, description="안전 자료실 C-1: 자료유형 필터 CSV (law,paper,incident,...). material_type = ANY"),
-    jurisdiction: str | None = Query(
-        None, description="안전 자료실 C-1: 관할 필터 (KR/US/EU/JP/GB/INT)"),
-    year_from: int | None = Query(None, ge=1900, le=2100, description="published_date 연도 하한 (NULL=created_at fallback)"),
-    year_to: int | None = Query(None, ge=1900, le=2100, description="published_date 연도 상한"),
-    facets: bool = Query(False, description="안전 자료실 C-1 후속: top-K 결과 분류 축 분포(material_type/jurisdiction/version_status)를 응답 facets 에 집계. 미지정=계산/노출 0"),
 ):
    """문서 검색 — FTS + ILIKE + 벡터 결합 (Phase 3.1 이후 run_search wrapper)"""
    try:
-        axis = AxisFilter(
-            material_types=[m.strip() for m in material_type.split(",") if m.strip()]
-            if material_type else None,
-            jurisdiction=jurisdiction,
-            year_from=year_from,
-            year_to=year_to,
-        )
        pr = await run_search(
            session,
            q,
@@ -251,7 +223,6 @@ async def search(
            rewrite_backend=rewrite_backend,
            corpus_variant=corpus_variant,
            exact_knn=exact_knn,
-            axis=axis,
        )
    except ValueError as e:
        # _resolve_backend / _resolve_reranker / _resolve_rewrite_backend / _resolve_corpus_variant unknown slug → HTTP 400
@@ -291,7 +262,7 @@ async def search(
            content={
                "error_reason": "unknown_embedding_backend",
                "backend_requested": embedding_backend,
-                "allowed": ["baseline"],
+                "allowed": ["baseline", "cand_me5_large_inst", "cand_snowflake_l_v2"],
                "detail": msg,
            },
        )
@@ -342,17 +313,12 @@ async def search(

    debug_obj = _build_search_debug(pr) if debug else None

-    # 안전 자료실 C-1 후속 — wrapper decoration (검색 코어 무접촉, ranking 무관)
-    await decorate_version_status(session, pr.results)   # 법령 결과에 version_status
-    facets_obj = compute_facets(pr.results) if facets else None
-
    return SearchResponse(
        results=pr.results,
        total=len(pr.results),
        query=q,
        mode=pr.mode,
        debug=debug_obj,
-        facets=facets_obj,
    )


@@ -710,9 +676,7 @@ async def ask(
    # 30s 로 align → classifier 동작 안정. ask 응답 latency 상한 ↑ 의도.
    try:
        classifier_result = await asyncio.wait_for(classifier_task, timeout=30.0)
-    except asyncio.CancelledError:
-        raise  # 요청 취소는 전파 — broad except 가 삼키지 않게 명시 (R3)
-    except Exception:
+    except (asyncio.TimeoutError, Exception):
        classifier_result = ClassifierResult("timeout", None, [], [], 0.0)

    defense_log["classifier"] = {
@@ -874,9 +838,7 @@ async def ask(
        # → classifier 와 동일 패턴 (search.py:522 가 6s→15s swap 했던 case). 10s 로 align.
        try:
            verifier_result = await asyncio.wait_for(verifier_task, timeout=10.0)
-        except asyncio.CancelledError:
-            raise  # 요청 취소는 전파 — broad except 가 삼키지 않게 명시 (R3)
-        except Exception:
+        except (asyncio.TimeoutError, Exception):
            verifier_result = VerifierResult("timeout", [], 0.0)

    # Verifier contradictions → grounding flags 머지 (prefix 로 구분, severity 3단계)
@@ -21,14 +21,12 @@ from sqlalchemy.exc import IntegrityError
 from sqlalchemy.ext.asyncio import AsyncSession

 from core.auth import get_current_user
-from core.config import settings
 from core.database import get_session
 from models.study_memo_card import StudyMemoCard, StudyMemoCardEvidence, record_card_view
 from models.study_memo_card_progress import StudyMemoCardProgress, rate_card
 from models.study_question import StudyQuestion
 from models.user import User
 from services.study.card_normalize import compute_dedup_hash
-from services.study.publish_enqueue import enqueue_card_progress_publish, enqueue_card_publish

 router = APIRouter()

@@ -250,18 +248,9 @@ async def approve_batch(
            StudyMemoCard.needs_review,
        )
        .values(needs_review=False, flagged_by=None, flagged_at=None)
-        .returning(StudyMemoCard.id)
    )
-    approved_ids = list(result.scalars().all())
-    # 방금 검수완료된 카드 발행(같은 tx, flag off 면 no-op). S-2.
-    if settings.study_publish_enabled and approved_ids:
-        cards = (
-            await session.execute(select(StudyMemoCard).where(StudyMemoCard.id.in_(approved_ids)))
-        ).scalars().all()
-        for c in cards:
-            await enqueue_card_publish(session, c)
    await session.commit()
-    return {"approved": len(approved_ids)}
+    return {"approved": result.rowcount or 0}


 # ─── 복습(SR) 트랙 ───
@@ -321,9 +310,6 @@ async def rate(
    if outcome is None:
        raise HTTPException(status_code=422, detail=f"invalid outcome: {body.outcome!r}")
    progress = await rate_card(session, card=card, outcome=outcome, now=datetime.now(timezone.utc))
-    # 카드 SR 상태 발행(같은 tx, flag off=no-op) — ALL row(sentinel/terminal 포함). S-4.
-    if settings.study_publish_enabled:
-        await enqueue_card_progress_publish(session, progress)
    await session.commit()
    return RateResult(
        card_id=card.id, outcome=outcome, review_stage=progress.review_stage, due_at=progress.due_at
@@ -406,9 +392,6 @@ async def update_card(
            card.flagged_by = None
            card.flagged_at = None

-    # 발행 재투영/tombstone(같은 tx) — 검수완료=발행·검수대기복귀=tombstone(상태 기반). S-2.
-    if settings.study_publish_enabled:
-        await enqueue_card_publish(session, card)
    try:
        await session.commit()
    except IntegrityError:
@@ -431,7 +414,4 @@ async def delete_card(
    card = await session.get(StudyMemoCard, card_id)
    card = _verify_card(card, user)
    card.deleted_at = datetime.now(timezone.utc)
-    # 발행 tombstone(같은 tx) — 삭제는 feed 1급 이벤트. S-2.
-    if settings.study_publish_enabled:
-        await enqueue_card_publish(session, card)
    await session.commit()
@@ -39,9 +39,6 @@ from services.study.explanation_rag import (
    gather_explanation_context,
    render_evidence_block,
 )
-from services.study.publish_enqueue import enqueue_publish, enqueue_question_publish
-from services.study.publish_projection import KIND_CARD, KIND_EXPLANATION, KIND_QUESTION
-from services.study.outcome import derive_outcome

 logger = logging.getLogger(__name__)
 router = APIRouter()
@@ -546,9 +543,6 @@ async def create_question_in_topic(
    )
    session.add(q)
    await session.flush()
-    # 발행 outbox 적재(같은 tx, flag off 면 no-op) — 신규 문항 발행. P0-1b.
-    if settings.study_publish_enabled:
-        await enqueue_question_publish(session, q)
    await session.commit()

    stats = QuestionAttemptStats(attempt_count=0, correct_count=0, wrong_count=0)
@@ -911,16 +905,9 @@ async def update_question(
    # 카드는 '구' ai_explanation 에서 추출됐으므로 정정 후 stale 가능 — 즉시 가시화 플래그.
    # 최종 stale 정리는 card_extract 워커의 supersede 가 책임(새 버전 추출 시 구버전 retire).
    if AI_STALE_TRIGGER & fields_set:
-        flagged_card_ids = await flag_cards_for_source(session, source_question_id=q.id, reason="source_changed")
-        # 발행 자격 잃은(검수대기 복귀) 파생 카드 tombstone(같은 tx). S-2.
-        if settings.study_publish_enabled:
-            for cid in flagged_card_ids:
-                await enqueue_publish(session, kind=KIND_CARD, source_id=cid, payload=None, deleted=True)
+        await flag_cards_for_source(session, source_question_id=q.id, reason="source_changed")

    q.updated_at = datetime.now(timezone.utc)
-    # 발행 재투영(같은 tx) — 문항 갱신 반영. 해설은 ready 일 때만 동봉, stale→tombstone 은 P1-3. P0-1b.
-    if settings.study_publish_enabled:
-        await enqueue_question_publish(session, q)
    await session.commit()

    stats = await _attempt_stats(session, user.id, question_id)
@@ -983,16 +970,7 @@ async def soft_delete_question(
    )
    # 공부 암기노트: 소스 문제 삭제 시 파생 암기카드를 검토 대기로 마킹(source_deleted).
    # study_questions 는 soft-delete 만이라 카드 FK CASCADE 는 미발동 — 이 훅이 실 경로.
-    flagged_card_ids = await flag_cards_for_source(session, source_question_id=q.id, reason="source_deleted")
-    # 발행 자격 잃은 파생 카드 tombstone(같은 tx). S-2.
-    if settings.study_publish_enabled:
-        for cid in flagged_card_ids:
-            await enqueue_publish(session, kind=KIND_CARD, source_id=cid, payload=None, deleted=True)
-    # 발행 tombstone(같은 tx) — 삭제는 feed 1급 이벤트(raw DELETE 금지·워커 경유). 해설 본문 있으면 그 kind 도. P0-1b.
-    if settings.study_publish_enabled:
-        await enqueue_publish(session, kind=KIND_QUESTION, source_id=q.id, payload=None, deleted=True)
-        if q.ai_explanation:
-            await enqueue_publish(session, kind=KIND_EXPLANATION, source_id=q.id, payload=None, deleted=True)
+    await flag_cards_for_source(session, source_question_id=q.id, reason="source_deleted")
    await session.commit()


@@ -1014,27 +992,24 @@ async def submit_attempt(
    q = await session.get(StudyQuestion, question_id)
    q = _verify_question_ownership(q, user)

-    # 채점 단일 소스 — 뷰어 ingest 와 동일 함수(P2). 선택 없고 unsure 아니면 422.
-    try:
-        selected, is_correct, outcome = derive_outcome(
-            body.selected_choice, body.is_unsure, q.correct_choice
+    if body.is_unsure:
+        selected = None
+        is_correct = False
+        outcome = "unsure"
+    elif body.selected_choice is None:
+        raise HTTPException(
+            status_code=422,
+            detail="selected_choice (1~4) 또는 is_unsure=true 가 필요합니다",
        )
-    except ValueError as e:
-        raise HTTPException(status_code=422, detail=str(e))
+    else:
+        selected = body.selected_choice
+        is_correct = selected == q.correct_choice
+        outcome = "correct" if is_correct else "wrong"

    # PR-10: 세션 연동. 기본은 None.
    quiz_session: StudyQuizSession | None = None
    if body.quiz_session_id is not None:
-        # FOR UPDATE 로 행 잠금 (R9) — 모바일 더블탭/재시도로 같은 세션에 동시 제출이 들어오면
-        # 둘 다 cursor=N 을 읽고 둘 다 cursor+1·count 가산하는 race(이중 가산). 잠금으로 직렬화 →
-        # 두 번째 제출은 첫 commit 후 cursor=N+1 을 보고 cursor 불일치 409 로 거부된다.
-        quiz_session = (
-            await session.execute(
-                select(StudyQuizSession)
-                .where(StudyQuizSession.id == body.quiz_session_id)
-                .with_for_update()
-            )
-        ).scalar_one_or_none()
+        quiz_session = await session.get(StudyQuizSession, body.quiz_session_id)
        if quiz_session is None or quiz_session.user_id != user.id:
            raise HTTPException(status_code=404, detail="quiz_session 을 찾을 수 없습니다")
        if quiz_session.study_topic_id != q.study_topic_id:
@@ -1559,8 +1534,8 @@ async def delete_question_image(

 # ─── PR-3: AI 풀이 생성 엔드포인트 ───

-# 2026-06-20: config 단일소스 (구 하드코딩 30s = 빠른 Gemma 기준).
-LLM_TIMEOUT_S = settings.llm_call_timeout_s
+# MLX 호출 timeout (초). MLX gate + 26B 추론 평균 ~10s, 안전 마진.
+LLM_TIMEOUT_S = 30.0
 # 프롬프트 템플릿 lazy load
 _PROMPT_PATH = "study_question_explanation.txt"
 _prompt_cache: str | None = None
@@ -1729,9 +1704,6 @@ async def generate_ai_explanation(
    primary_name = ai_client.ai.primary.model if hasattr(ai_client.ai.primary, "model") else "primary"
    q.ai_explanation_model = f"mlx:{primary_name}"
    q.updated_at = q.ai_explanation_generated_at
-    # 발행 재투영(같은 tx) — 실시간 해설 ready → 문항+해설 발행. P0-1b.
-    if settings.study_publish_enabled:
-        await enqueue_question_publish(session, q)
    await session.commit()

    return AIExplanationResponse(
@@ -33,7 +33,6 @@ from ai.client import AIClient, strip_thinking
 from eid.ai import EidAIClient
 from eid.compose import compose
 from core.auth import get_current_user
-from core.config import settings
 from core.database import get_session
 from core.library import LIBRARY_PREFIX, normalize_library_path
 from models.document import Document
@@ -47,8 +46,6 @@ from models.eid_study_weakness import EidStudyWeakness
 from models.eid_review_set_draft import EidReviewSetDraft
 from models.user import User
 from services.search.llm_gate import Priority, acquire_mlx_gate
-from services.study.publish_enqueue import enqueue_publish, enqueue_topic_publish
-from services.study.publish_projection import KIND_TOPIC
 from services.study.subject_note_rag import (
    SubjectNoteContext,
    gather_subject_note_context,
@@ -469,9 +466,6 @@ async def create_study_topic(
    session.add(topic)
    try:
        await session.flush()
-        # 발행 outbox 적재(같은 tx, flag off 면 no-op) — 신규 주제 발행. S-1.
-        if settings.study_publish_enabled:
-            await enqueue_topic_publish(session, topic)
        await session.commit()
    except IntegrityError:
        await session.rollback()
@@ -701,10 +695,6 @@ async def update_study_topic(
        topic.focused_at = datetime.now(timezone.utc) if body.focused else None

    topic.updated_at = datetime.now(timezone.utc)
-    # 발행 재투영(같은 tx) — 주제 메타 갱신 반영. payload(name·exam_round_size) 무변경(focused 등)
-    # 은 워커 (payload_hash, deleted) 디둡이 rev 안 올리고 흡수 = churn 없음. S-1.
-    if settings.study_publish_enabled:
-        await enqueue_topic_publish(session, topic)
    try:
        await session.commit()
    except IntegrityError:
@@ -780,9 +770,6 @@ async def delete_study_topic(
    )

    topic.deleted_at = datetime.now(timezone.utc)
-    # 발행 tombstone(같은 tx) — 삭제는 feed 1급 이벤트(raw DELETE 금지·워커 경유). S-1.
-    if settings.study_publish_enabled:
-        await enqueue_publish(session, kind=KIND_TOPIC, source_id=topic.id, payload=None, deleted=True)
    await session.commit()


@@ -1028,7 +1015,7 @@ async def detach_session_from_topic(

 # ─── PR-9: 분야 설명 (study_topic_subject_notes) ───

-SUBJECT_NOTE_TIMEOUT_S = settings.llm_call_timeout_s
+SUBJECT_NOTE_TIMEOUT_S = 30.0
 _SUBJECT_NOTE_PROMPT_PATH = "study_subject_note.txt"
 _subject_note_prompt_cache: str | None = None

@@ -1255,7 +1242,7 @@ async def generate_subject_note(
 # 워커(study_weakness)가 산출한 최신 eid_study_weakness 스냅샷을 '학습 진단 코치'(study overlay)
 # 로 번역. 약점/태도 '판정'은 코드 derived(스냅샷) — LLM 은 스냅샷 블록 값만 인용(환각 약점 차단).
 # compose("study_diagnosis") = persona+rules+study overlay(+{placeholder}) → 표면이 블록 substitute.
-DIAGNOSIS_TIMEOUT_S = settings.llm_call_timeout_s
+DIAGNOSIS_TIMEOUT_S = 40.0


 class StudyDiagnosisResponse(BaseModel):
@@ -30,11 +30,6 @@ class AIModelConfig(BaseModel):
    # None = MLX/OpenAI server default. Anthropic branch 는 미적용 (별 plan 범위).
    temperature: float | None = None
    top_p: float | None = None
-    # mlx 네이티브 샘플링 — 한국어 장문 코드스위칭(CJK/라틴 누수)·반복루프 억제용.
-    # Qwen3 권장: top_k=20, repetition_penalty 1.05~1.1. None = 서버 기본값(주입 안 함).
-    # OpenAI 호환 분기(mlx)만 적용 — Anthropic 분기는 미적용(별 범위).
-    repetition_penalty: float | None = None
-    top_k: int | None = None


 class DeepSummaryBacklogConfig(BaseModel):
@@ -103,10 +98,6 @@ class AIConfig(BaseModel):
    classifier: AIModelConfig | None = None
    # Phase 3.5b: semantic verifier (optional — 없으면 grounding-only). PR #20 이후 Mac mini 26B MLX endpoint (initial = exaone3.5).
    verifier: AIModelConfig | None = None
-    # ds-macbook-offload-1: 심층 전용 슬롯 (optional). 맥북 M5 Max Qwen3.6-27B — llm-router :8890
-    # 경유(model=qwen-macbook alias, wake preflight 재사용). 부재 시 deep_summary 는 기존
-    # primary(맥미니 26B) 경로 그대로 = 기능 미활성. 명시 opt-in — silent fallback 없음.
-    deep: AIModelConfig | None = None
    # Legacy: vision 슬롯 (현재 사용처 0 — Document Server 는 OCR/STT 별도 서비스).
    # 제거 진행 중이므로 optional 로 관대한 로딩 유지.
    vision: AIModelConfig | None = None
@@ -163,47 +154,15 @@ class Settings(BaseModel):
    # 업로드 한도 (authoritative policy)
    upload: UploadConfig = UploadConfig()

-    # 생성 LLM 홀드 (2026-06-11): config.yaml pipeline.held_stages 에 든 이름의
-    # 컨슈머/워커는 claim 자체를 하지 않는다 (attempts 미소모, pending 적체 = 의도).
-    # 유효 키 = 큐 stage 명(classify/summarize/deep_summary) + cron/컨슈머 키(digest,
-    # briefing, study_explanation, study_session_analysis, study_memo_card).
-    # 빈 리스트 = 무동작 (기존 동작 그대로).
-    pipeline_held_stages: list[str] = []
-
-    # mlx gate 동시 실행 상한 (2026-06-12, config.yaml pipeline.mlx_gate_concurrency).
-    # 1 = 구 single-inference 동작. 2 = continuous batching 활용 (llm_gate docstring 참조).
-    mlx_gate_concurrency: int = 1
-
-    # digest/briefing 생성 LLM 호출 파라미터 (2026-06-15, 모델 교체 후 타임아웃 단일소스화).
-    # 구 하드코딩 25s(빠른 Gemma 기준)가 Qwen3.6-27B-6bit(콜당 ~90~300s) 교체 sweep 에서
-    # 누락돼 digest 600s 하드캡 초과·briefing 4/4 폴백을 유발 → config 단일소스로 이관.
-    # 동시성은 별 키 아님 — 전역 mlx_gate_concurrency(게이트 단일 budget)가 담당.
-    digest_llm_timeout_s: int = 200
-    digest_llm_attempts: int = 2
-    digest_pipeline_hard_cap_s: int = 1800
-    # 2026-06-20: study/analyze 단일 primary-call 타임아웃 (구 하드코딩 30~60s = 빠른 Gemma 기준,
-    # Qwen 27B 교체 sweep 누락 → 사용자 대면 504 + 워커 영구 stuck). digest 와 동형 단일소스.
-    llm_call_timeout_s: int = 200
-
    # PR-MacMini-Derived-Worker-1: study explanation owner = Mac mini
    # GPU 측은 false 로 설정 (.env), explanation 분기 skip guard 트리거.
    study_explanation_enabled: bool = True
    # 공부 암기노트 Phase 1: card_extract 폴러/consumer 게이트. owner 분리 시 false 로.
    study_card_extract_enabled: bool = True
-    # 발행 레이어(docsrv-viewer-publish): publish_outbox 워커 게이트. 저자/4-A enqueue 결선(P0-1b) 후 true.
-    study_publish_enabled: bool = False
-    digest_publish_enabled: bool = False  # docsrv-viewer-publish P1-1 (뉴스/다이제스트 발행 feed gate)
-    maintenance_mode: bool = False  # P1-4: 점검/실험 중 = 가공현황 배너(표면 != 데이터)
-    maintenance_note: str = ""
-    # 뷰어 write-back ingest(study-to-viewer P2) 게이트. /ingest/study/attempts 활성. 기본 false=inert(503).
-    study_ingest_enabled: bool = False

    # internal endpoint Bearer token (Mac mini derived-worker 호출용)
    internal_worker_token: str = ""

-    # 뷰어↔DS 발행 채널 Bearer token (publish read API P0-2 + ingest P2). Mac mini 토큰과 분리(폭발반경 격리).
-    viewer_sync_token: str = ""
-

 def load_settings() -> Settings:
    """config.yaml + 환경변수에서 설정 로딩"""
@@ -211,13 +170,7 @@ def load_settings() -> Settings:
    database_url = os.getenv("DATABASE_URL", "")
    study_explanation_enabled = os.getenv("STUDY_EXPLANATION_ENABLED", "true").lower() in ("1", "true", "yes")
    study_card_extract_enabled = os.getenv("STUDY_CARD_EXTRACT_ENABLED", "true").lower() in ("1", "true", "yes")
-    study_publish_enabled = os.getenv("STUDY_PUBLISH_ENABLED", "false").lower() in ("1", "true", "yes")
-    digest_publish_enabled = os.getenv("DIGEST_PUBLISH_ENABLED", "false").lower() in ("1", "true", "yes")
-    maintenance_mode = os.getenv("MAINTENANCE_MODE", "false").lower() in ("1", "true", "yes")
-    maintenance_note = os.getenv("MAINTENANCE_NOTE", "")
-    study_ingest_enabled = os.getenv("STUDY_INGEST_ENABLED", "false").lower() in ("1", "true", "yes")
    internal_worker_token = os.getenv("INTERNAL_WORKER_TOKEN", "")
-    viewer_sync_token = os.getenv("VIEWER_SYNC_TOKEN", "")
    jwt_secret = os.getenv("JWT_SECRET", "")
    totp_secret = os.getenv("TOTP_SECRET", "")
    eval_runner_token = os.getenv("EVAL_RUNNER_TOKEN", "")
@@ -265,7 +218,6 @@ def load_settings() -> Settings:
                verifier=(
                    AIModelConfig(**models["verifier"]) if "verifier" in models else None
                ),
-                deep=(AIModelConfig(**models["deep"]) if "deep" in models else None),
                deep_summary_backlog=DeepSummaryBacklogConfig(
                    **ai_raw.get("deep_summary_backlog", {})
                ),
@@ -287,42 +239,6 @@ def load_settings() -> Settings:
            )
        )

-    pipeline_held_stages: list[str] = []
-    mlx_gate_concurrency = 1
-    digest_llm_timeout_s = 200
-    digest_llm_attempts = 2
-    digest_pipeline_hard_cap_s = 1800
-    llm_call_timeout_s = 200
-    if config_path.exists() and raw and "pipeline" in raw:
-        held_raw = (raw.get("pipeline") or {}).get("held_stages") or []
-        # 스칼라(문자열) 오기입 시 char-split 방지 — 단일 항목 리스트로 수용.
-        if not isinstance(held_raw, (list, tuple)):
-            held_raw = [held_raw]
-        pipeline_held_stages = [str(s) for s in held_raw]
-        try:
-            mlx_gate_concurrency = max(
-                1, int((raw.get("pipeline") or {}).get("mlx_gate_concurrency", 1))
-            )
-        except (TypeError, ValueError):
-            mlx_gate_concurrency = 1
-        _pl = raw.get("pipeline") or {}
-        try:
-            digest_llm_timeout_s = max(1, int(_pl.get("digest_llm_timeout_s", 200)))
-        except (TypeError, ValueError):
-            digest_llm_timeout_s = 200
-        try:
-            digest_llm_attempts = max(1, int(_pl.get("digest_llm_attempts", 2)))
-        except (TypeError, ValueError):
-            digest_llm_attempts = 2
-        try:
-            digest_pipeline_hard_cap_s = max(60, int(_pl.get("digest_pipeline_hard_cap_s", 1800)))
-        except (TypeError, ValueError):
-            digest_pipeline_hard_cap_s = 1800
-        try:
-            llm_call_timeout_s = max(1, int(_pl.get("llm_call_timeout_s", 200)))
-        except (TypeError, ValueError):
-            llm_call_timeout_s = 200
-
    taxonomy = raw.get("taxonomy", {}) if config_path.exists() and raw else {}
    document_types = raw.get("document_types", []) if config_path.exists() and raw else []
    upload_cfg = (
@@ -350,19 +266,7 @@ def load_settings() -> Settings:
        upload=upload_cfg,
        study_explanation_enabled=study_explanation_enabled,
        study_card_extract_enabled=study_card_extract_enabled,
-        study_publish_enabled=study_publish_enabled,
-        digest_publish_enabled=digest_publish_enabled,
-        maintenance_mode=maintenance_mode,
-        maintenance_note=maintenance_note,
-        study_ingest_enabled=study_ingest_enabled,
        internal_worker_token=internal_worker_token,
-        viewer_sync_token=viewer_sync_token,
-        pipeline_held_stages=pipeline_held_stages,
-        mlx_gate_concurrency=mlx_gate_concurrency,
-        digest_llm_timeout_s=digest_llm_timeout_s,
-        digest_llm_attempts=digest_llm_attempts,
-        digest_pipeline_hard_cap_s=digest_pipeline_hard_cap_s,
-        llm_call_timeout_s=llm_call_timeout_s,
    )


@@ -1,346 +0,0 @@
-"""크롤링 politeness 코어 (A-4, plan crawl-24x7-1)
-
-개인 아카이빙 권장치를 그대로 박은 공용 fetch 계층:
- per-domain 동시성 1 (asyncio.Lock) + 같은 도메인 연속 요청 5–15초 지연 + jitter
- robots.txt 존중 (urllib.robotparser, 24h 캐시) — 비로그인 공개 크롤링 한정.
-  로그인 세션 fetch (B-3) 는 사용자 행위 성격이라 robots 대신 사람 속도가 기준.
- 정직 식별 UA + 연락처 (익명 크롤링 트랙. 로그인 세션은 브라우저 UA 유지 — B-3)
- 429 = Retry-After 존중 / 5xx = 재시도 가능 / 403 = 차단 신호 (호출측 circuit 연동)
-
-도메인별 마지막 요청 시각 등 rate 상태는 in-process (영속 워터마크는 DB — news_sources).
-SSRF 차단은 core.url_validator.validate_feed_url 재사용 (redirect target 재검증 포함).
-"""
-
-import asyncio
-import base64
-import random
-import time
-import urllib.robotparser
-from urllib.parse import urljoin, urlparse
-
-import httpx
-
-from core.url_validator import validate_feed_url
-from core.utils import setup_logger
-
-# bare getLogger 는 root(WARNING) 상속이라 INFO 대기/차단 로그가 드랍됨 — 타 워커와 동일 설정
-logger = setup_logger("crawl_politeness")
-
-# 정직 식별 UA + 연락처 — 차단 전 연락 통로 (A-4)
-CRAWL_UA = "HyungiPKM-Archiver/1.0 (personal archive; +mailto:hyun49196@gmail.com)"
-
-# 같은 도메인 연속 요청 간격 (초) — 권장치 5–15s + jitter
-_DOMAIN_DELAY_MIN = 5.0
-_DOMAIN_DELAY_MAX = 15.0
-
-# 구독 세션(브라우저) fetch 간격 — 사람 속도 (B-3 ④: 기사 간 수십 초)
-_AUTH_DELAY_MIN = 30.0
-_AUTH_DELAY_MAX = 60.0
-
-# B-3 Playwright 격리 컨테이너 (internal-only, compose DNS)
-_FETCHER_URL = "http://playwright-fetcher:3400"
-_FETCHER_TIMEOUT = 120.0  # 브라우저 기동 + 네비게이션 + settle 포함
-
-# 안티봇 챌린지 페이지 식별 마커 (DataDome/Cloudflare 등) — 좁게 유지(오탐 회피).
-# 실측: 르몽드 기사 = DataDome "Client Challenge" + "Entrez les caractères" CAPTCHA.
-_CHALLENGE_MARKERS = (
-    "Client Challenge",
-    "Entrez les caractères affichés",
-    "Checking your browser before",
-    "captcha-delivery.com",
-    "geo.captcha-delivery",
-    # CF JS 챌린지 인터스티셜의 스크립트 도메인 (aiche.org 실측 2026-06-11) —
-    # fetcher 의 챌린지 대기를 끝까지 통과 못 한 최종 HTML 만 여기 걸린다.
-    "challenges.cloudflare.com",
-)
-
-_ROBOTS_CACHE_TTL = 24 * 3600  # 24h
-_MAX_PAGE_BYTES = 5 * 1024 * 1024  # 피드 fetch 와 동일 5MB cap
-_PAGE_TIMEOUT = 20.0
-_MAX_REDIRECTS = 3
-
-_HTML_CONTENT_TYPES = ("text/html", "application/xhtml+xml")
-
-
-class CrawlFetchError(Exception):
-    """일시 오류 (5xx / timeout / 네트워크) — 큐 재시도 대상."""
-
-
-class CrawlBlocked(Exception):
-    """차단 신호 (403 / 429 / robots disallow) — 재시도보다 backoff/circuit 대상."""
-
-
-class CrawlSkip(Exception):
-    """영구 비대상 (비-HTML / 크기 초과 / SSRF 차단 / 4xx) — 격하 처리 대상."""
-
-
-# 도메인별 직렬화 상태 (in-process)
-_domain_locks: dict[str, asyncio.Lock] = {}
-_domain_last_request: dict[str, float] = {}
-# host → (cached_at, RobotFileParser | None).  None = robots 없음/4xx (전부 허용)
-_robots_cache: dict[str, tuple[float, urllib.robotparser.RobotFileParser | None]] = {}
-
-
-def _domain_of(url: str) -> str:
-    return (urlparse(url).hostname or "").lower()
-
-
-def _get_lock(domain: str) -> asyncio.Lock:
-    if domain not in _domain_locks:
-        _domain_locks[domain] = asyncio.Lock()
-    return _domain_locks[domain]
-
-
-async def _respect_domain_rate(
-    domain: str,
-    delay_min: float = _DOMAIN_DELAY_MIN,
-    delay_max: float = _DOMAIN_DELAY_MAX,
-) -> None:
-    """같은 도메인 직전 요청에서 delay(jitter) 경과할 때까지 대기."""
-    last = _domain_last_request.get(domain)
-    if last is not None:
-        delay = random.uniform(delay_min, delay_max)
-        wait = last + delay - time.monotonic()
-        if wait > 0:
-            # silent sleep 금지 — politeness 동작 검증·운영 관찰 가시성
-            logger.info("[politeness] %s %.1fs 대기", domain, wait)
-            await asyncio.sleep(wait)
-
-
-async def _fetch_robots(client: httpx.AsyncClient, scheme: str, host: str):
-    """robots.txt 조회. 4xx/부재 = 전부 허용(None), 5xx/오류 = 보수적으로 이번 사이클 차단."""
-    robots_url = f"{scheme}://{host}/robots.txt"
-    try:
-        resp = await client.get(robots_url, headers={"User-Agent": CRAWL_UA})
-    except httpx.HTTPError as e:
-        raise CrawlFetchError(f"robots.txt 조회 실패: {host}: {e}") from e
-    if resp.status_code >= 500:
-        # 5xx 는 의도 불명 — 표준 관행대로 이번 사이클은 차단 취급
-        raise CrawlFetchError(f"robots.txt 5xx: {host}: {resp.status_code}")
-    if resp.status_code >= 400:
-        return None  # robots 없음 = 전부 허용
-    rp = urllib.robotparser.RobotFileParser()
-    rp.parse(resp.text.splitlines())
-    return rp
-
-
-async def _robots_allows(client: httpx.AsyncClient, url: str) -> bool:
-    parsed = urlparse(url)
-    host = (parsed.hostname or "").lower()
-    cached = _robots_cache.get(host)
-    if cached is None or time.monotonic() - cached[0] > _ROBOTS_CACHE_TTL:
-        rp = await _fetch_robots(client, parsed.scheme or "https", host)
-        _robots_cache[host] = (time.monotonic(), rp)
-        cached = _robots_cache[host]
-    rp = cached[1]
-    if rp is None:
-        return True
-    return rp.can_fetch(CRAWL_UA, url)
-
-
-async def fetch_page(
-    url: str, *, check_robots: bool = True,
-    content_types: tuple[str, ...] = _HTML_CONTENT_TYPES,
-) -> tuple[str, str]:
-    """공개 페이지 1건 politeness fetch. (html_text, final_url) 반환.
-
-    - SSRF 검증 (redirect target 포함, news_collector 피드 fetch 와 동일 이중 검증)
-    - per-domain 동시성 1 + 5–15s jitter 지연
-    - 429: Retry-After 로그 후 CrawlBlocked / 403: CrawlBlocked / 그 외 4xx: CrawlSkip
-    - 5xx/timeout: CrawlFetchError (큐 재시도)
-    - 비-HTML content-type / 5MB 초과: CrawlSkip
-    """
-    try:
-        validate_feed_url(url)
-    except ValueError as e:
-        raise CrawlSkip(f"URL 검증 실패: {e}") from e
-
-    domain = _domain_of(url)
-    async with _get_lock(domain):
-        await _respect_domain_rate(domain)
-        try:
-            async with httpx.AsyncClient(
-                timeout=_PAGE_TIMEOUT, follow_redirects=False,
-                headers={"User-Agent": CRAWL_UA},
-            ) as client:
-                if check_robots and not await _robots_allows(client, url):
-                    raise CrawlBlocked(f"robots.txt disallow: {url}")
-
-                resp = await client.get(url)
-                redirects = 0
-                # has_redirect_location = location 헤더 있는 진짜 redirect 만 (httpx 의
-                # is_redirect 는 3xx 전체라 304 등을 redirect 로 오인 — news_collector 동일 함정)
-                while resp.has_redirect_location and redirects < _MAX_REDIRECTS:
-                    location = urljoin(str(resp.request.url), resp.headers["location"])
-                    try:
-                        validate_feed_url(location)
-                    except ValueError as e:
-                        raise CrawlSkip(f"redirect target 차단: {e}") from e
-                    # redirect 도 같은 도메인 연속 요청 — 간격은 lock 보유로 충분 (즉시 1회)
-                    resp = await client.get(location)
-                    redirects += 1
-                if resp.has_redirect_location:
-                    raise CrawlSkip(f"redirect {_MAX_REDIRECTS}회 초과: {url}")
-        except httpx.TimeoutException as e:
-            raise CrawlFetchError(f"timeout: {url}") from e
-        except httpx.HTTPError as e:
-            raise CrawlFetchError(f"네트워크 오류: {url}: {e}") from e
-        finally:
-            _domain_last_request[domain] = time.monotonic()
-
-    if resp.status_code == 429:
-        retry_after = resp.headers.get("retry-after", "")
-        logger.warning("[politeness] 429 %s (Retry-After=%s)", domain, retry_after or "-")
-        raise CrawlBlocked(f"429 rate limited: {url} (Retry-After={retry_after or '-'})")
-    if resp.status_code == 403:
-        raise CrawlBlocked(f"403 forbidden: {url}")
-    if resp.status_code >= 500:
-        raise CrawlFetchError(f"{resp.status_code}: {url}")
-    if resp.status_code >= 400:
-        raise CrawlSkip(f"{resp.status_code}: {url}")
-
-    ct = resp.headers.get("content-type", "").lower()
-    if ct and not any(t in ct for t in content_types):
-        raise CrawlSkip(f"비허용 content-type: {ct}: {url}")
-    if len(resp.content) > _MAX_PAGE_BYTES:
-        raise CrawlSkip(f"크기 초과: {len(resp.content)} bytes: {url}")
-
-    return resp.text, str(resp.request.url)
-
-
-# ── B-3 구독 세션 fetch (Playwright 격리 컨테이너 경유) ──────────────────────
-
-async def fetch_page_via_browser(url: str, profile: str | None) -> tuple[str, str]:
-    """브라우저 페이지 1건 — playwright-fetcher 에 위임, politeness 는 사람 속도(30~60s).
-
-    profile=None = 익명 컨텍스트 (사이클 3 — 평문 httpx 를 UA 무관 403 하는 공개
-    사이트의 WAF 우회 전용, CCPS aiche.org 실측). 값 = B-3 구독 세션.
-    (html_text, final_url) 반환. robots 미적용 — 구독 fetch 는 사용자 행위 성격,
-    익명 WAF 우회는 월간 1~2회 저빈도 + 사람 속도가 보호 장치.
-    예외 어휘는 fetch_page 와 동일 (호출측 분기 재사용).
-    """
-    try:
-        validate_feed_url(url)
-    except ValueError as e:
-        raise CrawlSkip(f"URL 검증 실패: {e}") from e
-
-    payload = {"url": url}
-    if profile:
-        payload["profile"] = profile
-
-    domain = _domain_of(url)
-    async with _get_lock(domain):
-        await _respect_domain_rate(domain, _AUTH_DELAY_MIN, _AUTH_DELAY_MAX)
-        try:
-            async with httpx.AsyncClient(timeout=_FETCHER_TIMEOUT) as client:
-                resp = await client.post(f"{_FETCHER_URL}/fetch", json=payload)
-        except httpx.TimeoutException as e:
-            raise CrawlFetchError(f"browser fetch timeout: {url}") from e
-        except httpx.HTTPError as e:
-            raise CrawlFetchError(f"playwright-fetcher 연결 오류: {e}") from e
-        finally:
-            _domain_last_request[domain] = time.monotonic()
-
-    if resp.status_code == 503:
-        # storage_state 부재 — 수동 세션 박제 대기 (호출측 degrade, 재시도 루프 금지)
-        raise CrawlBlocked(f"세션 프로필 부재: {profile}")
-    if resp.status_code != 200:
-        raise CrawlFetchError(f"playwright-fetcher {resp.status_code}: {url}")
-    data = resp.json()
-    html_text = data.get("html", "")
-    if len(html_text.encode("utf-8", errors="replace")) > _MAX_PAGE_BYTES:
-        raise CrawlSkip(f"크기 초과 (browser): {url}")
-    # 안티봇 챌린지 페이지(DataDome 등) 식별 — 본문 길이 게이트(200자)를 통과하는
-    # 짧은 챌린지 HTML 이 기사 본문으로 승격되는 silent corruption 차단. 헤드리스 탐지라
-    # 재시도 무의미 → CrawlBlocked(=degrade, RSS 요약 유지). 마커는 보수적으로 좁게.
-    if any(m in html_text for m in _CHALLENGE_MARKERS):
-        raise CrawlBlocked(f"안티봇 챌린지 페이지(headless 차단): {url}")
-    return html_text, data.get("final_url", url)
-
-
-_MAX_DOWNLOAD_BYTES = 60 * 1024 * 1024  # fetcher MAX_DOWNLOAD_BYTES 와 동률
-
-
-async def download_via_browser(
-    url: str, *, referer: str | None = None, profile: str | None = None
-) -> tuple[bytes, str]:
-    """바이너리(PDF) 1건 — fetcher /download 위임. (content, content_type) 반환.
-
-    referer = WAF 챌린지 쿠키를 먼저 획득할 목록 페이지 (CCPS Beacon 패턴).
-    내부 status 판정: 403/429 = CrawlBlocked, 그 외 4xx = CrawlSkip, 5xx = CrawlFetchError
-    (fetch_page 와 동일 어휘 — 호출측 분기 재사용).
-    """
-    try:
-        validate_feed_url(url)
-    except ValueError as e:
-        raise CrawlSkip(f"URL 검증 실패: {e}") from e
-
-    payload: dict = {"url": url}
-    if referer:
-        payload["referer"] = referer
-    if profile:
-        payload["profile"] = profile
-
-    domain = _domain_of(url)
-    async with _get_lock(domain):
-        await _respect_domain_rate(domain, _AUTH_DELAY_MIN, _AUTH_DELAY_MAX)
-        try:
-            async with httpx.AsyncClient(timeout=_FETCHER_TIMEOUT) as client:
-                resp = await client.post(f"{_FETCHER_URL}/download", json=payload)
-        except httpx.TimeoutException as e:
-            raise CrawlFetchError(f"browser download timeout: {url}") from e
-        except httpx.HTTPError as e:
-            raise CrawlFetchError(f"playwright-fetcher 연결 오류: {e}") from e
-        finally:
-            _domain_last_request[domain] = time.monotonic()
-
-    if resp.status_code == 503:
-        raise CrawlBlocked(f"세션 프로필 부재: {profile}")
-    if resp.status_code != 200:
-        raise CrawlFetchError(f"playwright-fetcher {resp.status_code}: {url}")
-    data = resp.json()
-    inner = int(data.get("status", 0))
-    if inner in (403, 429):
-        raise CrawlBlocked(f"{inner} (browser download): {url}")
-    if 400 <= inner < 500:
-        raise CrawlSkip(f"{inner} (browser download): {url}")
-    if inner != 200:
-        raise CrawlFetchError(f"{inner} (browser download): {url}")
-    content = base64.b64decode(data.get("body_b64", ""))
-    if len(content) > _MAX_DOWNLOAD_BYTES:
-        raise CrawlSkip(f"크기 초과 (browser download): {url}")
-    return content, data.get("content_type", "")
-
-
-async def probe_session(
-    profile: str, probe_url: str, min_body_chars: int, paywall_markers: list[str]
-) -> dict:
-    """내용 기반 세션 probe (B-3 ②) — {'ok': bool, 'reason': str|None, 'body_chars': int}.
-
-    실패를 예외가 아닌 값으로 반환 — 호출측이 source_health 에 기록하고 degrade 분기.
-    probe 도 실제 publisher fetch 라 동일 도메인 lock + 사람 속도 적용.
-    """
-    domain = _domain_of(probe_url)
-    async with _get_lock(domain):
-        await _respect_domain_rate(domain, _AUTH_DELAY_MIN, _AUTH_DELAY_MAX)
-        try:
-            async with httpx.AsyncClient(timeout=_FETCHER_TIMEOUT) as client:
-                resp = await client.post(
-                    f"{_FETCHER_URL}/probe",
-                    json={
-                        "profile": profile,
-                        "probe_url": probe_url,
-                        "min_body_chars": min_body_chars,
-                        "paywall_markers": paywall_markers,
-                    },
-                )
-        except httpx.HTTPError as e:
-            return {"ok": False, "reason": f"fetcher 연결 오류: {e}", "body_chars": 0}
-        finally:
-            _domain_last_request[domain] = time.monotonic()
-
-    if resp.status_code == 503:
-        return {"ok": False, "reason": f"세션 프로필 부재: {profile}", "body_chars": 0}
-    if resp.status_code != 200:
-        return {"ok": False, "reason": f"fetcher {resp.status_code}", "body_chars": 0}
-    return resp.json()
@@ -72,55 +72,6 @@ def _validate_sql_content(name: str, sql: str) -> None:
            )


-# R1: baseline 스냅샷이 대표하는 마지막 마이그레이션 버전 (이하 버전은 baseline 에 포함).
-# 새 baseline 재생성 시 이 값을 갱신한다 (migrations/_baseline/<cutoff>_schema_baseline.sql).
-_BASELINE_CUTOFF = 358
-
-
-async def _load_baseline_if_fresh(conn, migrations_dir: Path) -> None:
-    """fresh DB(documents 부재)면 baseline 스키마 스냅샷 적재 + schema_migrations 1..cutoff 스탬프.
-
-    기존 DB(documents 존재)는 즉시 반환 — baseline 미적재, 무영향. baseline 파일 부재 시도
-    기존 replay 경로 유지(하위호환).
-    """
-    from sqlalchemy import text
-
-    baseline_dir = migrations_dir / "_baseline"
-    baseline_files = (
-        sorted(baseline_dir.glob("*_schema_baseline.sql")) if baseline_dir.is_dir() else []
-    )
-    if not baseline_files:
-        return
-
-    docs_exists = (
-        await conn.execute(text("SELECT to_regclass('public.documents') IS NOT NULL"))
-    ).scalar()
-    if docs_exists:
-        return  # 기존 DB — baseline skip
-
-    baseline_path = baseline_files[-1]
-    logger.info(f"[migration] fresh DB 감지 — baseline 적재: {baseline_path.name}")
-    # baseline 은 multi-statement 덤프 — exec_driver_sql(asyncpg prepared)은 multi-statement
-    # 불허("cannot insert multiple commands into a prepared statement"). raw asyncpg 의 simple
-    # 프로토콜 execute() 로 적재한다(같은 connection = 현재 트랜잭션 내). psql 스모크는 이 제약을
-    # 못 잡으므로 init_db 런타임 검증으로 확인됨.
-    raw = await conn.get_raw_connection()
-    await raw.driver_connection.execute(baseline_path.read_text(encoding="utf-8"))
-    # baseline = cutoff 까지의 스키마 → 실제 파일 버전 기준으로 schema_migrations 스탬프.
-    versions = [v for v, _, _ in _parse_migration_files(migrations_dir) if v <= _BASELINE_CUTOFF]
-    for v in versions:
-        await conn.execute(
-            text(
-                "INSERT INTO schema_migrations (version, name) "
-                "VALUES (:v, :n) ON CONFLICT DO NOTHING"
-            ),
-            {"v": v, "n": f"baseline:{v}"},
-        )
-    logger.info(
-        f"[migration] baseline 적재 + schema_migrations {len(versions)}건 스탬프 (cutoff {_BASELINE_CUTOFF})"
-    )
-
-
 async def _run_migrations(conn) -> None:
    """미적용 migration 실행 (호출자가 트랜잭션 관리)"""
    from sqlalchemy import text
@@ -139,6 +90,10 @@ async def _run_migrations(conn) -> None:
        f"SELECT pg_advisory_xact_lock({_MIGRATION_LOCK_KEY})"
    ))

+    # 적용 이력 조회
+    result = await conn.execute(text("SELECT version FROM schema_migrations"))
+    applied = {row[0] for row in result}
+
    # migration 파일 스캔
    # /app/core/database.py → parent.parent = /app → /app/migrations (volume mount 위치)
    migrations_dir = Path(__file__).resolve().parent.parent / "migrations"
@@ -146,15 +101,6 @@ async def _run_migrations(conn) -> None:
        logger.info("[migration] migrations/ 디렉토리 없음, 스킵")
        return

-    # R1: fresh DB(documents 부재)면 baseline 스냅샷 먼저 적재 + schema_migrations 스탬프.
-    # migrations/ 전체 replay 는 누적 비-replayable(011 view 의존·326 enum-same-txn 등)로
-    # 깨지므로 신규/DR 환경은 prod 스키마 스냅샷에서 출발한다. 기존 DB 는 skip(무영향).
-    await _load_baseline_if_fresh(conn, migrations_dir)
-
-    # 적용 이력 조회 (baseline 스탬프 반영 — fresh DB 는 1..cutoff 가 이미 applied)
-    result = await conn.execute(text("SELECT version FROM schema_migrations"))
-    applied = {row[0] for row in result}
-
    files = _parse_migration_files(migrations_dir)
    pending = [(v, name, path) for v, name, path in files if v not in applied]

@@ -2,7 +2,6 @@

 import hashlib
 import logging
-from logging.handlers import RotatingFileHandler
 from pathlib import Path


@@ -14,9 +13,7 @@ def setup_logger(name: str, log_dir: str = "logs") -> logging.Logger:

    if not logger.handlers:
        # 파일 핸들러
-        fh = RotatingFileHandler(
-            f"{log_dir}/{name}.log", maxBytes=10 * 1024 * 1024, backupCount=3, encoding="utf-8"
-        )
+        fh = logging.FileHandler(f"{log_dir}/{name}.log", encoding="utf-8")
        fh.setFormatter(logging.Formatter(
            "%(asctime)s [%(levelname)s] %(message)s",
            datefmt="%Y-%m-%d %H:%M:%S"
@@ -109,3 +106,33 @@ END:VCALENDAR"""
    except Exception as e:
        logging.getLogger("caldav").error(f"CalDAV VTODO 생성 실패: {e}")
        return None
+
+
+# ─── SMTP 헬퍼 ───
+
+
+def send_smtp_email(
+    host: str,
+    port: int,
+    username: str,
+    password: str,
+    subject: str,
+    body: str,
+    to_addr: str | None = None,
+):
+    """Synology MailPlus SMTP로 이메일 발송"""
+    import smtplib
+    from email.mime.text import MIMEText
+
+    to_addr = to_addr or username
+    msg = MIMEText(body, "plain", "utf-8")
+    msg["Subject"] = subject
+    msg["From"] = username
+    msg["To"] = to_addr
+
+    try:
+        with smtplib.SMTP_SSL(host, port, timeout=30) as server:
+            server.login(username, password)
+            server.send_message(msg)
+    except Exception as e:
+        logging.getLogger("smtp").error(f"SMTP 발송 실패: {e}")
@@ -11,119 +11,11 @@ endpoint 를 못 부른다(silent fallback 0, rules no-silent-fallback).
  - _request()       → endpoint 에 anthropic.com 있으면 raise(primary 오결선 방어, 이중보증)
 call_primary / call_triage / embed / rerank 는 그대로(내부 inference·임베딩 허용).
 egress 워커·시스템 경로는 기존 AIClient 유지 — fallback 은 시스템만, 이드만 박탈(분리).
-
-eid-chat (D-5): 이드 채팅 SSE 스트리밍도 이 클래스의 call_stream() 한 곳 — RouterBackend
-직접 호출 금지, mode 어휘는 _CHAT_ALIAS 닫힌 매핑(daily/deep)만, 미지 mode = EidEgressBlocked.
 """

 from __future__ import annotations

-import asyncio
-import json
-import re
-from collections.abc import AsyncIterator
-from contextlib import AsyncExitStack
-
-import httpx
-
 from ai.client import AIClient
-from services.llm.backends import (
-    MAC_MINI_DEFAULT,
-    BackendUnavailable,
-    _router_url,  # router URL 단일 출처 재사용 (settings → env LLM_ROUTER_URL → MVP default)
-)
-from services.search.llm_gate import Priority, acquire_mlx_gate
-
-# 이드 채팅 mode → router alias 닫힌 매핑 (D-2). 클라는 mode 만 보냄 — claude-cloud/auto 금지.
-# 2026-06-11 맥북 백지화: deep 도 mac-mini-default (맥미니 Qwen 27B 단일 호스트).
-# mode 구분은 유지 — deep = ReAct 자동검색 경로(모델이 아니라 동작이 다름).
-# 게이트는 alias==MAC_MINI_DEFAULT 조건이라 deep 도 자동으로 mlx gate 적용
-# (llm_gate "예외 없이 gate 획득 필수" invariant 충족 — 구 무게이트는 맥북 예외였음).
-_CHAT_ALIAS: dict[str, str] = {
-    "daily": MAC_MINI_DEFAULT,  # router tier_b → Mac mini :8801
-    "deep": MAC_MINI_DEFAULT,   # 맥북 폐기로 동일 upstream — ReAct 검색 모드 구분만 유지
-}
-
-# read 는 per-chunk 적용이라 MacBook wake(24s)+토큰 생성 간격 커버. connect 는 내부 router 라 짧게.
-_STREAM_TIMEOUT = httpx.Timeout(connect=5.0, read=120.0, write=30.0, pool=5.0)
-
-# 스트림 중계 전체(업스트림 진입~종료) wall-clock 상한. per-chunk read timeout 만으로는
-# 토큰이 계속 흐르는 한 무한 점유 가능 → daily 는 mlx gate 를 물고 있어 deadline 필수.
-# deep 도 동일 적용(단순·일관). 정상 스트림(max_tokens 2048, ~90tps ≈ 23s)은 여유 통과.
-_STREAM_DEADLINE_S = 300.0
-
-# error_reason allowlist — 이 밖(대문자/공백/JSON 직렬화 파편)은 일반화해 비노출
-_REASON_ALLOWED = re.compile(r"[a-z0-9_]{1,64}")
-
-# 스트림 시작 전 transport 계열 실패 → BackendUnavailable 매핑 대상 (RouterBackend._post 와 동일 목록)
-_TRANSPORT_ERRORS = (
-    httpx.ConnectError,
-    httpx.ConnectTimeout,
-    httpx.ReadTimeout,
-    httpx.PoolTimeout,
-    httpx.WriteTimeout,
-    httpx.RemoteProtocolError,
-)
-
-
-def _stream_error_reason(status_code: int, body: bytes) -> str:
-    """스트림 시작 전 4xx/5xx 응답 본문 → error_reason 추출.
-
-    어휘는 /api/search/ask(RouterBackend._post)와 일치 — router 가 주는 error.type /
-    error.error_reason (macbook_unavailable / warming / editor_busy / upstream_cold /
-    provider_not_configured 등) 우선, 없으면 status 기반 router_503 / upstream_502 /
-    router_http_<status>.
-
-    최종 reason 은 [a-z0-9_]{1,64} allowlist 검사 — 불일치(대문자/공백/dict 직렬화
-    파편)는 upstream_502(502 계열) / router_error(그 외) 로 일반화해 외부 비노출.
-    """
-    try:
-        data = json.loads(body.decode("utf-8", errors="replace"))
-    except Exception:
-        data = {}
-    err = data.get("error", {}) if isinstance(data, dict) else {}
-    reason: str | None = None
-    if isinstance(err, dict):
-        raw = err.get("type") or err.get("error_reason")
-        if raw:
-            reason = str(raw)
-    if reason is None and isinstance(data, dict) and data.get("error_reason"):
-        reason = str(data["error_reason"])
-    if reason is None:
-        if status_code == 502:
-            reason = "upstream_502"
-        elif status_code == 503:
-            reason = "router_503"
-        else:
-            reason = f"router_http_{status_code}"
-    if _REASON_ALLOWED.fullmatch(reason):
-        return reason
-    return "upstream_502" if status_code == 502 else "router_error"
-
-
-def _rewrite_sse_line(line: bytes, mode: str) -> bytes:
-    """SSE 라인 1건 정화 — data: JSON 의 model 을 mode 어휘로 치환 + usage 제거.
-
-    fixture 실측: 27B chunk 의 model 필드가 맥북 파일시스템 절대경로
-    ("/Users/.../mlx-models/Qwen3.6-27B-8bit")를 노출 — 표면 문법 '모델·머신명
-    비노출'과 충돌해 라인 단위로 재작성한다. usage(tps/peak_memory 등 머신
-    텔레메트리)도 함께 제거. [DONE]·비-data 라인(빈 줄 포함)·파싱 실패 라인은
-    원문 그대로(방어적) — SSE 프레이밍(data: 라인 + 빈 줄) 보존.
-    """
-    if not line.startswith(b"data: "):
-        return line
-    payload = line[len(b"data: "):]
-    if payload.strip() == b"[DONE]":
-        return line
-    try:
-        obj = json.loads(payload)
-    except Exception:
-        return line
-    if not isinstance(obj, dict):
-        return line
-    obj["model"] = mode
-    obj.pop("usage", None)
-    return b"data: " + json.dumps(obj, ensure_ascii=False).encode("utf-8")


 class EidEgressBlocked(RuntimeError):
@@ -147,91 +39,3 @@ class EidAIClient(AIClient):
        if "anthropic.com" in endpoint:
            raise EidEgressBlocked(f"이드: 외부 endpoint 차단 ({endpoint}). 내부 inference 만.")
        return await super()._request(model_config, prompt, system=system)
-
-    async def call_stream(
-        self, mode: str, messages: list[dict], system: str
-    ) -> AsyncIterator[bytes]:
-        """이드 채팅 SSE 스트림 — router /v1/chat/completions stream=true 라인 단위 중계 (D-5).
-
-        mode     : "daily" | "deep" — _CHAT_ALIAS 닫힌 매핑. 미지 mode = EidEgressBlocked
-                   (이드 LLM 호출 봉쇄는 이 클래스 한 곳, 불변식 #5).
-        messages : user/assistant 턴 목록 (system role 금지 — system 인자로만 주입).
-        system   : compose("eid_chat", ...) 합본. messages 맨 앞에 system role 로 끼움.
-
-        스트림 시작 전 실패(연결 실패·5xx 응답) = BackendUnavailable(reason 어휘는 ask
-        와 동일). router 400 = 닫힌 매핑에서 alias drift 코드 버그 → ValueError fail-loud
-        (RouterBackend._post 컨벤션 미러). 스트림 시작 후엔 bytes 를 라인 버퍼링해
-        _rewrite_sse_line 으로 model 치환(mode 어휘)·usage 제거만 하고 프레이밍은 보존.
-        취소/disconnect 시 AsyncExitStack 이 response·client 정리(upstream 닫힘 보장).
-
-        daily/deep 모두 mac-mini-default(2026-06-11 맥북 백지화) → Mac mini MLX 단일
-        inference 영구 룰(llm_gate docstring "예외 없이 gate 획득 필수")에 따라
-        acquire_mlx_gate(FOREGROUND) 안에서 스트리밍 — 게이트 조건이 alias 기준이라
-        deep 도 자동 적용 (구 무게이트는 맥북 별 endpoint 시절 예외였음).
-
-        중계 전체(업스트림 진입~종료)는 asyncio.timeout(_STREAM_DEADLINE_S) wall-clock
-        deadline 안 — llm_gate 계약 "timeout 은 gate 안쪽" 준수(gate 대기엔 미적용).
-        초과 시 BackendUnavailable(alias, "stream_deadline_exceeded") 로 수렴.
-        """
-        alias = _CHAT_ALIAS.get(mode)
-        if alias is None:
-            raise EidEgressBlocked(
-                f"이드: 미지 chat mode {mode!r} — 닫힌 매핑(daily/deep) 외 호출 차단."
-            )
-        router_url = _router_url()
-        if "anthropic.com" in router_url:
-            # 기존 _request 패턴 미러 — router URL 오결선 시 외부 egress 방어 (이중보증)
-            raise EidEgressBlocked(f"이드: 외부 endpoint 차단 ({router_url}). 내부 router 만.")
-        url = f"{router_url.rstrip('/')}/v1/chat/completions"
-        payload = {
-            "model": alias,
-            "messages": [{"role": "system", "content": system}] + messages,
-            "stream": True,
-            "max_tokens": 2048,
-            "temperature": 0.4,
-        }
-        async with AsyncExitStack() as stack:
-            if alias == MAC_MINI_DEFAULT:
-                await stack.enter_async_context(acquire_mlx_gate(Priority.FOREGROUND))
-            client = await stack.enter_async_context(httpx.AsyncClient(timeout=_STREAM_TIMEOUT))
-            try:
-                # wall-clock deadline — gate 획득 *후* 진입 (llm_gate "timeout 은 gate 안쪽")
-                async with asyncio.timeout(_STREAM_DEADLINE_S):
-                    try:
-                        resp = await stack.enter_async_context(
-                            client.stream("POST", url, json=payload)
-                        )
-                    except _TRANSPORT_ERRORS as exc:
-                        # 스트림 시작 전 연결 계열 실패 — reason 어휘 = RouterBackend(router_*) 와 일치
-                        raise BackendUnavailable(alias, f"router_{type(exc).__name__}") from exc
-                    if resp.status_code == 400:
-                        # 닫힌 매핑에서 400 = alias drift 코드 버그 — RouterBackend._post 미러,
-                        # BackendUnavailable(일시 비가용) 아님 → fail-loud
-                        body = await resp.aread()
-                        try:
-                            data = json.loads(body.decode("utf-8", errors="replace"))
-                        except Exception:
-                            data = {}
-                        raise ValueError(f"router rejected alias={alias!r} body={data!r}")
-                    if resp.status_code >= 400:
-                        body = await resp.aread()
-                        raise BackendUnavailable(
-                            alias, _stream_error_reason(resp.status_code, body)
-                        )
-                    buf = b""
-                    try:
-                        async for chunk in resp.aiter_bytes():
-                            buf += chunk
-                            # 라인 버퍼링 — 청크 경계에서 b"\n" 분리, 잔여 버퍼 유지
-                            while (nl := buf.find(b"\n")) != -1:
-                                line, buf = buf[:nl], buf[nl + 1:]
-                                yield _rewrite_sse_line(line, mode) + b"\n"
-                    except _TRANSPORT_ERRORS as exc:
-                        # 시작 후 중단 — 이미 보낸 chunk 는 전송됨. typed 예외로 수렴(caller 가 끊고 정리).
-                        raise BackendUnavailable(alias, f"router_{type(exc).__name__}") from exc
-                    if buf:
-                        # 스트림 끝 잔여분 flush (개행 없는 마지막 라인 — 원문에 없던 \n 추가 안 함)
-                        yield _rewrite_sse_line(buf, mode)
-            except TimeoutError as exc:
-                # asyncio.timeout 초과 — 게이트 점유 무한화 차단, typed 예외로 수렴
-                raise BackendUnavailable(alias, "stream_deadline_exceeded") from exc
@@ -50,8 +50,6 @@ _ROUTE: dict[str, dict] = {
    "react_ask":                  {"overlay": None, "variant": "full"},
    "study_subject_note":         {"overlay": None, "variant": "full"},
    "study_question_explanation": {"overlay": None, "variant": "full"},
-    # 이드 채팅 표면 (D-1 /api/eid/chat) — 자유-prose(base), persona ON (불변식 #3)
-    "eid_chat":                   {"overlay": None, "variant": "full"},
    # 미래 active eid 표면 — 기능 overlay (W3+ 에서 호출 배선)
    "study_diagnosis":            {"overlay": "study",    "variant": "full"},
    "document_brief":             {"overlay": "document", "variant": "full"},
@@ -115,17 +113,6 @@ def is_composed_surface(surface: str) -> bool:
    return surface in _ROUTE


-def rules_present() -> bool:
-    """rules.md 존재 여부 — 채팅 표면(D-6)의 fail-closed 판정 재료.
-
-    기존 _rules() 의 degraded 배너 컨벤션(다른 표면, fail-loud 진행)은 그대로 둔다 —
-    여긴 '진행 거부' 판정만 제공하고 강제는 호출부(/api/eid/chat) 책임.
-    lru_cache 된 _read 를 쓰지 않고 매 호출 직접 stat — D-6 게이트는 살아있는 판정
-    이어야 한다(캐시 동결 시 rules.md 부재/복구가 영원히 반영 안 됨).
-    """
-    return (_SUBSTRATE_DIR / "rules.md").is_file()
-
-
 def compose(surface: str, task: str, *, variant: str | None = None,
            budget_chars: int | None = None) -> str:
    """persona → rules → overlay → task 단일 system 문자열 합성.
@@ -9,8 +9,6 @@ from sqlalchemy import func, select, text
 from api.audio import router as audio_router
 from api.internal_study import router as internal_study_router
 from api.internal_worker import router as internal_worker_router
-from api.published import router as published_router
-from api.ingest_study import router as ingest_study_router
 from api.auth import router as auth_router
 from api.briefing import router as briefing_router
 from api.config import router as config_router
@@ -19,12 +17,10 @@ from api.digest import router as digest_router
 from api.document_notes import router as document_notes_router
 from api.document_reads import router as document_reads_router
 from api.documents import router as documents_router
-from api.eid_chat import router as eid_chat_router
 from api.events import router as events_router
 from api.library import router as library_router
 from api.memos import router as memos_router
 from api.news import router as news_router
-from api.queue_overview import router as queue_overview_router
 from api.search import router as search_router
 from api.setup import router as setup_router
 from api.study_question_progress import router as study_question_progress_router
@@ -52,27 +48,16 @@ async def lifespan(app: FastAPI):
    from services.search.query_analyzer import prewarm_analyzer
    from workers.briefing_worker import run as morning_briefing_run
    from workers.daily_digest import run as daily_digest_run
-    from workers.dedup_reconcile import run as dedup_reconcile_run
-    from workers.document_purge_sweep import run as purge_sweep_run
    from workers.digest_worker import run as global_digest_run
    from workers.file_watcher import watch_inbox
+    from workers.law_monitor import run as law_monitor_run
    from workers.mailplus_archive import run as mailplus_run
-    from workers.statute_collector import run as statute_run
    from workers.news_collector import run as news_collector_run
-    from workers.arxiv_collector import run as arxiv_collector_run
-    from workers.openalex_collector import run as openalex_collector_run
-    from workers.paper_doi_reconcile import run as paper_doi_reconcile_run
-    from workers.fulltext_worker import reconcile_unresolved as fulltext_reconcile_run
-    from workers.kosha_collector import run as kosha_collector_run
-    from workers.csb_collector import run as csb_collector_run
-    from workers.api_standards_collector import run as api_standards_run
-    from workers.ccps_collector import run as ccps_collector_run
-    from workers.queue_consumer import consume_queue, consume_fast_queue, consume_markdown_queue, consume_deep_queue
+    from workers.queue_consumer import consume_queue, consume_markdown_queue
    from workers.study_queue_consumer import consume_study_queue
    from workers.study_session_queue_consumer import consume_study_session_queue
    from workers.study_memo_card_jobs_consumer import consume_study_memo_card_queue
    from workers.study_card_enqueue import run as study_card_enqueue_run
-    from workers.study_publish_worker import consume_publish_outbox
    from workers.study_reminder import run as study_reminder_run
    from workers.study_weakness import run as study_weakness_run
    from workers.study_question_embed_worker import (
@@ -81,19 +66,10 @@ async def lifespan(app: FastAPI):
    )
    from workers.tier_backfill import run as tier_backfill_run
    from workers.upload_cleanup import cleanup_orphan_uploads
-    from workers.memo_draft_worker import run as memo_draft_run
-    from workers.auto_review_worker import run as auto_review_run

    # 시작: DB 연결 확인
    await init_db()

-    # 2026-06-20: JWT_SECRET 빈값 fail-loud — credentials.env 미로드/누락 시 빈 키로 전 토큰
-    # 서명하며 부팅하던 침묵 인증붕괴 차단 (totp_secret 은 per-user 라 미가드).
-    if not settings.jwt_secret:
-        raise RuntimeError(
-            "JWT_SECRET 미설정 — 빈 키 서명 방지. credentials.env / 환경변수 확인."
-        )
-
    # NAS 마운트 확인 (NFS 미마운트 시 로컬 빈 디렉토리에 쓰는 것 방지)
    from pathlib import Path
    nas_check = Path(settings.nas_mount_path) / "PKM"
@@ -104,29 +80,15 @@ async def lifespan(app: FastAPI):
        )

    # APScheduler: 백그라운드 작업
-    scheduler = AsyncIOScheduler(
-        timezone="Asia/Seoul",
-        # 2026-06-20 H4: 기본 misfire_grace_time=1s 는 단일 asyncio 루프가 1초만 혼잡해도
-        # 1분 컨슈머 틱을 run time missed 로 침묵 스킵(에러·failed row 0). 45s 완화 + coalesce.
-        job_defaults={"misfire_grace_time": 45, "coalesce": True, "max_instances": 1},
-    )
+    scheduler = AsyncIOScheduler(timezone="Asia/Seoul")
    # 상시 실행
    scheduler.add_job(consume_queue, "interval", minutes=1, id="queue_consumer")
    # PR-DocSrv-Markdown-Consumer-Split-1: markdown(marker) 전용 consumer.
    # 대형 PDF split 변환(수십 분)이 메인 consume_queue 를 점유해 전 파이프라인을
    # stall 시키던 문제 제거. max_instances=1(기본) 으로 동시 marker 변환 2건은 방지.
    scheduler.add_job(consume_markdown_queue, "interval", minutes=1, id="markdown_consumer")
-    # 2026-06-12 fast-consumer split: embed/chunk(건당 <1s)를 LLM 사이클에서 분리 —
-    # classify(~190s×3)가 사이클을 점유해 벡터 적재가 굶던 구조 캡 해소 (markdown 선례).
-    scheduler.add_job(consume_fast_queue, "interval", minutes=1, id="fast_queue_consumer")
-    # 2026-06-15 deep-consumer split: deep_summary(70~300s)를 메인 루프에서 분리 (markdown/fast 선례).
-    scheduler.add_job(consume_deep_queue, "interval", minutes=1, id="deep_queue_consumer")
    scheduler.add_job(watch_inbox, "interval", minutes=5, id="file_watcher")
    scheduler.add_job(cleanup_orphan_uploads, "interval", minutes=10, id="upload_cleanup")
-    # P2: 메모→문서 승격분 26B 문서화 (needs_draft 마커 → md_content). 26B 콜이라 소량·2분 간격.
-    scheduler.add_job(memo_draft_run, "interval", minutes=2, id="memo_draft", max_instances=1)
-    # 검토 대기 자동검토: 고신뢰(ai_confidence>=0.9) 자동승인 + 저신뢰 수동 잔류. 순수 DB(LLM 없음).
-    scheduler.add_job(auto_review_run, "interval", minutes=3, id="auto_review", max_instances=1)
    # PR-4: study_questions 자동 임베딩 (status='none/failed/stale' 행을 batch=10 처리).
    # 별도 큐 테이블 없이 status 자체가 큐. backfill 도 cron 이 'none' 행을 자연스럽게 처리.
    scheduler.add_job(study_q_embed_run, "interval", minutes=1, id="study_q_embed")
@@ -143,16 +105,11 @@ async def lifespan(app: FastAPI):
    # 별 테이블/별 consumer 로 기존 study queue 와 격리. settings.study_card_extract_enabled 게이트.
    scheduler.add_job(consume_study_memo_card_queue, "interval", minutes=1, id="study_memo_card_consumer")
    scheduler.add_job(study_card_enqueue_run, "interval", minutes=1, id="study_card_enqueue")
-    # 발행 레이어(docsrv-viewer-publish): publish_outbox drain → published rev 부여.
-    # study_publish_enabled=false(기본) 면 worker 내부 no-op. 단일 라이터(pg_advisory_xact_lock) max_instances=1.
-    scheduler.add_job(consume_publish_outbox, "interval", minutes=1, id="publish_outbox_consumer", max_instances=1)
    # PR-B 레거시 tier 백필 — 30분 주기로 호출되지만 KST 00:00~06:00 시간대만 실제 enqueue.
    # safety > law > manual 우선순위로 25건씩. 6720 레거시 → 야간당 ~150건 → 약 45일 소화.
    scheduler.add_job(tier_backfill_run, "interval", minutes=30, id="tier_backfill")
    # 일일 스케줄 (KST)
-    # statute_collector = 구 law_monitor 대체 (safety-library-1 B-1 PR②) — poll→ingest→
-    # 생애주기 잡(버전 시리즈 승격·supersede·레거시 스윕·repeal) 통째 (R8-B1).
-    scheduler.add_job(statute_run, CronTrigger(hour=7, timezone=KST), id="statute_collector")
+    scheduler.add_job(law_monitor_run, CronTrigger(hour=7, timezone=KST), id="law_monitor")
    scheduler.add_job(mailplus_run, CronTrigger(hour=7, timezone=KST), id="mailplus_morning")
    scheduler.add_job(mailplus_run, CronTrigger(hour=18, timezone=KST), id="mailplus_evening")
    scheduler.add_job(daily_digest_run, CronTrigger(hour=20, timezone=KST), id="daily_digest")
@@ -162,33 +119,7 @@ async def lifespan(app: FastAPI):
    scheduler.add_job(study_reminder_run, CronTrigger(hour="9,13,19", timezone=KST), id="study_reminder")
    # 이드 W3-2: 공부중 토픽 약점 derived 스냅샷 (nightly 04:30 KST, LLM 0). study_diagnosis 표면 source.
    scheduler.add_job(study_weakness_run, CronTrigger(hour=4, minute=30, timezone=KST), id="study_weakness")
-    scheduler.add_job(news_collector_run, CronTrigger(hour="0,6,12,18", timezone=KST), id="news_collector")
-    # crawl-24x7 A-2 안전망: fulltext 영구 실패(3회 소진) 문서를 RSS 요약 기준으로
-    # 후속 enqueue (silent skip 누적 방지). 03:40 = dedup_reconcile(03:30) 직후 비충돌 슬롯.
-    scheduler.add_job(fulltext_reconcile_run, CronTrigger(hour=3, minute=40, timezone=KST), id="fulltext_reconcile")
-    # plan ds-s1-backend-1 B-4: dedup 컬럼(duplicate_of/duplicate_count) 야간 절대 재계산.
-    # soft-delete 잔여 드리프트 정리(멱등, 드리프트 없으면 no-op). cron 03:30 (다른 잡과 비충돌).
-    scheduler.add_job(dedup_reconcile_run, CronTrigger(hour=3, minute=30, timezone=KST), id="dedup_reconcile")
-    # R7: delete_file=true purge 요청 문서의 NAS 원본 grace(30일) 후 물리삭제 + audit.
-    # purge_requested_at 마커 기준(단순 숨김은 보존). 03:20 = 다른 새벽 잡과 비충돌 슬롯.
-    scheduler.add_job(purge_sweep_run, CronTrigger(hour=3, minute=20, timezone=KST), id="purge_sweep")
-    # B-3 PR4: 레거시 paper 행 arXiv DataCite DOI 스탬프(재유입 차단). keyless·in-DB·enqueue 0.
-    # dedup_reconcile(03:30)·fulltext_reconcile(03:40) 와 별 worker·비충돌 슬롯.
-    scheduler.add_job(paper_doi_reconcile_run, CronTrigger(hour=3, minute=50, timezone=KST), id="paper_doi_reconcile")
-    # crawl-24x7 C-2: KOSHA 재해사례 diff + GUIDE 점진 백필 (daily, 새벽 잡들과 비충돌 슬롯).
-    scheduler.add_job(kosha_collector_run, CronTrigger(hour=6, minute=40, timezone=KST), id="kosha_collector")
-    # 사이클 3 C-2 잔여: CSB sitemap lastmod diff (weekly 월, cap 40 + 워터마크 점진 백필).
-    scheduler.add_job(csb_collector_run, CronTrigger(day_of_week="mon", hour=6, minute=50, timezone=KST), id="csb_collector")
-    # 사이클 3 C-4: API 표준 공지 목록 diff (monthly — 월 1~2건 공지 페이스).
-    scheduler.add_job(api_standards_run, CronTrigger(day=5, hour=7, minute=5, timezone=KST), id="api_standards_collector")
-    # 사이클 3 C-2 잔여: CCPS Beacon 월간 PDF (playwright 익명 경유 — WAF 차단 시 health 로 가시화).
-    scheduler.add_job(ccps_collector_run, CronTrigger(day=5, hour=7, minute=20, timezone=KST), id="ccps_collector")
-    # B-3 PR2: arXiv 키워드 필터 수집기 (daily 07:30 KST — statute 07:00 직후 빈 슬롯).
-    # signal-only 초록 색인, per-run cap 으로 임베드 큐 보호. keyless.
-    scheduler.add_job(arxiv_collector_run, CronTrigger(hour=7, minute=30, timezone=KST), id="arxiv_collector")
-    # B-3 PR3: OpenAlex 백본 수집기 (daily 07:45 KST). scaffold-first(키 부재 explicit-skip),
-    # signal-only 초록 색인, per-run cap + cursor watermark. 키=OPENALEX_API_KEY(credentials.env).
-    scheduler.add_job(openalex_collector_run, CronTrigger(hour=7, minute=45, timezone=KST), id="openalex_collector")
+    scheduler.add_job(news_collector_run, "interval", hours=6, id="news_collector")
    scheduler.start()

    # Phase 2.1 (async 구조): QueryAnalyzer prewarm.
@@ -223,23 +154,17 @@ app.include_router(documents_router, prefix="/api/documents", tags=["documents"]
 app.include_router(document_reads_router, prefix="/api/documents", tags=["document-reads"])
 app.include_router(document_notes_router, prefix="/api/documents", tags=["document-notes"])
 app.include_router(search_router, prefix="/api/search", tags=["search"])
-# 이드 채팅 표면 (D-1) — POST /api/eid/chat. SSE 스트리밍, EidAIClient.call_stream 봉쇄 경유.
-app.include_router(eid_chat_router, prefix="/api/eid", tags=["eid-chat"])

 app.include_router(memos_router, prefix="/api/memos", tags=["memos"])
 app.include_router(events_router, prefix="/api/events", tags=["events"])
 app.include_router(dashboard_router, prefix="/api/dashboard", tags=["dashboard"])
 app.include_router(library_router, prefix="/api/library", tags=["library"])
 app.include_router(news_router, prefix="/api/news", tags=["news"])
-# 처리 머신 보드 (plan ds-processing-ui-6an) — GET /api/queue/overview
-app.include_router(queue_overview_router, prefix="/api/queue", tags=["queue"])
 app.include_router(digest_router, prefix="/api/digest", tags=["digest"])
 app.include_router(briefing_router, prefix="/api/briefing", tags=["briefing"])
 app.include_router(audio_router, prefix="/api/audio", tags=["audio"])
 app.include_router(internal_study_router, prefix="/internal/study", tags=["internal-study"])
 app.include_router(internal_worker_router, prefix="/internal/worker", tags=["internal-worker"])
-app.include_router(published_router, prefix="/published", tags=["published"])
-app.include_router(ingest_study_router, prefix="/ingest/study", tags=["ingest-study"])
 app.include_router(video_router, prefix="/api/video", tags=["video"])
 app.include_router(study_sessions_router, prefix="/api/study-sessions", tags=["study-sessions"])
 app.include_router(study_topics_router, prefix="/api/study-topics", tags=["study-topics"])
@@ -260,27 +185,21 @@ SETUP_BYPASS_PREFIXES = (
    "/api/setup", "/api/config", "/setup", "/health", "/docs", "/openapi.json", "/redoc",
 )

-# R10: 셋업 완료(user 존재)는 단조(monotonic) — 한 번 확인되면 영구. 매 요청 COUNT 쿼리
-# 대신 캐시 플래그로 전환 (setup 후 모든 요청이 users COUNT 하던 per-request 비용 제거).
-_setup_complete = False
-

@app.middleware("http")
 async def setup_redirect_middleware(request: Request, call_next):
-    global _setup_complete  # 함수 내 read+assign 둘 다 모듈 전역 참조 (UnboundLocalError 방지)
    path = request.url.path
-    # 셋업 완료됐거나 바이패스 경로면 즉시 통과 (DB 쿼리 없음)
-    if _setup_complete or any(path.startswith(p) for p in SETUP_BYPASS_PREFIXES):
+    # 바이패스 경로는 항상 통과
+    if any(path.startswith(p) for p in SETUP_BYPASS_PREFIXES):
        return await call_next(request)

-    # 유저 존재 여부 확인 (셋업 완료 전 1회성 — 완료 확인되면 플래그 set 후 영구 skip)
+    # 유저 존재 여부 확인
    try:
        async with async_session() as session:
            result = await session.execute(select(func.count(User.id)))
            user_count = result.scalar()
        if user_count == 0:
            return RedirectResponse(url="/setup")
-        _setup_complete = True
    except Exception:
        pass  # DB 연결 실패 시 통과 (health에서 확인 가능)

@@ -14,11 +14,6 @@ from sqlalchemy.orm import Mapped, mapped_column

 from core.database import Base

-# FK("users.id") 해석에 users 테이블 메타데이터 필요 — fastapi 앱은 어차피 전 모델을
-# import 하지만, CLI 단독 실행(queue_drain 등)은 본 모듈만 끌어와 INSERT 시
-# "could not find table 'users'" 로 실패했다 (2026-06-12 drain 로그 실측). 명시 import.
-from models.user import User  # noqa: F401
-

 class AnalyzeEvent(Base):
    __tablename__ = "analyze_events"
@@ -1,9 +1,9 @@
 """documents 테이블 ORM"""

-from datetime import date, datetime
+from datetime import datetime

 from pgvector.sqlalchemy import Vector
-from sqlalchemy import BigInteger, Boolean, Date, DateTime, Enum, ForeignKey, Integer, String, Text
+from sqlalchemy import BigInteger, Boolean, DateTime, Enum, Integer, String, Text
 from sqlalchemy.dialects.postgresql import JSONB
 from sqlalchemy.orm import Mapped, mapped_column

@@ -28,27 +28,6 @@ class Document(Base):
    )
    import_source: Mapped[str | None] = mapped_column(Text)

-    # 1계층: 원본명 + 중복검사 (S1-ADD, migration 287)
-    # original_filename = 업로드 원본 파일명(다운로드 라벨용). file_path 는 충돌 시 _N 리네임됨.
-    #   cf. original_format(ODF 변환용) / original_path·original_hash(007 legacy dead) 와 의미 구분.
-    # duplicate_of = canonical doc id (자기 자신이 canonical 이면 NULL). FK ON DELETE SET NULL.
-    # duplicate_count = canonical 행에 담는 '본인 제외 동일 판정 사본 수' (group_size-1). 업로드/backfill 가 갱신.
-    original_filename: Mapped[str | None] = mapped_column(Text)
-    duplicate_of: Mapped[int | None] = mapped_column(
-        BigInteger, ForeignKey("documents.id", ondelete="SET NULL")
-    )
-    duplicate_count: Mapped[int] = mapped_column(
-        Integer, nullable=False, default=0, server_default="0"
-    )
-
-    # G2 pre-segmentation (migration 362): 번들 PDF → N 자식 분할.
-    # presegment_role: NULL=일반 단일문서 / 'parent'=번들원본(자체 extract/embed 안 함) /
-    #   'child'=논리 하위문서(부모 file_path 공유 + bundle_page_start/end 1-based inclusive 범위).
-    # 부모-자식 관계 자체는 document_lineage(relation_type='segmented_from').
-    bundle_page_start: Mapped[int | None] = mapped_column(Integer)
-    bundle_page_end: Mapped[int | None] = mapped_column(Integer)
-    presegment_role: Mapped[str | None] = mapped_column(Text)
-
    # 2계층: 텍스트 추출
    extracted_text: Mapped[str | None] = mapped_column(Text)
    extracted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
@@ -60,8 +39,7 @@ class Document(Base):

    # 2계층: AI 가공
    ai_summary: Mapped[str | None] = mapped_column(Text)
-    # R11a: 주석 dict→list 정정(실제 list 적재), 공유 가변 default=[] → callable default=list.
-    ai_tags: Mapped[list | None] = mapped_column(JSONB, default=list)
+    ai_tags: Mapped[dict | None] = mapped_column(JSONB, default=[])
    ai_domain: Mapped[str | None] = mapped_column(String(100))
    ai_sub_group: Mapped[str | None] = mapped_column(String(100))
    ai_model_version: Mapped[str | None] = mapped_column(String(50))
@@ -88,7 +66,7 @@ class Document(Base):
    user_note: Mapped[str | None] = mapped_column(Text)

    # 사용자 태그 (ai_tags와 분리, #태그 파싱 결과 또는 수동 입력)
-    user_tags: Mapped[list | None] = mapped_column(JSONB, default=list)  # R11a: 공유 가변 default 제거
+    user_tags: Mapped[list | None] = mapped_column(JSONB, default=[])

    # 핀 고정
    pinned: Mapped[bool] = mapped_column(Boolean, default=False)
@@ -114,9 +92,6 @@ class Document(Base):
    # 승인/삭제
    review_status: Mapped[str | None] = mapped_column(String(20), default="pending")
    deleted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    # delete_file=true 명시 삭제 요청 마커 (R7) — retention sweep(document_purge_sweep)이
-    # grace 후 NAS 원본 물리삭제. deleted_at(단순 숨김, 파일 보존)과 분리.
-    purge_requested_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))

    # 외부 편집 URL
    edit_url: Mapped[str | None] = mapped_column(Text)
@@ -130,7 +105,7 @@ class Document(Base):
    source_channel: Mapped[str | None] = mapped_column(
        Enum("law_monitor", "devonagent", "email", "web_clip",
             "tksafety", "inbox_route", "manual", "drive_sync", "news", "memo",
-             "voice", "hermes", "crawl",
+             "voice", "hermes",
             name="source_channel")
    )
    # 외부 채널 (Hermes Discord 등) 의 channel/user/message_id/timestamp 메타.
@@ -158,16 +133,6 @@ class Document(Base):
    # /accept-suggestion 승인 시에만 category / user_tags 반영 (자동 전이 금지)
    ai_suggestion: Mapped[dict | None] = mapped_column(JSONB)

-    # === 안전 자료실 분류 축 (plan safety-library-1, migrations 340~345) ===
-    # 자료유형 — law/paper/book/incident/manual/standard/guide (TEXT+CHECK, enum 아님).
-    # 수집기 ingest 시점 deterministic 부여 (classify-skip 경로 다수 — classify_worker 의존 금지).
-    # AI 라우팅(subject_domain) 매칭 키 사용 금지 (axis separation — category 와 동일 불변식).
-    material_type: Mapped[str | None] = mapped_column(Text)
-    # 관할 — KR/US/EU/JP/GB/INT. law 는 CHECK 로 jurisdiction NOT NULL 구조 강제 (migration 344).
-    jurisdiction: Mapped[str | None] = mapped_column(Text)
-    # 유형별 대표 날짜 — 법령=COALESCE(시행일, 공포일) / 논문=발행일 / 재해=발생일
-    published_date: Mapped[date | None] = mapped_column(Date)
-
    # PR-B B-1: summary_triage (4B, 상시) / summary_deep (26B, 에스컬레이션) 분할 산출
    ai_tldr: Mapped[str | None] = mapped_column(Text)                          # ≤60자 TL;DR
    ai_bullets: Mapped[list | None] = mapped_column(JSONB)                     # 3~5개 핵심 bullets
@@ -1,31 +0,0 @@
-"""document_lineage 테이블 ORM — 문서 파생 관계 이력 (migration 217).
-
-G2 pre-segmentation 이 relation_type='segmented_from'(번들 → 자식) 으로 사용 (migration 363).
-이력 테이블 FK = ON DELETE RESTRICT (부모 hard delete 차단, soft delete 만 허용).
-"""
-from datetime import datetime
-
-from sqlalchemy import BigInteger, ForeignKey, Text, func
-from sqlalchemy.dialects.postgresql import JSONB
-from sqlalchemy.orm import Mapped, mapped_column
-from sqlalchemy.types import TIMESTAMP
-
-from core.database import Base
-
-
-class DocumentLineage(Base):
-    __tablename__ = "document_lineage"
-
-    id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
-    source_document_id: Mapped[int] = mapped_column(
-        BigInteger, ForeignKey("documents.id", ondelete="RESTRICT"), nullable=False
-    )
-    derived_document_id: Mapped[int] = mapped_column(
-        BigInteger, ForeignKey("documents.id", ondelete="RESTRICT"), nullable=False
-    )
-    relation_type: Mapped[str] = mapped_column(Text, nullable=False)
-    # 'metadata' 는 SQLAlchemy 예약속성 → Python 속성명은 meta, DB 컬럼명은 metadata.
-    meta: Mapped[dict] = mapped_column(
-        "metadata", JSONB, nullable=False, default=dict, server_default="{}"
-    )
-    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), server_default=func.now())
@@ -1,73 +0,0 @@
-"""legal_acts / legal_meta 테이블 ORM — 법령 레지스트리(워치리스트 겸) + 버전 위성
-
-plan: safety-library-1 (migrations 346~347).
- legal_acts = 폴링 순회 대상 목록이 곧 테이블 (news_sources 패턴의 법령판).
-  KOSHA GUIDE(비법령)·KGS Code(watch-폴더 단독 트랙)는 비대상.
- legal_meta = 법령 문서 1버전(또는 별표·해석례 1건)당 1행, documents 1:0..1 위성.
-  version_status 전이는 statute_collector 의 일일 잡이 유일한 코드 지점
-  (전 버전 pending 적재 → 잡이 승격·supersede·repeal 을 한 트랜잭션 처리).
-"""
-
-from datetime import date, datetime
-
-from sqlalchemy import BigInteger, Boolean, Date, DateTime, ForeignKey, Text, UniqueConstraint
-from sqlalchemy.orm import Mapped, mapped_column
-
-from core.database import Base
-
-
-class LegalAct(Base):
-    __tablename__ = "legal_acts"
-
-    # 'kr-law:{법령ID}' / 'us-cfr:29-1910' 형식. KGS 는 시드 비대상 (R3-M5).
-    family_id: Mapped[str] = mapped_column(Text, primary_key=True)
-    # 어댑터 상수 고정값 — 파싱 결과에서 추론 금지 (코어가 적재 직전 assert)
-    jurisdiction: Mapped[str] = mapped_column(Text, nullable=False)
-    # statute(법률) / decree(시행령) / rule(시행규칙·부령) / admin_rule(고시·예규) / code(법정 위임 상세기준)
-    law_level: Mapped[str] = mapped_column(Text, nullable=False)
-    title: Mapped[str] = mapped_column(Text, nullable=False)
-    title_ko: Mapped[str | None] = mapped_column(Text)
-    # 법률 → 시행령 → 시행규칙 계층
-    parent_family_id: Mapped[str | None] = mapped_column(ForeignKey("legal_acts.family_id"))
-    # 법령ID / CFR part / CELEX / e-Gov law_id 등 소스 고유 식별자
-    native_id: Mapped[str] = mapped_column(Text, nullable=False)
-    # 'law.go.kr' / 'ecfr' / 'cellar' / 'egov_v2' / 'leg_gov_uk'
-    source_api: Mapped[str] = mapped_column(Text, nullable=False)
-    # 시드 26개 전부 true — '우선순위'는 정렬일 뿐 watch 제외 아님 (R3-B1)
-    watch: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
-    poll_cycle: Mapped[str] = mapped_column(Text, nullable=False, default="daily")
-    # 변경이력 폴링 워터마크 — 파싱 검증 통과 후에만 영속
-    watermark: Mapped[str | None] = mapped_column(Text)
-    # 어댑터는 폐지 감지 마킹만, repealed 전이는 일일 잡 (R3-M3)
-    repeal_detected_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    created_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now
-    )
-    updated_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now, onupdate=datetime.now
-    )
-
-
-class LegalMeta(Base):
-    __tablename__ = "legal_meta"
-    __table_args__ = (
-        # 버전 dedup 구조 강제 — annex 는 version_key='MST|별표N' 합성형 (R3-M4)
-        UniqueConstraint("family_id", "law_doc_kind", "version_key", name="uq_legal_meta_version"),
-    )
-
-    document_id: Mapped[int] = mapped_column(
-        BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), primary_key=True
-    )
-    family_id: Mapped[str] = mapped_column(
-        ForeignKey("legal_acts.family_id"), nullable=False
-    )
-    # primary(본문) / annex(별표·서식) / interpretation(해석례)
-    law_doc_kind: Mapped[str] = mapped_column(Text, nullable=False, default="primary")
-    version_key: Mapped[str] = mapped_column(Text, nullable=False)
-    promulgation_date: Mapped[date | None] = mapped_column(Date)
-    effective_date: Mapped[date | None] = mapped_column(Date)
-    # pending → current → superseded / repealed. 전이는 일일 잡 단일 지점, KST 기준.
-    version_status: Mapped[str] = mapped_column(Text, nullable=False, default="pending")
-    created_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now
-    )
@@ -2,8 +2,7 @@

 from datetime import datetime

-from sqlalchemy import Boolean, DateTime, Enum, Integer, String, Text
-from sqlalchemy.dialects.postgresql import JSONB
+from sqlalchemy import Boolean, DateTime, String, Text
 from sqlalchemy.orm import Mapped, mapped_column

 from core.database import Base
@@ -24,41 +23,3 @@ class NewsSource(Base):
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=datetime.now
    )
-
-    # ── A-3 (plan crawl-24x7-1) 레지스트리 증축 — migration 319 ──
-    # fetch_method: rss / rss+page / sitemap+page / page / api / signal-only
-    fetch_method: Mapped[str] = mapped_column(String(20), default="rss")
-    # fulltext_policy: none(현행) / page(기사 페이지 fetch 후 4-tier 승격) / feed-full(피드 본문이 전문)
-    fulltext_policy: Mapped[str] = mapped_column(String(20), default="none")
-    # NULL=공개, 값=구독 세션 키 (B-3 Playwright 어댑터 슬롯)
-    auth_profile: Mapped[str | None] = mapped_column(String(50))
-    # 소스별 차등 폴링 (NULL=전역 6h 사이클)
-    poll_interval_minutes: Mapped[int | None] = mapped_column(Integer)
-    # 조건부 GET 워터마크 — 서버가 준 값 그대로 저장·재전송 (A-1)
-    etag: Mapped[str | None] = mapped_column(Text)
-    last_modified: Mapped[str | None] = mapped_column(Text)
-    # CDN ETag 회전 대비 콘텐츠 해시 변경감지 병행 (A-1)
-    feed_content_hash: Mapped[str | None] = mapped_column(String(64))
-    # 추출 실패 잦은 소스의 site-specific CSS selector (A-2)
-    selector_override: Mapped[dict | None] = mapped_column(JSONB)
-    # rdf / table-strip / gn-redirect / skip-video 등 파서 특이 케이스 (B-5)
-    parser_quirk: Mapped[str | None] = mapped_column(String(30))
-    # 채널 — 'news'(다이제스트/브리핑 대상) / 'crawl'(도메인 재료, 0-5 (a)) — migration 324.
-    # documents.source_channel 로 전파, crawl 채널은 embed/chunk 30일 게이트 미적용.
-    # documents 와 동일 PG enum 재사용 (Document 모델과 값 목록 동기 유지).
-    source_channel: Mapped[str] = mapped_column(
-        Enum("law_monitor", "devonagent", "email", "web_clip",
-             "tksafety", "inbox_route", "manual", "drive_sync", "news", "memo",
-             "voice", "hermes", "crawl",
-             name="source_channel"),
-        default="news",
-    )
-
-    # ── 안전 자료실 분류 축 (plan safety-library-1 A-2, migrations 352~355) ──
-    # 자료유형 기본값 — documents.material_type 으로 ingest 시점 전파 (NULL=비대상).
-    # jurisdiction 은 별도 컬럼 없이 country 전파, 단 paper 는 코드에서 NULL 강제.
-    material_type: Mapped[str | None] = mapped_column(Text)
-    # extract_meta.license 주입용 — kogl/ogl/public_domain/proprietary/unknown.
-    # 미확정 = 보수적(unknown + redistribute=false), 근거 확보 시 완화.
-    license_scheme: Mapped[str | None] = mapped_column(Text)
-    license_redistribute: Mapped[bool | None] = mapped_column(Boolean)
@@ -1,60 +0,0 @@
-"""발행 레이어 ORM (docsrv-viewer-publish) — published projection + publish_outbox.
-
-관계(relationship) 없음 = 독립 테이블, configure_mappers 무영향. 마이그 367~372.
-  published      = 뷰어가 read API(P0-2)로 당기는 render-ready projection(kind-discriminated).
-  publish_outbox = 저작/4-A 트랜잭션이 같은 tx에서 INSERT, 발행 워커가 drain 하며 rev 부여.
-
-불변식(plan study-to-viewer-slice1):
-  pub_id opaque+stable = dedup키 = progress키 / rev = 워커 커밋순 gapless(pg_advisory_lock 단일 라이터)
-  / (payload_hash, deleted) 디둡 / 삭제 = tombstone(deleted=true) / schema_version = 엔벨로프 버전.
-"""
-
-from __future__ import annotations
-
-from datetime import datetime
-
-from sqlalchemy import BigInteger, Boolean, DateTime, SmallInteger, String, Text
-from sqlalchemy.dialects.postgresql import JSONB
-from sqlalchemy.orm import Mapped, mapped_column
-
-from core.database import Base
-
-
-class Published(Base):
-    __tablename__ = "published"
-
-    id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
-    kind: Mapped[str] = mapped_column(String(40), nullable=False)
-    source_id: Mapped[int] = mapped_column(BigInteger, nullable=False)
-    pub_id: Mapped[str] = mapped_column(Text, nullable=False)
-    payload: Mapped[dict] = mapped_column(JSONB, nullable=False)
-    payload_hash: Mapped[str] = mapped_column(Text, nullable=False)
-    schema_version: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=1)
-    rev: Mapped[int] = mapped_column(BigInteger, nullable=False)
-    deleted: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
-    created_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now, nullable=False
-    )
-    updated_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now, nullable=False
-    )
-
-    # UNIQUE(kind, pub_id)=mig368, UNIQUE(kind, source_id)=mig369, idx(rev)=mig370.
-
-
-class PublishOutbox(Base):
-    __tablename__ = "publish_outbox"
-
-    id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
-    kind: Mapped[str] = mapped_column(String(40), nullable=False)
-    source_id: Mapped[int] = mapped_column(BigInteger, nullable=False)
-    payload: Mapped[dict] = mapped_column(JSONB, nullable=False)
-    payload_hash: Mapped[str] = mapped_column(Text, nullable=False)
-    schema_version: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=1)
-    deleted: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
-    created_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now, nullable=False
-    )
-    processed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-
-    # 미처리 부분 인덱스 idx(id) WHERE processed_at IS NULL = mig372.
@@ -2,41 +2,14 @@

 from datetime import datetime

-from sqlalchemy import BigInteger, DateTime, Enum, ForeignKey, SmallInteger, Text, func, or_, text
+from sqlalchemy import BigInteger, DateTime, Enum, ForeignKey, SmallInteger, Text, text
 from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.orm import Mapped, mapped_column
-from sqlalchemy.types import TIMESTAMP

 from core.database import Base


-class StageDeferred(Exception):
-    """워커가 '지금은 처리 불가 — 자료 손상 없이 보류' 를 선언하는 신호 (ds-macbook-offload-1).
-
-    맥북(M5 Max) deep 슬롯 경로 전용: 503(upstream_cold/editor_busy/warming) · 연결 실패 ·
-    생성 중 절단(read-timeout, 맥북 sleep) 시 raise. queue_consumer/queue_drain 이 attempts 를
-    소모하지 않고 pending 복귀 + payload.deferred_until 백오프를 기록한다. 결과 쓰기는 호출
-    완주 + 파싱 성공 후에만 일어나므로 어느 시점에 끊겨도 부분 쓰기 0 (sleep-안전 불변식).
-    """
-
-    def __init__(self, reason: str, retry_after_minutes: int = 30):
-        super().__init__(reason)
-        self.retry_after_minutes = retry_after_minutes
-
-
-def not_deferred_condition():
-    """보류 백오프(payload.deferred_until, ISO 문자열) 가 미래인 행을 claim 에서 제외.
-
-    payload 없음 / 키 없음 = 통과. queue_consumer 와 queue_drain 의 claim 이 공유한다.
-    """
-    deferred = ProcessingQueue.payload["deferred_until"].astext
-    return or_(
-        deferred.is_(None),
-        deferred.cast(TIMESTAMP(timezone=True)) <= func.now(),
-    )
-
-
 class ProcessingQueue(Base):
    __tablename__ = "processing_queue"

@@ -45,12 +18,10 @@ class ProcessingQueue(Base):
    stage: Mapped[str] = mapped_column(
        # 'stt' (audio): migration 150 / 'thumbnail' (video): queue_consumer 가 enqueue.
        # 'deep_summary' (PR-B B-1): classify_worker 가 에스컬레이션 시 enqueue.
-        # 'fulltext' (crawl-24x7 A-2): migration 321 — 기사 페이지 fetch 후 본문 승격.
-        # 'presegment' (G2): migration 364 — extract 前 번들 PDF → N 자식 분할.
        # DB enum 변경은 마이그레이션이 처리하므로 create_type=False.
        Enum(
-            "presegment", "extract", "classify", "summarize", "embed", "chunk", "preview",
-            "stt", "thumbnail", "deep_summary", "markdown", "fulltext",
+            "extract", "classify", "summarize", "embed", "chunk", "preview",
+            "stt", "thumbnail", "deep_summary", "markdown",
            name="process_stage",
            create_type=False,
        ),
@@ -1,44 +0,0 @@
-"""source_health 테이블 ORM (A-5, plan crawl-24x7-1)
-
-news_sources 와 1:1. 소스별 fetch 성공/실패 기록 + circuit breaker 상태.
-silent skip 누적 방지의 가시성 기반 — A-8 헬스 패널이 읽는다.
-"""
-
-from datetime import datetime
-
-from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Integer, String, Text
-from sqlalchemy.orm import Mapped, mapped_column
-
-from core.database import Base
-
-
-class SourceHealth(Base):
-    __tablename__ = "source_health"
-
-    id: Mapped[int] = mapped_column(primary_key=True)
-    source_id: Mapped[int] = mapped_column(
-        Integer, ForeignKey("news_sources.id", ondelete="CASCADE"), nullable=False
-    )
-    consecutive_failures: Mapped[int] = mapped_column(Integer, default=0)
-    total_fetches: Mapped[int] = mapped_column(BigInteger, default=0)
-    total_failures: Mapped[int] = mapped_column(BigInteger, default=0)
-    last_success_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    last_error: Mapped[str | None] = mapped_column(Text)
-    last_error_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    last_fetch_items: Mapped[int | None] = mapped_column(Integer)
-    # 200 인데 entries 0 인 연속 fetch 횟수 (304/해시동일은 미집계 — 피드 부패 신호 전용)
-    empty_streak: Mapped[int] = mapped_column(Integer, default=0)
-    # closed(정상) / open(연속 실패 → 지수 backoff) / disabled(임계 초과, 수동 복구 대상)
-    circuit_state: Mapped[str] = mapped_column(String(10), default="closed")
-    circuit_opened_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    updated_at: Mapped[datetime] = mapped_column(
-        DateTime(timezone=True), default=datetime.now
-    )
-
-    # ── B-3 구독 세션 상태 계약 — migration 325 ──
-    # 쓰기 1종 플래그: A-8 버튼이 기록만, 어댑터가 소비(수동 half-open).
-    # 소비 위치 = open-스킵 분기보다 앞 (r5 함정 고정 — 데드 버튼 방지).
-    relogin_requested: Mapped[bool] = mapped_column(Boolean, default=False)
-    # 내용 기반 probe 결과 (시간 기반 만료 판정 금지 — 페이월 안내문 silent corruption 차단)
-    last_probe_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    last_probe_ok: Mapped[bool | None] = mapped_column(Boolean)
@@ -25,7 +25,6 @@ from sqlalchemy import (
    String,
    Text,
    func,
-    select,
    text,
    update,
 )
@@ -100,25 +99,13 @@ async def supersede_old_cards(
    *,
    source_question_id: int,
    keep_generated_at: datetime | None,
-) -> list[int]:
+) -> int:
    """같은 문제의 '다른 버전' 카드를 deleted_at 마킹(retire).

    새 source_generated_at 카드 적재 '전에' 호출 — 살아있는 구버전 카드가 dedup PARTIAL
    UNIQUE 로 새 추출을 막는 것을 방지(정정-후 stale 잔류 0). 같은 버전은 보존.
-    Returns: retire 되며 '발행 중이던'(needs_review=False) 카드 id 목록 — 발행 tombstone
-      대상(호출측이 enqueue). 검수 안 됐던(미발행) retire 카드는 tombstone 불요라 제외.
+    Returns: retire 된 행 수.
    """
-    # 발행 중이던 retire 대상 선캡처(update 전) — 미발행 카드 스푸리어스 tombstone 회피.
-    published_retired = (
-        await session.execute(
-            select(StudyMemoCard.id).where(
-                StudyMemoCard.source_question_id == source_question_id,
-                StudyMemoCard.deleted_at.is_(None),
-                StudyMemoCard.source_generated_at.is_distinct_from(keep_generated_at),
-                StudyMemoCard.needs_review.is_(False),
-            )
-        )
-    ).scalars().all()
    stmt = (
        update(StudyMemoCard)
        .where(
@@ -128,8 +115,8 @@ async def supersede_old_cards(
        )
        .values(deleted_at=func.now())
    )
-    await session.execute(stmt)
-    return list(published_retired)
+    result = await session.execute(stmt)
+    return result.rowcount or 0


 async def append_card(
@@ -229,24 +216,13 @@ async def flag_cards_for_source(
    *,
    source_question_id: int,
    reason: str,
-) -> list[int]:
+) -> int:
    """소스 문제 정정/삭제 시 파생 카드를 needs_review=auto 마킹(임시 플래그).

    최종 stale 정리는 워커 supersede 가 책임 — 이건 사용자 가시화용 즉시 플래그.
    reason: 'source_changed' | 'source_deleted'.
-    Returns: 플래그로 '발행 자격을 잃은'(직전 needs_review=False) 카드 id 목록 — 발행
-      tombstone 대상(호출측 enqueue). 이미 검수대기였던(미발행) 카드는 제외.
+    Returns: 마킹된 행 수.
    """
-    # 발행 중이던 카드 선캡처(update 전) — 플래그로 needs_review=True 가 되면 발행 자격 상실.
-    published_ids = (
-        await session.execute(
-            select(StudyMemoCard.id).where(
-                StudyMemoCard.source_question_id == source_question_id,
-                StudyMemoCard.deleted_at.is_(None),
-                StudyMemoCard.needs_review.is_(False),
-            )
-        )
-    ).scalars().all()
    stmt = (
        update(StudyMemoCard)
        .where(
@@ -255,5 +231,5 @@ async def flag_cards_for_source(
        )
        .values(needs_review=True, flagged_by=reason, flagged_at=func.now())
    )
-    await session.execute(stmt)
-    return list(published_ids)
+    result = await session.execute(stmt)
+    return result.rowcount or 0
@@ -7,7 +7,7 @@ PR-2 가드레일:
  - correct_choice 변경 시 기존 attempt.is_correct 재계산 안 함 (기록은 그 시점의 사실).
 """

-from datetime import datetime, timezone
+from datetime import datetime

 from pgvector.sqlalchemy import Vector
 from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Integer, SmallInteger, String, Text
@@ -128,9 +128,7 @@ class StudyQuestionAttempt(Base):
    # PR-9: outcome 권장값 (correct/wrong/unsure). 강한 enum 미사용.
    outcome: Mapped[str] = mapped_column(String(20), nullable=False)
    answered_at: Mapped[datetime] = mapped_column(
-        # TZ-aware 명시 (R8) — naive datetime.now() 는 컨테이너 TZ 의존. 현 컨테이너=UTC 라
-        # 값 동일(백필 불요)이나, 컨테이너 TZ 가 바뀌면 9시간 어긋나는 잠복 의존 제거.
-        DateTime(timezone=True), default=lambda: datetime.now(timezone.utc), nullable=False
+        DateTime(timezone=True), default=datetime.now, nullable=False
    )
    # PR-10: 어떤 quiz 세션의 attempt 인지 (NULL = 세션 외 직접 입력 또는 세션 삭제됨).
    quiz_session_id: Mapped[int | None] = mapped_column(
@@ -50,10 +50,6 @@ class StudyQuizSession(Base):
    chronic_remaining_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)

    finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-    # study-to-viewer P2: 뷰어 ingest 멱등/출처. 라이브 세션=finalized_at·client_session_uuid NULL, source='live'.
-    finalized_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))  # 멱등 마커(mig 373)
-    client_session_uuid: Mapped[str | None] = mapped_column(String(64))  # 뷰어 세션 UUID(mig 374, uq mig376)
-    source: Mapped[str] = mapped_column(String(20), nullable=False, default="live")  # live|viewer(mig 375)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=datetime.now, nullable=False
    )
@@ -1,41 +0,0 @@
-You are a document-boundary detector. Output ONLY JSON {is_bundle, segments:[{start_page,end_page,title}]}.
-
-You are given a single PDF that may be a "bundle" — several independent logical documents
-concatenated into one file (for example: multiple laws, multiple reports, or multiple papers
-scanned together). Your job is to decide whether it is a bundle and, if so, where each logical
-document starts and ends.
-
-You receive only a compact sample per page: the page number and the first line / heading of that
-page (text may be truncated). Use these heading/first-line signals to detect where a new logical
-document begins (a new title page, a new cover, a clearly new document title, a restart of
-numbering, etc.). You do NOT receive the full text.
-
-Output rules:
- Respond with STRICT JSON only. No prose, no markdown, no code fence.
- Schema:
-  {
-    "is_bundle": true | false,
-    "segments": [
-      {"start_page": <int>, "end_page": <int>, "title": "<string or null>"}
-    ]
-  }
- Page numbers are 1-based and INCLUSIVE. start_page=1 is the first page; end_page equals the last
-  page of that segment.
- Segments MUST fully cover every page with NO gaps and NO overlaps:
-  - the first segment MUST start at page 1,
-  - each next segment MUST start exactly one page after the previous segment's end_page,
-  - the last segment MUST end at the final page (page_count).
- Order segments by start_page ascending.
- title = a short title for that logical document if you can infer one from its first page,
-  otherwise null.
-
-If the file is NOT a bundle (it is a single logical document), respond:
-  {"is_bundle": false, "segments": []}
-
-Be conservative: only report is_bundle=true when the heading signals clearly indicate separate
-logical documents. When unsure, return is_bundle=false.
-
-page_count: {page_count}
-
-Per-page samples (one per line, "p{n}: {first line}"):
-{page_samples}
@@ -17,17 +17,7 @@ python-multipart>=0.0.9
 jinja2>=3.1.0
 feedparser>=6.0.0
 pymupdf>=1.24.0
-# Web/Blog ingest (devonagent 트랙) + 뉴스 fulltext 승격 (crawl-24x7 A-2) — 4-tier fallback.
-# trafilatura 는 단일 메인테이너 리스크로 exact pin (A-2 결정).
-trafilatura==2.1.0
+# Web/Blog ingest (devonagent 트랙) — HTML 본문 정화 4-tier fallback
+trafilatura>=1.12.0
 readability-lxml>=0.8.1
 markdownify>=0.13.1
-# tier-4 (bs4) 가 직접 import — 전이 의존 가정 제거 (crawl-24x7 A-2)
-beautifulsoup4>=4.12.0
-# office OOXML(docx/xlsx/pptx) → md (plan ds-s1-backend-1 C-1).
-# 정확한 핀은 E-1 markitdown OOXML PoC(devsbx/버전핀 컨텍스트)에서 확정.
-markitdown[docx,xlsx,pptx]>=0.1.0
-# .hwp(HWP5 binary) → md: 순수 Python HWP5 전용 변환기(CLI hwp5html). LibreOffice 번들 libhwplo
-# 필터가 실제 한컴 HWP5 를 못 읽어 전건 실패 → pyhwp 로 교체(2026-06-09). six = pyhwp 의 미선언 런타임 의존성.
-pyhwp>=0.1b15
-six>=1.16.0
@@ -1,93 +0,0 @@
-"""off-queue 관리 스크립트(백필 등) 진행 가시화 — background_jobs (migration 357).
-
-processing_queue 는 파이프라인 stage 전용이라 hier_overnight_backfill /
-section_summary_pilot 같은 스크립트 작업은 대시보드 보드에 안 잡힌다. 이 모듈로
-스크립트가 진행상황을 남기면 queue_overview 가 "백그라운드 작업" 패널로 노출한다.
-
-설계 불변식:
- **자율 트랜잭션**: 각 기록은 engine.begin() 짧은 트랜잭션으로 즉시 commit한다.
-  스크립트 본 작업은 별도 세션(긴 트랜잭션)이라, 같이 묶으면 commit 전까지 안 보여
-  실시간 가시화가 깨진다. 그래서 전용 connection 으로 독립 commit.
- **best-effort**: 관측 기록 실패가 본 작업을 깨면 안 된다 — 모든 함수 try/except,
-  실패 시 warning 로그만. job_id=None 이면 조용히 no-op (start 실패해도 이어서 동작).
-"""
-
-import json
-import logging
-
-from sqlalchemy import text
-from sqlalchemy.ext.asyncio import AsyncEngine
-
-logger = logging.getLogger(__name__)
-
-
-async def start_job(
-    engine: AsyncEngine, kind: str, label: str | None = None, total: int | None = None
-) -> int | None:
-    """작업 시작 기록 → background_jobs.id (실패 시 None — 호출측은 그대로 진행)."""
-    try:
-        async with engine.begin() as conn:
-            row = (
-                await conn.execute(
-                    text(
-                        "INSERT INTO background_jobs (kind, label, total) "
-                        "VALUES (:k, :l, :t) RETURNING id"
-                    ),
-                    {"k": kind, "l": label, "t": total},
-                )
-            ).first()
-            return int(row[0]) if row else None
-    except Exception as exc:  # noqa: BLE001 — 관측은 부가, 본작업 보호
-        logger.warning(f"[background_jobs] start 실패(무시): {type(exc).__name__}: {exc}")
-        return None
-
-
-async def heartbeat(
-    engine: AsyncEngine,
-    job_id: int | None,
-    *,
-    processed: int | None = None,
-    total: int | None = None,
-    detail: dict | None = None,
-) -> None:
-    """진행 갱신(processed/total/detail). job_id=None 또는 실패 시 no-op."""
-    if job_id is None:
-        return
-    try:
-        async with engine.begin() as conn:
-            await conn.execute(
-                text(
-                    "UPDATE background_jobs SET "
-                    "processed = COALESCE(:p, processed), "
-                    "total = COALESCE(:t, total), "
-                    "detail = COALESCE(CAST(:d AS jsonb), detail), "
-                    "updated_at = now() WHERE id = :id"
-                ),
-                {
-                    "id": job_id,
-                    "p": processed,
-                    "t": total,
-                    "d": json.dumps(detail, ensure_ascii=False) if detail is not None else None,
-                },
-            )
-    except Exception as exc:  # noqa: BLE001
-        logger.warning(f"[background_jobs] heartbeat 실패(무시): {type(exc).__name__}: {exc}")
-
-
-async def finish_job(
-    engine: AsyncEngine, job_id: int | None, *, state: str = "done", error: str | None = None
-) -> None:
-    """종료 기록(done/failed). job_id=None 또는 실패 시 no-op."""
-    if job_id is None:
-        return
-    try:
-        async with engine.begin() as conn:
-            await conn.execute(
-                text(
-                    "UPDATE background_jobs SET state = :s, error = :e, "
-                    "finished_at = now(), updated_at = now() WHERE id = :id"
-                ),
-                {"id": job_id, "s": state, "e": (error or None)},
-            )
-    except Exception as exc:  # noqa: BLE001
-        logger.warning(f"[background_jobs] finish 실패(무시): {type(exc).__name__}: {exc}")
@@ -18,14 +18,12 @@ from typing import Any
 import numpy as np

 from ai.client import parse_json_response
-from core.config import settings
 from core.utils import setup_logger
 from services.clustering_common import normalize_vector
-from services.search.llm_gate import Priority, acquire_mlx_gate

 logger = setup_logger("briefing_comparator")

-LLM_CALL_TIMEOUT = settings.digest_llm_timeout_s   # 2026-06-15 config 단일소스 (Phase 4 와 동일 키)
+LLM_CALL_TIMEOUT = 25                 # 초. Phase 4 와 동일
 HISTORICAL_TOP_K = 5
 HISTORICAL_SIMILARITY_MIN = 0.70
 HISTORICAL_WINDOW_DAYS = 30
@@ -41,6 +39,7 @@ MAX_ARTICLE_IDS_PER_COUNTRY = 5       # country_perspectives[].article_ids 후
 FALLBACK_HEADLINE = "LLM 분석 실패로 원문 기사 묶음만 표시합니다."
 FALLBACK_TOPIC_LABEL = "주요 뉴스 묶음"

+_llm_sem = asyncio.Semaphore(1)
 _PROMPT_PATH = Path(__file__).resolve().parent.parent.parent / "prompts" / "briefing_comparative.txt"
 _PROMPT_TEMPLATE: str | None = None

@@ -113,8 +112,7 @@ def retrieve_historical(


 async def _try_call_llm(client: Any, prompt: str) -> str:
-    # 전역 MLX gate(BACKGROUND) 경유 — 영구 룰(llm_gate): 새 Semaphore 금지, timeout 은 gate 안쪽.
-    async with acquire_mlx_gate(Priority.BACKGROUND):
+    async with _llm_sem:
        return await asyncio.wait_for(
            client.call_primary(prompt),
            timeout=LLM_CALL_TIMEOUT,
@@ -284,7 +282,7 @@ async def compare_cluster_with_fallback(
    historical_docs = historical_docs or []
    prompt = build_prompt(selected, historical_docs)

-    for attempt in range(settings.digest_llm_attempts):  # 2026-06-15 config 단일소스
+    for attempt in range(2):
        try:
            raw = await _try_call_llm(client, prompt)
        except asyncio.TimeoutError:
@@ -15,12 +15,11 @@ from sqlalchemy import text

 from core.database import async_session
 from core.utils import setup_logger
-from services.search.license_filter import restricted_exclude_sql

 logger = setup_logger("briefing_loader")


-_NEWS_WINDOW_SQL = text(f"""
+_NEWS_WINDOW_SQL = text("""
    SELECT
        d.id,
        d.title,
@@ -42,9 +41,6 @@ _NEWS_WINDOW_SQL = text(f"""
      AND d.created_at < :window_end
      AND d.embedding IS NOT NULL
      AND d.ai_summary IS NOT NULL
-      AND length(d.ai_summary) > 0
-      -- 안전 자료실 B-4: licensed_restricted 발행 차단 (digest 와 동일 공유 술어, 경로 일관성)
-      AND {restricted_exclude_sql("d")}
 """)


@@ -53,7 +49,7 @@ _SOURCE_COUNTRY_SQL = text("""
 """)


-_HISTORICAL_CANDIDATES_SQL = text(f"""
+_HISTORICAL_CANDIDATES_SQL = text("""
    SELECT
        d.id,
        d.title,
@@ -67,9 +63,6 @@ _HISTORICAL_CANDIDATES_SQL = text(f"""
      AND d.created_at < :hist_end
      AND d.embedding IS NOT NULL
      AND d.ai_summary IS NOT NULL
-      AND length(d.ai_summary) > 0
-      -- 안전 자료실 B-4: licensed_restricted 발행 차단 (공유 술어)
-      AND {restricted_exclude_sql("d")}
 """)


@@ -6,7 +6,6 @@
 regenerate 정책: briefing_date UNIQUE 충돌 시 transaction 안에서 DELETE+INSERT.
 """

-import asyncio
 import time
 from datetime import date, datetime, timedelta, timezone
 from typing import Any
@@ -16,9 +15,7 @@ from sqlalchemy import delete

 from ai.client import AIClient
 from core.database import async_session
-from core.database import engine as db_engine
 from core.utils import setup_logger
-from services import background_jobs as bgj
 from models.briefing import BriefingTopic, MorningBriefing
 from services.briefing.clustering import LAMBDA, cluster_global
 from services.briefing.comparator import (
@@ -36,6 +33,7 @@ KST = ZoneInfo("Asia/Seoul")
 NIGHT_WINDOW_HOURS = 5             # KST 00:00 ~ 05:00
 SELECT_K = 7                       # Plan §"Clustering 파라미터" briefing K_PER_CLUSTER=7
 SELECT_LAMBDA_MMR = 0.6            # Plan briefing MMR lambda 0.6
+PIPELINE_HARD_CAP = 600            # 초. Phase 4 와 동일


 def _compute_window(target_date: date | None = None) -> tuple[datetime, datetime, date]:
@@ -145,7 +143,7 @@ async def _save_briefing(
        return new.id


-async def run_briefing_pipeline(target_date: date | None = None, job_id: int | None = None) -> dict[str, Any]:
+async def run_briefing_pipeline(target_date: date | None = None) -> dict[str, Any]:
    """야간 뉴스 브리핑 1회 실행. cron 또는 수동 regenerate API 에서 호출.

    Returns:
@@ -208,36 +206,16 @@ async def run_briefing_pipeline(target_date: date | None = None, job_id: int | N
    usable_count = 0

    try:
-        # 2026-06-15: cluster 호출 gather 동시 실행. 실동시성 = 전역 MLX gate
-        # (config.mlx_gate_concurrency, BACKGROUND 우선순위). rank/순서 보존.
-        jobs = []
        for rank, cluster in enumerate(clusters, start=1):
            selected = select_for_llm(cluster, k=SELECT_K, lambda_mmr=SELECT_LAMBDA_MMR)
            historical_docs = (
                retrieve_historical(cluster, historical_candidates)
                if historical_enabled() else []
            )
-            jobs.append((rank, cluster, selected, historical_docs))
-
-        if job_id is not None:
-            await bgj.heartbeat(db_engine, job_id, total=len(jobs))
-        _prog = {"n": 0}
-
-        async def _run_one(cluster, selected, historical_docs):
-            r = await compare_cluster_with_fallback(
+            llm_calls += 1
+            envelope = await compare_cluster_with_fallback(
                client, cluster, selected, historical_docs=historical_docs
            )
-            if job_id is not None:
-                _prog["n"] += 1
-                await bgj.heartbeat(db_engine, job_id, processed=_prog["n"])
-            return r
-
-        results = await asyncio.gather(
-            *[_run_one(c, s, h) for (_, c, s, h) in jobs]
-        )
-
-        for (rank, cluster, selected, historical_docs), envelope in zip(jobs, results):
-            llm_calls += 1
            if envelope.get("llm_fallback_used"):
                llm_failures += 1
            if _is_usable_topic(envelope, envelope["topic_label"]):
@@ -1,239 +0,0 @@
-"""중복검사(dedup) 공용 로직 — plan ds-s1-backend-1 B 그룹.
-
-세 소비처가 공유:
-  - B-1 업로드 채움 (api/documents.upload_document)  → find_canonical_for_hash
-  - B-2 GET /documents/duplicates                     → DEDUP_OFF_CHANNELS (그룹 SQL 은 라우터에)
-  - B-4 backfill (scripts/backfill_dedup.py)          → DEDUP_OFF_CHANNELS / canonical = min(id)
-  - B-3 near_duplicate                                → find_near_duplicates
-
-OFF-whitelist (DEDUP_OFF_CHANNELS):
-  law_monitor = 법령 개정본을 의도적으로 별 행으로 보존(개정일 추적). file_hash 가 같아도
-  collapse 하면 개정 이력이 사라지므로 dedup 비참여. (P0-2 실측: dup 18그룹/36행 중
-  law_monitor 17그룹 = 의도된 개정 보존, manual 1그룹 = 진짜 content dedup.)
-  file_hash 는 이미 채널별 키를 인코딩(note=본문SHA / devonagent=URL / news=article_id)하므로
-  채널별 키 분기는 두지 않고 단일 OFF-list 만 데이터로 둔다(P0-2 결정).
-
-near_duplicate (B-3):
-  title trigram 후보 → 후보에만 doc-level embedding 코사인 rerank. 전수 28.9k 임베딩 스캔 회피.
-  저장된 embedding read-only(검색실험 Soft Lock: 재생성 금지). 임계·결과는 전부 non-gating 기록값
-  (trigram-first recall gap = 본문동일·제목상이 near-dup 은 놓침 → phase2 ivfflat 회수 대상).
-  영속화는 보류(on-the-fly) — S1 은 helper + 호출부 로깅까지. duplicate_of 영속화는 exact(file_hash)만.
-"""
-
-from __future__ import annotations
-
-import logging
-
-from sqlalchemy import bindparam, or_, select, text
-from sqlalchemy.ext.asyncio import AsyncSession
-
-logger = logging.getLogger(__name__)
-
-# file_hash dedup 제외 채널 (단일 OFF-whitelist). B-1/B-2/B-4 공용.
-DEDUP_OFF_CHANNELS: tuple[str, ...] = ("law_monitor",)
-
-# near_duplicate 파라미터 — 전부 기록값·non-gating (phase2 ivfflat 가 recall gap 회수).
-NEAR_DUP_TRGM_THRESHOLD = 0.30      # pg_trgm title 후보 컷 (느슨 — 후보 생성용)
-NEAR_DUP_COSINE_THRESHOLD = 0.95    # 후보 embedding 코사인 near-dup 판정 컷 (≈0.95~0.97)
-NEAR_DUP_MAX_CANDIDATES = 50        # trigram 후보 상한 — 전수 임베딩 스캔 회피
-
-
-async def find_canonical_for_hash(
-    session: AsyncSession, file_hash: str, *, exclude_id: int | None = None
-):
-    """주어진 file_hash 의 canonical 문서(가장 오래된 = min id)를 반환. 없으면 None.
-
-    OFF-whitelist 채널(law_monitor)은 canonical 후보에서 제외 → 업로드가 법령 개정본에
-    링크되지 않는다. exclude_id = 방금 INSERT 한 신규 행 자신 제외(B-1).
-    """
-    from models.document import Document  # 지연 import (순환 회피)
-
-    stmt = (
-        select(Document)
-        .where(
-            Document.file_hash == file_hash,
-            Document.deleted_at.is_(None),
-            or_(
-                Document.source_channel.is_(None),
-                Document.source_channel.notin_(DEDUP_OFF_CHANNELS),
-            ),
-        )
-        .order_by(Document.id.asc())
-    )
-    if exclude_id is not None:
-        stmt = stmt.where(Document.id != exclude_id)
-    return (await session.execute(stmt)).scalars().first()
-
-
-# B-2 /documents/duplicates 의 file_hash 그룹 SQL. 라우터가 직접 execute (Pydantic 응답은 라우터에).
-#   reason='content_hash' = file_hash exact 그룹(idx_documents_hash 재사용, 신규 인덱스/테이블 불요).
-#   canonical_id = min(id), members = id 오름차순 배열, n = 그룹 크기.
-DUPLICATE_GROUPS_SQL = text(
-    """
-    SELECT file_hash,
-           min(id)                       AS canonical_id,
-           array_agg(id ORDER BY id)     AS members,
-           count(*)                      AS n
-    FROM documents
-    WHERE deleted_at IS NULL
-      AND file_hash IS NOT NULL
-      AND (source_channel IS NULL OR source_channel NOT IN :off_channels)
-    GROUP BY file_hash
-    HAVING count(*) > 1
-    ORDER BY min(id)
-    """
-).bindparams(bindparam("off_channels", expanding=True))
-
-
-async def reconcile_dedup(
-    session: AsyncSession, *, apply: bool = True, chunk_size: int = 500, sample_size: int = 40
-) -> dict:
-    """file_hash exact 그룹의 duplicate_of/duplicate_count 를 재계산해 정합화 (B-4 코어).
-
-    멱등 — 목표값과 다른 행만 UPDATE. 야간 잡(workers.dedup_reconcile)과 backfill 스크립트가
-    공유한다. 문서는 soft-delete only(FK ON DELETE SET NULL 미발화) → 비정규화 dedup 컬럼이
-    삭제 시 드리프트(멤버의 stale 포인터·canonical overcount)하므로 절대 재계산이 정합 보장.
-
-    반환 = {groups, docs, changes, applied, sample}. sample = 적용될/된 변경 미리보기(최대 sample_size).
-    canonical = 그룹 최古(min id): duplicate_of=NULL, duplicate_count=group_size-1. 멤버: duplicate_of=canonical, count=0.
-    """
-    groups = (
-        await session.execute(
-            DUPLICATE_GROUPS_SQL, {"off_channels": list(DEDUP_OFF_CHANNELS)}
-        )
-    ).all()
-
-    desired: dict[int, tuple[int | None, int]] = {}
-    for g in groups:
-        members = list(g.members)
-        canonical = g.canonical_id
-        desired[canonical] = (None, len(members) - 1)
-        for m in members:
-            if m != canonical:
-                desired[m] = (canonical, 0)
-
-    if not desired:
-        return {"groups": 0, "docs": 0, "changes": 0, "applied": 0, "sample": []}
-
-    ids = list(desired.keys())
-    current: dict[int, tuple[int | None, int]] = {}
-    for i in range(0, len(ids), 1000):
-        batch = ids[i : i + 1000]
-        rows = (
-            await session.execute(
-                text(
-                    "SELECT id, duplicate_of, duplicate_count "
-                    "FROM documents WHERE id = ANY(:ids)"
-                ).bindparams(ids=batch)
-            )
-        ).all()
-        for r in rows:
-            current[r.id] = (r.duplicate_of, int(r.duplicate_count or 0))
-
-    changes = [
-        (i, dof, dcnt)
-        for i, (dof, dcnt) in desired.items()
-        if current.get(i) != (dof, dcnt)
-    ]
-    sample = [
-        {"id": i, "duplicate_of": dof, "duplicate_count": dcnt}
-        for (i, dof, dcnt) in changes[:sample_size]
-    ]
-
-    applied = 0
-    if apply and changes:
-        for i in range(0, len(changes), chunk_size):
-            for did, dof, dcnt in changes[i : i + chunk_size]:
-                await session.execute(
-                    text(
-                        "UPDATE documents SET duplicate_of = :dof, duplicate_count = :dcnt "
-                        "WHERE id = :id"
-                    ).bindparams(dof=dof, dcnt=dcnt, id=did)
-                )
-            await session.commit()
-            applied += len(changes[i : i + chunk_size])
-
-    return {
-        "groups": len(groups),
-        "docs": len(ids),
-        "changes": len(changes),
-        "applied": applied,
-        "sample": sample,
-    }
-
-
-async def find_near_duplicates(
-    session: AsyncSession,
-    doc_id: int,
-    *,
-    cosine_threshold: float = NEAR_DUP_COSINE_THRESHOLD,
-    trgm_threshold: float = NEAR_DUP_TRGM_THRESHOLD,
-    max_candidates: int = NEAR_DUP_MAX_CANDIDATES,
-) -> list[dict]:
-    """anchor doc 의 near-duplicate 후보를 trigram→embedding 2단계로 찾는다(read-only).
-
-    반환 = [{doc_id, title, title_sim?, cosine}] (cosine 내림차순). embedding 미생성 시
-    (업로드 직후 흔함) trigram 후보만 cosine=None 으로 반환(non-gating 기록). 어떤 행도
-    수정/삭제하지 않으며 저장된 embedding 만 읽는다(Soft Lock 준수).
-    """
-    anchor = (
-        await session.execute(
-            text(
-                "SELECT id, title, (embedding IS NOT NULL) AS has_emb "
-                "FROM documents WHERE id = :id AND deleted_at IS NULL"
-            ).bindparams(id=doc_id)
-        )
-    ).first()
-    if anchor is None or not anchor.title:
-        return []
-
-    # (1) title trigram 후보. similarity() 컷으로 후보를 max_candidates 로 줄여 전수 임베딩
-    #     스캔을 회피한다. (index-accelerated `%` 연산자 경로는 후보 생성이 병목이 될 때의
-    #     phase2 최적화 — 짧은 title 28.9k seq 평가는 비동기 post-upload 에서 충분히 저렴.)
-    cand_rows = (
-        await session.execute(
-            text(
-                """
-                SELECT id, title, similarity(title, :t) AS title_sim
-                FROM documents
-                WHERE id <> :id
-                  AND deleted_at IS NULL
-                  AND title IS NOT NULL
-                  AND similarity(title, :t) >= :trgm
-                ORDER BY similarity(title, :t) DESC
-                LIMIT :lim
-                """
-            ).bindparams(id=doc_id, t=anchor.title, trgm=trgm_threshold, lim=max_candidates)
-        )
-    ).all()
-    if not cand_rows:
-        return []
-
-    if not anchor.has_emb:
-        # 임베딩 미생성 — 후보만 기록(cosine rerank 는 embed stage 완료 후). non-gating.
-        return [
-            {"doc_id": r.id, "title": r.title, "title_sim": float(r.title_sim), "cosine": None}
-            for r in cand_rows
-        ]
-
-    # (2) 후보에만 doc-level embedding 코사인 rerank. 저장값 read-only.
-    cand_ids = [r.id for r in cand_rows]
-    rer = (
-        await session.execute(
-            text(
-                """
-                SELECT c.id, c.title,
-                       (1 - (c.embedding <=> (SELECT embedding FROM documents WHERE id = :id))) AS cosine
-                FROM documents c
-                WHERE c.id = ANY(:ids) AND c.embedding IS NOT NULL
-                """
-            ).bindparams(id=doc_id, ids=cand_ids)
-        )
-    ).all()
-    out = [
-        {"doc_id": r.id, "title": r.title, "cosine": float(r.cosine)}
-        for r in rer
-        if r.cosine is not None and float(r.cosine) >= cosine_threshold
-    ]
-    out.sort(key=lambda x: x["cosine"], reverse=True)
-    return out
@@ -15,12 +15,11 @@ from sqlalchemy import text

 from core.database import async_session
 from core.utils import setup_logger
-from services.search.license_filter import restricted_exclude_sql

 logger = setup_logger("digest_loader")


-_NEWS_WINDOW_SQL = text(f"""
+_NEWS_WINDOW_SQL = text("""
    SELECT
        d.id,
        d.title,
@@ -42,10 +41,6 @@ _NEWS_WINDOW_SQL = text(f"""
      AND d.created_at < :window_end
      AND d.embedding IS NOT NULL
      AND d.ai_summary IS NOT NULL
-      AND length(d.ai_summary) > 0
-      -- 안전 자료실 B-4: licensed_restricted 발행 차단 (모든 경로 공유 술어 = license_filter).
-      -- news 채널엔 현재 restricted 부재 = 방어적 게이트(미래 유료 news 소스 대비, 경로 누락 방지).
-      AND {restricted_exclude_sql("d")}
 """)


@@ -10,7 +10,6 @@ Step:
    7. start/end 로그 + generation_ms + fallback 비율 health metric
 """

-import asyncio
 import hashlib
 import time
 from datetime import datetime, timedelta, timezone
@@ -20,9 +19,7 @@ from sqlalchemy import delete

 from ai.client import AIClient
 from core.database import async_session
-from core.database import engine as db_engine
 from core.utils import setup_logger
-from services import background_jobs as bgj
 from models.digest import DigestTopic, GlobalDigest

 from .clustering import LAMBDA, cluster_country
@@ -76,7 +73,7 @@ def _build_topic_row(
    )


-async def run_digest_pipeline(job_id: int | None = None) -> dict:
+async def run_digest_pipeline() -> dict:
    """전체 파이프라인 실행. worker entry 에서 호출.

    Returns:
@@ -110,37 +107,20 @@ async def run_digest_pipeline(job_id: int | None = None) -> dict:
    stats = {"llm_calls": 0, "fallback_used": 0}

    try:
-        # 2026-06-15: cluster 호출을 gather 로 동시 실행. 실제 동시성은 전역 MLX gate
-        # (config.mlx_gate_concurrency, BACKGROUND 우선순위) 가 제한한다. rank/순서 보존.
-        jobs = []
        for country, docs in docs_by_country.items():
            clusters = cluster_country(country, docs)
            if not clusters:
                continue  # sparse country 자동 제외
+
            for rank, cluster in enumerate(clusters, start=1):
                selected = select_for_llm(cluster)
-                jobs.append((country, rank, cluster, selected))
-
-        if job_id is not None:
-            await bgj.heartbeat(db_engine, job_id, total=len(jobs))
-        _prog = {"n": 0}
-
-        async def _run_one(cluster, selected):
-            r = await summarize_cluster_with_fallback(client, cluster, selected)
-            if job_id is not None:
-                _prog["n"] += 1
-                await bgj.heartbeat(db_engine, job_id, processed=_prog["n"])
-            return r
-
-        results = await asyncio.gather(*[_run_one(c, s) for (_, _, c, s) in jobs])
-
-        for (country, rank, cluster, selected), llm_result in zip(jobs, results):
-            stats["llm_calls"] += 1
-            if llm_result["llm_fallback_used"]:
-                stats["fallback_used"] += 1
-            all_topic_rows.append(
-                _build_topic_row(country, rank, cluster, selected, llm_result, primary_model)
-            )
+                stats["llm_calls"] += 1
+                llm_result = await summarize_cluster_with_fallback(client, cluster, selected)
+                if llm_result["llm_fallback_used"]:
+                    stats["fallback_used"] += 1
+                all_topic_rows.append(
+                    _build_topic_row(country, rank, cluster, selected, llm_result, primary_model)
+                )
    finally:
        await client.close()

@@ -2,8 +2,8 @@

 핵심 결정:
 - AIClient._call_chat 직접 호출 (client.py 수정 회피, fallback 로직 재사용)
- 전역 MLX gate(BACKGROUND) 경유로 동시성 제어 (services.search.llm_gate 단일 게이트)
- Per-call timeout = config.digest_llm_timeout_s (asyncio.wait_for, gate 안쪽)
+- Semaphore(1) 로 MLX 과부하 회피
+- Per-call timeout 25초 (asyncio.wait_for) — MLX hang / fallback Claude API stall 방어
 - JSON 파싱 실패 → 1회 재시도 → 그래도 실패 시 minimal fallback (drop 금지)
 - fallback: topic_label="주요 뉴스 묶음", summary = top member ai_summary[:200]
 """
@@ -13,16 +13,15 @@ from pathlib import Path
 from typing import Any

 from ai.client import parse_json_response
-from core.config import settings
 from core.utils import setup_logger
-from services.search.llm_gate import Priority, acquire_mlx_gate

 logger = setup_logger("digest_summarizer")

-# 2026-06-15: config 단일소스 (구 하드코딩 25s = 빠른 Gemma 기준, Qwen 27B 교체 후 누락).
-LLM_CALL_TIMEOUT = settings.digest_llm_timeout_s
+LLM_CALL_TIMEOUT = 25       # 초. MLX 평균 5초 + tail latency 마진
 FALLBACK_SUMMARY_LIMIT = 200

+_llm_sem = asyncio.Semaphore(1)
+
 _PROMPT_PATH = Path(__file__).resolve().parent.parent.parent / "prompts" / "digest_topic.txt"
 _PROMPT_TEMPLATE: str | None = None

@@ -49,12 +48,8 @@ def build_prompt(selected: list[dict]) -> str:


 async def _try_call_llm(client: Any, prompt: str) -> str:
-    """전역 MLX gate(BACKGROUND) + per-call timeout 으로 감싼 단일 호출.
-
-    영구 룰(llm_gate): Mac mini endpoint 는 단일 게이트 공유, 새 Semaphore 금지.
-    동시성 lever = config.mlx_gate_concurrency. timeout 은 gate 안쪽에서만.
-    """
-    async with acquire_mlx_gate(Priority.BACKGROUND):
+    """Semaphore + per-call timeout 으로 감싼 단일 호출."""
+    async with _llm_sem:
        return await asyncio.wait_for(
            client._call_chat(client.ai.primary, prompt),
            timeout=LLM_CALL_TIMEOUT,
@@ -91,7 +86,7 @@ async def summarize_cluster_with_fallback(
    """
    prompt = build_prompt(selected)

-    for attempt in range(settings.digest_llm_attempts):  # config 단일소스 (기본 2 = 1회 재시도)
+    for attempt in range(2):  # 1회 재시도 포함
        try:
            raw = await _try_call_llm(client, prompt)
        except asyncio.TimeoutError:
@@ -13,7 +13,6 @@
 from __future__ import annotations
 import re
 import hashlib
-import unicodedata
 from dataclasses import dataclass, field

 STRUCTURE_SPLIT_THRESHOLD = 4000
@@ -26,42 +25,7 @@ _ATX = re.compile(r'^(#{1,6})\s+(?P<title>\S.*?)\s*#*\s*$')
 _KO_JANG = re.compile(r'^\s*(?P<title>제\s*\d+\s*장\b.*)$')
 _KO_JEOL = re.compile(r'^\s*(?P<title>제\s*\d+\s*절\b.*)$')
 _KO_JO = re.compile(r'^\s*(?P<title>제\s*\d+\s*조\b.*)$')
-# _ENG: 영문 구조 헤딩(ATX 미사용 문서용). ASME 파트는 보통 ATX(`# PART PG`)로 잡혀 _ENG 의존 낮음.
-# D1: 식별자 뒤가 소문자 문장연속이면("Part III to demonstrate to the satisfaction…") 본문이므로
-# 미탐지 — 가짜 절 차단. 선택 제목은 대문자/괄호/숫자로 시작해야 헤딩 인정(소문자 시작=문장으로 봄).
-# 식별자는 번호/PG/3.31/UHX/A-1 등 (.·- 소수·하이픈 확장 허용).
-_ENG = re.compile(
-    r'^\s*(?P<title>(?:Chapter|Section|Article|Part|PART)\s+'
-    r'[\dIVXLA-Z]+(?:[.\-][\dA-Za-z]+)*'
-    r'(?:\s+[A-Z(\d][^\n]*)?'
-    r')\s*$'
-)
-
-# 코드펜스 경계 (FE outlineAnchors.ts:60 `/^\s{0,3}(```|~~~)/` 와 동일). 펜스 내부 라인은
-# heading 미탐지 — 코드블록 안 '# foo' 가 가짜 절을 만들지 않게(O3).
-_FENCE = re.compile(r'^\s{0,3}(```|~~~)')
-
-
-# ASME 절 식별자 (A-1): UG-79 · PG-27.4.1 · UW-11 · UCS-56 · A-69 · PFT-14
-# (대문자 1~4 + 하이픈 + 숫자[.숫자]*). _detect_heading 의 ATX 분기에서 node_type='clause' 판정에 사용.
-# 한국 법령(제N조)은 _KO_JO 가 별도 처리 — 본 패턴/정제와 무관(무회귀).
-_ASME_CLAUSE = re.compile(r'^[A-Z]{1,4}-\d+(?:\.\d+)*\b')
-
-
-def _clean_label(title: str) -> str:
-    r"""C-4: marker 가 박는 LaTeX/markdown/페이지번호 아티팩트 제거 — 절번호 패턴 매칭의 전처리 겸 표시 라벨 정제.
-    실데이터 예: '$\textbf{PG-20.1 …} \hspace{0.2cm} \textbf{(25)}$' → 'PG-20.1 …' / '(25) **A-69**' → 'A-69'.
-    노이즈 없는 제목(한국 법령·일반 ATX 등)엔 inert(무회귀)."""
-    t = re.sub(r'\\textbf|\\textit|\\mathbf|\\hspace\{[^}]*\}|[${}]|\*\*', '', title)
-    t = re.sub(r'^\s*\(\d+\)\s*', '', t)        # 선두 페이지번호 '(25) '
-    return re.sub(r'\s{2,}', ' ', t).strip()
-
-
-def _utf16_units(s: str) -> int:
-    """JS 문자열 .length(= UTF-16 code unit 수) 와 동일. astral(BMP 밖)=surrogate pair=2 units.
-    FE 의 `raw.length` / `out.slice(off)` 가 UTF-16 code unit 단위라 char_start 도 같은 단위여야 함.
-    len(s.encode('utf-16-le'))//2 = code unit 수 (utf-16-le 는 BOM 미부착)."""
-    return len(s.encode("utf-16-le")) // 2
+_ENG = re.compile(r'^\s*(?P<title>(?:Chapter|Section|Article|Part|PART)\s+[\dIVXLA-Z]+\b.*)$')


@dataclass
@@ -75,9 +39,6 @@ class HierNode:
    text: str
    is_leaf: bool = True
    chunk_content_hash: str = field(default="")
-    # md_content 내 heading 라인 시작 offset(UTF-16 code unit). jump-target(비-window leaf / %_split parent)만
-    # 값 보유; window-child / preamble(title None) = None(점프 타깃 아님, g0-t2/g2-t3).
-    char_start: int | None = None

    def finalize_hash(self):
        self.chunk_content_hash = hashlib.sha256(self.text.encode("utf-8")).hexdigest()
@@ -87,9 +48,7 @@ def _detect_heading(line: str) -> tuple[int, str, str] | None:
    """(level, title, node_type) 또는 None. level 은 상대 깊이."""
    m = _ATX.match(line)
    if m:
-        title = _clean_label(m.group("title").strip())  # C-4: LaTeX/md/페이지번호 정제(전처리)
-        nt = "clause" if _ASME_CLAUSE.match(title) else None  # A-1: ASME 절 식별자(UG-79 등) → clause
-        return (len(m.group(1)), title, nt)
+        return (len(m.group(1)), m.group("title").strip(), None)  # node_type 은 후처리에서
    for pat, lvl, nt in ((_KO_JANG, 1, "chapter"), (_KO_JEOL, 2, "section"),
                          (_KO_JO, 3, "clause"), (_ENG, 1, "chapter")):
        m = pat.match(line)
@@ -98,64 +57,33 @@ def _detect_heading(line: str) -> tuple[int, str, str] | None:
    return None


-def _segment(text: str) -> list[tuple[int, str | None, str | None, str, int | None]]:
-    """heading 경계로 분할 → [(level, title, node_type, segment_text, char_start), ...].
+def _segment(text: str) -> list[tuple[int, str | None, str | None, str]]:
+    """heading 경계로 분할 → [(level, title, node_type, segment_text), ...].

-    라인 모델 = FE outlineAnchors.ts:55-65 와 동일: `text.split('\n')` + UTF-16 code-unit offset +
-    코드펜스 추적(splitlines(keepends=True) 폐기 — JS 와 라인경계 \v\f\x1c… 7종을 다르게 쪼개는 문제 제거).
-    char_start = 그 segment 첫 라인(=heading 라인)의 UTF-16 offset. preamble = None(점프 타깃 아님).
-    node.text 보존(라인모델 변경에 hash-neutral): 그룹을 '\n'.join 하되 마지막 그룹이 아니면 분리용 '\n'
-      을 그 그룹 끝에 되돌려 붙여(= splitlines(keepends) 가 마지막 라인에 \n 을 남기던 동작) 원문과 동일.
-    CR 미strip(CRLF 면 '\r' 잔류 → FE raw.length 와 동일), NFC 무변환.
+    preamble(첫 heading 이전 본문) = (0, None, None, text).
    """
-    raw_lines = text.split("\n")
-    n = len(raw_lines)
-    # 라인별 (offset, heading) 선계산 — 펜스 내부/경계 라인은 heading 미탐지.
-    offs: list[int] = []
-    headings: list[tuple[int, str, str | None] | None] = []
-    off = 0
-    in_fence = False
-    for raw in raw_lines:
-        fence_toggle = bool(_FENCE.match(raw))
-        fenced_here = in_fence or fence_toggle
-        offs.append(off)
-        headings.append(None if fenced_here else _detect_heading(raw))
-        if fence_toggle:
-            in_fence = not in_fence
-        off += _utf16_units(raw) + 1  # '\n'
-
-    # 그룹 경계 = 첫 heading 이전(preamble) + 각 heading 라인. (start_idx, meta) 리스트.
-    first_heading = next((i for i in range(n) if headings[i] is not None), None)
-    starts: list[int] = []
-    metas: list[tuple[int, str | None, str | None] | None] = []
-    if first_heading is None:
-        starts.append(0)
-        metas.append(None)  # 전체 = preamble
-    else:
-        if first_heading > 0:
-            starts.append(0)
-            metas.append(None)
-        for i in range(first_heading, n):
-            h = headings[i]
-            if h is not None:
-                starts.append(i)
-                metas.append((h[0], h[1], h[2]))
-
-    segs: list[tuple[int, str | None, str | None, str, int | None]] = []
-    for gi, s_idx in enumerate(starts):
-        e_idx = starts[gi + 1] if gi + 1 < len(starts) else n
-        seg_text = "\n".join(raw_lines[s_idx:e_idx])
-        if e_idx < n:
-            seg_text += "\n"  # 분리용 '\n' 을 앞 그룹에 귀속(splitlines keepends 동치)
-        meta = metas[gi]
-        if meta is None:
-            if not seg_text.strip():  # 빈 preamble 폐기(기존 동작)
-                continue
-            segs.append((0, None, None, seg_text, None))
+    lines = text.splitlines(keepends=True)
+    segs: list[tuple[int, str | None, str | None, list[str]]] = []
+    cur: tuple[int, str | None, str | None, list[str]] | None = None
+    preamble: list[str] = []
+    for ln in lines:
+        h = _detect_heading(ln.rstrip("\n"))
+        if h:
+            if cur is not None:
+                segs.append(cur)
+            elif preamble and "".join(preamble).strip():
+                segs.append((0, None, None, preamble))
+            cur = (h[0], h[1], h[2], [ln])
        else:
-            lvl, title, nt = meta
-            segs.append((lvl, title, nt, seg_text, offs[s_idx]))
-    return segs
+            if cur is None:
+                preamble.append(ln)
+            else:
+                cur[3].append(ln)
+    if cur is not None:
+        segs.append(cur)
+    elif preamble and "".join(preamble).strip():
+        segs.append((0, None, None, preamble))
+    return [(lvl, title, nt, "".join(body)) for (lvl, title, nt, body) in segs]


 def _window_split(body: str, target: int) -> list[str]:
@@ -211,7 +139,7 @@ def build_hier_tree(
            chain.append(title)
        return " > ".join(chain) if chain else None

-    for lvl, title, nt, body, cstart in segs:
+    for lvl, title, nt, body in segs:
        norm = 0 if lvl == 0 else min(level_map[lvl], max_depth)
        # 부모 = 스택에서 norm 보다 작은 가장 가까운 노드
        while stack and stack[-1][0] >= norm:
@@ -219,11 +147,8 @@ def build_hier_tree(
        parent_idx = stack[-1][1] if stack else None
        idx = len(nodes)
        hp = _heading_path(parent_idx, title)
-        # char_start = 생성 시점 할당(window-split 가 n.text 를 heading 라인으로 truncate 하기 전에 박제).
-        # split-parent 가 돼도 이 값(heading 라인 offset)이 windowed section 단일 jump target 으로 보존된다.
        node = HierNode(idx=idx, parent_idx=parent_idx, level=norm, node_type=nt,
-                        section_title=title, heading_path=hp, text=body, is_leaf=True,
-                        char_start=cstart)
+                        section_title=title, heading_path=hp, text=body, is_leaf=True)
        nodes.append(node)
        if norm > 0:
            stack.append((norm, idx))
@@ -253,17 +178,14 @@ def build_hier_tree(
                n.is_leaf = False
                heading_line = (n.text.splitlines() or [""])[0]
                n.text = heading_line  # 중복 저장 회피 (full body 는 window child 가 보유)
-                n.node_type = (n.node_type or "section") + "_split"  # chapter_split/clause_split/section_split
-                # n.char_start 보존 = windowed section 의 단일 jump target(생성시점 heading offset).
+                n.node_type = (n.node_type or "section") + "_split"
                base_level = min(n.level + 1, max_depth)
                for wtext in wins:
                    ci = len(final)
-                    # window child = char_start None(_window_split 가 whitespace buf 를 drop 해
-                    # char-preserving 이 아니므로 합산 offset 이 거짓; 점프 타깃도 아님, B1/#1).
                    final.append(HierNode(
                        idx=ci, parent_idx=n.idx, level=base_level, node_type="window",
                        section_title=n.section_title, heading_path=n.heading_path,
-                        text=wtext, is_leaf=True, char_start=None))
+                        text=wtext, is_leaf=True))
    for n in final:
        n.finalize_hash()
    return final
@@ -287,24 +209,6 @@ def coverage_stats(text: str, nodes: list[HierNode]) -> dict:
            # 일반 네비: 자식 level > 부모 level 만 보장
            if n.level <= nodes[n.parent_idx].level and nodes[n.parent_idx].level > 0:
                bad_level += 1
-    # char_start O5 검증 (UTF-16 슬라이스 == heading 라인) + NFC telemetry (g2-t4).
-    # 검증은 FE 가 실제 쓰는 방식과 동일: md.encode('utf-16-le')[2*cs:2*(cs+n)].decode == heading_line
-    # (Python code-point 슬라이스 md[cs:cs+n] 가 아님 — astral 시 어긋남).
-    md_u16 = text.encode("utf-16-le")
-    cs_total = cs_verified = 0
-    for n in nodes:
-        if n.char_start is None:
-            continue
-        cs_total += 1
-        first_line = n.text.split("\n", 1)[0]
-        nu = _utf16_units(first_line)
-        seg = md_u16[2 * n.char_start: 2 * (n.char_start + nu)]
-        try:
-            if seg.decode("utf-16-le") == first_line:
-                cs_verified += 1
-        except UnicodeDecodeError:
-            pass
-    non_nfc = 1 if unicodedata.normalize("NFC", text) != text else 0
    return {
        "nodes": len(nodes), "leaves": len(leaves),
        "coverage_ratio": round(leaf_chars / base, 4) if base else 0,
@@ -313,6 +217,4 @@ def coverage_stats(text: str, nodes: list[HierNode]) -> dict:
        "level_dist": {l: sum(1 for n in nodes if n.level == l) for l in sorted({n.level for n in nodes})},
        "leaf_len_min": min((len(n.text) for n in leaves), default=0),
        "leaf_len_max": max((len(n.text) for n in leaves), default=0),
-        "char_start_total": cs_total, "char_start_verified": cs_verified,
-        "non_nfc": non_nfc,
    }
@@ -58,16 +58,16 @@ async def persist_hier_tree(
            INSERT INTO document_chunks
              (doc_id, chunk_index, chunk_type, section_title, heading_path, domain_category,
               text, embedding, source_type, chunker_version, chunk_content_hash,
-               parent_id, level, node_type, is_leaf, in_corpus, char_start)
+               parent_id, level, node_type, is_leaf, in_corpus)
            VALUES (:d, :ci, :ct, :stt, :hp, :dc, :tx,
               cast(cast(:emb AS text) AS vector),
-               :src, :cv, :hash, :pid, :lvl, :nt, :leaf, false, :cs)
+               :src, :cv, :hash, :pid, :lvl, :nt, :leaf, false)
            RETURNING id"""), {
                "d": doc_id, "ci": base + n.idx, "ct": chunk_type,
                "stt": n.section_title, "hp": n.heading_path, "dc": domain_category,
                "tx": n.text, "emb": emb_str, "src": SOURCE_TYPE, "cv": CHUNKER_VERSION,
                "hash": n.chunk_content_hash, "pid": parent_db, "lvl": n.level,
-                "nt": n.node_type, "leaf": n.is_leaf, "cs": n.char_start})
+                "nt": n.node_type, "leaf": n.is_leaf})
        idx_to_dbid[n.idx] = db_id
    await session.commit()

@@ -1,5 +0,0 @@
-"""B-3 논문 수집 트랙 공유 모듈 (plan safety-library-b3-1).
-
-doi    — DOI 정규화·dedup 키·2-Document(holder/parent_doi child) extract_meta 계약 (순수).
-holder — 서지 holder 공유 dedup 조회 (DB).
-"""
@@ -1,141 +0,0 @@
-"""B-3 논문 DOI 코어 — 정규화·dedup 키·2-Document(서지 holder / parent_doi child) 계약.
-
-plan safety-library-b3-1 PR1 (keyless·마이그 0).
-
-핵심 계약(모든 논문 수집기·reconcile·구매 PDF 스탬프가 공유):
- DOI 정규화는 이 단일 함수(normalize_doi) 경유 — **저장=조회 동일 함수**
-  (migration 351 주석 명시, news_collector._normalize_url 의 store=lookup 불변식 선례).
-  같은 논문이 다른 표기(https://doi.org/ vs doi: vs 대문자)로 들어와도 한 holder 로 붕괴.
- dedup 키 = lower(extract_meta #>> '{paper,doi}') — 라이브 partial-unique 인덱스
-  uq_documents_paper_doi(WHERE material_type='paper' AND ... IS NOT NULL)가 강제.
- 2-Document(R2-B1): paper.doi 는 **서지 Document 단일 보유**. OA/구매 전문 PDF 는
-  doi 없이 paper.parent_doi 로 holder 링크(NULL doi 라 인덱스 밖 → 다중행 무충돌).
-  holder 와 child 는 doi/parent_doi 를 **상호 배타**로 가진다.
-"""
-
-import hashlib
-import re
-
-# 소문자화 후 비교하므로 전부 소문자 prefix. 긴 것부터(dx.doi.org 가 doi.org 보다 먼저).
-_DOI_PREFIXES = (
-    "https://dx.doi.org/",
-    "http://dx.doi.org/",
-    "https://doi.org/",
-    "http://doi.org/",
-    "dx.doi.org/",
-    "doi.org/",
-    "doi:",
-)
-
-
-def normalize_doi(raw: str | None) -> str | None:
-    """DOI 정규화 — 소문자 + URL/doi: prefix 제거 + 양끝 공백·잡음 제거. 단일 함수(저장=조회).
-
-    유효 DOI(10. 으로 시작)가 아니면 None. 저장측·조회측·dedup 키 생성이 모두 이 함수를
-    공유해야 dedup 이 성립한다(raw 를 그대로 저장하고 정규화로 조회하면 영구 미스).
-    """
-    if not raw:
-        return None
-    s = raw.strip().lower()
-    for p in _DOI_PREFIXES:
-        if s.startswith(p):
-            s = s[len(p):]
-            break
-    s = s.strip()
-    # 인용문 끝 잡음(마침표/쉼표/세미콜론)만 제거. 괄호 '()' 는 DOI 일부일 수 있어 보존한다
-    # (예: 10.1016/s0010-8650(00)80003-2) — 과삭제는 서로 다른 논문을 한 holder 로 병합하는
-    # 데이터 손상이라 near-dup(과소삭제)보다 위험. API 소스(OpenAlex/arXiv)의 doi 는 이미 깨끗.
-    s = s.rstrip(".,;")
-    if not s.startswith("10."):
-        return None
-    return s
-
-
-# arXiv id: 신형 'YYMM.NNNNN'(+vN) 또는 구형 'archive(.SUBJ)/NNNNNNN'. 'arXiv:' 접두 흡수.
-_ARXIV_ID_RE = re.compile(
-    r"arxiv:\s*([a-z\-]+(?:\.[a-z]{2})?/\d{7}|\d{4}\.\d{4,5})(v\d+)?", re.IGNORECASE
-)
-
-
-def parse_arxiv_id(text: str | None) -> str | None:
-    """본문/제목에서 arXiv id(versionless) 추출. 없으면 None. 레거시 reconcile 의 입력."""
-    if not text:
-        return None
-    m = _ARXIV_ID_RE.search(text)
-    return m.group(1) if m else None
-
-
-def arxiv_doi(arxiv_id: str | None) -> str | None:
-    """arXiv DataCite DOI = 10.48550/arxiv.{id} (정규화). 저널 DOI 없는 프리프린트의 canonical
-    paper.doi 통일 키 — OpenAlex 가 프리프린트에 동일 DOI 부여(실측 확인). 모든 수집기·reconcile 가
-    같은 함수로 같은 DOI 를 써야 교차소스 dedup 이 성립."""
-    if not arxiv_id:
-        return None
-    return normalize_doi(f"10.48550/arXiv.{arxiv_id}")
-
-
-_DOI_IN_TEXT_RE = re.compile(r"10\.\d{4,9}/[^\s\"'<>]+", re.IGNORECASE)
-
-
-def parse_doi_from_text(text: str | None) -> str | None:
-    """본문에서 첫 DOI 추출(정규화). 구매 PDF 의 paper.parent_doi 링크용(PDF 구조 무관 — 전체 스캔).
-    DOI 끝 구두점은 normalize_doi 가 정리. 없으면 None."""
-    if not text:
-        return None
-    m = _DOI_IN_TEXT_RE.search(text)
-    return normalize_doi(m.group(0)) if m else None
-
-
-def paper_doi_hash(normalized_doi: str) -> str:
-    """서지 holder 의 Document.file_hash — sha256('paper|{doi}')[:32].
-
-    statute 의 'statute|{jur}|{native_id}|{version_key}' 다중부 키 선례를 따른다.
-    인자는 normalize_doi() 출력(정규화 완료값)이어야 한다 — raw 를 넣으면 dedup 이 깨진다.
-    """
-    if not normalized_doi:
-        raise ValueError("paper_doi_hash 는 정규화된 DOI 필요 (normalize_doi 먼저)")
-    return hashlib.sha256(f"paper|{normalized_doi}".encode()).hexdigest()[:32]
-
-
-def read_paper_doi(extract_meta: dict | None) -> str | None:
-    """holder 의 정규화 DOI 읽기 — 인덱스 식 lower(extract_meta #>> '{paper,doi}') 의 조회측 거울.
-
-    방어적 재정규화(이미 정규화돼 저장되지만 레거시·외부 주입 대비).
-    """
-    if not extract_meta:
-        return None
-    paper = extract_meta.get("paper")
-    if not isinstance(paper, dict):
-        return None
-    return normalize_doi(paper.get("doi"))
-
-
-def with_paper_doi(extract_meta: dict | None, normalized_doi: str) -> dict:
-    """서지 holder 의 extract_meta 에 paper.doi 주입 (merge-safe, 타 키 보존).
-
-    holder 전용 — parent_doi 는 제거(상호 배타). 반환값은 새 dict(입력 비변경).
-    """
-    if not normalized_doi:
-        raise ValueError("with_paper_doi 는 정규화된 DOI 필요")
-    meta = dict(extract_meta or {})
-    paper = dict(meta.get("paper") or {})
-    paper["doi"] = normalized_doi
-    paper.pop("parent_doi", None)
-    meta["paper"] = paper
-    return meta
-
-
-def with_parent_doi(extract_meta: dict | None, parent_normalized_doi: str) -> dict:
-    """child(OA/구매 전문 PDF)의 extract_meta 에 paper.parent_doi 주입 (merge-safe, 타 키 보존).
-
-    child 는 paper.doi 를 갖지 않는다(NULL → partial-unique 인덱스 밖, 2-Document 무충돌).
-    반환값은 새 dict(입력 비변경).
-    """
-    if not parent_normalized_doi:
-        raise ValueError("with_parent_doi 는 정규화된 DOI 필요")
-    meta = dict(extract_meta or {})
-    paper = dict(meta.get("paper") or {})
-    paper["parent_doi"] = parent_normalized_doi
-    paper.pop("doi", None)
-    meta["paper"] = paper
-    return meta
@@ -1,39 +0,0 @@
-"""B-3 논문 서지 holder 공유 dedup 조회.
-
-모든 논문 수집기(OpenAlex/arXiv/KoreaScience/J-STAGE)·reconcile·구매 PDF 스탬프가
-ingest 전 이 함수로 holder 존재를 확인한다(있으면 skip 또는 child 링크).
-
- 조회 키 = lower(extract_meta #>> '{paper,doi}') == normalize_doi(...) — 라이브 partial-unique
-  인덱스 uq_documents_paper_doi 와 동일 식(인덱스 사용).
- .scalars().first() — 교차게시·다중 landing-page 로 2행 이상 매칭 시 MultipleResultsFound
-  raise 방지(scalar_one_or_none 금지, 2026-06 BBC 수집 중단 선례 / news_collector 동일 규율).
- 서지 holder Document 의 **생성**은 각 수집기/스탬프 경로가 소유한다(초록 signal 문서 vs 구매
-  최소 holder 로 shape 가 다름). 이 모듈은 dedup 조회만 공유한다.
-
-DB 조회라 본 모듈은 PR2(arXiv 실수집)에서 라이브 검증한다 — PR1 단위 테스트 대상은 doi.py(순수).
-"""
-
-from sqlalchemy import func, select
-
-from models.document import Document
-from services.papers.doi import normalize_doi
-
-# 인덱스 식과 동일: lower(extract_meta #>> '{paper,doi}')
-_DOI_EXPR = func.lower(Document.extract_meta[("paper", "doi")].astext)
-
-
-async def find_paper_holder(session, raw_or_normalized_doi):
-    """정규화 DOI 로 서지 holder Document 조회. 없으면 None.
-
-    인자는 raw 든 정규화든 받아 normalize_doi 로 통일(저장=조회 동일 함수 보장).
-    """
-    doi = normalize_doi(raw_or_normalized_doi)
-    if not doi:
-        return None
-    result = await session.execute(
-        select(Document)
-        .where(Document.material_type == "paper", _DOI_EXPR == doi,
-               Document.deleted_at.is_(None))
-        .limit(1)
-    )
-    return result.scalars().first()
@@ -1,573 +0,0 @@
-"""처리 머신 보드 + ETA 집계 (plan ds-processing-ui-6an, 안2+안5/6).
-
-GET /api/queue/overview 의 집계 로직. 모든 수치는 기존 processing_queue /
-documents 컬럼에서 라이브 계산 — 신규 테이블/마이그레이션 0 (HARD 제약).
-
-구조: SQL 수집부(build_overview 내부 5쿼리)와 판정부(순수 함수)를 분리.
-판정부(rows_to_* / build_machines / build_summarize_eta / build_trend /
-build_totals / compute_eta_minutes)는 DB 없이 단위테스트 가능.
-
-귀속 규칙 (단일 진실):
- stage→machine 정적 맵: gpu = extract/embed/chunk/markdown/preview/thumbnail/
-  fulltext/stt · macmini = classify/summarize · macbook = deep_summary
-  (단, settings.ai.deep 부재 시 deep_summary 도 macmini 귀속).
- summarize 는 풀(pool): pending/processing/failed 는 macmini 귀속이되, 완료
-  실적(done_*)은 documents.ai_model_version 조인으로 분리 — 'qwen-macbook'
-  이면 macbook 실적, 아니면 macmini 실적.
- deferred_pending(payload.deferred_until 미래)은 macbook 카드 귀속
-  (보류 = 맥북 불가 신호).
-"""
-
-from datetime import datetime, timedelta
-from posixpath import basename
-from zoneinfo import ZoneInfo
-
-from sqlalchemy import bindparam, text
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from core.config import settings
-
-KST = ZoneInfo("Asia/Seoul")
-
-# 내부 판별용 alias — 응답에 raw 모델명 노출 금지, 머신 label 만 노출.
-_MACBOOK_MODEL_ALIAS = "qwen-macbook"
-
-# stage→machine 정적 맵 재료 (선언 순서 = 카드 stages 표시 순서)
-_GPU_STAGES = (
-    "extract", "embed", "chunk", "markdown",
-    "preview", "thumbnail", "fulltext", "stt",
-)
-_MACMINI_STAGES = ("classify", "summarize")
-_MACBOOK_STAGES = ("deep_summary",)
-_STAGE_ORDER = _GPU_STAGES + _MACMINI_STAGES + _MACBOOK_STAGES
-
-_MACHINE_KEYS = ("gpu", "macmini", "macbook")
-_MACHINE_LABELS = {
-    "gpu": "GPU 서버",
-    "macmini": "맥미니",
-    "macbook": "맥북 M5 Max",
-}
-
-# 머신 카드당 current 표시 상한
-_CURRENT_LIMIT = 2
-
-
-def stage_machine_map(deep_enabled: bool) -> dict[str, str]:
-    """stage → machine key 맵. deep 슬롯 부재 시 deep_summary 는 macmini 귀속."""
-    mapping: dict[str, str] = {}
-    for s in _GPU_STAGES:
-        mapping[s] = "gpu"
-    for s in _MACMINI_STAGES:
-        mapping[s] = "macmini"
-    for s in _MACBOOK_STAGES:
-        mapping[s] = "macbook" if deep_enabled else "macmini"
-    return mapping
-
-
-def _zero_stage() -> dict:
-    return {
-        "pending": 0, "processing": 0, "failed": 0,
-        "done_1h": 0, "done_today": 0, "done_15m": 0,
-        "deferred_pending": 0, "created_1h": 0, "oldest_pending_at": None,
-    }
-
-
-def rows_to_stage_stats(rows) -> dict[str, dict]:
-    """stage×status 집계 쿼리 행 → {stage: {pending, ..., created_1h}} 변환."""
-    stats: dict[str, dict] = {}
-    for row in rows:
-        stats[row[0]] = {
-            "pending": int(row[1] or 0),
-            "processing": int(row[2] or 0),
-            "failed": int(row[3] or 0),
-            "done_1h": int(row[4] or 0),
-            "done_today": int(row[5] or 0),
-            "done_15m": int(row[6] or 0),
-            "deferred_pending": int(row[7] or 0),
-            "created_1h": int(row[8] or 0),
-            "oldest_pending_at": row[9] if len(row) > 9 else None,
-        }
-    return stats
-
-
-def rows_to_summarize_split(rows) -> dict[str, dict]:
-    """summarize 완료 실적 분리 쿼리 행 → {"macbook"|"macmini": {done_*}}.
-
-    is_macbook = documents.ai_model_version 이 'qwen-macbook' 인지 (내부 판별 전용).
-    """
-    split = {
-        "macbook": {"done_1h": 0, "done_today": 0, "done_15m": 0},
-        "macmini": {"done_1h": 0, "done_today": 0, "done_15m": 0},
-    }
-    for row in rows:
-        key = "macbook" if row[0] else "macmini"
-        split[key]["done_1h"] += int(row[1] or 0)
-        split[key]["done_today"] += int(row[2] or 0)
-        split[key]["done_15m"] += int(row[3] or 0)
-    return split
-
-
-def display_title(row: dict) -> str:
-    """표시용 제목 — title > original_filename > file_path basename > 문서 id."""
-    if row.get("title"):
-        return row["title"]
-    if row.get("original_filename"):
-        return row["original_filename"]
-    if row.get("file_path"):
-        return basename(row["file_path"].rstrip("/"))
-    return f"문서 #{row['document_id']}"
-
-
-def build_machines(
-    stage_stats: dict[str, dict],
-    summarize_split: dict[str, dict],
-    current_rows: list[dict],
-    *,
-    deep_enabled: bool,
-) -> list[dict]:
-    """머신 카드 3장 (gpu / macmini / macbook) 구성 — 귀속 규칙의 판정부."""
-    smap = stage_machine_map(deep_enabled)
-
-    def g(stage: str, field: str) -> int:
-        return stage_stats.get(stage, {}).get(field, 0)
-
-    # current 귀속: processing 행을 머신별 최대 2건 (summarize processing → macmini)
-    current_by_machine: dict[str, list[dict]] = {k: [] for k in _MACHINE_KEYS}
-    for row in current_rows:
-        machine = smap.get(row["stage"])
-        if machine and len(current_by_machine[machine]) < _CURRENT_LIMIT:
-            current_by_machine[machine].append({
-                "document_id": row["document_id"],
-                "title": display_title(row),
-                "stage": row["stage"],
-            })
-
-    machines = []
-    for key in _MACHINE_KEYS:
-        stages = [s for s in _STAGE_ORDER if smap[s] == key]
-
-        pending = sum(g(s, "pending") for s in stages)
-        processing = sum(g(s, "processing") for s in stages)
-        failed = sum(g(s, "failed") for s in stages)
-
-        # 완료 실적: summarize 는 풀이라 stage 합산에서 제외하고 split 로 귀속
-        done_1h = sum(g(s, "done_1h") for s in stages if s != "summarize")
-        done_today = sum(g(s, "done_today") for s in stages if s != "summarize")
-        done_15m = sum(g(s, "done_15m") for s in stages if s != "summarize")
-        if key in summarize_split:
-            done_1h += summarize_split[key]["done_1h"]
-            done_today += summarize_split[key]["done_today"]
-            done_15m += summarize_split[key]["done_15m"]
-
-        # 보류 백오프 = 맥북 불가 신호 → macbook 카드 귀속 (deep 슬롯 유무 무관)
-        deferred_pending = (
-            g("summarize", "deferred_pending") + g("deep_summary", "deferred_pending")
-            if key == "macbook" else 0
-        )
-
-        # state 판정 — 우선순위: 가동 > 보류 > 대기 (사용자 피드백 2026-06-11).
-        # 일하고 있으면(처리 중 또는 최근 15분 완료) 백오프 잔여가 있어도 "가동" —
-        # 보류 건수는 카드의 deferred_pending 라인이 따로 보여준다. "보류" 칩은
-        # 실제로 일이 멈춰 있고 백오프만 쌓인 상태(sleep/불가 지속)에서만.
-        if processing > 0 or done_15m > 0:
-            state = "active"
-        elif key == "macbook" and deferred_pending > 0:
-            state = "deferred"
-        else:
-            state = "idle"
-
-        machines.append({
-            "key": key,
-            "label": _MACHINE_LABELS[key],
-            "state": state,
-            "stages": stages,
-            "pending": pending,
-            "processing": processing,
-            "failed": failed,
-            "done_1h": done_1h,
-            "done_today": done_today,
-            "deferred_pending": deferred_pending,
-            "current": current_by_machine[key],
-        })
-    return machines
-
-
-def compute_eta_minutes(pending: int, done_1h: int, inflow_1h: int) -> int | None:
-    """ETA(분) = 순소화율 기반. done > inflow 일 때만 산출, 아니면 None (소화 불가)."""
-    if done_1h > inflow_1h:
-        return round(pending / (done_1h - inflow_1h) * 60)
-    return None
-
-
-def build_summarize_eta(stage_stats: dict[str, dict]) -> dict:
-    """summarize 풀 ETA — pending 은 보류(deferred) 포함 총수."""
-    s = stage_stats.get("summarize", _zero_stage())
-    pending = s["pending"]
-    done_rate = s["done_1h"]
-    inflow_rate = s["created_1h"]
-    return {
-        "pending": pending,
-        "done_rate_1h": done_rate,
-        "inflow_rate_1h": inflow_rate,
-        "eta_minutes": compute_eta_minutes(pending, done_rate, inflow_rate),
-    }
-
-
-def build_summarize_by_machine(summarize_split: dict[str, dict]) -> dict:
-    """summarize 머신별 완료 실적 분담 (macmini vs macbook) — 보드 레인의
-    오프로드 가시화용. rows_to_summarize_split 이 이미 만든 값을 응답 형태로
-    투영(done_1h/done_today 만, done_15m 은 내부 state 판정 전용이라 제외)."""
-    def m(key: str) -> dict:
-        s = summarize_split.get(key, {})
-        return {"done_1h": int(s.get("done_1h", 0)), "done_today": int(s.get("done_today", 0))}
-    return {"macmini": m("macmini"), "macbook": m("macbook")}
-
-
-def build_trend(
-    inflow_buckets: dict[str, int],
-    done_buckets: dict[str, int],
-    now_kst: datetime,
-) -> list[dict]:
-    """summarize 24h 추이 — KST 시간 버킷 24개 (오래된 것부터, 빈 버킷 0).
-
-    버킷 key = "YYYY-MM-DD HH:00" (KST). SQL to_char 출력과 동일 포맷.
-    """
-    base = now_kst.replace(minute=0, second=0, microsecond=0)
-    trend = []
-    for i in range(23, -1, -1):
-        bucket = base - timedelta(hours=i)
-        key = bucket.strftime("%Y-%m-%d %H:00")
-        trend.append({
-            "hour": bucket.strftime("%H:00"),
-            "inflow": inflow_buckets.get(key, 0),
-            "done": done_buckets.get(key, 0),
-        })
-    return trend
-
-
-def build_stages(stage_stats: dict[str, dict], now=None) -> list[dict]:
-    """단계별 현황 행 — '단계 상세' 패널용 (2026-06-11 사용자 피드백: 완료가 보여야 한다).
-
-    파이프라인 순서 유지, 미지 stage 는 뒤에. 숨김/강조 판단은 FE 몫 — 여기선 사실만.
-    oldest_pending_age_sec = 가장 오래된 pending 의 경과 초 (pending 없으면 None).
-    """
-    from datetime import datetime, timezone
-    now = now or datetime.now(timezone.utc)
-    extra = [s for s in stage_stats if s not in _STAGE_ORDER]
-    rows = []
-    for stage in [*_STAGE_ORDER, *extra]:
-        st = stage_stats.get(stage) or _zero_stage()
-        oldest = st.get("oldest_pending_at")
-        age = None
-        if oldest is not None:
-            if oldest.tzinfo is None:
-                oldest = oldest.replace(tzinfo=timezone.utc)
-            age = max(0, int((now - oldest).total_seconds()))
-        rows.append({
-            "stage": stage,
-            "pending": st["pending"],
-            "processing": st["processing"],
-            "failed": st["failed"],
-            "done_1h": st["done_1h"],
-            "created_1h": st["created_1h"],
-            "done_today": st["done_today"],
-            "oldest_pending_age_sec": age,
-        })
-    return rows
-
-
-def build_totals(stage_stats: dict[str, dict]) -> dict:
-    """전 stage 합계."""
-    return {
-        "pending": sum(s["pending"] for s in stage_stats.values()),
-        "processing": sum(s["processing"] for s in stage_stats.values()),
-        "failed": sum(s["failed"] for s in stage_stats.values()),
-    }
-
-
-def compose_overview(
-    stage_stats: dict[str, dict],
-    summarize_split: dict[str, dict],
-    inflow_buckets: dict[str, int],
-    done_buckets: dict[str, int],
-    current_rows: list[dict],
-    *,
-    deep_enabled: bool,
-    now_kst: datetime,
-) -> dict:
-    """수집된 통계 → 응답 dict (계약 shape). 순수 함수 — DB 불요."""
-    return {
-        "machines": build_machines(
-            stage_stats, summarize_split, current_rows, deep_enabled=deep_enabled
-        ),
-        "stages": build_stages(stage_stats),
-        "summarize_eta": build_summarize_eta(stage_stats),
-        "summarize_by_machine": build_summarize_by_machine(summarize_split),
-        "trend_24h": build_trend(inflow_buckets, done_buckets, now_kst),
-        "totals": build_totals(stage_stats),
-    }
-
-
-# ─── SQL 수집부 (총 5쿼리) ────────────────────────────────────────────────────
-
-# 1) stage×status 집계 + 시간창 완료/유입 + 보류 (1방)
-_STAGE_STATS_SQL = """
-    SELECT
-        stage,
-        COUNT(*) FILTER (WHERE status = 'pending')                          AS pending,
-        COUNT(*) FILTER (WHERE status = 'processing')                       AS processing,
-        COUNT(*) FILTER (WHERE status = 'failed')                           AS failed,
-        COUNT(*) FILTER (WHERE status = 'completed'
-                           AND completed_at > NOW() - INTERVAL '1 hour')    AS done_1h,
-        COUNT(*) FILTER (WHERE status = 'completed'
-                           AND completed_at > :kst_midnight)                AS done_today,
-        COUNT(*) FILTER (WHERE status = 'completed'
-                           AND completed_at > NOW() - INTERVAL '15 minutes') AS done_15m,
-        COUNT(*) FILTER (WHERE status = 'pending'
-                           AND payload ->> 'deferred_until' IS NOT NULL
-                           AND (payload ->> 'deferred_until')::timestamptz > NOW())
-                                                                            AS deferred_pending,
-        COUNT(*) FILTER (WHERE created_at > NOW() - INTERVAL '1 hour')      AS created_1h,
-        MIN(created_at) FILTER (WHERE status = 'pending')                    AS oldest_pending_at
-    FROM processing_queue
-    GROUP BY stage
-"""
-
-# 2) summarize 풀 완료 실적 분리 (documents.ai_model_version 조인, 1방)
-#    스캔 하한 = 오늘 0시(KST)와 1h 전 중 더 이른 시각 (자정 직후 1h 창 보전).
-_SUMMARIZE_SPLIT_SQL = """
-    SELECT
-        COALESCE(d.ai_model_version = :macbook_alias, false)                 AS is_macbook,
-        COUNT(*) FILTER (WHERE q.completed_at > NOW() - INTERVAL '1 hour')   AS done_1h,
-        COUNT(*) FILTER (WHERE q.completed_at > :kst_midnight)               AS done_today,
-        COUNT(*) FILTER (WHERE q.completed_at > NOW() - INTERVAL '15 minutes') AS done_15m
-    FROM processing_queue q
-    JOIN documents d ON d.id = q.document_id
-    WHERE q.stage = 'summarize'
-      AND q.status = 'completed'
-      AND q.completed_at > LEAST(:kst_midnight, NOW() - INTERVAL '1 hour')
-    GROUP BY 1
-"""
-
-# 3/4) summarize 24h 추이 — KST 시간 버킷 (inflow/done 각 1방)
-_TREND_INFLOW_SQL = """
-    SELECT to_char(date_trunc('hour', created_at AT TIME ZONE 'Asia/Seoul'),
-                   'YYYY-MM-DD HH24:00')                                     AS bucket,
-           COUNT(*)                                                          AS n
-    FROM processing_queue
-    WHERE stage = 'summarize'
-      AND created_at > NOW() - INTERVAL '24 hours'
-    GROUP BY 1
-"""
-
-_TREND_DONE_SQL = """
-    SELECT to_char(date_trunc('hour', completed_at AT TIME ZONE 'Asia/Seoul'),
-                   'YYYY-MM-DD HH24:00')                                     AS bucket,
-           COUNT(*)                                                          AS n
-    FROM processing_queue
-    WHERE stage = 'summarize'
-      AND status = 'completed'
-      AND completed_at > NOW() - INTERVAL '24 hours'
-    GROUP BY 1
-"""
-
-# 5) processing 행 + 표시용 제목 재료 (1방 — 머신별 2건 슬라이스는 판정부에서)
-_CURRENT_SQL = """
-    SELECT q.stage, q.document_id, d.title, d.original_filename, d.file_path
-    FROM processing_queue q
-    JOIN documents d ON d.id = q.document_id
-    WHERE q.status = 'processing'
-    ORDER BY q.started_at DESC NULLS LAST
-    LIMIT 50
-"""
-
-
-async def build_overview(session: AsyncSession) -> dict:
-    """5쿼리 수집 → compose_overview 판정 → 응답 dict."""
-    now_kst = datetime.now(KST)
-    kst_midnight = now_kst.replace(hour=0, minute=0, second=0, microsecond=0)
-    deep_enabled = settings.ai is not None and settings.ai.deep is not None
-
-    stage_rows = (
-        await session.execute(text(_STAGE_STATS_SQL), {"kst_midnight": kst_midnight})
-    ).all()
-    split_rows = (
-        await session.execute(
-            text(_SUMMARIZE_SPLIT_SQL),
-            {"kst_midnight": kst_midnight, "macbook_alias": _MACBOOK_MODEL_ALIAS},
-        )
-    ).all()
-    inflow_rows = (await session.execute(text(_TREND_INFLOW_SQL))).all()
-    done_rows = (await session.execute(text(_TREND_DONE_SQL))).all()
-    current_result = (await session.execute(text(_CURRENT_SQL))).all()
-
-    current_rows = [
-        {
-            "stage": row[0],
-            "document_id": row[1],
-            "title": row[2],
-            "original_filename": row[3],
-            "file_path": row[4],
-        }
-        for row in current_result
-    ]
-
-    result = compose_overview(
-        rows_to_stage_stats(stage_rows),
-        rows_to_summarize_split(split_rows),
-        {row[0]: int(row[1]) for row in inflow_rows},
-        {row[0]: int(row[1]) for row in done_rows},
-        current_rows,
-        deep_enabled=deep_enabled,
-        now_kst=now_kst,
-    )
-    # 큐 밖 관리 스크립트(백필 등) = background_jobs (migration 357). 테이블 부재 시 graceful([]).
-    result["background_jobs"] = await _fetch_background_jobs(session)
-    return result
-
-
-# kind -> 처리 머신 (보드 머신 카드 귀속용). 미상 kind = gpu(오케스트레이션 호스트).
-_BG_JOB_MACHINE = {
-    "global_digest": "macmini",
-    "morning_briefing": "macmini",
-    "section_summary": "macmini",
-    "hier_backfill": "gpu",
-    "hier_redecompose": "gpu",
-}
-
-
-_BACKGROUND_JOBS_SQL = """
-    SELECT id, kind, label, state, processed, total,
-           EXTRACT(EPOCH FROM (now() - started_at))::int AS elapsed_sec,
-           (state = 'running' AND updated_at < now() - interval '5 minutes') AS stale,
-           error
-    FROM background_jobs
-    WHERE state = 'running' OR finished_at > now() - interval '6 hours'
-    ORDER BY (state = 'running') DESC, started_at DESC
-    LIMIT 20
-"""
-
-
-async def _fetch_background_jobs(session: AsyncSession) -> list[dict]:
-    """running + 최근 6h 완료 background_jobs. 테이블 없거나 오류면 [] (보드 무영향).
-
-    요청 세션과 **별도 connection**으로 조회한다 — 테이블 부재(마이그 357 미적용 등) 시
-    SELECT 실패가 요청 세션의 트랜잭션을 오염시키지 않도록 물리적으로 분리(실패 시 그
-    임시 connection만 폐기). 관측은 부가 기능이라 보드 본체를 절대 깨면 안 된다.
-    """
-    try:
-        async with session.bind.connect() as conn:  # 풀에서 독립 connection
-            rows = (await conn.execute(text(_BACKGROUND_JOBS_SQL))).mappings().all()
-    except Exception:  # noqa: BLE001 — 관측 부가, 보드 본체 보호
-        return []
-    return [
-        {
-            "id": r["id"], "kind": r["kind"], "label": r["label"], "state": r["state"],
-            "processed": int(r["processed"] or 0), "total": r["total"],
-            "elapsed_sec": int(r["elapsed_sec"] or 0), "stale": bool(r["stale"]),
-            "error": r["error"],
-            "machine": _BG_JOB_MACHINE.get(r["kind"], "gpu"),
-        }
-        for r in rows
-    ]
-
-
-# ─── 실패 처리 (plan ds-board-engines-1) ─────────────────────────────────────
-# 실패 = 자동 재시도(max_attempts=3) 소진 후 영구 정지 상태. 여기 함수들은
-# 사용자 명시 조치 전용 — 자동 호출 경로 없음 (보드 실패 드로어가 유일 호출자).
-
-# 실패 행은 completed_at 이 비어 있을 수 있어(소비자 실패 경로가 미기록)
-# started_at 을 시각 fallback 으로 쓴다.
-_FAILED_LIST_SQL = """
-    SELECT q.id, q.stage, q.document_id, q.attempts, q.max_attempts,
-           q.error_message,
-           COALESCE(q.completed_at, q.started_at)                 AS failed_at,
-           d.title, d.original_filename, d.file_path
-    FROM processing_queue q
-    JOIN documents d ON d.id = q.document_id
-    WHERE q.status = 'failed'
-    ORDER BY q.stage, COALESCE(q.completed_at, q.started_at) DESC NULLS LAST
-    LIMIT 300
-"""
-
-# 재시도: failed → pending (attempts 리셋 = 자동 재시도 3회 새로 부여).
-# error_message 는 감사용으로 보존 — 성공 시 완료 행에 남아도 무해.
-# uq_queue_active((doc,stage) pending/processing 부분 유니크)와 충돌하는 행 —
-# 같은 문서·단계가 이미 재enqueue 된 경우 — 는 건드리지 않고 건수만 보고.
-_RETRY_SQL = """
-    UPDATE processing_queue q
-    SET status = 'pending', attempts = 0,
-        started_at = NULL, completed_at = NULL
-    WHERE q.id IN :ids
-      AND q.status = 'failed'
-      AND NOT EXISTS (
-        SELECT 1 FROM processing_queue p
-        WHERE p.document_id = q.document_id
-          AND p.stage = q.stage
-          AND p.status IN ('pending', 'processing')
-          AND p.id <> q.id
-      )
-    RETURNING q.id
-"""
-
-# 건너뛰기: failed → completed + payload 마킹 (감사 추적).
-# enqueue_next_stage 는 의도적으로 호출하지 않는다 — 실패 문서(빈 텍스트 등)가
-# 하류 단계로 흘러가는 것 방지. 후속 단계가 필요하면 재시도가 정상 경로.
-_SKIP_SQL = """
-    UPDATE processing_queue
-    SET status = 'completed', completed_at = NOW(),
-        payload = COALESCE(payload, '{}'::jsonb)
-                  || jsonb_build_object('skipped_by_user', true,
-                                        'skipped_at', NOW()::text)
-    WHERE id IN :ids AND status = 'failed'
-    RETURNING id
-"""
-
-
-async def fetch_failed_items(session: AsyncSession) -> list[dict]:
-    """영구 실패 행 목록 (문서 제목 포함, 최대 300건)."""
-    rows = (await session.execute(text(_FAILED_LIST_SQL))).all()
-    return [
-        {
-            "id": r[0],
-            "stage": r[1],
-            "document_id": r[2],
-            "attempts": int(r[3] or 0),
-            "max_attempts": int(r[4] or 0),
-            "error_message": r[5],
-            "failed_at": r[6],
-            "title": display_title({
-                "document_id": r[2],
-                "title": r[7],
-                "original_filename": r[8],
-                "file_path": r[9],
-            }),
-        }
-        for r in rows
-    ]
-
-
-async def retry_failed(session: AsyncSession, ids: list[int]) -> dict:
-    """failed → pending 복귀. not_retried = active 충돌 + 이미 failed 아님."""
-    unique_ids = list(set(ids))
-    stmt = text(_RETRY_SQL).bindparams(bindparam("ids", expanding=True))
-    retried = (await session.execute(stmt, {"ids": unique_ids})).all()
-    await session.commit()
-    return {
-        "requested": len(unique_ids),
-        "retried": len(retried),
-        "not_retried": len(unique_ids) - len(retried),
-    }
-
-
-async def skip_failed(session: AsyncSession, ids: list[int]) -> dict:
-    """failed → completed(건너뛰기 마킹). 후속 단계 연쇄 없음."""
-    unique_ids = list(set(ids))
-    stmt = text(_SKIP_SQL).bindparams(bindparam("ids", expanding=True))
-    skipped = (await session.execute(stmt, {"ids": unique_ids})).all()
-    await session.commit()
-    return {
-        "requested": len(unique_ids),
-        "skipped": len(skipped),
-        "not_skipped": len(unique_ids) - len(skipped),
-    }
@@ -102,7 +102,7 @@ async def classify(
        # "MLX primary 호출 경로는 예외 없이 gate 획득 필수".
        async with acquire_mlx_gate(Priority.FOREGROUND):
            async with asyncio.timeout(LLM_TIMEOUT_MS / 1000):
-                raw = await client.call_classifier(prompt)
+                raw = await client._request(settings.ai.classifier, prompt)
        _failure_count = 0
    except asyncio.TimeoutError:
        _failure_count += 1
@@ -1,6 +1,6 @@
 """Time-aware retrieval freshness decay (PR-RAG-Time-1).

-뉴스(source_channel='news') / 재해사례(material_type='incident', KOSHA) 도메인은
+뉴스(source_channel='news') / 법령 알림(source_channel='law_monitor') 도메인은
 시간이 중요한 문서. 단순 relevance score 만으로는 오래된 문서가 상위에 머물러
 검색 품질이 떨어짐. 본 모듈은 reranker 이후 final score 합성 단계에서
 soft multiplier 로 시간 가중치 적용. 삭제는 없음 — ranking 만 demote.
@@ -9,10 +9,9 @@ soft multiplier 로 시간 가중치 적용. 삭제는 없음 — ranking 만 de
 - reranker = 의미 관련도, freshness decay = 운영 정책. 두 단계 분리 유지.
 - floor 0.7 (multiplier 가 0.7 미만으로 안 떨어짐) — 오래되어도 죽지 않음.
 - 일반 업로드 / 학습 자료 / KGS Code 원문 / ai_drafted 는 비적용 (no-op).
- ★법령(law)은 C-1 후속에서 freshness 제외 — 현행성은 version_status(B-1 버전체인)가 처리.

 published_date 컬럼이 documents 에 없음 → created_at(수집 시점) 을 임시 proxy.
-news/KOSHA 워커가 수집 즉시 indexing 하므로 created_at ≈ published_date.
+news/law_monitor 워커가 수집 즉시 indexing 하므로 created_at ≈ published_date.
 정확도 향상은 후속 PR (worker 가 published_date 메타 채우기) 로 분리.
 """

@@ -33,10 +32,10 @@ if TYPE_CHECKING:
 # ─── Policy ────────────────────────────────────────────────────────

 # half-life (일). 90 일: 한 달 ~0.79 / 6개월 ~0.25.
-# C-1 후속(2026-06-13): law_365d 폐기 — 법령 현행성은 version_status(B-1 버전체인)가 처리,
-#   age-decay 는 current 법령을 부당 강등(의도 변경 기록). 재해사례(incident)는 news_90d 흡수.
+# 365 일: 1년 ~0.5 / 3년 ~0.13.
 HALF_LIFE_DAYS: dict[str, int] = {
    "news_90d": 90,
+    "law_365d": 365,
 }

 # soft multiplier — final = base * (FLOOR + (1-FLOOR) * decay).
@@ -53,35 +52,32 @@ class _DocMeta:
    source_channel: str | None
    content_origin: str | None
    created_at: datetime | None
-    material_type: str | None = None


 def freshness_policy(meta: _DocMeta | None) -> str | None:
    """문서 메타 → freshness 정책 이름 또는 None (no-op).

    적용:
-      - material_type='incident' (KOSHA 재해사례/사망사고) → news_90d (C-1 후속 흡수, 시간 민감)
-      - source_channel='news'                            → news_90d
+      - source_channel='news'        → news_90d
+      - source_channel='law_monitor' → law_365d

    비적용 (None 반환):
      - meta 자체가 None
      - content_origin='ai_drafted' (생성 시점 = 가치 시점, 시간 demote 부적합)
-      - ★법령(source_channel='law_monitor'/material_type='law'): C-1 후속에서 law_365d 폐기.
-        법령 현행성은 version_status(B-1 버전체인 current/superseded)가 처리 — age-decay 는
-        current 법령을 부당 강등(의도 변경 기록). law 검색 ranking = version_status decorate.
-      - 그 외 모든 source_channel (manual, drive_sync, inbox_route, memo 등 — 자연 비적용)
+      - 그 외 모든 source_channel (manual, drive_sync, inbox_route, memo,
+        Study/Manual/Reference/Academic/Checklist 류 — 자연 비적용)
    """
    if meta is None:
        return None
    # 가드 2: content_origin='ai_drafted' 비적용
    if meta.content_origin == "ai_drafted":
        return None
-    # 재해사례/사망사고 = 시간 민감 → news 와 동일 90d (source 무관, 업로드 incident 도 포함)
-    if meta.material_type == "incident":
+    sc = meta.source_channel
+    if sc == "news":
        return "news_90d"
-    if meta.source_channel == "news":
-        return "news_90d"
-    # 법령 law_365d 폐기 + unknown source_channel → no decay
+    if sc == "law_monitor":
+        return "law_365d"
+    # 가드 6: unknown source_channel → no decay
    return None


@@ -133,7 +129,7 @@ async def _fetch_meta(
        text(
            """
            SELECT id, source_channel::text AS source_channel,
-                   content_origin, material_type, created_at
+                   content_origin, created_at
            FROM documents
            WHERE id = ANY(:ids)
            """
@@ -145,7 +141,6 @@ async def _fetch_meta(
            source_channel=row.source_channel,
            content_origin=row.content_origin,
            created_at=row.created_at,
-            material_type=getattr(row, "material_type", None),
        )
        for row in rows
    }
@@ -72,10 +72,6 @@ class LegacyWeightedSum(FusionStrategy):
                    score=existing.score + r.score * 0.5,
                    snippet=existing.snippet,
                    match_reason=f"{existing.match_reason}+vector",
-                    # C-1: 분류 축 메타 전파 (재구성 시 누락 = D-1 유형 표시 None)
-                    material_type=existing.material_type,
-                    jurisdiction=existing.jurisdiction,
-                    published_date=existing.published_date,
                )
            elif r.score > 0.3:
                merged[r.id] = r
@@ -132,10 +128,6 @@ class RRFOnly(FusionStrategy):
                    score=rrf_score,
                    snippet=base.snippet,
                    match_reason="+".join(reasons),
-                    # C-1: 분류 축 메타 전파 (재구성 시 누락 = D-1 유형 표시 None)
-                    material_type=base.material_type,
-                    jurisdiction=base.jurisdiction,
-                    published_date=base.published_date,
                )
            )
        return merged[:limit]
@@ -1,28 +0,0 @@
-"""안전 자료실 B-4 — licensed_restricted 단일 술어 (a안 U-2①, 모든 경로 공유 정의).
-
-색인은 허용하되 restricted=true(구매 전자책·유료자료)의 verbatim span 이 RAG 증거·발행물
-(검색/ask·digest·morning_briefing·study 풀이)에 들어가는 모든 경로를 구조적으로 차단.
-경로마다 술어를 복붙하지 않고 이 한 정의를 공유 — 가드 누락/드리프트 방지
-([[feedback_structural_integrity_over_path_discipline]]).
-개인 파일 열람(GET /documents/{id}?download)은 a안상 허용 = 미적용.
-
-두 표현(raw SQL / ORM)은 의미 동일: restricted 부재·false·extract_meta NULL = COALESCE 로
-미제외(redistribute=false 여도 restricted 부재면 미제외 — redistribute≠restricted 가 핵심).
-"""
-
-
-def restricted_exclude_sql(alias: str = "") -> str:
-    """raw text() 쿼리용 bare 술어('AND' 미포함). alias='' = 컬럼 직접 참조."""
-    p = (alias + ".") if alias else ""
-    return f"COALESCE({p}extract_meta -> 'license' ->> 'restricted', 'false') <> 'true'"
-
-
-def restricted_exclude_orm():
-    """SQLAlchemy ORM .where() 절 — restricted_exclude_sql 과 동일 의미(JSONB extract_meta)."""
-    from sqlalchemy import func
-
-    from models.document import Document
-
-    return func.coalesce(
-        Document.extract_meta["license"]["restricted"].astext, "false"
-    ) != "true"
@@ -26,11 +26,8 @@ PR-MacBook-RAG-Backend-1 부터 `services.llm.QwenMacBookBackend` 는 별 endpoi
 - **fallback(Claude Sonnet 4 API) 경로는 gate 제외**. PR #20 이후 fallback = Claude API. 단 현재
  구현상 `AIClient._call_chat` 내부에서 primary→fallback 전환이 일어나므로
  fallback도 gate 점유 상태로 실행된다. 허용 가능(fallback 빈도 낮음).
- ~~**MLX concurrency는 `MLX_CONCURRENCY = 1` 고정**~~ → **2026-06-12 개정**:
-  구 룰의 전제(서버 = single-inference)가 소멸 — 현 mlx_vlm server 는 continuous
-  batching 으로 동시 스트림 흡수(실측). 상한은 config `pipeline.mlx_gate_concurrency`
-  (기본 1, 운영 2). **게이트 자체(상한+우선순위 큐)는 영구 유지** — thundering herd
-  (23 concurrent → 22 timeout 사고) 방지는 계속 이 상한이 담당. 무제한 금지.
+- **MLX concurrency는 `MLX_CONCURRENCY = 1` 고정**. 모델이 바뀌어도 single-
+  inference 특성이 깨지지 않는 한 이 값을 올리지 말 것.

 ## 우선순위 정책 (B-1, 2026-05-17)

@@ -83,22 +80,8 @@ from core.utils import setup_logger

 logger = setup_logger("llm_gate")

-
-def _capacity() -> int:
-    """게이트 동시 실행 상한 — config.yaml `pipeline.mlx_gate_concurrency` (기본 1).
-
-    2026-06-12 일반화: "MLX_CONCURRENCY = 1 고정" 영구 룰의 전제(구 서버 = single-
-    inference, 23 concurrent → 22 timeout 실측)가 소멸 — 현 mlx_vlm server 는
-    continuous batching 으로 동시 스트림을 흡수(2026-06-11 밤 6~8 concurrent 실측
-    정상). 게이트 자체(상한 + 우선순위)는 유지하고 상한만 config 로 — thundering
-    herd 재발 방지는 이 상한이 계속 담당한다. 런타임 매 acquire 시 조회라
-    config 변경 + 프로세스 재기동으로 반영, 테스트는 settings monkeypatch.
-    """
-    from core.config import settings
-    try:
-        return max(1, int(getattr(settings, "mlx_gate_concurrency", 1)))
-    except (TypeError, ValueError):
-        return 1
+# MLX primary는 single-inference → 1
+MLX_CONCURRENCY = 1

 # Background waiter wait_ms 가 이 값 초과 시 WARN (starvation 신호, aging mitigation 은 Phase 2)
 STARVATION_WARN_MS = 300_000  # 5 min
@@ -118,7 +101,7 @@ DEFAULT_PRIORITY: Priority = Priority.BACKGROUND
 # Tuple format: (priority: int, seq: int, future: asyncio.Future, enqueue_ts: float)
 _waiters: list[tuple[int, int, asyncio.Future, float]] = []
 _seq = itertools.count()
-_inflight_n: int = 0  # 동시 실행 수 (구 bool — capacity 일반화로 카운터)
+_inflight: bool = False
 _lock: asyncio.Lock | None = None


@@ -160,7 +143,7 @@ async def acquire_mlx_gate(

    ⚠ `asyncio.timeout` 은 반드시 gate 안쪽 (Future await 후) 에 둘 것.
    """
-    global _inflight_n, _waiters
+    global _inflight, _waiters

    lock = _get_lock()
    seq = next(_seq)
@@ -169,9 +152,9 @@ async def acquire_mlx_gate(
    fut: asyncio.Future | None = None

    async with lock:
-        if _inflight_n < _capacity() and not _waiters:
+        if not _inflight and not _waiters:
            # fast path — 즉시 inflight 진입, Future 생성 안 함
-            _inflight_n += 1
+            _inflight = True
        else:
            # 대기열 진입
            fut = asyncio.get_event_loop().create_future()
@@ -211,8 +194,8 @@ async def acquire_mlx_gate(
        async with lock:
            next_fut = _dispatch_next_locked()
            if next_fut is None:
-                _inflight_n = max(0, _inflight_n - 1)
-            # next_fut 가 있으면 슬롯 handover — 카운트 유지 (다음 waiter 가 진입 예정)
+                _inflight = False
+            # _inflight 는 True 유지 (다음 waiter 가 진입 예정)
        logger.debug(
            "mlx_gate release duration_ms=%.0f priority=%s seq=%d",
            duration_ms, priority.name, seq,
@@ -239,24 +222,13 @@ def get_mlx_gate():
    return acquire_mlx_gate(DEFAULT_PRIORITY)


-# ── Read-only status (UI 표시용) ─────────────────────────────────────────────
-
-
-def gate_status() -> dict:
-    """현재 gate 점유 스냅샷 (read-only, lock-free 근사치 — UI 표시용).
-
-    inflight = 동시 실행 수(int). 기존 소비자(eid status)는 bool() 캐스팅이라 호환.
-    """
-    return {"inflight": _inflight_n, "waiters": len(_waiters)}
-
-
 # ── Test helpers (conftest reset) ────────────────────────────────────────────


 def _reset_for_test() -> None:
    """테스트 fixture 가 fresh loop 마다 호출. production code 에서 사용 X."""
-    global _waiters, _inflight_n, _lock, _seq
+    global _waiters, _inflight, _lock, _seq
    _waiters = []
-    _inflight_n = 0
+    _inflight = False
    _lock = None
    _seq = itertools.count()
@@ -1,55 +0,0 @@
-"""안전 자료실 C-1 후속 — 검색 결과 wrapper decoration (version_status + facets).
-
-엔드포인트 wrapper 에서 run_search() 결과에 1회 적용 — 검색 코어(run_search) 무접촉(r3).
- version_status: 법령 결과(material_type='law')에 legal_meta.version_status
-  (current/superseded/pending/repealed) 부착. legal_meta.document_id 1:0..1 위성 →
-  매핑 없는 law(레거시 등)는 None 유지. law 결과 없으면 query skip.
- facets: top-K 결과 내 분류 축(material_type/jurisdiction/version_status) 분포 라벨(r2-M4).
-  facets=true 일 때만 계산(미요청 시 None = byte 불변·ranking 무관).
-"""
-
-from __future__ import annotations
-
-from collections import Counter
-from typing import TYPE_CHECKING
-
-from sqlalchemy import text
-from sqlalchemy.ext.asyncio import AsyncSession
-
-if TYPE_CHECKING:
-    from api.search import SearchResult
-
-
-async def decorate_version_status(
-    session: AsyncSession, results: list["SearchResult"]
-) -> None:
-    """법령 결과에 legal_meta.version_status 부착 (in-place). law 결과 없으면 query skip."""
-    law_ids = [r.id for r in results if r.material_type == "law" and r.id is not None]
-    if not law_ids:
-        return
-    rows = await session.execute(
-        text(
-            "SELECT document_id, version_status FROM legal_meta "
-            "WHERE document_id = ANY(:ids)"
-        ),
-        {"ids": law_ids},
-    )
-    status_by_id = {row.document_id: row.version_status for row in rows}
-    for r in results:
-        if r.id in status_by_id:
-            r.version_status = status_by_id[r.id]
-
-
-def compute_facets(results: list["SearchResult"]) -> dict[str, dict[str, int]]:
-    """top-K 결과의 분류 축 분포 라벨. None 값은 제외(present 라벨만, 빈 축은 미포함)."""
-    axes = {
-        "material_type": [r.material_type for r in results],
-        "jurisdiction": [r.jurisdiction for r in results],
-        "version_status": [getattr(r, "version_status", None) for r in results],
-    }
-    facets: dict[str, dict[str, int]] = {}
-    for axis, vals in axes.items():
-        counter = Counter(v for v in vals if v is not None)
-        if counter:
-            facets[axis] = dict(counter.most_common())
-    return facets
@@ -24,7 +24,6 @@ import asyncio
 import hashlib
 import re
 import time
-from dataclasses import dataclass
 from typing import TYPE_CHECKING, Any

 from sqlalchemy import text
@@ -54,76 +53,18 @@ QUERY_EMBED_MAXSIZE = 500
 # server-side allowlist map. query parameter 가 raw table name 받지 않음.
 CANDIDATE_BACKEND_MAP: dict[str, dict[str, str] | None] = {
    "baseline": None,
-    # Phase 2A 임베딩 후보(me5_large_inst·snowflake_l_v2·qwen06·qwen4·qwen4m) 전량 no-go
-    # 종결(2026-06-12, 후보 전부 -0.03~-0.04) → cand 슬러그·테이블 제거 (R13, 마이그 360
-    # DROP). read-path 슬러그를 먼저 빼야 embedding_backend=cand_X /search 가 dropped 테이블을
-    # 읽어 500 나지 않는다. baseline(production)만 잔존.
+    "cand_me5_large_inst": {
+        "docs_table": "documents_cand_me5_large_inst",
+        "chunks_table": "document_chunks_cand_me5_large_inst",
+        "embed_endpoint": "http://embedding-cand-me5-inst:80/embed",
+    },
+    "cand_snowflake_l_v2": {
+        "docs_table": "documents_cand_snowflake_l_v2",
+        "chunks_table": "document_chunks_cand_snowflake_l_v2",
+        "embed_endpoint": "http://embedding-cand-snowflake-l-v2:80/embed",
+    },
 }

-# G-1 핀 고정 instruct 문자열 (inventory 2026-06-12-c 기록과 동일해야 함 —
-# 문구 변경 = 저장=조회 불변식 위반과 동급. 쿼리 측 전용, 문서 적재는 plain).
-QWEN3_QUERY_INSTRUCT = (
-    "Instruct: Given a web search query, retrieve relevant passages that answer the query"
-    "\nQuery: "
-)
-
-# ─── 안전 자료실 C-1: 분류 축 명시 필터 (3 leg 동등, byte 불변) ───────────────
-# 미지정(active=False) 시 모든 SQL 절이 빈 문자열 → 기존 SQL byte 불변(run_eval 회귀 0).
-# year 는 published_date NULL fallback created_at (freshness 와 동일 COALESCE 사상).
-@dataclass
-class AxisFilter:
-    material_types: list[str] | None = None   # CSV → list, material_type = ANY
-    jurisdiction: str | None = None
-    year_from: int | None = None
-    year_to: int | None = None
-
-    def active(self) -> bool:
-        return bool(self.material_types or self.jurisdiction
-                    or self.year_from is not None or self.year_to is not None)
-
-
-def _axis_sql(alias: str, af: "AxisFilter | None", params: dict) -> str:
-    """alias 기준 axis 필터 SQL — 미지정 시 '' (byte 불변). 반환 형태 ' AND ...'.
-
-    alias='' 이면 컬럼 직접 참조(단일 테이블 FROM documents 경로). 파라미터는 af_ prefix
-    로 호출측 기존 bind 와 충돌 방지.
-    """
-    if af is None or not af.active():
-        return ""
-    p = (alias + ".") if alias else ""
-    cl: list[str] = []
-    if af.material_types:
-        cl.append(f"{p}material_type = ANY(:af_mt)")
-        params["af_mt"] = af.material_types
-    if af.jurisdiction:
-        cl.append(f"{p}jurisdiction = :af_jur")
-        params["af_jur"] = af.jurisdiction
-    if af.year_from is not None:
-        cl.append(f"COALESCE({p}published_date, {p}created_at::date) >= make_date(:af_yf, 1, 1)")
-        params["af_yf"] = af.year_from
-    if af.year_to is not None:
-        cl.append(f"COALESCE({p}published_date, {p}created_at::date) <= make_date(:af_yt, 12, 31)")
-        params["af_yt"] = af.year_to
-    return " AND " + " AND ".join(cl)
-
-
-# ─── 안전 자료실 B-4: licensed_restricted 단일 술어 (a안 U-2① — 항상 적용) ──────
-def _license_sql(alias: str) -> str:
-    """licensed_restricted(extract_meta.license.restricted=true) 문서를 retrieval 에서 제외.
-
-    a안: 색인은 허용하되, 구매 전자책/유료자료의 verbatim span 이 RAG 증거·digest 발행에
-    들어가는 경로를 구조적으로 차단. 이 단일 술어를 모든 retrieval leg + digest loader 가
-    공유 — 경로별 가드 누락 방지([[feedback_structural_integrity_over_path_discipline]]).
-    개인 파일 열람(GET /documents/{id}?download)은 a안상 허용이라 미적용.
-
-    axis 필터(조건부)와 달리 항상 적용. restricted 부재/false = COALESCE 로 미제외 →
-    기존 코퍼스(restricted=true 0건)에서 결과 불변. 반환 ' AND ...' (alias='' = 컬럼 직접).
-    술어 정의 = license_filter.restricted_exclude_sql 공유(digest/briefing/study 풀이와 단일 source).
-    """
-    from services.search.license_filter import restricted_exclude_sql
-    return " AND " + restricted_exclude_sql(alias)
-
-
 # 2단계 gate (R2-B1) — SQL string interpolation 직전 final allowlist.
 _VALID_DOCS_TABLE = re.compile(r"^(documents|documents_cand_[a-z0-9_]+)$")
 # corpus_chunks = document_chunks WHERE in_corpus=true 뷰 (Hier-Decomp-1 c2 choke point).
@@ -196,34 +137,6 @@ async def _embed_query_via_tei(endpoint: str, text_: str) -> list[float] | None:
        return None


-async def _embed_query_via_ollama(cfg: dict, text_: str) -> list[float] | None:
-    """Phase 2A 후보 쿼리 임베딩 — Ollama /api/embed + 비대칭 instruct prefix.
-
-    쿼리 측 전용: QWEN3_QUERY_INSTRUCT 를 선두에 붙인다 (문서 적재 = plain).
-    embed_dimensions 지정(qwen4m) 시 Ollama dimensions 옵션 = MRL truncate+재정규화
-    (G-1 fixture: 1024 출력 L2=1.0 실측). cache 미사용 — slug 별 분포 상이.
-    """
-    if not text_:
-        return None
-    import httpx
-    body: dict = {"model": cfg["embed_model"], "input": [QWEN3_QUERY_INSTRUCT + text_]}
-    if cfg.get("embed_dimensions"):
-        body["dimensions"] = cfg["embed_dimensions"]
-    try:
-        async with httpx.AsyncClient(timeout=60.0) as c:
-            r = await c.post(cfg["embed_endpoint"], json=body)
-            r.raise_for_status()
-            embs = r.json().get("embeddings")
-        if not isinstance(embs, list) or not embs or not isinstance(embs[0], list):
-            raise ValueError("unexpected /api/embed shape")
-        return embs[0]
-    except Exception as exc:
-        logger.warning(
-            "candidate ollama embed failed model=%s err=%r", cfg.get("embed_model"), exc
-        )
-        return None
-
-
 def _query_embed_key(text_: str) -> str:
    return hashlib.sha256(f"{text_}|bge-m3".encode("utf-8")).hexdigest()

@@ -261,7 +174,7 @@ def query_embed_cache_stats() -> dict[str, int]:


 async def search_text(
-    session: AsyncSession, query: str, limit: int, *, axis: "AxisFilter | None" = None
+    session: AsyncSession, query: str, limit: int
 ) -> list["SearchResult"]:
    """FTS + trigram 필드별 가중치 검색 (Phase 1.2-B UNION 분해).

@@ -292,12 +205,8 @@ async def search_text(
    # SQLAlchemy async session 내 두 execute는 같은 connection 사용
    await session.execute(text("SELECT set_limit(0.15)"))

-    _params: dict[str, Any] = {"q": query, "limit": limit}
-    # license(항상) + axis(조건부). license 가 항상 ' AND ...' 이라 WHERE 는 늘 존재.
-    _where = _license_sql("d") + _axis_sql("d", axis, _params)
-
    result = await session.execute(
-        text(f"""
+        text("""
            WITH candidates AS (
                -- title trigram (idx_documents_title_trgm)
                SELECT id FROM documents
@@ -329,7 +238,7 @@ async def search_text(
                       + similarity(coalesce(d.ai_tags::text, ''), :q) * 2.5
                       + similarity(coalesce(d.user_note, ''), :q) * 2.0
                       + similarity(coalesce(d.ai_summary, ''), :q) * 1.5
-                       + similarity(left(coalesce(d.extracted_text, ''), 2000), :q) * 1.0
+                       + similarity(coalesce(d.extracted_text, ''), :q) * 1.0
                       -- FTS 보너스 (idx_documents_fts_full 활용)
                       + coalesce(ts_rank(
                           to_tsvector('simple',
@@ -337,7 +246,7 @@ async def search_text(
                               coalesce(d.ai_tags::text, '') || ' ' ||
                               coalesce(d.ai_summary, '') || ' ' ||
                               coalesce(d.user_note, '') || ' ' ||
-                               left(coalesce(d.extracted_text, ''), 2000)
+                               coalesce(d.extracted_text, '')
                           ),
                           plainto_tsquery('simple', :q)
                       ), 0) * 2.0
@@ -348,17 +257,15 @@ async def search_text(
                       WHEN similarity(coalesce(d.ai_tags::text, ''), :q) >= 0.3 THEN 'tags'
                       WHEN similarity(coalesce(d.user_note, ''), :q) >= 0.3 THEN 'note'
                       WHEN similarity(coalesce(d.ai_summary, ''), :q) >= 0.3 THEN 'summary'
-                       WHEN similarity(left(coalesce(d.extracted_text, ''), 2000), :q) >= 0.3 THEN 'content'
+                       WHEN similarity(coalesce(d.extracted_text, ''), :q) >= 0.3 THEN 'content'
                       ELSE 'fts'
-                   END AS match_reason,
-                   d.material_type, d.jurisdiction, d.published_date
+                   END AS match_reason
            FROM documents d
            JOIN candidates c ON d.id = c.id
-            WHERE{_where[4:]}
            ORDER BY score DESC
            LIMIT :limit
        """),
-        _params,
+        {"q": query, "limit": limit},
    )
    return [SearchResult(**row._mapping) for row in result]

@@ -373,7 +280,6 @@ async def search_vector(
    snapshot_chunk_id_max: int | None = None,
    corpus_variant: str | None = None,
    exact_knn: bool = False,
-    axis: "AxisFilter | None" = None,
 ) -> list["SearchResult"]:
    """Hybrid 벡터 검색 — doc + chunks 동시 retrieval (Phase 1.2-G).

@@ -417,10 +323,7 @@ async def search_vector(
    else:
        docs_table = cfg["docs_table"]
        chunks_table = cfg["chunks_table"]
-        if cfg.get("embed_kind") == "ollama":
-            query_embedding = await _embed_query_via_ollama(cfg, query)
-        else:
-            query_embedding = await _embed_query_via_tei(cfg["embed_endpoint"], query)
+        query_embedding = await _embed_query_via_tei(cfg["embed_endpoint"], query)

    logger.info(
        "[embedding-dispatch] backend=%s docs_table=%s chunks_table=%s snapshot_doc_id_max=%s "
@@ -448,7 +351,6 @@ async def search_vector(
                docs_table=docs_table,
                snapshot_doc_id_max=snapshot_doc_id_max,
                exact_knn=exact_knn,
-                axis=axis,
            )

    async def _chunks_call() -> list["SearchResult"]:
@@ -458,7 +360,6 @@ async def search_vector(
                chunks_table=chunks_table,
                snapshot_chunk_id_max=snapshot_chunk_id_max,
                exact_knn=exact_knn,
-                axis=axis,
            )

    doc_results, chunk_results = await asyncio.gather(_docs_call(), _chunks_call())
@@ -474,7 +375,6 @@ async def _search_vector_docs(
    docs_table: str = "documents",
    snapshot_doc_id_max: int | None = None,
    exact_knn: bool = False,
-    axis: "AxisFilter | None" = None,
 ) -> list["SearchResult"]:
    """documents (또는 documents_cand_<slug>).embedding 직접 검색.

@@ -499,34 +399,28 @@ async def _search_vector_docs(
        if snapshot_doc_id_max is not None:
            snapshot_clause = " AND id <= :snapshot_doc_id_max"
            params["snapshot_doc_id_max"] = snapshot_doc_id_max
-        axis_clause = _axis_sql("", axis, params)   # alias 없음 (단일 FROM documents)
-        license_clause = _license_sql("")            # B-4: restricted 항상 제외
        sql = f"""
            SELECT id, title, ai_domain, ai_summary, file_format,
                   (1 - (embedding <=> cast(:embedding AS vector))) AS score,
                   left(extracted_text, 1200) AS snippet,
                   'vector_doc' AS match_reason,
-                   NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title,
-                   material_type, jurisdiction, published_date
+                   NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title
            FROM documents
-            WHERE embedding IS NOT NULL AND deleted_at IS NULL{snapshot_clause}{axis_clause}{license_clause}
+            WHERE embedding IS NOT NULL AND deleted_at IS NULL{snapshot_clause}
            ORDER BY embedding <=> cast(:embedding AS vector)
            LIMIT :limit
        """
    else:
        # candidate: docs_table 은 (doc_id, embed_input, embed_input_hash, embedding) 만 보유 → JOIN documents
-        axis_clause = _axis_sql("d", axis, params)
-        license_clause = _license_sql("d")           # B-4: restricted 항상 제외
        sql = f"""
            SELECT d.id, d.title, d.ai_domain, d.ai_summary, d.file_format,
                   (1 - (c.embedding <=> cast(:embedding AS vector))) AS score,
                   left(d.extracted_text, 1200) AS snippet,
                   'vector_doc' AS match_reason,
-                   NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title,
-                   d.material_type, d.jurisdiction, d.published_date
+                   NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title
            FROM {docs_table} c
            JOIN documents d ON d.id = c.doc_id
-            WHERE d.deleted_at IS NULL{axis_clause}{license_clause}
+            WHERE d.deleted_at IS NULL
            ORDER BY c.embedding <=> cast(:embedding AS vector)
            LIMIT :limit
        """
@@ -542,7 +436,6 @@ async def _search_vector_chunks(
    chunks_table: str = "document_chunks",
    snapshot_chunk_id_max: int | None = None,
    exact_knn: bool = False,
-    axis: "AxisFilter | None" = None,
 ) -> list["SearchResult"]:
    """document_chunks (또는 document_chunks_cand_<slug>).embedding window partition.

@@ -568,25 +461,12 @@ async def _search_vector_chunks(
        snapshot_clause = " AND c.id <= :snapshot_chunk_id_max"
        params["snapshot_chunk_id_max"] = snapshot_chunk_id_max

-    # C-1: axis 필터는 inner topk 에 JOIN (R6 결정 — outer post-filter 면 ANN top-:inner_k
-    # 후보를 뽑은 뒤 거르므로 좁은 필터(GB 법령 등)에서 후보 붕괴). 미지정 시 JOIN 없음 = byte 불변.
-    if axis and axis.active():
-        chunk_join = " JOIN documents df ON df.id = c.doc_id"
-        chunk_axis = _axis_sql("df", axis, params)
-    else:
-        chunk_join = ""
-        chunk_axis = ""
-
-    # B-4: restricted 제외 — outer 가 documents d 를 항상 JOIN 하므로 post-rank 위치.
-    # restricted 는 소수(구매자료)라 inner topk 후 제외해도 candidate collapse 없음(axis 와 상이).
-    license_clause = _license_sql("d")
-
    sql = f"""
        WITH topk AS (
            SELECT c.id AS chunk_id, c.doc_id, c.chunk_index, c.section_title, c.text,
                   c.embedding <=> cast(:embedding AS vector) AS dist
-            FROM {chunks_table} c{chunk_join}
-            WHERE c.embedding IS NOT NULL{snapshot_clause}{chunk_axis}
+            FROM {chunks_table} c
+            WHERE c.embedding IS NOT NULL{snapshot_clause}
            ORDER BY c.embedding <=> cast(:embedding AS vector)
            LIMIT :inner_k
        ),
@@ -599,12 +479,10 @@ async def _search_vector_chunks(
               d.ai_summary AS ai_summary, d.file_format AS file_format,
               (1 - r.dist) AS score, left(r.text, 1200) AS snippet,
               'vector_chunk' AS match_reason,
-               r.chunk_id AS chunk_id, r.chunk_index AS chunk_index, r.section_title AS section_title,
-               d.material_type AS material_type, d.jurisdiction AS jurisdiction,
-               d.published_date AS published_date
+               r.chunk_id AS chunk_id, r.chunk_index AS chunk_index, r.section_title AS section_title
        FROM ranked r
        JOIN documents d ON d.id = r.doc_id
-        WHERE r.rn <= 2 AND d.deleted_at IS NULL{license_clause}
+        WHERE r.rn <= 2 AND d.deleted_at IS NULL
        ORDER BY r.dist
        LIMIT :limit
    """
@@ -32,8 +32,6 @@ from typing import TYPE_CHECKING, Literal

 from sqlalchemy.ext.asyncio import AsyncSession

-from core.database import async_session
-
 from . import query_analyzer, query_rewriter
 from .fusion_service import (
    DEFAULT_FUSION,
@@ -49,7 +47,6 @@ from .rerank_service import (
    rerank_chunks,
 )
 from .retrieval_service import (
-    AxisFilter,
    compress_chunks_to_docs,
    search_text,
    search_vector,
@@ -151,7 +148,6 @@ async def run_search(
    rewrite_backend: str | None = None,
    corpus_variant: str | None = None,
    exact_knn: bool = False,
-    axis: AxisFilter | None = None,
 ) -> PipelineResult:
    """검색 파이프라인 실행.

@@ -190,7 +186,6 @@ async def run_search(
            snapshot_chunk_id_max=snapshot_chunk_id_max,
            reranker_backend=reranker_backend,
            rewrite_backend=rewrite_backend,
-            axis=axis,
        )

    timing: dict[str, float] = {}
@@ -280,7 +275,6 @@ async def run_search(
                snapshot_chunk_id_max=snapshot_chunk_id_max,
                corpus_variant=corpus_variant,
                exact_knn=exact_knn,
-                axis=axis,
            )
        timing["vector_ms"] = (time.perf_counter() - t0) * 1000
        if not raw_chunks:
@@ -290,7 +284,7 @@ async def run_search(
        results = vector_results
    else:
        t0 = time.perf_counter()
-        text_results = await search_text(session, q, limit, axis=axis)
+        text_results = await search_text(session, q, limit)
        timing["text_ms"] = (time.perf_counter() - t0) * 1000

        if mode == "hybrid":
@@ -312,7 +306,6 @@ async def run_search(
                    snapshot_chunk_id_max=snapshot_chunk_id_max,
                    corpus_variant=corpus_variant,
                    exact_knn=exact_knn,
-                    axis=axis,
                )
            timing["vector_ms"] = (time.perf_counter() - t1) * 1000

@@ -465,10 +458,6 @@ def _rrf_fuse_variants(
            score=rrf_score,
            snippet=doc.snippet,
            match_reason=f"{doc.match_reason}+multi_query_rrf",
-            # C-1: 분류 축 메타 전파 (SearchResult 재구성 지점 — fusion 2곳과 동기)
-            material_type=doc.material_type,
-            jurisdiction=doc.jurisdiction,
-            published_date=doc.published_date,
        ))
    return fused[:limit]

@@ -539,7 +528,6 @@ async def search_with_rewrite(
    snapshot_chunk_id_max: int | None,
    reranker_backend: str | None,
    rewrite_backend: str,
-    axis: "AxisFilter | None" = None,
 ) -> PipelineResult:
    """Phase 2Q multi-query retrieval 합성 path (plan v6 §5.5).

@@ -583,20 +571,13 @@ async def search_with_rewrite(
    async def _variant_retrieve(
        v: str,
    ) -> "tuple[list[SearchResult], list[SearchResult], dict[int, list[SearchResult]]]":
-        # 변형별 독립 AsyncSession (fan-out). 공유 session 을 asyncio.gather 로 동시
-        # execute 에 넘기면 SQLAlchemy async 가 'another operation in progress' 로
-        # 부하 의존적 비결정 크래시 — variant 마다 독립 연결로 분리한다.
-        # axis(material_type/jurisdiction/year) 도 single-query path 와 동일하게 전달
-        # (rewrite 경로가 axis 필터를 조용히 누락하던 결함 수정).
-        async with async_session() as vsession:
-            text = await search_text(vsession, v, per_variant_k, axis=axis)
-            raw_chunks = await search_vector(
-                vsession, v, per_variant_k,
-                embedding_backend=embedding_backend,
-                snapshot_doc_id_max=snapshot_doc_id_max,
-                snapshot_chunk_id_max=snapshot_chunk_id_max,
-                axis=axis,
-            )
+        text = await search_text(session, v, per_variant_k)
+        raw_chunks = await search_vector(
+            session, v, per_variant_k,
+            embedding_backend=embedding_backend,
+            snapshot_doc_id_max=snapshot_doc_id_max,
+            snapshot_chunk_id_max=snapshot_chunk_id_max,
+        )
        vector, chunks_by_doc = compress_chunks_to_docs(raw_chunks, per_variant_k)
        return text, vector, chunks_by_doc

@@ -47,7 +47,7 @@ logger = setup_logger("synthesis")

 # ─── 상수 (plan 영구 룰) ─────────────────────────────────
 PROMPT_VERSION = "v2"
-LLM_TIMEOUT_MS = 120000  # 2026-06-11 Qwen3.6-27B-6bit 전환: 프리필 ~112 tok/s·디코드 ~11.7 tok/s 실측 — 30s 면 synthesis(답변 본체) 상시 timeout. synthesis 는 graceful skip 불가(=답변 실패)라 단독 상향, config ask.backend.timeout_read_s=120 와 align
+LLM_TIMEOUT_MS = 30000  # 2026-05-17 B-3: 15s 시 동시 부하 (Mac mini 26B classifier+evidence+synthesis serialized) 빈발 timeout — classifier (30s) 와 align
 CACHE_TTL = 3600  # 1h (answer 는 원문 변경에 민감 → query_analyzer 24h 보다 짧게)
 CACHE_MAXSIZE = 300
 MAX_ANSWER_CHARS = 600
@@ -95,10 +95,8 @@ except FileNotFoundError:
    )


-# ─── in-memory 캐시 (FIFO eviction + TTL, query_analyzer 패턴 복제) ─
-# R10: (ts, result) 저장 — TTL 미적용으로 원문 수정돼도 CACHE_MAXSIZE 찰 때까지 stale answer
-# 반환하던 결함 수정. query_rewriter 의 expire_at TTL enforce 정본 복제.
-_CACHE: dict[str, tuple[float, SynthesisResult]] = {}
+# ─── in-memory LRU (FIFO 근사, query_analyzer 패턴 복제) ─
+_CACHE: dict[str, SynthesisResult] = {}


 def _model_version() -> str:
@@ -124,11 +122,10 @@ def get_cached(query: str, chunk_ids: list[int], backend_name: str = "gemma-macm
    entry = _CACHE.get(key)
    if entry is None:
        return None
-    ts, result = entry
-    if time.time() - ts > CACHE_TTL:
-        _CACHE.pop(key, None)  # 만료 — 삭제 후 miss
-        return None
-    return result
+    # TTL 체크는 elapsed_ms 를 악용할 수 없으므로 별도 저장
+    # 여기서는 단순 policy 로 처리: entry 가 있으면 반환 (eviction 은 FIFO 시점)
+    # 정확한 TTL 이 필요하면 (ts, result) tuple 로 저장해야 함.
+    return entry


 def _should_cache(result: SynthesisResult) -> bool:
@@ -146,9 +143,8 @@ def set_cached(query: str, chunk_ids: list[int], result: SynthesisResult, backen
    if not _should_cache(result):
        return
    key = _cache_key(query, chunk_ids, backend_name)
-    now = time.time()
    if key in _CACHE:
-        _CACHE[key] = (now, result)
+        _CACHE[key] = result
        return
    if len(_CACHE) >= CACHE_MAXSIZE:
        try:
@@ -156,7 +152,7 @@ def set_cached(query: str, chunk_ids: list[int], result: SynthesisResult, backen
            _CACHE.pop(oldest, None)
        except StopIteration:
            pass
-    _CACHE[key] = (now, result)
+    _CACHE[key] = result


 def cache_stats() -> dict[str, int]:
@@ -11,7 +11,7 @@
 ## 핵심 원칙
 - **Verifier strong 단독 refuse 금지** — grounding strong 과 교차해야 refuse
 - **Timeout 3s** — 느리면 없는 게 낫다 (fail open)
- MLX gate 사용 (Mac mini 26B endpoint — classifier/evidence 와 동일 gate 공유, 동시 race 방지)
+- MLX gate 미사용 (PR #20 이후 Mac mini 26B endpoint — concurrent 안전성 별 검토)
 """

 from __future__ import annotations
@@ -25,7 +25,6 @@ from typing import TYPE_CHECKING, Literal
 from ai.client import AIClient, _load_prompt, parse_json_response
 from core.config import settings
 from core.utils import setup_logger
-from .llm_gate import Priority, acquire_mlx_gate

 if TYPE_CHECKING:
    from .evidence_service import EvidenceItem
@@ -133,9 +132,8 @@ async def verify(
    prompt = _build_input(answer, evidence)
    client = AIClient()
    try:
-        async with acquire_mlx_gate(Priority.FOREGROUND):
-            async with asyncio.timeout(LLM_TIMEOUT_MS / 1000):
-                raw = await client.call_verifier(prompt)
+        async with asyncio.timeout(LLM_TIMEOUT_MS / 1000):
+            raw = await client._request(settings.ai.verifier, prompt)
        _failure_count = 0
    except asyncio.TimeoutError:
        _failure_count += 1
@@ -1,39 +0,0 @@
-"""스토리지 계층 추상화 패키지 (plan ds-s1-backend-1 D 그룹, scaffold-first).
-
-활성 백엔드 선택 = get_storage_backend():
-  - env DS_STORAGE_BACKEND (기본 'local') 로 결정 — config.yaml storage 섹션 편집 없이도
-    동작(검색실험 Soft Lock 동안 config 불가침). 실 활성(외부 백엔드)은 D-3.
-  - 'local'        → LocalBackend(settings.nas_mount_path)  : 현행 NAS NFS, /file 동작 불변.
-  - 'nas_api'/'nas'→ NasApiBackend(env DS_NAS_API_BASE_URL) : 미프로비전 시 503(silent fallback X).
-"""
-
-from __future__ import annotations
-
-import os
-from functools import lru_cache
-
-from core.config import settings
-
-from .base import StatResult, StorageBackend, StorageNotConfigured
-from .local import LocalBackend
-from .nas_api import NasApiBackend
-
-__all__ = [
-    "StorageBackend",
-    "StorageNotConfigured",
-    "StatResult",
-    "LocalBackend",
-    "NasApiBackend",
-    "get_storage_backend",
-]
-
-
-@lru_cache(maxsize=1)
-def get_storage_backend() -> StorageBackend:
-    """활성 스토리지 백엔드 1개 반환 (프로세스 단위 캐시)."""
-    backend = os.getenv("DS_STORAGE_BACKEND", "local").lower()
-    if backend == "local":
-        return LocalBackend(settings.nas_mount_path)
-    if backend in ("nas_api", "nas"):
-        return NasApiBackend(os.getenv("DS_NAS_API_BASE_URL"))
-    raise StorageNotConfigured(f"unknown DS_STORAGE_BACKEND={backend!r}")
@@ -1,50 +0,0 @@
-"""스토리지 백엔드 추상 인터페이스 — plan ds-s1-backend-1 D-1.
-
-ABC 는 첫날부터 Range(offset/length) stream 계약을 포함한다 — D-2 의 원격 streaming
-Range pass-through 가 afterthought 가 아니라 인터페이스 의무가 되도록.
-
-is_local=True 백엔드는 로컬 파일시스템 경로를 노출 → 호출부가 Starlette FileResponse
-(Range 자동 처리)를 그대로 쓴다. 원격 백엔드는 stream()/stat() 로 Range 를 구현한다.
-"""
-
-from __future__ import annotations
-
-import os
-from abc import ABC, abstractmethod
-from collections.abc import AsyncIterator
-from dataclasses import dataclass
-
-
-class StorageNotConfigured(RuntimeError):
-    """활성화되지 않은(미프로비전) 백엔드 호출 — 503 으로 표면화. silent fallback 금지."""
-
-
-@dataclass
-class StatResult:
-    exists: bool
-    size: int
-
-
-class StorageBackend(ABC):
-    """원본 파일 접근 추상 인터페이스."""
-
-    # 로컬 파일시스템 경로를 노출하는가 (FileResponse 직결 가능 여부).
-    is_local: bool = False
-
-    @abstractmethod
-    def local_path(self, rel_path: str) -> os.PathLike[str] | None:
-        """is_local=True 면 물리 경로 반환(FileResponse 용). 원격 백엔드는 None."""
-
-    @abstractmethod
-    async def stat(self, rel_path: str) -> StatResult:
-        """크기/존재 여부. 미구성 백엔드는 StorageNotConfigured raise."""
-
-    @abstractmethod
-    def stream(
-        self, rel_path: str, *, start: int | None = None, end: int | None = None
-    ) -> AsyncIterator[bytes]:
-        """[start, end] 바이트 범위(inclusive)를 async 청크로 yield (Range pass-through).
-
-        start/end 가 None 이면 전체. 미구성 백엔드는 StorageNotConfigured raise.
-        """
-        raise NotImplementedError
@@ -1,51 +0,0 @@
-"""LocalBackend — 현행 NAS NFS(volume4) 마운트. /file 동작 불변 (plan D-1)."""
-
-from __future__ import annotations
-
-import asyncio
-import os
-from collections.abc import AsyncIterator
-from pathlib import Path
-
-from .base import StatResult, StorageBackend
-
-_STREAM_CHUNK = 256 * 1024
-
-
-class LocalBackend(StorageBackend):
-    """루트(=settings.nas_mount_path) 하위 상대경로를 로컬 파일시스템으로 해석."""
-
-    is_local = True
-
-    def __init__(self, root: str) -> None:
-        self._root = Path(root)
-
-    def local_path(self, rel_path: str) -> os.PathLike[str]:
-        return self._root / rel_path
-
-    async def stat(self, rel_path: str) -> StatResult:
-        p = self._root / rel_path
-        if not p.exists():
-            return StatResult(exists=False, size=0)
-        return StatResult(exists=True, size=p.stat().st_size)
-
-    async def stream(
-        self, rel_path: str, *, start: int | None = None, end: int | None = None
-    ) -> AsyncIterator[bytes]:
-        """로컬 파일을 청크 stream (Range 지원). /file 의 로컬 경로는 FileResponse 가
-        Range 를 자동 처리하므로 이 메서드는 인터페이스 대칭/원격 동등성을 위한 구현."""
-        p = self._root / rel_path
-        with p.open("rb") as f:
-            if start:
-                f.seek(start)
-            remaining = None if end is None else (end - (start or 0) + 1)
-            while True:
-                to_read = _STREAM_CHUNK if remaining is None else min(_STREAM_CHUNK, remaining)
-                if to_read <= 0:
-                    break
-                data = await asyncio.to_thread(f.read, to_read)
-                if not data:
-                    break
-                yield data
-                if remaining is not None:
-                    remaining -= len(data)
@@ -1,33 +0,0 @@
-"""NasApiBackend — 외부 스토리지(맥미니4TB / NAS Docker API) stub (plan D-1).
-
-★ 미프로비전 = 503. silent fallback 금지(다른 백엔드로 자동 우회 X). 실 프로비전 후
-D-3 에서 활성화. infra_inventory.md 갱신(Update Rule) 이 선행이다.
-"""
-
-from __future__ import annotations
-
-import os
-from collections.abc import AsyncIterator
-
-from .base import StatResult, StorageBackend, StorageNotConfigured
-
-_MSG = "NasApiBackend 미구성 — 외부 스토리지 프로비전 후 활성(D-3). silent fallback 없음."
-
-
-class NasApiBackend(StorageBackend):
-    is_local = False
-
-    def __init__(self, base_url: str | None = None) -> None:
-        self._base_url = base_url
-
-    def local_path(self, rel_path: str) -> os.PathLike[str] | None:
-        return None
-
-    async def stat(self, rel_path: str) -> StatResult:
-        raise StorageNotConfigured(_MSG)
-
-    async def stream(
-        self, rel_path: str, *, start: int | None = None, end: int | None = None
-    ) -> AsyncIterator[bytes]:
-        raise StorageNotConfigured(_MSG)
-        yield b""  # 도달 불가 — async generator 형태 유지용(호출부 `async for` 계약 일치).
@@ -24,7 +24,6 @@ from models.chunk import DocumentChunk
 from models.document import Document
 from models.study_question import StudyQuestion
 from models.study_topic import StudyTopicDocument
-from services.search.license_filter import restricted_exclude_orm

 logger = logging.getLogger(__name__)

@@ -125,14 +124,11 @@ async def _gather_document_evidence(
        return []

    # 매핑된 documents 메타 (제목·요약 표기)
-    # B-4: licensed_restricted 제외 → valid_doc_ids 에서 빠지므로 아래 청크 쿼리(doc_id IN)도
-    # 자동 차단. study 풀이 RAG 도 retrieval/digest 와 동일 단일 술어 공유(a안 U-2①).
    doc_meta_rows = (
        await session.execute(
            select(Document.id, Document.title, Document.ai_summary).where(
                Document.id.in_(doc_ids),
                Document.deleted_at.is_(None),
-                restricted_exclude_orm(),
            )
        )
    ).all()
@@ -252,15 +248,12 @@ async def gather_explanation_context(
    client = AIClient()
    query = _build_query(question)
    try:
-        # 같은 AsyncSession 을 asyncio.gather 로 동시 execute 에 넘기면 SQLAlchemy async 가
-        # 'another operation in progress' 로 부하 의존적 비결정 크래시(이전 주석 'lock 충돌
-        # 없음' 은 rerank HTTP 만 보고 DB execute 동시성을 간과한 오인). 백그라운드 prefetch
-        # 라 순차 직렬화 — 사용자 대면 rewrite 경로(독립 세션 fan-out)와는 다른 처방.
-        docs = await _gather_document_evidence(
-            session, user_id, question.study_topic_id, query, client
-        )
-        questions = await _gather_question_evidence(
-            session, user_id, question.study_topic_id, question.id, query, client
+        # 두 조회 병렬화 (rerank 호출이 별개라 lock 충돌 없음)
+        docs, questions = await asyncio.gather(
+            _gather_document_evidence(session, user_id, question.study_topic_id, query, client),
+            _gather_question_evidence(
+                session, user_id, question.study_topic_id, question.id, query, client
+            ),
        )
        return ExplanationContext(documents=docs, questions=questions)
    finally:
@@ -1,25 +0,0 @@
-"""채점(outcome) 산출 단일 소스 (study-to-viewer P2).
-
-라이브 attempt 엔드포인트(submit_attempt)와 뷰어 ingest 가 **동일 함수**로 채점 →
-정오 어휘가 한 곳(서버)에서 결정(plan r2: ingest 는 raw 신호 selected+unsure 만 싣고
-DS 가 산출 = '무수정 재생'을 실제로 성립시키는 형태). correct_choice 는 항상 현재 DB 값.
-
-규칙(라이브 study_questions.py:1008-1020 동일):
-  is_unsure=True  → (None, False, 'unsure')   # unsure 가 정오 override, selected 폐기
-  selected None   → ValueError                # 선택 없고 unsure 도 아니면 무효(엔드포인트가 처리)
-  그 외          → selected==correct → (selected, is_correct, 'correct'|'wrong')
-"""
-
-from __future__ import annotations
-
-
-def derive_outcome(
-    selected_choice: int | None, is_unsure: bool, correct_choice: int
-) -> tuple[int | None, bool, str]:
-    """(selected, is_correct, outcome) 반환. skipped 는 여기서 안 나옴(선택 없으면 호출측이 거부/skip)."""
-    if is_unsure:
-        return None, False, "unsure"
-    if selected_choice is None:
-        raise ValueError("selected_choice (1~4) 또는 is_unsure=true 가 필요합니다")
-    is_correct = selected_choice == correct_choice
-    return selected_choice, is_correct, ("correct" if is_correct else "wrong")
@@ -1,174 +0,0 @@
-"""발행 outbox enqueue + 초기 백필 (docsrv-viewer-publish).
-
-enqueue_publish: 저작/4-A 트랜잭션이 같은 session(=같은 Postgres tx)에서 호출 → caller commit
-  (P0-1 규율: 콘텐츠 변경과 outbox INSERT 원자성, dual-write 회피). payload/hash 스냅샷.
-enqueue_question_publish: 문항 + (ready면)해설을 함께 적재. 저작 쓰기/4-A 완료/백필 공용.
-backfill_publish_questions: 기존 active 문항을 bounded 로 1회 outbox 적재(초기 백필, P2-1 bounded page).
-  멱등 = 발행 워커의 (payload_hash, deleted) 디둡이 no-op 재투영 흡수(중복 enqueue 무해).
-
-★주의: 저작 엔드포인트(study_questions create/update)·4-A 워커에서의 enqueue 결선은 P0-1b
-  (기존 hot 파일 수정이라 별 increment). 본 모듈은 호출 라이브러리 + 수동/백필 진입점.
-"""
-
-from __future__ import annotations
-
-from typing import Any
-
-from sqlalchemy import select
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from models.published import PublishOutbox
-from models.study_memo_card import StudyMemoCard
-from models.study_memo_card_progress import StudyMemoCardProgress
-from models.study_question import StudyQuestion
-from models.study_topic import StudyTopic
-from services.study.publish_projection import (
-    KIND_CARD,
-    KIND_CARD_PROGRESS,
-    KIND_EXPLANATION,
-    KIND_QUESTION,
-    KIND_TOPIC,
-    SCHEMA_VERSION,
-    payload_hash,
-    project_card,
-    project_card_progress,
-    project_explanation,
-    project_question,
-    project_topic,
-)
-
-
-async def enqueue_publish(
-    session: AsyncSession,
-    *,
-    kind: str,
-    source_id: int,
-    payload: dict[str, Any] | None,
-    deleted: bool = False,
-) -> None:
-    """outbox 1행 INSERT. caller 가 commit (저자 tx 동봉). deleted=True 면 tombstone(payload={})."""
-    body: dict[str, Any] = payload if payload is not None else {}
-    session.add(
-        PublishOutbox(
-            kind=kind,
-            source_id=source_id,
-            payload=body,
-            payload_hash=payload_hash(body),
-            schema_version=SCHEMA_VERSION,
-            deleted=deleted,
-        )
-    )
-
-
-async def enqueue_question_publish(session: AsyncSession, q: Any) -> None:
-    """문항 + (ready면)해설을 outbox 적재. caller commit."""
-    await enqueue_publish(session, kind=KIND_QUESTION, source_id=q.id, payload=project_question(q))
-    expl = project_explanation(q)
-    if expl is not None:
-        await enqueue_publish(session, kind=KIND_EXPLANATION, source_id=q.id, payload=expl)
-
-
-async def backfill_publish_questions(session: AsyncSession, *, after_id: int = 0, limit: int = 200) -> int:
-    """active(미삭제) 문항을 id>after_id 부터 bounded 로 outbox 적재.
-
-    반환 = enqueue 한 문항 수(0 이면 끝). 큰 셋은 마지막 id 로 페이지 반복. caller commit.
-    """
-    rows = (
-        await session.execute(
-            select(StudyQuestion)
-            .where(StudyQuestion.deleted_at.is_(None), StudyQuestion.id > after_id)
-            .order_by(StudyQuestion.id.asc())
-            .limit(limit)
-        )
-    ).scalars().all()
-    for q in rows:
-        await enqueue_question_publish(session, q)
-    return len(rows)
-
-
-async def enqueue_topic_publish(session: AsyncSession, topic: Any) -> None:
-    """주제 메타를 outbox 적재(S-1). caller commit. 저작 create/update 결선 + 백필 공용."""
-    await enqueue_publish(session, kind=KIND_TOPIC, source_id=topic.id, payload=project_topic(topic))
-
-
-async def backfill_publish_topics(session: AsyncSession, *, after_id: int = 0, limit: int = 200) -> int:
-    """active(미삭제) 주제를 id>after_id 부터 bounded 로 outbox 적재(S-1 초기 백필).
-
-    반환 = enqueue 한 주제 수(0 이면 끝). 큰 셋은 마지막 id 로 페이지 반복. caller commit.
-    멱등 = 발행 워커의 (payload_hash, deleted) 디둡이 no-op 재투영 흡수(중복 enqueue 무해).
-    """
-    rows = (
-        await session.execute(
-            select(StudyTopic)
-            .where(StudyTopic.deleted_at.is_(None), StudyTopic.id > after_id)
-            .order_by(StudyTopic.id.asc())
-            .limit(limit)
-        )
-    ).scalars().all()
-    for t in rows:
-        await enqueue_topic_publish(session, t)
-    return len(rows)
-
-
-async def enqueue_card_publish(session: AsyncSession, card: Any) -> None:
-    """카드 상태 기반 발행/tombstone (S-2). caller commit.
-
-    검수완료(needs_review=False) & 미삭제 만 발행 — 그 외(검수대기 복귀·삭제·retire)는
-    tombstone(feed 1급 삭제 이벤트). 발행 자격이 카드 상태에 매여 있어 호출측은 '카드를
-    건드렸다'만 알면 되고 publish/tombstone 분기는 여기 단일화(경로별 가드 기억 회피).
-    """
-    if card.deleted_at is not None or card.needs_review:
-        await enqueue_publish(session, kind=KIND_CARD, source_id=card.id, payload=None, deleted=True)
-    else:
-        await enqueue_publish(session, kind=KIND_CARD, source_id=card.id, payload=project_card(card))
-
-
-async def backfill_publish_cards(session: AsyncSession, *, after_id: int = 0, limit: int = 200) -> int:
-    """검수완료(needs_review=False)·미삭제 카드를 id>after_id 부터 bounded 로 outbox 적재(S-2 초기 백필).
-
-    반환 = enqueue 한 카드 수(0 이면 끝). 멱등 = 워커 (payload_hash, deleted) 디둡. caller commit.
-    """
-    rows = (
-        await session.execute(
-            select(StudyMemoCard)
-            .where(
-                StudyMemoCard.deleted_at.is_(None),
-                StudyMemoCard.needs_review.is_(False),
-                StudyMemoCard.id > after_id,
-            )
-            .order_by(StudyMemoCard.id.asc())
-            .limit(limit)
-        )
-    ).scalars().all()
-    for c in rows:
-        await enqueue_card_publish(session, c)
-    return len(rows)
-
-
-async def enqueue_card_progress_publish(session: AsyncSession, progress: Any) -> None:
-    """카드 SR progress row 발행(S-4). caller commit. rate_card 결과(ALL row, sentinel/terminal 포함)."""
-    await enqueue_publish(
-        session,
-        kind=KIND_CARD_PROGRESS,
-        source_id=progress.id,
-        payload=project_card_progress(progress),
-    )
-
-
-async def backfill_publish_card_progress(session: AsyncSession, *, after_id: int = 0, limit: int = 200) -> int:
-    """모든 card progress row 를 id>after_id 부터 bounded 로 outbox 적재(S-4 초기 백필).
-
-    ★필터 없음 = ALL row(due_at NULL sentinel·terminal 포함) — due-only 백필은 sentinel 누락.
-    반환 = enqueue 한 row 수(0 이면 끝). 멱등 = 워커 디둡. caller commit.
-    """
-    rows = (
-        await session.execute(
-            select(StudyMemoCardProgress)
-            .where(StudyMemoCardProgress.id > after_id)
-            .order_by(StudyMemoCardProgress.id.asc())
-            .limit(limit)
-        )
-    ).scalars().all()
-    for p in rows:
-        await enqueue_card_progress_publish(session, p)
-    return len(rows)
@@ -1,112 +0,0 @@
-"""발행 projection — 소스 행을 render-ready payload + 안정 해시로 변환 (순수 함수).
-
-뷰어가 보는 '단일 진실'은 이 payload 까지 (DS 내부 실험 스키마는 계약 뒤 격리).
-kind 별 projector. payload_hash = 정렬된 JSON 의 sha256 = (payload_hash, deleted) 디둡 키.
-
-★주의(plan study-to-viewer-slice1 r2): 과목/시험메타를 per-question payload 에 인라인 —
-  bulk subject rename 시 N행 churn. 정규화(과목=별 kind subject ref)는 churn 최적화 후속(P0-1b),
-  읽기 정합엔 무영향. 지금은 인라인(상관관계 단순)으로 두고 후속 PR 에서 분리.
-SCHEMA_VERSION = 엔벨로프 버전. payload 모양 진화 시 bump + 뷰어 range 수용(P0-2).
-"""
-
-from __future__ import annotations
-
-import hashlib
-import json
-from typing import Any
-
-SCHEMA_VERSION = 1
-
-KIND_QUESTION = "study_question"
-KIND_EXPLANATION = "study_explanation"
-KIND_TOPIC = "study_topic"
-KIND_CARD = "study_card"  # ★뷰어 pubstudy.ts 의 KIND_CARD 와 일치 필수(S-3 forward-contract).
-KIND_CARD_PROGRESS = "study_card_progress"  # 카드 SR 상태 read model (S-4, viewer C-4 소비).
-
-
-def payload_hash(payload: dict[str, Any]) -> str:
-    """정렬 JSON 의 sha256 — (payload_hash, deleted) 디둡 키. 키 순서/공백 비의존."""
-    canonical = json.dumps(payload, ensure_ascii=False, sort_keys=True, separators=(",", ":"))
-    return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
-
-
-def project_question(q: Any) -> dict[str, Any]:
-    """study_question → 발행 payload. 정답 포함(개인 학습툴, plan Q2). 이미지는 ref 만(P0-4, 후속)."""
-    return {
-        "topic_id": q.study_topic_id,
-        "question_text": q.question_text,
-        "choices": [q.choice_1, q.choice_2, q.choice_3, q.choice_4],
-        "correct_choice": q.correct_choice,
-        "subject": q.subject,
-        "scope": q.scope,
-        "exam_name": q.exam_name,
-        "exam_round": q.exam_round,
-        "exam_question_number": q.exam_question_number,
-        "explanation": q.explanation,  # 수동 해설(있으면). AI 해설은 별 kind.
-    }
-
-
-def project_explanation(q: Any) -> dict[str, Any] | None:
-    """study_question 의 AI 해설 → 별 발행 kind. ready 일 때만(없으면 None=발행 안 함).
-
-    재조우 표시용 선발행. 신규 오답은 4-A 워커가 ~90s 후 ready→재발행(P2-3 결선, P0-1b).
-    """
-    if getattr(q, "ai_explanation_status", None) != "ready" or not getattr(q, "ai_explanation", None):
-        return None
-    gen = getattr(q, "ai_explanation_generated_at", None)
-    return {
-        "question_source_id": q.id,
-        "explanation_md": q.ai_explanation,
-        "model": getattr(q, "ai_explanation_model", None),
-        "generated_at": gen.isoformat() if gen else None,
-    }
-
-
-def project_card(c: Any) -> dict[str, Any]:
-    """study_memo_card → 발행 payload (S-2). 순수 변환 — 발행 자격(needs_review=false &
-    미삭제) 판단은 호출측(enqueue_card_publish)이 카드 상태로. payload 계약 = 뷰어
-    pubstudy.ts getCards 와 동형(format·cue·fact·cloze_text·source_question_id·source_generated_at).
-    """
-    gen = getattr(c, "source_generated_at", None)
-    return {
-        "format": c.format,
-        "cue": c.cue,
-        "fact": c.fact,
-        "cloze_text": c.cloze_text,
-        "source_question_id": c.source_question_id,
-        "source_generated_at": gen.isoformat() if gen else None,
-    }
-
-
-def project_card_progress(p: Any) -> dict[str, Any]:
-    """study_memo_card_progress → 발행 payload (S-4) = 카드 SR 상태 read model.
-
-    ★ALL row 발행(due_at NULL sentinel=암-on-new · terminal=졸업 포함). due-only 발행하면
-    sentinel 누락 → viewer 가 '미확인' 오분류. SR 계산은 DS(sr_schedule), 여긴 결과만.
-    card_id = pub_card 의 source_id(=DS card.id) → viewer C-4 가 pub_card LEFT JOIN 하는 키.
-    """
-    due = getattr(p, "due_at", None)
-    rev = getattr(p, "last_reviewed_at", None)
-    return {
-        "card_id": p.card_id,
-        "topic_id": p.study_topic_id,
-        "last_outcome": p.last_outcome,
-        "last_reviewed_at": rev.isoformat() if rev else None,
-        "due_at": due.isoformat() if due else None,
-        "review_stage": p.review_stage,
-    }
-
-
-def project_topic(t: Any) -> dict[str, Any]:
-    """study_topic → 발행 payload (S-1, plan study-viewer-port).
-
-    topic 메타만 신규 발행 — viewer 가 주제 단위 퀴즈를 만들 최소 정보.
-    회차 목록은 발행 안 함 = viewer 가 pub_content(study_question) 의 exam_name/exam_round 로
-    파생(추가 발행 불요, plan S-1 결정). topic_id 는 project_question 의 topic_id(=study_topic_id)
-    와 동일 DS 식별자라 viewer 가 문항→주제 상관에 사용(pub_id 는 opaque 라 상관 키 아님).
-    """
-    return {
-        "topic_id": t.id,
-        "name": t.name,
-        "exam_round_size": t.exam_round_size,
-    }
@@ -238,13 +238,9 @@ async def gather_subject_note_context(
    client = AIClient()
    query = _build_query(subject, scope)
    try:
-        # 같은 AsyncSession 동시 execute 회피 — 순차 직렬화(백그라운드 prefetch).
-        # explanation_rag.gather_explanation_context 와 동형(R2 공유세션 동시성 수정).
-        docs = await _gather_document_evidence(
-            session, user_id, study_topic_id, query, client
-        )
-        questions = await _gather_question_evidence(
-            session, user_id, study_topic_id, subject, scope, query, client
+        docs, questions = await asyncio.gather(
+            _gather_document_evidence(session, user_id, study_topic_id, query, client),
+            _gather_question_evidence(session, user_id, study_topic_id, subject, scope, query, client),
        )
        return SubjectNoteContext(documents=docs, questions=questions)
    finally:
@@ -1,256 +0,0 @@
-"""C-4 ① API 표준 공지(Important Standards Announcements) 수집 워커 (사이클 3).
-
-RSS 없음. 실측(2026-06-11) 결과 '페이지 diff' 가 아니라 공지별 상세 URL 이 있는
-목록 페이지(10건/페이지, ?page=N&pageSize=10 페이지네이션 ~12+) — 목록 링크 파싱
-→ 신규 상세 페이지만 ingest 가 정확하고 dedup 도 자연스럽다 (rss+page 패턴의 HTML 판).
-510/570/653 개정 공지가 업무 직결 — 표준 본문은 유료라 공지만 수집 (카드 C-4).
-
-스케줄 = monthly (main.py 5일 07:05 KST) — 최근 2페이지 diff (월 1~2건 공지 페이스).
-초기 일괄: docker exec hyungi_document_server-fastapi-1 \
-    python -m workers.api_standards_collector --bulk   # 전 페이지 (~120건, politeness ~30분)
-
-멱등: edit_url(정규화)+file_hash dedup — 재실행 = 신규분만.
-"""
-
-import argparse
-import asyncio
-import hashlib
-import re
-from datetime import datetime, timezone
-
-from sqlalchemy import select
-
-from core.crawl_politeness import (
-    CrawlBlocked,
-    CrawlFetchError,
-    CrawlSkip,
-    fetch_page,
-)
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from workers.fulltext_worker import (
-    _WEB_MIN_BODY_LEN,
-    _extract_body,
-    _raw_html_path,
-    _save_raw_html,
-    _strip_article_footer,
-)
-from workers.news_collector import (
-    _get_or_create_health,
-    _normalize_url,
-    _record_failure,
-    _record_success,
-)
-from workers.static_corpus_ingest import _page_title
-
-logger = setup_logger("api_standards")
-
-_BASE = "https://www.api.org"
-_LISTING_PATH = "/products-and-services/standards/important-standards-announcements"
-_LISTING_URL = f"{_BASE}{_LISTING_PATH}"
-_SOURCE_NAME = "API 표준 공지"
-
-_SCHEDULED_PAGES = 2   # monthly diff 범위 (20건 — 월 1~2건 페이스에 충분한 겹침)
-_BULK_MAX_PAGES = 15   # 실측 12페이지 + 여유. 빈 페이지에서 조기 종료.
-
-_DETAIL_RE = re.compile(
-    r'href="(' + re.escape(_LISTING_PATH) + r'/[^"?#]+)"'
-)
-_DATE_RE = re.compile(
-    r"(January|February|March|April|May|June|July|August|September|October"
-    r"|November|December)\s+(\d{1,2}),?\s+(\d{4})"
-)
-_MONTHS = {m: i for i, m in enumerate(
-    ["January", "February", "March", "April", "May", "June", "July",
-     "August", "September", "October", "November", "December"], start=1)}
-
-
-def _parse_listing(html_text: str) -> list[str]:
-    """상세 공지 절대 URL — 순서 보존 dedup (페이지네이션 링크는 ?가 패턴에서 배제)."""
-    seen: set[str] = set()
-    out: list[str] = []
-    for m in _DETAIL_RE.finditer(html_text):
-        url = f"{_BASE}{m.group(1)}"
-        if url not in seen:
-            seen.add(url)
-            out.append(url)
-    return out
-
-
-def _parse_pub_date(text: str) -> datetime | None:
-    """본문 첫 'Month DD, YYYY' — 공지 게시일 관행. 실패 = None (색인은 채널 게이트로 무조건)."""
-    m = _DATE_RE.search(text)
-    if not m:
-        return None
-    try:
-        return datetime(int(m.group(3)), _MONTHS[m.group(1)], int(m.group(2)),
-                        tzinfo=timezone.utc)
-    except ValueError:
-        return None
-
-
-async def _get_or_create_source(session) -> NewsSource:
-    result = await session.execute(
-        select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
-    )
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=_SOURCE_NAME, feed_url=_LISTING_URL, feed_type="rss",
-            fetch_method="page", fulltext_policy="none",
-            source_channel="crawl", category="Engineering", language="en", country="US",
-            enabled=False,  # 6h 뉴스 사이클 비대상 — 본 워커가 monthly 폴링
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-async def _ingest_detail(session, source: NewsSource, url: str) -> str:
-    """공지 1건. 반환: 'ok' / 'dup' / 'skip'."""
-    normalized_url = _normalize_url(url)
-    ann_hash = hashlib.sha256(f"api-ann|{normalized_url}".encode()).hexdigest()[:32]
-    existing = await session.execute(
-        select(Document).where(
-            (Document.file_hash == ann_hash)
-            | (Document.edit_url.in_([normalized_url, url]))
-        ).limit(1)
-    )
-    if existing.scalars().first():
-        return "dup"
-
-    try:
-        html_text, final_url = await fetch_page(url)
-    except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
-        logger.warning(f"[api-std] fetch 실패 skip: {url} — {type(e).__name__}: {e}")
-        return "skip"
-
-    body, engine, engine_ver = _extract_body(html_text)
-    if not engine:
-        logger.warning(f"[api-std] 추출 실패 skip (< {_WEB_MIN_BODY_LEN}자): {url}")
-        return "skip"
-    clean_body = _strip_article_footer(body.replace("\x00", ""))
-    if len(clean_body) < _WEB_MIN_BODY_LEN:
-        return "skip"
-
-    now = datetime.now(timezone.utc)
-    raw_path = _raw_html_path(source.id, ann_hash, now)
-    raw_saved = True
-    try:
-        _save_raw_html(raw_path, html_text)
-    except OSError as e:
-        raw_saved = False
-        logger.error(f"[api-std] 원본 보존 실패 (ingest 는 진행): {e}")
-
-    pub_dt = _parse_pub_date(clean_body)
-    title = _page_title(html_text, fallback=url.rsplit("/", 1)[-1][:90])
-    title = re.sub(r"\s*\|\s*API\s*$", "", title).strip() or title
-
-    doc = Document(
-        file_path=f"crawl/{_SOURCE_NAME}/{ann_hash}",
-        file_hash=ann_hash,
-        file_format="article",
-        file_size=0,
-        file_type="note",
-        title=title,
-        extracted_text=f"{title}\n\n{clean_body}",
-        extracted_at=now,
-        extractor_version=f"listing+page@{engine}",
-        md_content=clean_body,
-        md_status="success",
-        md_extraction_engine=engine,
-        md_extraction_engine_version=engine_ver,
-        md_format_version="1.0",
-        md_generated_at=now,
-        md_source_hash=hashlib.sha256(html_text.encode("utf-8", errors="replace")).hexdigest(),
-        md_content_hash=hashlib.sha256(clean_body.encode("utf-8")).hexdigest(),
-        content_origin="extracted",
-        source_channel="crawl",
-        data_origin="external",
-        edit_url=normalized_url,
-        review_status="approved",
-        ai_domain="Engineering",
-        ai_sub_group=_SOURCE_NAME,
-        ai_tags=["Engineering/API 표준 공지"],
-        # 안전 자료실 A-2 — 표준 '공지' = standard (코드 본문 아님 — ASME/API 본문은 paywall)
-        material_type="standard",
-        jurisdiction="US",
-        published_date=pub_dt.date() if pub_dt else None,
-        extract_meta={
-            "source_id": source.id,
-            "source_name": _SOURCE_NAME,
-            "published_at": pub_dt.isoformat() if pub_dt else None,
-            "license": {"scheme": "proprietary", "redistribute": False,
-                        "attribution": "American Petroleum Institute"},
-            "fulltext": {
-                "status": "api_announcement",
-                "engine": engine,
-                "final_url": final_url,
-                "raw_html_path": str(raw_path) if raw_saved else None,
-                "body_chars": len(clean_body),
-                "resolved_at": now.isoformat(),
-            },
-        },
-    )
-    doc.file_size = len(doc.extracted_text.encode())
-    session.add(doc)
-    await session.flush()
-    await enqueue_stage(session, doc.id, "summarize")
-    await enqueue_stage(session, doc.id, "embed")
-    await enqueue_stage(session, doc.id, "chunk")
-    logger.info(f"[api-std] ingest {len(clean_body)}자 ({engine}): {title[:60]}")
-    return "ok"
-
-
-async def run(bulk: bool = False) -> None:
-    """monthly 진입점 (스케줄러) — bulk 는 CLI 전용 (전 페이지 일괄)."""
-    now = datetime.now(timezone.utc)
-    async with async_session() as session:
-        source = await _get_or_create_source(session)
-        await session.commit()
-        source_id = source.id
-
-    max_pages = _BULK_MAX_PAGES if bulk else _SCHEDULED_PAGES
-    counts = {"ok": 0, "dup": 0, "skip": 0}
-    try:
-        for page in range(1, max_pages + 1):
-            listing_url = (
-                _LISTING_URL if page == 1
-                else f"{_LISTING_URL}?page={page}&pageSize=10"
-            )
-            html_text, _ = await fetch_page(listing_url)
-            detail_urls = _parse_listing(html_text)
-            if not detail_urls:
-                break  # 빈 페이지 = 끝 (bulk 조기 종료)
-            for url in detail_urls:
-                async with async_session() as session:
-                    src = await session.get(NewsSource, source_id)
-                    status = await _ingest_detail(session, src, url)
-                    await session.commit()
-                counts[status] += 1
-            logger.info(f"[api-std] 목록 p{page}: 누적 {counts}")
-    except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
-        logger.error(f"[api-std] 목록 수집 실패: {e}")
-        async with async_session() as session:
-            health = await _get_or_create_health(session, source_id)
-            _record_failure(health, str(e) or repr(e), now)
-            await session.commit()
-        return
-
-    async with async_session() as session:
-        health = await _get_or_create_health(session, source_id)
-        _record_success(health, counts["ok"], False, now)
-        src = await session.get(NewsSource, source_id)
-        src.last_fetched_at = now
-        await session.commit()
-    logger.info(f"[api-std] 완료: {counts}")
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="API 표준 공지 수집")
-    parser.add_argument("--bulk", action="store_true", help="전 페이지 일괄 (초기 백필)")
-    args = parser.parse_args()
-    asyncio.run(run(bulk=args.bulk))
@@ -1,378 +0,0 @@
-"""arXiv 키워드 필터 수집기 — B-3 PR2 (plan safety-library-b3-1).
-
-bespoke arXiv API(Atom) 수집기. 카테고리 RSS 통째(firehose)가 아니라
-cat:{category} AND (abs:키워드 ...) 로 안전/신뢰성/압력용기 관련분만 좁혀 수집한다.
-
- signal-only: 초록만 색인(embed+chunk), summarize 절대 미enqueue — 맥미니 Qwen 큐 무접촉.
- DOI 보유 → paper.doi(서지 holder, partial-unique 인덱스 진입). 없으면 versionless arXiv id 로
-  dedup(향후 PR4 reconcile 가 DOI 백필).
- etiquette: 요청 간 ≥3s + HTTP 429 지수 백오프. 카테고리별 submittedDate 워터마크로 증분.
- per-run insert cap(_RUN_CAP) — 광역 수집이 GPU bge-m3 embed 큐를 범람시키지 않게(적대리뷰 A major).
-  잔여는 silent-cap 금지(csb idiom): 누락 건수 로깅.
- keyless. enabled=False news_sources 행(6h 뉴스 사이클 비대상) + main.py CronTrigger(자체 폴링).
- arXiv API 는 https 필수(http=301). UA = CRAWL_UA.
-"""
-
-import asyncio
-import hashlib
-import re
-import xml.etree.ElementTree as ET
-from dataclasses import dataclass, field
-from datetime import datetime, timezone
-
-import httpx
-from sqlalchemy import select
-
-from core.crawl_politeness import CRAWL_UA
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from services.papers.doi import arxiv_doi, normalize_doi
-from services.papers.holder import find_paper_holder
-from workers.news_collector import (
-    FeedError,
-    _get_or_create_health,
-    _record_failure,
-    _record_success,
-)
-
-logger = setup_logger("arxiv_collector")
-
-_ARXIV_API = "https://export.arxiv.org/api/query"
-_SOURCE_NAME = "arXiv 안전·공학 (keyword)"
-
-# 신규 카테고리만 — 기존 RSS 행(id 62 physics.app-ph, id 64 cond-mat.mtrl-sci)과 비중복.
-_CATEGORIES = (
-    "eess.SY",        # systems & control
-    "physics.flu-dyn",  # 유체 — 압력/유동
-    "physics.comp-ph",  # 전산물리
-    "math.OC",        # 최적화·제어
-    "math.NA",        # 수치해석 (FEM 등)
-    "stat.AP",        # 응용통계 — 신뢰성
-    "cs.CE",          # computational engineering
-)
-# 압력용기·공정안전·구조건전성 도메인 키워드(abs: OR 게이트). 좁게 유지 = 관련성↑·볼륨↓ (튜너블).
-_KEYWORDS = (
-    "pressure vessel",
-    "process safety",
-    "structural integrity",
-    "fracture mechanics",
-    "fatigue life",
-    "corrosion",
-)
-
-_RUN_CAP = 80           # 1회 run 신규 적재 상한(임베드 큐 보호). bulk 시 해제.
-_PAGE_SIZE = 50         # max_results per request
-_MAX_PAGES_PER_CAT = 4  # 카테고리당 최대 페이지(증분이라 보통 1페이지에 워터마크 도달)
-_REQ_SLEEP = 3.0        # arXiv etiquette ≥3s
-_MAX_RETRY = 4
-_BACKOFF_BASE = 5.0
-
-_NS = {
-    "a": "http://www.w3.org/2005/Atom",
-    "arxiv": "http://arxiv.org/schemas/atom",
-    "opensearch": "http://a9.com/-/spec/opensearch/1.1/",
-}
-_ABS_ID_RE = re.compile(r"arxiv\.org/abs/(.+?)(v\d+)?$")
-_WS_RE = re.compile(r"\s+")
-
-
-# ───────────────────────── 순수 파서 (fixture 단위 테스트 대상) ─────────────────────────
-
-@dataclass
-class ArxivEntry:
-    arxiv_id: str            # versionless, 예: "1209.2405"
-    version: str | None      # "v1" 또는 None
-    title: str
-    summary: str             # 초록
-    published: datetime | None
-    doi: str | None          # normalize_doi 적용
-    journal_ref: str | None
-    primary_category: str | None
-    categories: list = field(default_factory=list)
-    abs_url: str | None = None
-    pdf_url: str | None = None
-
-
-def _clean(text: str | None) -> str:
-    return _WS_RE.sub(" ", text).strip() if text else ""
-
-
-def _parse_id(raw_id: str | None) -> tuple[str | None, str | None]:
-    """'http://arxiv.org/abs/1209.2405v1' → ('1209.2405', 'v1'). versionless id 가 dedup 키."""
-    m = _ABS_ID_RE.search((raw_id or "").strip())
-    if not m:
-        return None, None
-    return m.group(1), m.group(2)
-
-
-def _parse_dt(s: str | None) -> datetime | None:
-    if not s:
-        return None
-    try:
-        return datetime.fromisoformat(s.replace("Z", "+00:00"))
-    except ValueError:
-        return None
-
-
-def build_search_query(category: str, keywords=_KEYWORDS) -> str:
-    """cat:{category} AND (abs:kw1 OR abs:"kw with space" ...). 공백 키워드는 따옴표 구절."""
-    kw = " OR ".join(f'abs:"{k}"' if " " in k else f"abs:{k}" for k in keywords)
-    return f"cat:{category} AND ({kw})"
-
-
-def parse_arxiv_feed(xml_text: str) -> tuple[int, list[ArxivEntry]]:
-    """arXiv Atom 응답 → (total_results, [ArxivEntry]). 순수 함수."""
-    root = ET.fromstring(xml_text)
-    raw_total = root.findtext("opensearch:totalResults", default="0", namespaces=_NS)
-    try:
-        total = int(raw_total)
-    except (TypeError, ValueError):
-        total = 0
-    entries: list[ArxivEntry] = []
-    for e in root.findall("a:entry", _NS):
-        aid, ver = _parse_id(e.findtext("a:id", namespaces=_NS))
-        if not aid:
-            continue
-        prim = e.find("arxiv:primary_category", _NS)
-        abs_url = pdf_url = None
-        for ln in e.findall("a:link", _NS):
-            if ln.get("rel") == "alternate" and (ln.get("type") or "").startswith("text/html"):
-                abs_url = ln.get("href")
-            elif ln.get("title") == "pdf":
-                pdf_url = ln.get("href")
-        entries.append(ArxivEntry(
-            arxiv_id=aid,
-            version=ver,
-            title=_clean(e.findtext("a:title", namespaces=_NS)),
-            summary=_clean(e.findtext("a:summary", namespaces=_NS)),
-            published=_parse_dt(e.findtext("a:published", namespaces=_NS)),
-            doi=normalize_doi(e.findtext("arxiv:doi", namespaces=_NS)),
-            journal_ref=_clean(e.findtext("arxiv:journal_ref", namespaces=_NS)) or None,
-            primary_category=prim.get("term") if prim is not None else None,
-            categories=[c.get("term") for c in e.findall("a:category", _NS)],
-            abs_url=abs_url,
-            pdf_url=pdf_url,
-        ))
-    return total, entries
-
-
-# ───────────────────────── 적재 (DB — PR2 라이브 검증) ─────────────────────────
-
-def _build_paper_meta(source: NewsSource, entry: ArxivEntry, doi: str | None) -> dict:
-    """extract_meta — license + source + paper 식별. 서지 holder 는 paper.doi(있으면) 보유."""
-    paper: dict = {"arxiv_id": entry.arxiv_id}
-    if doi:
-        paper["doi"] = doi          # partial-unique 인덱스 진입 (교차소스 dedup)
-    if entry.journal_ref:
-        paper["journal_ref"] = entry.journal_ref
-    if entry.primary_category:
-        paper["primary_category"] = entry.primary_category
-    meta: dict = {
-        "source_id": source.id,
-        "source_name": source.name,
-        "source_region": "INT",   # arXiv = 국제 preprint. paper.jurisdiction 은 NULL 유지(A-2).
-        "paper": paper,
-        # arXiv 기본 라이선스 = 비배포(보수적). restricted 부재 → 초록은 RAG 사용 가능.
-        # (명시 CC 검출은 OAI 인터페이스 필요 — Atom API 미포함, PR 후속/관찰.)
-        "license": {"scheme": "arxiv", "redistribute": False, "attribution": "arXiv"},
-    }
-    if entry.published:
-        meta["published_at"] = entry.published.isoformat()
-    return meta
-
-
-async def _ingest_entry(session, source: NewsSource, entry: ArxivEntry) -> bool:
-    """1건 적재. 반환 = 신규 여부. signal-only(embed+chunk, summarize 없음)."""
-    arxiv_hash = hashlib.sha256(f"arxiv|{entry.arxiv_id}".encode()).hexdigest()[:32]
-    # 재수집 dedup(arXiv id) — .first()(다중행 방어)
-    dup = await session.execute(
-        select(Document.id).where(Document.file_hash == arxiv_hash).limit(1)
-    )
-    if dup.scalars().first():
-        return False
-    # arXiv canonical DOI = 저널 DOI 또는 arXiv DataCite DOI(프리프린트도 paper.doi 보유 → PR3 와 dedup)
-    doi = entry.doi or arxiv_doi(entry.arxiv_id)
-    # 교차소스 dedup(DOI holder 이미 존재 — partial-unique 인덱스 백스톱 선제 회피)
-    if doi and await find_paper_holder(session, doi):
-        return False
-
-    body = entry.summary or entry.title
-    doc = Document(
-        file_path=f"crawl/arxiv/{entry.arxiv_id}",
-        file_hash=arxiv_hash,
-        file_format="article",
-        file_size=len(body.encode()),
-        file_type="note",
-        title=entry.title,
-        extracted_text=f"{entry.title}\n\n{body}",
-        extracted_at=datetime.now(timezone.utc),
-        extractor_version="arxiv-api-signal",
-        md_status="skipped",
-        md_extraction_error="arXiv abstract: signal-only, markdown 비대상",
-        source_channel="crawl",
-        data_origin="external",
-        edit_url=entry.abs_url,
-        review_status="approved",
-        material_type="paper",
-        jurisdiction=None,   # paper = NULL 불변(A-2). 지역은 extract_meta.paper.source_region.
-        published_date=entry.published.date() if entry.published else None,
-        extract_meta=_build_paper_meta(source, entry, doi),
-    )
-    session.add(doc)
-    await session.flush()
-    # signal-only: 검색 색인만. summarize/fulltext 절대 enqueue 안 함(맥미니 큐 무접촉).
-    await enqueue_stage(session, doc.id, "embed")
-    await enqueue_stage(session, doc.id, "chunk")
-    return True
-
-
-async def _get_or_create_source(session) -> NewsSource:
-    result = await session.execute(
-        select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
-    )
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=_SOURCE_NAME, feed_url=_ARXIV_API, feed_type="atom",
-            fetch_method="signal-only", fulltext_policy="none",
-            source_channel="crawl", category="Engineering", language="en",
-            country=None,            # paper → jurisdiction NULL (country 미전파)
-            material_type="paper",
-            license_scheme="arxiv", license_redistribute=False,
-            enabled=False,           # 6h 뉴스 사이클 비대상 — 본 워커가 자체 폴링
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-def _watermark(source: NewsSource, category: str) -> datetime | None:
-    raw = (source.selector_override or {}).get("arxiv_watermark", {}).get(category)
-    if not raw:
-        return None
-    return _parse_dt(raw)
-
-
-def _set_watermark(source: NewsSource, category: str, value: datetime) -> None:
-    cfg = dict(source.selector_override or {})
-    wm = dict(cfg.get("arxiv_watermark") or {})
-    wm[category] = value.isoformat()
-    cfg["arxiv_watermark"] = wm
-    source.selector_override = cfg   # JSONB 변경 감지 위해 재할당
-
-
-async def _fetch(client: httpx.AsyncClient, query: str, start: int) -> str:
-    params = {
-        "search_query": query, "start": start, "max_results": _PAGE_SIZE,
-        "sortBy": "submittedDate", "sortOrder": "descending",
-    }
-    for attempt in range(_MAX_RETRY):
-        resp = await client.get(_ARXIV_API, params=params)
-        if resp.status_code == 429:
-            await asyncio.sleep(_BACKOFF_BASE * (2 ** attempt))
-            continue
-        resp.raise_for_status()
-        return resp.text
-    raise FeedError(f"arXiv 429 재시도 초과: {query[:48]}")
-
-
-async def run(bulk: bool = False, limit: int = 0) -> None:
-    """daily 진입점(스케줄러). bulk/limit 은 CLI 전용(bulk=cap 해제·깊은 페이징)."""
-    now = datetime.now(timezone.utc)
-    async with async_session() as session:
-        source = await _get_or_create_source(session)
-        await session.commit()
-        source_id = source.id
-
-    run_cap = (limit or 10**9) if bulk else (min(limit, _RUN_CAP) if limit else _RUN_CAP)
-    inserted = 0
-    seen = 0
-    failures: list[str] = []
-
-    async with httpx.AsyncClient(
-        timeout=30.0, headers={"User-Agent": CRAWL_UA}, follow_redirects=True
-    ) as client:
-        for category in _CATEGORIES:
-            if inserted >= run_cap:
-                break
-            query = build_search_query(category)
-            async with async_session() as session:
-                src = await session.get(NewsSource, source_id)
-                watermark = _watermark(src, category)
-            newest_seen: datetime | None = None
-            capped = False  # 이번 run 이 cap 으로 카테고리 중도 절단됐는지 (R4)
-            max_pages = (10**6 if bulk else _MAX_PAGES_PER_CAT)
-            try:
-                for page in range(max_pages):
-                    if inserted >= run_cap:
-                        capped = True
-                        break
-                    xml_text = await _fetch(client, query, page * _PAGE_SIZE)
-                    total, entries = parse_arxiv_feed(xml_text)
-                    if not entries:
-                        break
-                    stop = False
-                    for entry in entries:
-                        seen += 1
-                        if entry.published:
-                            newest_seen = max(newest_seen or entry.published, entry.published)
-                            # 증분: 워터마크 이하 도달 시 이 카테고리 종료(이미 본 구간)
-                            if watermark and not bulk and entry.published <= watermark:
-                                stop = True
-                                break
-                        async with async_session() as session:
-                            src = await session.get(NewsSource, source_id)
-                            if await _ingest_entry(session, src, entry):
-                                inserted += 1
-                                await session.commit()
-                            else:
-                                await session.rollback()
-                        if inserted >= run_cap:
-                            capped = True
-                            break
-                    await asyncio.sleep(_REQ_SLEEP)
-                    if stop or (page + 1) * _PAGE_SIZE >= total:
-                        break
-                # 카테고리 워터마크 전진 — cap 으로 절단된 run 은 미전진 (R4).
-                # 절단 시 newest_seen 으로 전진하면 [oldest-ingested, 옛 watermark] 사이
-                # 미적재 항목이 다음 run 의 watermark 필터(entry.published <= watermark)에
-                # 영구 배제(silent data loss). 미전진하면 다음 run 이 최신부터 재스캔하며
-                # 적재분은 dedup-skip(_ingest_entry False, cap 미소모)하고 gap 까지 내려가
-                # 이어 적재 → 백로그가 run 당 cap 씩 소화(livelock 회피). bulk 은 cap 무관.
-                if newest_seen and not capped:
-                    async with async_session() as session:
-                        src = await session.get(NewsSource, source_id)
-                        _set_watermark(src, category, newest_seen)
-                        await session.commit()
-            except (httpx.HTTPError, FeedError, ET.ParseError) as e:
-                msg = f"[{category}] {e or repr(e)}"
-                logger.error(f"[arxiv] {msg}")
-                failures.append(msg)
-
-    async with async_session() as session:
-        health = await _get_or_create_health(session, source_id)
-        if failures and inserted == 0:
-            _record_failure(health, "; ".join(failures)[:500], now)
-        else:
-            _record_success(health, inserted, False, now)
-        await session.commit()
-
-    deferred = "" if inserted < run_cap else f" (cap {run_cap} 도달 — 잔여는 다음 run 이월)"
-    logger.info(
-        f"[arxiv] {len(_CATEGORIES)}개 카테고리 스캔 {seen}건 → 신규 {inserted}건{deferred}"
-        + (f" / 실패 {len(failures)}건" if failures else "")
-    )
-
-
-if __name__ == "__main__":
-    # CLI = 수동/백필 전용. --bulk = cap 해제·깊은 페이징, --limit N = 상한 N(라이브 검증용).
-    import argparse
-
-    parser = argparse.ArgumentParser(description="arXiv 안전·공학 키워드 수집기")
-    parser.add_argument("--bulk", action="store_true", help="cap 해제 + 깊은 페이징 백필")
-    parser.add_argument("--limit", type=int, default=0, help="신규 적재 상한(0=기본 cap)")
-    args = parser.parse_args()
-    asyncio.run(run(bulk=args.bulk, limit=args.limit))
@@ -1,72 +0,0 @@
-"""검토 대기(review_status='pending') 자동 검토 — 고신뢰 자동승인 + 저신뢰 수동 잔류.
-
-classify 가 이미 부여한 ai_confidence 를 게이트로 사용 — **재-LLM 호출 없음**(대량 2천건에
-맥미니/GPU 부하 0, 분류 confidence 가 곧 AI 의 자기-신뢰도). ai_domain 보유 +
-ai_confidence >= THRESHOLD 인 pending 문서를 review_status='approved' 로 자동승인하고
-audit(source_metadata.auto_reviewed)를 남긴다. 저신뢰/미분류는 그대로 두어 수동 검토
-큐(/inbox)에 잔류.
-
-설계 근거(게이트 실측):
-  - review_status 는 inbox 카운트(dashboard) + 수집기 ingest 에서만 사용, 검색/RAG/digest/
-    ask 경로 필터에 **미사용** → 자동승인은 노출(검색결과) 변동 없이 검토 큐만 비운다.
-  - pending 2,161 중 ai_suggestion 보유 0 → 이 큐는 '분류 변경 제안'(accept_suggestion)이
-    아니라 '미검토 자동분류'. 승인 = review_status 플립.
-배치·interval 점진 드레인(관찰·중단 가능). 되돌리기 = source_metadata.auto_reviewed 마커로
-대상 식별 후 review_status='pending' 복원.
-"""
-
-import logging
-from datetime import datetime, timezone
-
-from sqlalchemy import select
-
-from core.database import async_session
-from models.document import Document
-
-logger = logging.getLogger(__name__)
-
-# 고신뢰 자동승인 바 (튜닝 가능). 실측 분포: >=0.9 → 1,981건 자동 / 저신뢰·미분류 ~180건 수동 잔류.
-_CONFIDENCE_THRESHOLD = 0.9
-# 한 틱 처리량 — 순수 DB UPDATE(LLM 없음)라 가볍지만, 2천 행 일괄 락 회피 위해 배치.
-_BATCH = 300
-
-
-async def run() -> None:
-    """pending 고신뢰 문서를 배치 자동승인 (interval job, no-arg)."""
-    async with async_session() as session:
-        rows = (
-            await session.execute(
-                select(Document)
-                .where(
-                    Document.review_status == "pending",
-                    Document.deleted_at.is_(None),
-                    Document.ai_domain.isnot(None),
-                    Document.ai_confidence.isnot(None),
-                    Document.ai_confidence >= _CONFIDENCE_THRESHOLD,
-                )
-                .order_by(Document.id)
-                .limit(_BATCH)
-            )
-        ).scalars().all()
-        if not rows:
-            return
-
-        now = datetime.now(timezone.utc)
-        for doc in rows:
-            doc.review_status = "approved"
-            doc.source_metadata = {
-                **(doc.source_metadata or {}),
-                "auto_reviewed": {
-                    "by": "confidence_gate",
-                    "confidence": float(doc.ai_confidence),
-                    "threshold": _CONFIDENCE_THRESHOLD,
-                    "at": now.isoformat(),
-                },
-            }
-            doc.updated_at = now
-        await session.commit()
-        logger.info(
-            "auto_review: approved %d pending docs (ai_confidence >= %.2f)",
-            len(rows),
-            _CONFIDENCE_THRESHOLD,
-        )
@@ -8,16 +8,12 @@
 import asyncio
 from datetime import date

-from core.config import settings
-from core.database import engine as db_engine
 from core.utils import setup_logger
-from services.background_jobs import finish_job, start_job
 from services.briefing.pipeline import run_briefing_pipeline

 logger = setup_logger("briefing_worker")

-# 2026-06-15: config 단일소스 (digest 와 공유 키). 구 600s = 빠른 Gemma 기준.
-PIPELINE_HARD_CAP = settings.digest_pipeline_hard_cap_s
+PIPELINE_HARD_CAP = 600


 async def run(target_date: date | None = None) -> dict | None:
@@ -26,27 +22,19 @@ async def run(target_date: date | None = None) -> dict | None:
    Args:
        target_date: KST 기준 briefing_date (None = 오늘). API regenerate 가 명시 지정 가능.
    """
-    if "briefing" in settings.pipeline_held_stages:
-        logger.info("[briefing] 보류 (pipeline.held_stages) — 이번 실행 skip")
-        return None
-    # 보드 가시화: 큐 밖 cron 생성 작업이라 background_jobs 로 노출 (best-effort, 맥미니 귀속)
-    job_id = await start_job(db_engine, "morning_briefing", label="조간 브리핑 생성")
    try:
        result = await asyncio.wait_for(
-            run_briefing_pipeline(target_date, job_id=job_id),
+            run_briefing_pipeline(target_date),
            timeout=PIPELINE_HARD_CAP,
        )
-        await finish_job(db_engine, job_id, state="done")
        logger.info(f"[briefing] 워커 완료: {result}")
        return result
    except asyncio.TimeoutError:
-        await finish_job(db_engine, job_id, state="failed", error=f"HARD CAP {PIPELINE_HARD_CAP}s 초과")
        logger.error(
            f"[briefing] HARD CAP {PIPELINE_HARD_CAP}s 초과 — 워커 강제 중단. "
            f"기존 briefing 은 commit 시점에만 갱신되므로 그대로 유지됨."
        )
    except Exception as e:
-        await finish_job(db_engine, job_id, state="failed", error=str(e)[:300])
        logger.exception(f"[briefing] 워커 실패: {e}")
    return None

@@ -1,185 +0,0 @@
-"""C-2 잔여 ② CCPS Process Safety Beacon 수집 워커 (사이클 3).
-
-월간 1페이지 PDF + 한국어 번역판 — RAG 청크로 이상적 크기 (카드 C-2).
-aiche.org 는 평문 httpx 를 UA 무관 403 (2026-06-11 실측: Archiver UA·브라우저 UA 모두)
-→ playwright-fetcher 익명 컨텍스트 경유 (B-3 인프라 재사용):
-  목록 페이지 브라우저 fetch → beacon PDF 링크 파싱 → referer 쿠키 승계 다운로드.
-
-알려진 리스크: WAF 가 헤드리스 자체를 차단하면 _CHALLENGE_MARKERS → CrawlBlocked
-→ health 실패 기록 후 종료 (르몽드 B-3 PARK 선례 — 그 경우 대안 = 이메일 구독
-.eml 트랙 결합, [[feedback_antibot_headless_subscription_wall]]).
-
-스케줄 = monthly (main.py 5일 07:20 KST). 월간 1건 페이스라 diff 는 file_path dedup 으로 충분.
-수동: docker exec hyungi_document_server-fastapi-1 python -m workers.ccps_collector
-"""
-
-import asyncio
-import hashlib
-import re
-from datetime import datetime, timezone
-from pathlib import Path
-from urllib.parse import urljoin, urlparse
-
-from sqlalchemy import select
-
-from core.config import settings
-from core.crawl_politeness import (
-    CrawlBlocked,
-    CrawlFetchError,
-    CrawlSkip,
-    download_via_browser,
-    fetch_page_via_browser,
-)
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from workers.kosha_collector import _safe_filename
-from workers.news_collector import (
-    _get_or_create_health,
-    _record_failure,
-    _record_success,
-)
-
-logger = setup_logger("ccps_collector")
-
-_BEACON_URL = "https://www.aiche.org/ccps/resources/process-safety-beacon"
-_SOURCE_NAME = "CCPS Process Safety Beacon"
-_MAX_PDFS_PER_RUN = 10  # 월간 1~2건(영/한) 페이스 — 페이지 구조 오판 시 폭주 방지
-
-
-def _beacon_pdf_links(html_text: str, base_url: str) -> list[str]:
-    """beacon 관련 PDF 링크 — href/앵커텍스트에 'beacon' 포함만 (보수적).
-
-    필터에 안 걸린 PDF 가 있으면 호출측이 로그로 가시화 (첫 실측에서 패턴 보정용).
-    """
-    seen: set[str] = set()
-    out: list[str] = []
-    for m in re.finditer(
-        r'<a\s+[^>]*href="([^"]+\.pdf(?:\?[^"]*)?)"[^>]*>(.*?)</a>',
-        html_text, re.I | re.S,
-    ):
-        href, text = m.group(1), re.sub(r"<[^>]+>", " ", m.group(2))
-        if "beacon" not in href.lower() and "beacon" not in text.lower():
-            continue
-        absolute = urljoin(base_url, href)
-        path = urlparse(absolute).path
-        if path not in seen:
-            seen.add(path)
-            out.append(absolute)
-    return out
-
-
-def _all_pdf_hrefs(html_text: str) -> list[str]:
-    return sorted({m.group(1) for m in re.finditer(r'href="([^"]+\.pdf(?:\?[^"]*)?)"', html_text, re.I)})
-
-
-async def _get_or_create_source(session) -> NewsSource:
-    result = await session.execute(
-        select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
-    )
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=_SOURCE_NAME, feed_url=_BEACON_URL, feed_type="rss",
-            fetch_method="page", fulltext_policy="none",
-            source_channel="crawl", category="Safety", language="en", country="US",
-            enabled=False,  # 6h 뉴스 사이클 비대상 — 본 워커가 monthly 폴링
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-async def _ingest_pdf(session, pdf_url: str) -> bool:
-    """Beacon PDF 1건 → NAS 저장 + Document + extract enqueue. 반환 = 신규 여부."""
-    fname = _safe_filename(Path(urlparse(pdf_url).path).name)
-    rel_path = f"crawl_raw/ccps_beacon/{fname}"
-    existing = await session.execute(
-        select(Document).where(Document.file_path == rel_path).limit(1)
-    )
-    if existing.scalars().first():
-        return False
-
-    content, content_type = await download_via_browser(pdf_url, referer=_BEACON_URL)
-    if "pdf" not in content_type.lower() and not content.startswith(b"%PDF"):
-        raise CrawlSkip(f"PDF 아님 (content-type={content_type[:60]}): {pdf_url}")
-
-    dest = Path(settings.nas_mount_path) / rel_path
-    dest.parent.mkdir(parents=True, exist_ok=True)
-    dest.write_bytes(content)
-
-    doc = Document(
-        file_path=rel_path,
-        file_hash=hashlib.sha256(content).hexdigest(),
-        file_format="pdf",
-        file_size=len(content),
-        file_type="immutable",
-        title=fname.rsplit(".", 1)[0].replace("_", " ").replace("-", " "),
-        source_channel="crawl",
-        data_origin="external",
-        import_source="ccps_beacon",
-        edit_url=pdf_url,
-        ai_tags=["Safety/CCPS Beacon"],
-        extract_meta={"ccps": {"kind": "beacon_pdf"}},
-    )
-    session.add(doc)
-    await session.flush()
-    await enqueue_stage(session, doc.id, "extract")
-    logger.info(f"[ccps] Beacon ingest: {rel_path} ({len(content)} bytes)")
-    return True
-
-
-async def run() -> None:
-    """monthly 진입점 — 실패는 health 기록 (circuit 가 A-8 패널 가시화)."""
-    now = datetime.now(timezone.utc)
-    async with async_session() as session:
-        source = await _get_or_create_source(session)
-        await session.commit()
-        source_id = source.id
-
-    try:
-        html_text, final_url = await fetch_page_via_browser(_BEACON_URL, profile=None)
-        links = _beacon_pdf_links(html_text, final_url)
-        if not links:
-            others = _all_pdf_hrefs(html_text)
-            # 필터 0건 = 페이지 구조/명명 변경 가능성 — 발견 PDF 를 가시화해 보정 단서 제공
-            raise CrawlFetchError(
-                f"beacon PDF 0건 (전체 PDF {len(others)}건: {others[:5]})"
-            )
-
-        new_count = 0
-        for pdf_url in links[:_MAX_PDFS_PER_RUN]:
-            async with async_session() as session:
-                try:
-                    if await _ingest_pdf(session, pdf_url):
-                        new_count += 1
-                    await session.commit()
-                except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
-                    await session.rollback()
-                    logger.warning(f"[ccps] PDF 실패 skip ({pdf_url}): {e}")
-        if len(links) > _MAX_PDFS_PER_RUN:
-            logger.warning(
-                f"[ccps] PDF {len(links)}건 중 {_MAX_PDFS_PER_RUN}건만 처리 "
-                f"(월간 1~2건 가정 초과 — 페이지 구조 확인 필요)"
-            )
-
-        async with async_session() as session:
-            health = await _get_or_create_health(session, source_id)
-            _record_success(health, new_count, False, now)
-            src = await session.get(NewsSource, source_id)
-            src.last_fetched_at = now
-            await session.commit()
-        logger.info(f"[ccps] 완료: 신규 {new_count}건 (링크 {len(links)}건)")
-    except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
-        # CrawlBlocked = WAF 헤드리스 차단 신호 — 연속되면 circuit open (PARK 판단 근거)
-        logger.error(f"[ccps] 수집 실패: {type(e).__name__}: {e}")
-        async with async_session() as session:
-            health = await _get_or_create_health(session, source_id)
-            _record_failure(health, str(e) or repr(e), now)
-            await session.commit()
-
-
-if __name__ == "__main__":
-    asyncio.run(run())
@@ -272,20 +272,15 @@ async def _lookup_news_source(
    if not source_name:
        return None, None, None

-    # news_sources prefix 매칭 — R10: 전체 로드+Python 루프 대신 DB 필터 푸시다운.
-    # (name == source_name) OR (name 이 "source_name " 로 시작) = 기존 split[0]==source_name 동치
-    # (첫 토큰 일치 = 정확일치 또는 'source_name ' prefix). autoescape 로 %/_ 안전.
-    result = await session.execute(
-        select(NewsSource)
-        .where(
-            (NewsSource.name == source_name)
-            | NewsSource.name.startswith(source_name + " ", autoescape=True)
-        )
-        .limit(1)
-    )
-    src = result.scalars().first()
-    if src is not None:
-        return src.country, src.name, src.language
+    # news_sources에서 이름이 일치하는 레코드 찾기 (prefix match)
+    result = await session.execute(select(NewsSource))
+    sources = result.scalars().all()
+    for src in sources:
+        if source_name and (
+            src.name.split(" ")[0] == source_name
+            or src.name.startswith(source_name + " ")
+        ):
+            return src.country, src.name, src.language

    logger.warning(
        f"[chunk] news_source 매핑 실패: doc_id={doc.id} ai_sub_group={source_name!r} "
@@ -316,10 +311,6 @@ async def process(document_id: int, session: AsyncSession) -> None:
    country, source, src_lang = await _lookup_news_source(session, doc)
    if src_lang:
        language = src_lang
-    # 안전 자료실 A-2 — 뉴스 lookup 미해당(crawl/law/업로드) 문서는 jurisdiction 을
-    # chunk.country 미러로 (leg 간 국가 일치. EU/INT 도 이 경로로 첫 유입 — String(10) 수용).
-    if country is None and doc.jurisdiction:
-        country = doc.jurisdiction
    domain_category = "news" if doc.source_channel == "news" else "document"

    # 기존 chunks 삭제 (재처리)
@@ -31,19 +31,12 @@ from pydantic import BaseModel, Field, ValidationError
 from sqlalchemy import text as sql_text
 from sqlalchemy.ext.asyncio import AsyncSession

-from ai.client import (
-    AIClient,
-    call_deep_or_defer,
-    is_deferrable_error,
-    parse_json_response,
-    strip_thinking,
-)
+from ai.client import AIClient, parse_json_response, strip_thinking
 from ai.envelope import EscalationEnvelope
 from core.config import settings
-from services.search.llm_gate import Priority, acquire_mlx_gate
 from core.utils import setup_logger
 from models.document import Document
-from models.queue import StageDeferred, enqueue_stage
+from models.queue import enqueue_stage
 from policy.prompt_render import render_4b, policy_version as compute_policy_version
 from policy.routing import decide_routing
 from services.document_telemetry import record_analyze_event
@@ -63,15 +56,6 @@ FACET_DOCTYPES = {"발주서", "세금계산서", "명세표", "도면", "증명
 # 자료실 자동 분류 제안 대상 (거래 하위)
 LIBRARY_SUGGESTION_DOCTYPES = {"발주서", "세금계산서", "명세표"}

-# 안전 자료실 A-2 — document_type → material_type 결정적 매핑 (제안 전용, 자동 전이 금지).
-# 모호한 doctype(Reference/Report 등)은 매핑하지 않음 — 무리한 전수 분류 시도 금지 (plan 0-1).
-_DOCTYPE_TO_MATERIAL = {
-    "Law_Document": "law",
-    "Academic_Paper": "paper",
-    "Manual": "manual",
-    "Standard": "standard",
-}
-
 # PR-B prompt_version task 이름
 SUMMARY_TRIAGE_TASK = "p3a_short_summary"

@@ -361,20 +345,13 @@ _FRONTMATTER_PRESERVED_KEYS = {
 # ───────────────────────── main process ────────────────────────────────


-async def process(
-    document_id: int, session: AsyncSession, *, use_deep: bool = False
-) -> None:
+async def process(document_id: int, session: AsyncSession) -> None:
    """문서 분류 + 요약 + tier triage.

    1) Legacy: classify() → ai_domain/document_type/ai_tags/ai_confidence/ai_suggestion
    2) Legacy: summarize() → ai_summary
    3) PR-B B-1: summary_triage (4B) → ai_tldr/ai_bullets/ai_analysis_tier='triage'

-    use_deep (2026-06-12 fair-share, queue_drain 전용): triage LLM 호출을 deep 슬롯
-    (맥북, 라우터 경유)으로 보낸다 — sampling 은 triage 의 temperature/max_tokens 를
-    유지(분류 결정성), endpoint 만 교체. 맥북 불가 = StageDeferred 전파(drain 이
-    보류 처리). False(기본/consumer) = 기존 call_triage(맥미니 직접) 그대로.
-
    예외 — source_channel='law_monitor':
      법령은 외부 source-of-truth (law.go.kr) 보유 + immutable + 자동 재수집.
      AI 분류는 무가치 + 본문 해석 환각 위험. 26B legacy + 4B triage 전부 skip.
@@ -412,15 +389,6 @@ async def process(
        logger.info(f"doc {document_id}: devonagent → classify skip")
        return

-    # 논문(material_type='paper') — 요약/분류 LLM 스킵(맥미니 큐 무접촉, B-3 signal-only 유지).
-    # embed/chunk/markdown 은 queue_consumer 가 chain (early-return 후에도 다음 stage enqueue).
-    if doc.material_type == "paper":
-        if not doc.ai_domain:
-            doc.ai_domain = "논문"
-        await session.commit()
-        logger.info(f"doc {document_id}: paper → classify skip (no summarize)")
-        return
-
    if not doc.extracted_text:
        raise ValueError(f"문서 ID {document_id}: extracted_text가 비어있음")

@@ -478,20 +446,10 @@ async def process(
        logger.info(f"doc {document_id}: frontmatter 부분 인식 → LLM 으로 미설정 필드 보완")

    client = AIClient()
-    # fair-share (2026-06-12): use_deep 시 legacy classify/summarize 도 deep 슬롯(맥북)
-    # 경유 — 그래야 drain 의 "맥북 분담" 이 실제로 성립 (triage 만 보내면 50K 요약
-    # 프리필이 맥미니에 남는다). deep 슬롯 sampling = primary 와 동일(0.3/0.9/8192).
-    legacy_cfg = settings.ai.deep if (use_deep and settings.ai.deep is not None) else None
    try:
-        # ─── 1. Legacy classify (primary 또는 deep) ───
+        # ─── 1. Legacy classify (primary 26B) ───
        truncated = doc.extracted_text[:MAX_CLASSIFY_TEXT]
-        try:
-            raw_response = await client.classify(truncated, cfg=legacy_cfg)
-        except Exception as exc:
-            if legacy_cfg is not None and is_deferrable_error(exc):
-                # 맥북 불가 — 첫 호출(최저 비용 지점)에서 보류로 전환, doc 쓰기 0
-                raise StageDeferred(f"macbook_unavailable:{type(exc).__name__}") from exc
-            raise
+        raw_response = await client.classify(truncated)
        parsed = parse_json_response(raw_response)

        if not parsed:
@@ -511,24 +469,6 @@ async def process(
        if not doc.document_type:
            doc.document_type = doc_type if doc_type in DOCUMENT_TYPES else "Note"

-        # ─── 안전 자료실 A-2: material_type 제안 (업로드 경로 — LLM 직접 부여 금지) ───
-        # document_type → material_type 결정적 매핑만 제안으로 적재 (프롬프트 변경 0).
-        # 승인(accept-suggestion) 시에만 전이 — law 는 국가 필수 입력 (KR 기본값 오염 차단,
-        # 자동 전이 금지 사상은 category 와 동일). 수집기 deterministic 경로는 이미 채워져
-        # 있어(material_type IS NOT NULL) 본 제안 비대상. 거래문서 제안(ai_suggestion 점유)과
-        # 충돌 시 기존 제안 우선 (두 제안이 겹치는 문서는 실무상 없음 — 거래 vs 안전자료).
-        _mt_prop = _DOCTYPE_TO_MATERIAL.get(doc.document_type or "")
-        if _mt_prop and doc.material_type is None and doc.ai_suggestion is None:
-            doc.ai_suggestion = {
-                "proposed_material_type": _mt_prop,
-                "proposed_jurisdiction": None,
-                "confidence": doc.ai_confidence,
-                "source_updated_at": (
-                    doc.updated_at.isoformat() if doc.updated_at else None
-                ),
-                "reason": "document_type→material_type 결정적 매핑",
-            }
-
        # confidence
        confidence = parsed.get("confidence", 0.5)
        doc.ai_confidence = max(0.0, min(1.0, float(confidence)))
@@ -564,9 +504,7 @@ async def process(
            doc.facet_doctype = ai_doctype

        # ─── ai_suggestion 저장 (자료실 승인 대기함 제안, §1) ───
-        # R9: 기존 제안(material_type 제안 등) 우선 — doc.ai_suggestion is None 가드 추가
-        # (material 제안 블록과 대칭). 없으면 거래문서 제안이 기존 제안을 clobber('기존 제안 우선' 위반).
-        if ai_doctype in LIBRARY_SUGGESTION_DOCTYPES and doc.ai_suggestion is None:
+        if ai_doctype in LIBRARY_SUGGESTION_DOCTYPES:
            year = doc.facet_year or datetime.now(timezone.utc).year
            doc.ai_suggestion = {
                "proposed_category": "library",
@@ -579,17 +517,12 @@ async def process(
                "reason": "classify pipeline",
            }

-        # ─── 2. Legacy 요약 (primary 또는 deep) ───
-        try:
-            summary = await client.summarize(doc.extracted_text[:50000], cfg=legacy_cfg)
-        except Exception as exc:
-            if legacy_cfg is not None and is_deferrable_error(exc):
-                raise StageDeferred(f"macbook_unavailable:{type(exc).__name__}") from exc
-            raise
+        # ─── 2. Legacy 요약 (primary 26B) ───
+        summary = await client.summarize(doc.extracted_text[:50000])
        doc.ai_summary = strip_thinking(summary)

-        # ─── 메타데이터 (legacy 완료) — 실제 처리 머신 귀속 (drain=qwen-macbook) ───
-        doc.ai_model_version = (legacy_cfg or settings.ai.primary).model
+        # ─── 메타데이터 (legacy 완료) ───
+        doc.ai_model_version = settings.ai.primary.model
        doc.ai_processed_at = datetime.now(timezone.utc)

        logger.info(
@@ -600,9 +533,7 @@ async def process(

        # ─── 3. PR-B B-1 — tier triage (4B, 실패는 legacy 결과 보존) ───
        try:
-            await _run_tier_triage(client, doc, session, use_deep=use_deep)
-        except StageDeferred:
-            raise  # 보류는 실패가 아님 — drain/consumer 가 attempts 미소모 처리
+            await _run_tier_triage(client, doc, session)
        except Exception as exc:
            logger.exception(f"[triage] id={document_id} 전체 실패 — legacy 유지: {exc}")

@@ -610,10 +541,8 @@ async def process(
        await client.close()


-async def _run_tier_triage(
-    client: AIClient, doc: Document, session: AsyncSession, *, use_deep: bool = False
-) -> None:
-    """summary_triage (p3a_short_summary) 경로. use_deep = process() 에서 전달 (drain 전용)."""
+async def _run_tier_triage(client: AIClient, doc: Document, session: AsyncSession) -> None:
+    """summary_triage (p3a_short_summary) 경로."""
    document_id = doc.id
    text = doc.extracted_text or ""
    input_chars = len(text)
@@ -621,14 +550,6 @@ async def _run_tier_triage(
    triage_start = time.perf_counter()
    parse_error: str | None = None
    triage_out = TriageOutput()
-    # drain 경유 시 triage 도 deep 슬롯(맥북) — sampling 은 triage 것 유지(결정성).
-    deep_triage_cfg = None
-    if use_deep and settings.ai.deep is not None:
-        deep_triage_cfg = settings.ai.deep.model_copy(update={
-            "temperature": settings.ai.triage.temperature,
-            "top_p": settings.ai.triage.top_p,
-            "max_tokens": settings.ai.triage.max_tokens,
-        })

    # 입력이 triage 한도 초과면 호출 생략하고 long_context 로 escalate
    if input_chars > TRIAGE_TEXT_LIMIT:
@@ -669,17 +590,7 @@ async def _run_tier_triage(
    prompt = rendered.replace("{extracted_text}", text[:TRIAGE_TEXT_LIMIT])

    try:
-        if deep_triage_cfg is not None:
-            # drain 전용 — deep 슬롯 endpoint + triage sampling. 맥북 불가(StageDeferred)
-            # 는 아래 generic except 에 먹히지 않게 먼저 전파.
-            raw_triage = await call_deep_or_defer(client, prompt, cfg=deep_triage_cfg)
-        else:
-            # consumer 경로 call_triage 는 PR #20 이후 primary 와 동일 Mac mini endpoint —
-            # evidence/classifier 처럼 gate 안에서 호출(영구 룰: 같은 endpoint 예외 없이 gate).
-            async with acquire_mlx_gate(Priority.BACKGROUND):
-                raw_triage = await client.call_triage(prompt)
-    except StageDeferred:
-        raise  # drain 이 attempts 미소모 + 백오프로 처리 (sleep-안전)
+        raw_triage = await client.call_triage(prompt)
    except Exception as exc:
        logger.warning(
            "[triage] 4B 호출 실패 id=%s type=%s repr=%r",
@@ -745,7 +656,6 @@ async def _run_tier_triage(
        escalation_reason=escalation_reason,
        parse_error=parse_error,
        routing_decision=routing_decision,
-        model_name=(deep_triage_cfg.model if deep_triage_cfg is not None else None),
    )


@@ -760,7 +670,6 @@ async def _apply_triage_result(
    escalation_reason: str | None,
    parse_error: str | None,
    routing_decision=None,
-    model_name: str | None = None,  # fair-share: 실제 호출 경로 모델 (None=triage 기본)
 ) -> None:
    """TriageOutput → Document 필드 + R2 suppression + envelope enqueue + audit.

@@ -851,7 +760,7 @@ async def _apply_triage_result(
        layers_returned=["tldr", "bullets"] if not parse_error else [],
        cached=False,
        latency_ms=latency_ms,
-        model_name=(model_name or settings.ai.triage.model),
+        model_name=settings.ai.triage.model,
        prompt_version=(f"{SUMMARY_TRIAGE_TASK}@{pv}" if pv else SUMMARY_TRIAGE_TASK),
        error_code=parse_error,
        source="document_server",
@@ -1,407 +0,0 @@
-"""C-2 잔여 ① US CSB sitemap diff 수집 워커 (plan crawl-24x7-1, 사이클 3).
-
-RSS 폐지 → sitemap.xml lastmod diff 폴링이 정석 (정부 사이트라 lastmod 양호 —
-2026-06-11 실측 1,307 URL, 조사 보고서 페이지는 루트 슬러그). 페이지 본문(4-tier
-≥200자 게이트) + 보고서 PDF(/assets/, recommendation 상태요약 제외) →
-기존 extract 파이프라인(marker/kordoc) 재사용.
-
-스케줄 = weekly (main.py 월 06:50 KST):
-  워터마크(selector_override.sitemap_watermark — B-3 probe 설정과 같은 JSONB 슬롯)
-  이후 lastmod 만, 오래된 것부터 cap(40페이지/회). 워터마크는 처리분까지만 전진
-  = 잔량 자동 점진 백필 (KOSHA GUIDE cap 패턴). cap 미처리 잔량은 매회 로그
-  (silent cap 금지). diff 건수 > sanity(300) = sitemap 부패/lastmod 남발 의심 가시 경고.
-
-초기 일괄 (cap 해제, politeness 로 수 시간 — docker exec -d, 진행 중 같은 서비스
-재배포 금지 [[feedback_docker_exec_orphan_kill]] 자매 함정):
-  docker exec hyungi_document_server-fastapi-1 \
-      python -m workers.csb_collector --limit 3        # 검증용
-  docker exec -d hyungi_document_server-fastapi-1 \
-      python -m workers.csb_collector --bulk           # 전체
-
-멱등: 페이지 = edit_url(정규화)+file_hash dedup (first-wins — lastmod 갱신 페이지의
-본문 재적재는 안 함, 갱신의 실체인 신규 PDF 는 개별 dedup 으로 적재됨).
-PDF = file_path dedup. 워터마크 경계는 >= 재조회 — 경계 페이지 1회 재fetch 후
-dedup 이 잡는다 (lastmod 실측 distinct 라 누적 재fetch 없음).
-"""
-
-import argparse
-import asyncio
-import hashlib
-import random
-import re
-from datetime import datetime, timezone
-from pathlib import Path
-from urllib.parse import urljoin, urlparse
-
-import httpx
-from sqlalchemy import select
-
-from core.config import settings
-from core.crawl_politeness import (
-    CRAWL_UA,
-    CrawlBlocked,
-    CrawlFetchError,
-    CrawlSkip,
-    fetch_page,
-)
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from workers.fulltext_worker import (
-    _WEB_MIN_BODY_LEN,
-    _extract_body,
-    _raw_html_path,
-    _save_raw_html,
-    _strip_article_footer,
-)
-from workers.kosha_collector import _safe_filename
-from workers.news_collector import (
-    FeedError,
-    _get_or_create_health,
-    _normalize_url,
-    _record_failure,
-    _record_success,
-)
-from workers.static_corpus_ingest import _page_title
-
-logger = setup_logger("csb_collector")
-
-_SITEMAP_URL = "https://www.csb.gov/sitemap.xml"
-_SOURCE_NAME = "US CSB 사고조사보고서"
-
-_RUN_PAGE_CAP = 40      # weekly 1회 처리 상한 — 잔량은 워터마크 미전진으로 자동 이월
-_DIFF_SANITY = 300      # 주간 diff 가 이를 넘으면 sitemap lastmod 남발/부패 의심 (카드 C-2)
-_MAX_PDF_BYTES = 50 * 1024 * 1024
-_PDF_DELAY = (2.0, 5.0)  # 같은 도메인 연속 PDF 다운로드 간격 (kosha _DOWNLOAD_DELAY 동률)
-
-# 텍스트 코퍼스 무가치/관리성 섹션 — 첫 path segment 기준 (조사 보고서·뉴스 릴리스는
-# 루트 슬러그라 영향 없음. /news/·/investigations/ 는 목록 페이지뿐이라 제외).
-_SKIP_FIRST_SEGMENT = {
-    "videos", "photos", "events", "members", "disclaimers", "media-room",
-    "about-the-csb", "about-us", "foia", "news", "investigations",
-    "site-map", "subscribe", "unsubscribe", "optout", "test",
-    "privacy-policy", "vulnerability-disclosure-policy", "en-espanol",
-    "newsletter", "recom-stats", "500.aspx", "documents", "records-details",
-}
-
-
-def _parse_sitemap(xml_text: str) -> list[tuple[str, datetime]]:
-    """(url, lastmod) 목록 — lastmod 없는/파싱불가 항목은 제외 (diff 축이 없음)."""
-    out: list[tuple[str, datetime]] = []
-    for m in re.finditer(
-        r"<url>\s*<loc>([^<]+)</loc>\s*<lastmod>([^<]+)</lastmod>", xml_text
-    ):
-        try:
-            lastmod = datetime.fromisoformat(m.group(2).strip())
-        except ValueError:
-            continue
-        if lastmod.tzinfo is None:
-            lastmod = lastmod.replace(tzinfo=timezone.utc)
-        out.append((m.group(1).strip(), lastmod))
-    return out
-
-
-def _should_skip(url: str) -> bool:
-    path = urlparse(url).path.strip("/")
-    if not path:
-        return True  # 홈
-    return path.split("/", 1)[0].lower() in _SKIP_FIRST_SEGMENT
-
-
-def _pdf_links(html_text: str, base_url: str) -> list[str]:
-    """페이지 내 보고서 PDF — /assets/recommendation/(상태변경 요약 다수)은 제외.
-
-    cache-buster 쿼리(?17346)는 다운로드 URL 에는 유지, dedup/파일명은 path 기준.
-    """
-    seen: set[str] = set()
-    out: list[str] = []
-    for m in re.finditer(r'href="([^"]+\.pdf(?:\?[^"]*)?)"', html_text, re.I):
-        absolute = urljoin(base_url, m.group(1))
-        path = urlparse(absolute).path
-        if "/assets/recommendation/" in path.lower():
-            continue
-        if (urlparse(absolute).hostname or "").lower() != "www.csb.gov":
-            continue
-        if path not in seen:
-            seen.add(path)
-            out.append(absolute)
-    return out
-
-
-async def _download_pdf(url: str, dest: Path) -> int:
-    """PDF 다운로드 — 크기 cap + 연속 간격 (politeness 는 순차 실행 전제)."""
-    await asyncio.sleep(random.uniform(*_PDF_DELAY))
-    async with httpx.AsyncClient(timeout=60, follow_redirects=True) as client:
-        resp = await client.get(url, headers={"User-Agent": CRAWL_UA})
-    if resp.status_code != 200:
-        raise FeedError(f"PDF 다운로드 {resp.status_code}: {url}")
-    if len(resp.content) > _MAX_PDF_BYTES:
-        raise FeedError(f"PDF 크기 초과 ({len(resp.content)} bytes): {url}")
-    dest.parent.mkdir(parents=True, exist_ok=True)
-    dest.write_bytes(resp.content)
-    return len(resp.content)
-
-
-async def _get_or_create_source(session) -> NewsSource:
-    result = await session.execute(
-        select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
-    )
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=_SOURCE_NAME, feed_url=_SITEMAP_URL, feed_type="rss",
-            fetch_method="sitemap+page", fulltext_policy="none",
-            source_channel="crawl", category="Safety", language="en", country="US",
-            enabled=False,  # 6h 뉴스 사이클 비대상 — 본 워커가 weekly 폴링
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-def _watermark(source: NewsSource) -> datetime | None:
-    raw = (source.selector_override or {}).get("sitemap_watermark")
-    if not raw:
-        return None
-    try:
-        return datetime.fromisoformat(raw)
-    except ValueError:
-        return None
-
-
-def _set_watermark(source: NewsSource, value: datetime) -> None:
-    # JSONB 변경 감지를 위해 dict 재할당 (fulltext_worker._set_fulltext_meta 동일 규약)
-    cfg = dict(source.selector_override or {})
-    cfg["sitemap_watermark"] = value.isoformat()
-    source.selector_override = cfg
-
-
-async def _ingest_pdf(session, page_slug: str, pdf_url: str) -> bool:
-    """PDF 1건 → NAS 저장 + Document + extract enqueue. 반환 = 신규 여부."""
-    fname = _safe_filename(Path(urlparse(pdf_url).path).name)
-    rel_path = f"crawl_raw/csb/{page_slug}/{fname}"
-    existing = await session.execute(
-        select(Document).where(Document.file_path == rel_path).limit(1)
-    )
-    if existing.scalars().first():
-        return False
-
-    dest = Path(settings.nas_mount_path) / rel_path
-    size = await _download_pdf(pdf_url, dest)
-    doc = Document(
-        file_path=rel_path,
-        file_hash=hashlib.sha256(dest.read_bytes()).hexdigest(),
-        file_format="pdf",
-        file_size=size,
-        file_type="immutable",
-        title=fname.rsplit(".", 1)[0].replace("_", " "),
-        source_channel="crawl",
-        data_origin="external",
-        import_source="csb_sitemap",
-        edit_url=pdf_url,
-        ai_tags=["Safety/CSB/보고서"],
-        # 안전 자료실 A-2 — ingest 시점 deterministic. CSB = 미 연방기관 = public domain.
-        material_type="incident",
-        jurisdiction="US",
-        extract_meta={"csb": {"page_slug": page_slug, "kind": "report_pdf"},
-                      "license": {"scheme": "public_domain", "redistribute": True,
-                                  "attribution": "U.S. Chemical Safety Board"}},
-    )
-    session.add(doc)
-    await session.flush()
-    await enqueue_stage(session, doc.id, "extract")
-    logger.info(f"[csb] PDF ingest: {rel_path} ({size} bytes)")
-    return True
-
-
-async def _ingest_url(session, source: NewsSource, url: str, lastmod: datetime) -> dict:
-    """변경 URL 1건: 페이지 fetch → PDF 전수 스캔(개별 dedup) + 본문 신규면 적재.
-
-    페이지 재방문(lastmod 갱신)에서도 PDF 스캔은 항상 수행 — 갱신의 실체
-    (최종 보고서 추가 등)가 PDF 로 오는 경우가 핵심 가치다.
-    """
-    counts = {"page": 0, "pdf": 0, "skip": 0}
-    try:
-        html_text, final_url = await fetch_page(url)
-    except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
-        logger.warning(f"[csb] fetch 실패 skip: {url} — {type(e).__name__}: {e}")
-        counts["skip"] = 1
-        return counts
-
-    page_slug = _safe_filename(urlparse(url).path.strip("/").split("/")[-1] or "root")
-
-    for pdf_url in _pdf_links(html_text, final_url):
-        try:
-            if await _ingest_pdf(session, page_slug, pdf_url):
-                counts["pdf"] += 1
-        except FeedError as e:
-            logger.warning(f"[csb] PDF 실패 skip ({pdf_url}): {e}")
-
-    # 페이지 본문 — first-wins (이미 있으면 본문 재적재 없음)
-    normalized_url = _normalize_url(url)
-    page_hash = hashlib.sha256(f"csb-page|{normalized_url}".encode()).hexdigest()[:32]
-    existing = await session.execute(
-        select(Document).where(
-            (Document.file_hash == page_hash)
-            | (Document.edit_url.in_([normalized_url, url]))
-        ).limit(1)
-    )
-    if existing.scalars().first():
-        return counts
-
-    body, engine, engine_ver = _extract_body(html_text)
-    if not engine:
-        logger.info(f"[csb] 본문 부족 — 페이지 비적재 (PDF 만): {url}")
-        return counts
-    clean_body = _strip_article_footer(body.replace("\x00", ""))
-    if len(clean_body) < _WEB_MIN_BODY_LEN:
-        return counts
-
-    now = datetime.now(timezone.utc)
-    raw_path = _raw_html_path(source.id, page_hash, now)
-    raw_saved = True
-    try:
-        _save_raw_html(raw_path, html_text)
-    except OSError as e:
-        raw_saved = False
-        logger.error(f"[csb] 원본 보존 실패 (ingest 는 진행): {e}")
-
-    title = _page_title(html_text, fallback=page_slug.replace("-", " ")[:90])
-    doc = Document(
-        file_path=f"crawl/{_SOURCE_NAME}/{page_hash}",
-        file_hash=page_hash,
-        file_format="article",
-        file_size=0,
-        file_type="note",
-        title=title,
-        extracted_text=f"{title}\n\n{clean_body}",
-        extracted_at=now,
-        extractor_version=f"sitemap+page@{engine}",
-        md_content=clean_body,
-        md_status="success",
-        md_extraction_engine=engine,
-        md_extraction_engine_version=engine_ver,
-        md_format_version="1.0",
-        md_generated_at=now,
-        md_source_hash=hashlib.sha256(html_text.encode("utf-8", errors="replace")).hexdigest(),
-        md_content_hash=hashlib.sha256(clean_body.encode("utf-8")).hexdigest(),
-        content_origin="extracted",
-        source_channel="crawl",
-        data_origin="external",
-        edit_url=normalized_url,
-        review_status="approved",
-        ai_domain="Safety",
-        ai_sub_group=_SOURCE_NAME,
-        ai_tags=["Safety/CSB"],
-        # 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
-        material_type="incident",
-        jurisdiction="US",
-        published_date=lastmod.date() if lastmod else None,
-        extract_meta={
-            "source_id": source.id,
-            "source_name": _SOURCE_NAME,
-            "published_at": lastmod.isoformat(),
-            "license": {"scheme": "public_domain", "redistribute": True,
-                        "attribution": "U.S. Chemical Safety Board"},
-            "fulltext": {
-                "status": "csb_sitemap",
-                "engine": engine,
-                "final_url": final_url,
-                "raw_html_path": str(raw_path) if raw_saved else None,
-                "body_chars": len(clean_body),
-                "resolved_at": now.isoformat(),
-            },
-        },
-    )
-    doc.file_size = len(doc.extracted_text.encode())
-    session.add(doc)
-    await session.flush()
-    await enqueue_stage(session, doc.id, "summarize")
-    await enqueue_stage(session, doc.id, "embed")
-    await enqueue_stage(session, doc.id, "chunk")
-    counts["page"] = 1
-    logger.info(f"[csb] page ingest {len(clean_body)}자 ({engine}): {title[:60]}")
-    return counts
-
-
-async def run(bulk: bool = False, limit: int = 0) -> None:
-    """weekly 진입점 (스케줄러) — bulk/limit 은 CLI 전용."""
-    now = datetime.now(timezone.utc)
-    async with async_session() as session:
-        source = await _get_or_create_source(session)
-        await session.commit()
-        source_id = source.id
-        watermark = _watermark(source)
-
-    try:
-        xml_text, _ = await fetch_page(
-            _SITEMAP_URL, content_types=("text/xml", "application/xml", "text/html")
-        )
-        entries = _parse_sitemap(xml_text)
-        if not entries:
-            raise FeedError("sitemap 파싱 0건 — 포맷 변경/부패 의심")
-    except (CrawlBlocked, CrawlSkip, CrawlFetchError, FeedError) as e:
-        logger.error(f"[csb] sitemap 수집 실패: {e}")
-        async with async_session() as session:
-            health = await _get_or_create_health(session, source_id)
-            _record_failure(health, str(e) or repr(e), now)
-            await session.commit()
-        return
-
-    changed = sorted(
-        (
-            (url, lastmod) for url, lastmod in entries
-            if not _should_skip(url) and (watermark is None or lastmod >= watermark)
-        ),
-        key=lambda pair: pair[1],
-    )
-    if watermark is not None and len(changed) > _DIFF_SANITY:
-        logger.error(
-            f"[csb] diff {len(changed)}건 > sanity {_DIFF_SANITY} — "
-            f"sitemap lastmod 남발/부패 의심 (cap 처리는 계속, 관찰 필요)"
-        )
-
-    cap = len(changed) if bulk else _RUN_PAGE_CAP
-    if limit:
-        cap = min(cap, limit)
-    todo, deferred = changed[:cap], max(len(changed) - cap, 0)
-    logger.info(
-        f"[csb] sitemap {len(entries)}건 중 변경 {len(changed)}건, 처리 {len(todo)}건"
-        + (f" (잔여 {deferred}건 — 워터마크 미전진으로 자동 이월)" if deferred else "")
-    )
-
-    totals = {"page": 0, "pdf": 0, "skip": 0}
-    for i, (url, lastmod) in enumerate(todo, 1):
-        # 2026-06-20 C2: URL 1건 실패가 주간 run 전체를 중단(이후 URL 스킵·watermark 정지)하던 것 차단.
-        # 각 iteration 은 자체 session(async with) 이라 실패 격리 — 건너뛰고 계속.
-        try:
-            async with async_session() as session:
-                src = await session.get(NewsSource, source_id)
-                counts = await _ingest_url(session, src, url, lastmod)
-                _set_watermark(src, lastmod)
-                await session.commit()
-        except Exception as e:
-            logger.error(f"[csb] URL 처리 실패 (건너뜀): {url} — {str(e) or repr(e)}")
-            continue
-        for k in totals:
-            totals[k] += counts[k]
-        if i % 10 == 0:
-            logger.info(f"[csb] 진행 {i}/{len(todo)} {totals}")
-
-    async with async_session() as session:
-        health = await _get_or_create_health(session, source_id)
-        _record_success(health, totals["page"] + totals["pdf"], False, now)
-        src = await session.get(NewsSource, source_id)
-        src.last_fetched_at = now
-        await session.commit()
-    logger.info(f"[csb] 완료: {totals} (변경 {len(changed)}건 중 {len(todo)}건 처리)")
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="CSB sitemap diff 수집")
-    parser.add_argument("--bulk", action="store_true", help="cap 해제 — 초기 일괄")
-    parser.add_argument("--limit", type=int, default=0, help="처리 상한 (검증용)")
-    args = parser.parse_args()
-    asyncio.run(run(bulk=args.bulk, limit=args.limit))
@@ -1,12 +1,11 @@
-"""일일 다이제스트 워커 — PostgreSQL/CalDAV 쿼리 → Markdown 생성
+"""일일 다이제스트 워커 — PostgreSQL/CalDAV 쿼리 → Markdown + SMTP

 v1 scripts/pkm_daily_digest.py에서 포팅.
 DEVONthink/OmniFocus → PostgreSQL/CalDAV 쿼리로 전환.
-SMTP 발송은 2026-06-10 제거 (한 번도 전달 성공한 적 없는 기능 — 폐기 결정).
 """

-import asyncio
-from datetime import datetime, time, timedelta, timezone
+import os
+from datetime import datetime, timezone
 from zoneinfo import ZoneInfo
 from pathlib import Path

@@ -14,43 +13,24 @@ from sqlalchemy import func, select, text

 from core.config import settings
 from core.database import async_session
-from core.utils import setup_logger
+from core.utils import send_smtp_email, setup_logger
 from models.document import Document
 from models.queue import ProcessingQueue

 logger = setup_logger("daily_digest")


-def _write_and_rotate(digest_dir: Path, today: str, markdown: str) -> Path:
-    """digest 파일 저장 + 90일 초과 아카이브 이동 (blocking — caller 가 to_thread, R8)."""
-    digest_dir.mkdir(parents=True, exist_ok=True)
-    digest_path = digest_dir / f"{today}_digest.md"
-    digest_path.write_text(markdown, encoding="utf-8")
-    archive_dir = digest_dir / "archive"
-    archive_dir.mkdir(exist_ok=True)
-    cutoff = datetime.now(timezone.utc).timestamp() - (90 * 86400)
-    for old in digest_dir.glob("*_digest.md"):
-        if old.stat().st_mtime < cutoff:
-            old.rename(archive_dir / old.name)
-    return digest_path
-
-
 async def run():
    """일일 다이제스트 생성 + 저장 + 발송"""
-    # KST 기준 오늘 (cron 이 KST timezone fix 후 20:00 KST 에 fire).
-    kst = ZoneInfo("Asia/Seoul")
-    today = datetime.now(kst).date()
-    # KST 하루를 UTC 범위로 변환 (R8) — func.date(created_at)는 pg TimeZone(UTC) 기준 날짜라
-    # KST 0~9시 생성 문서(UTC 전날)가 누락되던 경계 버그. created_at(UTC저장) 범위 비교로.
-    start_utc = datetime.combine(today, time.min, tzinfo=kst).astimezone(timezone.utc)
-    end_utc = start_utc + timedelta(days=1)
+    # KST 기준 오늘 (cron 이 KST timezone fix 후 20:00 KST 에 fire). date 객체로 비교 — Document.created_at::date 와 직접 매칭.
+    today = datetime.now(ZoneInfo("Asia/Seoul")).date()
    sections = []

    async with async_session() as session:
        # ─── 1. 오늘 추가된 문서 ───
        added = await session.execute(
            select(Document.ai_domain, func.count(Document.id))
-            .where(Document.created_at >= start_utc, Document.created_at < end_utc)
+            .where(func.date(Document.created_at) == today)
            .group_by(Document.ai_domain)
        )
        added_rows = added.all()
@@ -69,8 +49,7 @@ async def run():
            select(Document.title)
            .where(
                Document.source_channel == "law_monitor",
-                Document.created_at >= start_utc,
-                Document.created_at < end_utc,
+                func.date(Document.created_at) == today,
            )
        )
        law_rows = law_docs.scalars().all()
@@ -87,8 +66,7 @@ async def run():
            select(func.count(Document.id))
            .where(
                Document.source_channel == "email",
-                Document.created_at >= start_utc,
-                Document.created_at < end_utc,
+                func.date(Document.created_at) == today,
            )
        )
        email_total = email_count.scalar() or 0
@@ -123,7 +101,7 @@ async def run():
        )
        failed_count = failed.scalar() or 0
        if failed_count > 0:
-            section += f"\n**[주의] 실패 {failed_count}건** — 수동 확인 필요\n"
+            section += f"\n⚠️ **실패 {failed_count}건** — 수동 확인 필요\n"
        sections.append(section)

        # ─── 5. Inbox 미분류 ───
@@ -141,8 +119,30 @@ async def run():
    markdown += "\n".join(sections)
    markdown += f"\n---\n*생성: {datetime.now(timezone.utc).isoformat()}*\n"

-    # ─── NAS 저장 + 90일 아카이브 (blocking 파일 I/O off-thread, R8/R5 일관) ───
+    # ─── NAS 저장 ───
    digest_dir = Path(settings.nas_mount_path) / "PKM" / "Archive" / "digests"
-    digest_path = await asyncio.to_thread(_write_and_rotate, digest_dir, str(today), markdown)
+    digest_dir.mkdir(parents=True, exist_ok=True)
+    digest_path = digest_dir / f"{today}_digest.md"
+    digest_path.write_text(markdown, encoding="utf-8")
+
+    # ─── 90일 초과 아카이브 ───
+    archive_dir = digest_dir / "archive"
+    archive_dir.mkdir(exist_ok=True)
+    cutoff = datetime.now(timezone.utc).timestamp() - (90 * 86400)
+    for old in digest_dir.glob("*_digest.md"):
+        if old.stat().st_mtime < cutoff:
+            old.rename(archive_dir / old.name)
+
+    # ─── SMTP 발송 ───
+    smtp_host = os.getenv("MAILPLUS_HOST", "")
+    smtp_port = int(os.getenv("MAILPLUS_SMTP_PORT", "465"))
+    smtp_user = os.getenv("MAILPLUS_USER", "")
+    smtp_pass = os.getenv("MAILPLUS_PASS", "")
+    if smtp_host and smtp_user:
+        send_smtp_email(
+            smtp_host, smtp_port, smtp_user, smtp_pass,
+            f"PKM 다이제스트 — {date_display}",
+            markdown,
+        )

    logger.info(f"다이제스트 생성 완료: {digest_path}")
@@ -1,32 +0,0 @@
-"""야간 dedup 컬럼 재계산 잡 (plan ds-s1-backend-1 B-4 '야간 배치').
-
-duplicate_of / duplicate_count 는 비정규화 캐시다. 문서는 soft-delete only(deleted_at)라
-FK ON DELETE SET NULL 이 발화하지 않아, canonical/멤버를 soft-delete 하면 잔여 드리프트가
-생긴다(멤버의 stale 포인터·canonical overcount). B-1 업로드 채움은 신규 행만 다루므로,
-이 야간 절대 재계산이 전체 정합을 보장한다. 멱등 — 드리프트 없으면 no-op(로그만).
-응답 계약(DocumentResponse.duplicate_count/duplicate_of)을 앱(S3)이 읽으므로 정합이 중요.
-"""
-
-import logging
-
-from core.database import async_session
-from services.dedup import reconcile_dedup
-
-logger = logging.getLogger("dedup_reconcile")
-
-
-async def run() -> None:
-    try:
-        async with async_session() as session:
-            r = await reconcile_dedup(session, apply=True)
-        if r["changes"]:
-            logger.info(
-                "[dedup_reconcile] groups=%s docs=%s changes=%s applied=%s",
-                r["groups"], r["docs"], r["changes"], r["applied"],
-            )
-        else:
-            logger.info(
-                "[dedup_reconcile] no drift (groups=%s docs=%s)", r["groups"], r["docs"]
-            )
-    except Exception:
-        logger.exception("[dedup_reconcile] failed")
@@ -20,12 +20,12 @@ from sqlalchemy.ext.asyncio import AsyncSession

 import json
 import re
-from ai.client import AIClient, call_deep_or_defer, parse_json_response, strip_thinking
+from ai.client import AIClient, parse_json_response, strip_thinking
 from ai.envelope import EscalationEnvelope
 from core.config import settings
 from core.utils import setup_logger
 from models.document import Document
-from models.queue import ProcessingQueue, StageDeferred
+from models.queue import ProcessingQueue
 from policy.prompt_render import render_26b, policy_version as compute_policy_version
 from services.document_telemetry import record_analyze_event
 from services.search.llm_gate import Priority, acquire_mlx_gate
@@ -54,18 +54,8 @@ class DeepSummaryOutput(BaseModel):
    confidence: float = 0.5


-async def process(
-    document_id: int, session: AsyncSession, *, defer_on_deep_unavailable: bool = False
-) -> None:
-    """deep_summary 큐 pickup → LLM 호출 → 필드 저장.
-
-    defer_on_deep_unavailable:
-      False (기본, consumer 경로) = 맥북(deep 슬롯) 우선 시도, 불가 시 즉시
-        맥미니 primary 로 처리. 2026-06-12 fair-share: 양 머신이 동일 모델
-        (Qwen3.6-27B-6bit)이라 폴백 = 품질 강등이 아니라 단순 분배.
-      True (queue_drain 전용) = 맥북 불가를 StageDeferred 로 올려 drain 이
-        보류 후 run 을 멈춘다 (drain = 맥북 분담 전용 레버 시멘틱 유지).
-    """
+async def process(document_id: int, session: AsyncSession) -> None:
+    """deep_summary 큐 pickup → 26B 호출 → 필드 저장."""
    doc = await session.get(Document, document_id)
    if not doc:
        raise ValueError(f"deep_summary: document id={document_id} 없음")
@@ -111,46 +101,19 @@ async def process(
    )

    client = AIClient()
-    # ds-macbook-offload-1: deep 슬롯 구성 시 맥북 M5 Max 경유(라우터). 부재 시 기존 경로 그대로.
-    deep_cfg = client.ai.deep
-    used_cfg = deep_cfg or settings.ai.primary
    latency_ms = 0
    parse_error: str | None = None
    deep_out = DeepSummaryOutput()

    try:
        start = time.perf_counter()
-        if deep_cfg is not None:
-            # 맥북 우선 — 맥미니 mlx gate 미점유(별 endpoint). doc 쓰기는 완주+파싱
-            # 후에만 일어나므로 어느 시점에 끊겨도 부분 쓰기 0.
-            try:
-                raw = await call_deep_or_defer(client, prompt)
-            except StageDeferred:
-                if defer_on_deep_unavailable:
-                    raise  # drain 전용 — 맥북 레버 시멘틱 (보류 후 run 종료)
-                # consumer 경로: 동일 모델이라 강등 아님 — 맥미니가 즉시 처리 (2026-06-12)
-                logger.info(
-                    f"[deep] id={document_id} 맥북 불가 → 맥미니 primary 처리 (fair-share)"
-                )
-                used_cfg = settings.ai.primary
-                async with acquire_mlx_gate(Priority.BACKGROUND):
-                    raw = await client.call_primary(prompt)
-        else:
-            async with acquire_mlx_gate(Priority.BACKGROUND):  # 2026-05-17 B-1: classify-escalate worker
-                raw = await client.call_primary(prompt)
+        async with acquire_mlx_gate(Priority.BACKGROUND):  # 2026-05-17 B-1: classify-escalate worker
+            raw = await client.call_primary(prompt)
        latency_ms = int((time.perf_counter() - start) * 1000)
-    except StageDeferred:
-        # 보류는 실패가 아님 — analyze_event 미기록(가짜 완료 방지), drain 이 백오프 기록.
-        logger.info(f"[deep] id={document_id} 맥북 일시 불가 — 보류 (deferred)")
-        raise
    except Exception as exc:
-        # 호출 실패(네트워크/API 5xx 등)는 삼키지 않고 전파 (R3) — queue_consumer 가
-        # attempts 소진까지 재시도 후 status=failed(dead-letter)로 가시화한다. 삼키면
-        # worker_fn 이 정상 반환 → 큐가 completed 로 확정 → ai_detail_summary 영구 누락 +
-        # tier 가 triage 에 고착(silent 영구 손실). extract/marker/fulltext/stt 정본과 일치.
-        # 완주 전 doc 쓰기(168~)는 일어나지 않으므로 부분 쓰기 0 (sleep-안전).
-        logger.warning(f"[deep] 호출 실패 id={document_id} model={used_cfg.model}: {exc}")
-        raise
+        logger.warning(f"[deep] 26B 호출 실패 id={document_id}: {exc}")
+        parse_error = "call_failed"
+        raw = ""
    finally:
        await client.close()

@@ -184,13 +147,12 @@ async def process(
        doc_id=document_id,
        user_id=None,
        mode="summary_deep",
-        text_limit=used_cfg.context_char_limit or 260000,
+        text_limit=settings.ai.primary.context_char_limit or 260000,
        truncated=False,
        layers_returned=["detail_summary", "inconsistencies"] if not parse_error else [],
        cached=False,
        latency_ms=latency_ms,
-        # deep 슬롯 사용 시 실처리 모델(qwen-macbook alias) 기록 — 어느 머신이 처리했는지 추적
-        model_name=used_cfg.model,
+        model_name=settings.ai.primary.model,
        prompt_version=(f"{DEEP_SUMMARY_TASK}@{pv}" if pv else DEEP_SUMMARY_TASK),
        error_code=parse_error,
        source="document_server",
@@ -10,16 +10,12 @@ global_digests / digest_topics 테이블에 저장한다.

 import asyncio

-from core.config import settings
-from core.database import engine as db_engine
 from core.utils import setup_logger
-from services.background_jobs import finish_job, start_job
 from services.digest.pipeline import run_digest_pipeline

 logger = setup_logger("digest_worker")

-# 2026-06-15: config 단일소스 (구 600s = 빠른 Gemma 기준, Qwen 27B 교체 후 누락 → 초과).
-PIPELINE_HARD_CAP = settings.digest_pipeline_hard_cap_s
+PIPELINE_HARD_CAP = 600  # 10분 hard cap


 async def run() -> None:
@@ -28,27 +24,19 @@ async def run() -> None:
    pipeline 자체는 timeout 으로 감싸지 않음 (per-call timeout 은 summarizer 가 처리).
    여기서는 전체 hard cap 만 강제.
    """
-    if "digest" in settings.pipeline_held_stages:
-        logger.info("[global_digest] 보류 (pipeline.held_stages) — 이번 실행 skip")
-        return
-    # 보드 가시화: 큐 밖 cron 생성 작업이라 background_jobs 로 노출 (best-effort, 맥미니 귀속)
-    job_id = await start_job(db_engine, "global_digest", label="글로벌 다이제스트 생성")
    try:
        result = await asyncio.wait_for(
-            run_digest_pipeline(job_id=job_id),
+            run_digest_pipeline(),
            timeout=PIPELINE_HARD_CAP,
        )
-        await finish_job(db_engine, job_id, state="done")
        logger.info(f"[global_digest] 워커 완료: {result}")
    except asyncio.TimeoutError:
-        await finish_job(db_engine, job_id, state="failed", error=f"HARD CAP {PIPELINE_HARD_CAP}s 초과")
        logger.error(
            f"[global_digest] HARD CAP {PIPELINE_HARD_CAP}s 초과 — 워커 강제 중단. "
            f"기존 digest 는 commit 시점에만 갱신되므로 그대로 유지됨. "
            f"다음 cron 실행에서 재시도."
        )
    except Exception as e:
-        await finish_job(db_engine, job_id, state="failed", error=str(e)[:300])
        logger.exception(f"[global_digest] 워커 실패: {e}")


@@ -1,65 +0,0 @@
-"""delete_file=true 로 요청된 문서의 NAS 원본을 grace 후 물리삭제 (R7 retention sweep).
-
-purge_requested_at 마커 기준(deleted_at 아님 — 일반 soft-delete/숨김은 파일 보존, undelete
-가능). grace(30일) 경과 + 파일 존재 시 unlink + AUDIT 로그. 파일 존재 체크로 멱등
-(재실행 시 이미 삭제된 건 skip). 요청 경로(DELETE)엔 동기 비가역 op 0 — 모두 이 cron 으로.
-"""
-import asyncio
-import logging
-from datetime import datetime, timedelta, timezone
-from pathlib import Path
-
-from sqlalchemy import select
-
-from core.config import settings
-from core.database import async_session
-from models.document import Document
-
-logger = logging.getLogger("purge_sweep")
-
-PURGE_GRACE_DAYS = 30
-
-
-def _unlink_if_exists(p: Path) -> bool:
-    """파일이 있으면 unlink (blocking — caller 가 to_thread). 존재 여부 반환(멱등)."""
-    if p.exists():
-        p.unlink()
-        return True
-    return False
-
-
-async def run() -> int:
-    """purge 요청 + grace 경과 문서의 NAS 원본 물리삭제. 삭제 건수 반환."""
-    cutoff = datetime.now(timezone.utc) - timedelta(days=PURGE_GRACE_DAYS)
-    async with async_session() as session:
-        rows = (
-            await session.execute(
-                select(Document.id, Document.file_path, Document.purge_requested_at).where(
-                    Document.purge_requested_at.is_not(None),
-                    Document.purge_requested_at < cutoff,
-                    Document.file_path.is_not(None),
-                )
-            )
-        ).all()
-
-    purged = 0
-    for doc_id, file_path, requested_at in rows:
-        nas_path = Path(settings.nas_mount_path) / file_path
-        try:
-            existed = await asyncio.to_thread(_unlink_if_exists, nas_path)
-            if existed:
-                purged += 1
-                # AUDIT — 물리삭제 기록 (가시화). doc_id / 경로 / 요청일 / grace.
-                logger.warning(
-                    "PURGE doc_id=%s file=%s requested_at=%s grace_days=%s",
-                    doc_id,
-                    file_path,
-                    requested_at.isoformat() if requested_at else None,
-                    PURGE_GRACE_DAYS,
-                )
-        except OSError as e:
-            logger.error("PURGE 실패 doc_id=%s file=%s: %s", doc_id, file_path, e)
-
-    if purged:
-        logger.info("[purge_sweep] NAS 원본 %d건 물리삭제 (grace %d일)", purged, PURGE_GRACE_DAYS)
-    return purged
@@ -1,9 +1,11 @@
 """텍스트 추출 워커 — kordoc / PyMuPDF / Surya OCR / LibreOffice / 직접 읽기 / 웹 HTML"""

+import email
 import hashlib
 import re
 import subprocess
 from datetime import datetime, timezone
+from email.header import decode_header
 from pathlib import Path

 import httpx
@@ -23,6 +25,8 @@ TEXT_FORMATS = {"md", "txt", "csv", "json", "xml", "html"}
 OFFICE_FORMATS = {"xlsx", "xls", "docx", "doc", "pptx", "ppt", "odt", "ods", "odp", "odoc", "osheet"}
 # OCR 대상 이미지 포맷
 IMAGE_FORMATS = {"jpg", "jpeg", "png", "tiff", "tif", "bmp", "gif", "webp"}
+# 이메일 (선별 PKM 폴더 수집 → 헤더+본문 추출)
+EML_FORMATS = {"eml"}

 EXTRACTOR_VERSION = "kordoc@1.7"
 PYMUPDF_VERSION = "pymupdf"
@@ -67,45 +71,21 @@ def _postprocess_ocr(text: str) -> str:
    return text.strip()


-def _extract_pdf_pymupdf(
-    file_path: Path, start_page: int | None = None, end_page: int | None = None
-) -> str:
-    """PyMuPDF fallback — 페이지 단위 스트리밍으로 대형 PDF도 저메모리 처리.
-
-    G2 (PR-G2-2): start_page/end_page(1-based inclusive) 가 주어지면 그 범위만 추출
-    (번들 자식 doc = 부모 파일 공유 + 자기 page 범위). 둘 다 None = 전체(기존 동작 동일).
-    """
+def _extract_pdf_pymupdf(file_path: Path) -> str:
+    """PyMuPDF fallback — 페이지 단위 스트리밍으로 대형 PDF도 저메모리 처리"""
    import fitz
    text_parts = []
    with fitz.open(str(file_path)) as doc:
-        if start_page is None and end_page is None:
-            for page in doc:
-                text_parts.append(page.get_text())
-        else:
-            # 1-based inclusive → 0-based range. 범위는 [0, page_count] 로 클램프(방어).
-            total = doc.page_count
-            lo = max(1, start_page or 1) - 1
-            hi = min(total, end_page or total)        # inclusive 끝 (0-based 마지막 인덱스 = hi-1)
-            for i in range(lo, hi):
-                text_parts.append(doc.load_page(i).get_text())
+        for page in doc:
+            text_parts.append(page.get_text())
    return "\n".join(text_parts)


-def _get_pdf_page_count(
-    file_path: Path, start_page: int | None = None, end_page: int | None = None
-) -> int:
-    """PDF 페이지 수 확인. G2: 범위가 주어지면 그 범위의 페이지 수(자식 doc 밀도 계산용).
-
-    둘 다 None = 전체 페이지 수(기존 동작 동일).
-    """
+def _get_pdf_page_count(file_path: Path) -> int:
+    """PDF 페이지 수 확인"""
    import fitz
    with fitz.open(str(file_path)) as doc:
-        total = len(doc)
-        if start_page is None and end_page is None:
-            return total
-        lo = max(1, start_page or 1)
-        hi = min(total, end_page or total)
-        return max(0, hi - lo + 1)
+        return len(doc)


 async def _call_ocr(file_path: Path, is_image: bool, max_pages: int = 200) -> str | None:
@@ -257,6 +237,90 @@ async def _extract_web_html(doc: Document, html_path: Path) -> None:
    )


+# ─── 이메일(.eml) 추출 ───
+
+def _decode_eml_header(raw: str) -> str:
+    """MIME 인코딩 헤더 디코딩."""
+    if not raw:
+        return ""
+    out = []
+    for data, charset in decode_header(raw):
+        if isinstance(data, bytes):
+            out.append(data.decode(charset or "utf-8", errors="replace"))
+        else:
+            out.append(data)
+    return "".join(out)
+
+
+async def _extract_eml(doc: Document, eml_path: Path) -> None:
+    """이메일(.eml) 본문 추출 — From/To/Date/Subject 헤더 블록 + 본문.
+
+    본문은 text/plain 우선, 없으면 text/html → bs4 평문(_extract_web_with_bs4 재사용).
+    헤더를 본문 머리에 prepend 해 검색·요약이 발신자/제목 맥락을 갖게 함.
+    첨부는 extract_meta['email_attachments'] 에 인벤토리만 (본문 추출은 후속 — scaffold).
+    """
+    raw = eml_path.read_bytes()
+    msg = email.message_from_bytes(raw)
+
+    hdr_lines = []
+    for label in ("From", "To", "Date", "Subject"):
+        val = _decode_eml_header(msg.get(label, ""))
+        if val:
+            hdr_lines.append(f"{label}: {val}")
+
+    body = ""
+    html_body = ""
+    attachments = []
+    if msg.is_multipart():
+        for part in msg.walk():
+            ctype = part.get_content_type()
+            disp = (part.get("Content-Disposition") or "").lower()
+            if "attachment" in disp:
+                payload = part.get_payload(decode=True)
+                attachments.append({
+                    "filename": _decode_eml_header(part.get_filename() or ""),
+                    "content_type": ctype,
+                    "size": len(payload) if payload else 0,
+                })
+                continue
+            if ctype == "text/plain" and not body:
+                payload = part.get_payload(decode=True)
+                if payload is not None:
+                    body = payload.decode(part.get_content_charset() or "utf-8", errors="replace")
+            elif ctype == "text/html" and not html_body:
+                payload = part.get_payload(decode=True)
+                if payload is not None:
+                    html_body = payload.decode(part.get_content_charset() or "utf-8", errors="replace")
+    else:
+        payload = msg.get_payload(decode=True)
+        if payload is not None:
+            decoded = payload.decode(msg.get_content_charset() or "utf-8", errors="replace")
+            if msg.get_content_type() == "text/html":
+                html_body = decoded
+            else:
+                body = decoded
+
+    # text/plain 없으면 html → bs4 평문 (devonagent 최종 fallback 재사용, 신규 의존성 0)
+    if not body and html_body:
+        body, _ = _extract_web_with_bs4(html_body)
+
+    if attachments:
+        names = ", ".join(a["filename"] for a in attachments if a["filename"])
+        hdr_lines.append(f"Attachments: {len(attachments)}개" + (f" ({names})" if names else ""))
+
+    header_block = "\n".join(hdr_lines)
+    full_text = (header_block + "\n\n" + (body or "")).replace("\x00", "").strip()
+
+    doc.extracted_text = full_text
+    doc.extracted_at = datetime.now(timezone.utc)
+    doc.extractor_version = "eml@stdlib"
+    if attachments:
+        meta = dict(doc.extract_meta or {})
+        meta["email_attachments"] = attachments
+        doc.extract_meta = meta
+    logger.info(f"[eml] {doc.file_path} ({len(full_text)}자, 첨부 {len(attachments)})")
+
+
 # ─── 메인 처리 ───

 async def process(document_id: int, session: AsyncSession) -> None:
@@ -281,6 +345,13 @@ async def process(document_id: int, session: AsyncSession) -> None:
        await _extract_web_html(doc, full_path)
        return

+    # ─── 이메일 (.eml) — 헤더+본문 추출 (선별 PKM 폴더 수집) ───
+    if fmt in EML_FORMATS:
+        if not full_path.exists():
+            raise FileNotFoundError(f"파일 없음: {full_path}")
+        await _extract_eml(doc, full_path)
+        return
+
    # ─── 텍스트 파일 — 직접 읽기 ───
    if fmt in TEXT_FORMATS:
        if not full_path.exists():
@@ -334,49 +405,6 @@ async def process(document_id: int, session: AsyncSession) -> None:
        doc.extracted_at = datetime.now(timezone.utc)
        return

-    # ─── G2 (PR-G2-2): 번들 자식 PDF — 부모 파일 공유 + 자기 page 범위만 추출 ───
-    # kordoc 서비스는 page-range 파라미터가 없어 전체 파일을 파싱한다(자식엔 부적합) → kordoc
-    # 우회, PyMuPDF 로 [bundle_page_start, bundle_page_end] 범위만 추출. range OCR 은 본 PR 범위
-    # 밖(자식은 ToC 존재 = digital text layer 전제 → 대개 OCR 불필요). PyMuPDF 텍스트가 빈약해도
-    # 그대로 보존하고 사유를 남긴다.
-    if fmt == "pdf" and doc.bundle_page_start is not None and doc.bundle_page_end is not None:
-        # 후보 A: 자식 file_path 는 합성값(`{부모}#p{s}-{e}`) → 실파일 = bundle_source_path 로 부모경로
-        # 복원 + NFC/NFD resolve. (자식 file_path 는 디스크에 없음.)
-        from workers.presegment_worker import _resolve_path as _resolve_bundle_path
-        from workers.presegment_worker import bundle_source_path
-        real_rel = bundle_source_path(doc.file_path)
-        src = _resolve_bundle_path(str(Path(settings.nas_mount_path) / real_rel))
-        if src is None:
-            raise FileNotFoundError(f"번들 원본 파일 없음: {real_rel}")
-        start, end = doc.bundle_page_start, doc.bundle_page_end
-        try:
-            pymupdf_text = _extract_pdf_pymupdf(src, start, end)
-            page_count = _get_pdf_page_count(src, start, end)
-        except Exception as e:
-            logger.error(f"[pymupdf:child] {doc.file_path} pages={start}-{end} 실패: {e}")
-            raise
-
-        meta = doc.extract_meta or {}
-        meta["presegment_child_range"] = {"start_page": start, "end_page": end}
-        meta["pymupdf_chars"] = len(pymupdf_text.strip())
-        should, reason = _should_ocr(pymupdf_text, page_count)
-        if should:
-            # range OCR 미지원(후속 PR) — PyMuPDF 결과 유지 + 사유 기록(silent skip 아님).
-            meta["ocr_skip_reason"] = "presegment_child_range_ocr_unsupported"
-            meta["ocr_reason"] = reason
-            logger.warning(
-                f"[pymupdf:child] {doc.file_path} pages={start}-{end} "
-                f"OCR 필요({reason})하나 range OCR 미지원 → PyMuPDF 결과 유지"
-            )
-        doc.extracted_text = pymupdf_text.replace("\x00", "")
-        doc.extracted_at = datetime.now(timezone.utc)
-        doc.extractor_version = PYMUPDF_VERSION if pymupdf_text.strip() else None
-        doc.extract_meta = meta
-        logger.info(
-            f"[pymupdf:child] {doc.file_path} pages={start}-{end} ({len(pymupdf_text)}자)"
-        )
-        return
-
    # ─── kordoc 파싱 (HWP/HWPX/PDF) + PyMuPDF fallback + OCR ───
    if fmt in KORDOC_FORMATS:
        container_path = f"/documents/{doc.file_path}"
@@ -17,7 +17,6 @@ Web/Blog ingest (devonagent 트랙, plan db-snuggly-petal.md):
  - sidecar (.json) 누락 시: skip 안 하고 ingest, web_meta.sidecar_missing=true
 """

-import asyncio
 import hashlib
 import json
 from pathlib import Path
@@ -59,23 +58,6 @@ SCAN_TARGETS: list[tuple[str, str | None]] = [
    ("Videos", "video"),
 ]

-# 안전 자료실 A-2/B-4 — watch 타깃별 (material_type, jurisdiction, license) deterministic 축.
-# 키 = 타깃 경로의 마지막 성분. license = extract_meta.license 에 그대로 주입(None=미주입).
-#   restricted=true → retrieval_service._license_sql 가 RAG 증거·digest 에서 제외(a안 U-2① —
-#   구매자료 verbatim span 차단, 색인 자체는 허용. 개인 파일 열람은 미차단).
-# 사용자 결정(2026-06-13): Books/Papers=proprietary+restricted / Manuals=proprietary·restricted=false
-#   (검색·RAG 활용) / KGS=법정 위임 상세기준 law/KR·KOGL 공공·restricted 아님.
-_TARGET_AXIS: dict[str, tuple[str, str | None, dict | None]] = {
-    "KGS_Code": ("law", "KR", {"scheme": "kogl", "redistribute": True,
-                               "restricted": False, "attribution": "한국가스안전공사(KGS)"}),
-    "Books": ("book", None, {"scheme": "proprietary", "redistribute": False,
-                             "restricted": True, "attribution": "구매 도서"}),
-    "Papers_Purchased": ("paper", None, {"scheme": "proprietary", "redistribute": False,
-                                         "restricted": True, "attribution": "구매 논문"}),
-    "Manuals": ("manual", None, {"scheme": "proprietary", "redistribute": False,
-                                 "restricted": False, "attribution": "기술 매뉴얼"}),
-}
-

 def should_skip(path: Path) -> bool:
    if path.name in SKIP_NAMES or path.name.startswith("._"):
@@ -118,18 +100,16 @@ def _route_media(path: Path, expected_category: str | None) -> tuple[str | None,
    if expected_category == "library":
        # 외부 작성 학습 자료 (KGS Code, 시행규칙 등). 문서 확장자만 수락.
        # frontmatter 해석은 classify_worker (옵션 C) 가 담당. file_watcher 는 라우팅만.
-        # G2: 첫 stage=presegment (후보 A 검증완료). 非PDF/단일 통과, 번들 PDF 만 분할.
        if ext in LIBRARY_DOC_EXTS:
-            return ("library", False, "presegment")
+            return ("library", False, "extract")
        if ext in AUDIO_EXTS or ext in VIDEO_DIRECT_EXTS or ext in VIDEO_QUARANTINE_EXTS:
            return (None, False, None)  # audio/video 잘못 들어오면 skip
        return (None, False, None)  # 기타 알 수 없는 확장자 skip

    # Inbox: 문서 파이프 (기존). audio/video 확장자가 실수로 여기 들어오면 skip.
-    # G2: 첫 stage=presegment (후보 A 검증완료). 非PDF/단일 통과, 번들 PDF 만 분할.
    if ext in AUDIO_EXTS or ext in VIDEO_DIRECT_EXTS or ext in VIDEO_QUARANTINE_EXTS:
        return (None, False, None)
-    return (None, False, "presegment")
+    return (None, False, "extract")


 # ─── Web/Blog ingest (devonagent 트랙) 헬퍼 ──────────────────────────────────
@@ -139,10 +119,6 @@ def _canonicalize_url(url: str) -> str:

    같은 글의 utm 변형 (`?utm_source=foo`) 과 fragment 변형 (`#section`) 을
    한 row 로 수렴시키기 위해 file_hash 산출 전 반드시 거친다.
-
-    ★R11c: news_collector._normalize_url(news 채널)과 의도적으로 다르다 — 이쪽(web_clip)은
-    query-sort/trailing-slash/소문자화로 공격적 정규화하지만, news 쪽은 query-식별 사이트의
-    별개 기사 붕괴 방지를 위해 보수적이다. 두 함수 통합 금지(채널별 dedup 의도가 다름).
    """
    if not url:
        return ""
@@ -228,8 +204,7 @@ async def _ingest_web_file(session, file_path: Path, rel_path: str) -> tuple[int
    )
    session.add(doc)
    await session.flush()
-    # G2: 첫 stage=presegment (후보 A 검증완료). HTML(非PDF)은 presegment 가 무변 통과 → extract.
-    await enqueue_stage(session, doc.id, "presegment")
+    await enqueue_stage(session, doc.id, "extract")
    return (1, 0)


@@ -254,8 +229,7 @@ async def watch_inbox():
    async with async_session() as session:
        # ─── Web/ 트랙 (devonagent) — DEVONthink Smart Rule 이 떨군 .html 만 진입 ───
        if web_root.exists():
-            # rglob NFS 디렉토리 walk(blocking stat 다발)를 off-thread 로 수집 (R5).
-            for file_path in await asyncio.to_thread(lambda: list(web_root.rglob("*.html"))):
+            for file_path in web_root.rglob("*.html"):
                if not file_path.is_file() or should_skip(file_path):
                    continue
                rel_path = str(file_path.relative_to(nas_root))
@@ -268,13 +242,7 @@ async def watch_inbox():
            if not scan_root.exists():
                continue

-            # 안전 자료실 A-2/B-4 — 타깃 폴더 기반 (material, jurisdiction, license)
-            target_mt, target_jur, target_license = _TARGET_AXIS.get(
-                Path(sub).name, (None, None, None)
-            )
-
-            # NFS 디렉토리 walk(blocking) off-thread 수집 (R5).
-            for file_path in await asyncio.to_thread(lambda: list(scan_root.rglob("*"))):
+            for file_path in scan_root.rglob("*"):
                if not file_path.is_file() or should_skip(file_path):
                    continue

@@ -288,11 +256,7 @@ async def watch_inbox():
                    continue

                rel_path = str(file_path.relative_to(nas_root))
-                # GB 파일 SHA-256 은 이벤트 루프를 점유 → 같은 루프의 모든 1분 주기 consumer
-                # + FastAPI 요청이 수십초~분 동시 정지. to_thread 오프로드. 스캔 루프가 이미
-                # 순차라 file_hash 는 한 번에 하나만 실행(직렬화) — 병렬 해싱 X = NFS 2.5GbE
-                # 대역폭·버퍼 메모리 blowup 방지 (R5).
-                fhash = await asyncio.to_thread(file_hash, file_path)
+                fhash = file_hash(file_path)

                result = await session.execute(
                    select(Document).where(Document.file_path == rel_path)
@@ -311,14 +275,7 @@ async def watch_inbox():
                        source_channel="drive_sync",
                        category=category,
                        needs_conversion=needs_conversion,
-                        # 안전 자료실 A-2/B-4 — watch 타깃 매핑 (KGS=law/KR 등, 비대상=NULL)
-                        material_type=target_mt,
-                        jurisdiction=target_jur,
                    )
-                    # B-4 — 타깃 폴더 license 주입(restricted 포함, 비대상=미주입). classify 는
-                    # material_type IS NULL 일 때만 제안 + extract_meta 미기록이라 주입 보존.
-                    if target_license:
-                        doc.extract_meta = {"license": dict(target_license)}
                    session.add(doc)
                    await session.flush()

@@ -334,15 +291,6 @@ async def watch_inbox():
                        existing.category = category
                    if needs_conversion and not getattr(existing, "needs_conversion", False):
                        existing.needs_conversion = True
-                    # B-4 — 축/license 보정(B-4 이전 적재분이 재변경 시): material 미설정 시 주입,
-                    # license 부재 시에만 merge 주입(clobber 회피 — 기존 extract_meta 키 보존).
-                    if existing.material_type is None and target_mt is not None:
-                        existing.material_type = target_mt
-                        existing.jurisdiction = target_jur
-                    if target_license and not (existing.extract_meta or {}).get("license"):
-                        meta = dict(existing.extract_meta or {})
-                        meta["license"] = dict(target_license)
-                        existing.extract_meta = meta

                    if next_stage:
                        await enqueue_stage(session, existing.id, next_stage)
@@ -1,320 +0,0 @@
-"""fulltext 승격 워커 (A-2 + A-7, plan crawl-24x7-1)
-
-news_collector 가 fulltext_policy='page' 소스의 기사에 enqueue 한 'fulltext' stage 를 소비:
-  기사 페이지 politeness fetch (A-4) → 원본 HTML NAS gzip 보존 (A-7)
-  → extract_worker 4-tier 재사용 (tier 2 sibling .md 는 디스크 원본이 없어 비적용)
-  → extracted_text/md_content 승격 → summarize + (30일 게이트) embed/chunk enqueue.
-
-실패 처리 (큐 어휘 = DB enum, 분기만 워커):
-  - 일시 오류 (5xx/timeout)            : raise → 큐 재시도 (max_attempts 3)
-  - 차단/비대상 (403/429/robots/비HTML/추출부족): RSS 요약으로 격하(degrade) 후 완료
-    → summarize/embed/chunk enqueue 보장 (기사 유실 0). 격하 사유는 extract_meta.fulltext 에 기록.
-  - 영구 실패 (3회 소진)                : 야간 reconcile_unresolved() 가 summarize 안전망 enqueue
-    ([[feedback_silent_skip_accumulation]] — 조건부 skip 이 영구 침묵으로 누적되지 않게).
-
-승격 게이트: 전 tier 공통 본문 >= 200자 (devonagent 와 달리 tier 4 도 게이트 적용 —
-페이월/오류 페이지의 nav 찌꺼기를 본문으로 승격하느니 RSS 요약 격하가 낫다).
-"""
-
-import gzip
-import hashlib
-import re
-from datetime import datetime, timezone
-from pathlib import Path
-
-from sqlalchemy import exists, select
-from sqlalchemy.ext.asyncio import AsyncSession
-from sqlalchemy.orm import aliased
-
-from core.config import settings
-from core.crawl_politeness import (
-    CrawlBlocked,
-    CrawlFetchError,
-    CrawlSkip,
-    fetch_page,
-    fetch_page_via_browser,
-    probe_session,
-)
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import ProcessingQueue, enqueue_stage
-from workers.extract_worker import (
-    _WEB_MIN_BODY_LEN,
-    _extract_web_with_bs4,
-    _extract_web_with_readability,
-    _extract_web_with_trafilatura,
-)
-
-logger = setup_logger("fulltext_worker")
-
-# 한국 기사 푸터 1층 후처리 (A-2) — 보수적으로 라인 단위만 제거
-_FOOTER_PATTERNS = [
-    re.compile(r"^.{0,120}(무단\s*전재|무단\s*복제|재배포\s*금지|저작권자\s*[ⓒ©(]).*$", re.M),
-    re.compile(r"^[\w.+-]+@[\w.-]+\.[A-Za-z]{2,}\s*$", re.M),  # 단독 이메일 라인
-    re.compile(r"^\s*\S{2,4}\s*기자\s*$", re.M),               # 단독 '◯◯◯ 기자' 라인
-]
-
-
-def _strip_article_footer(body: str) -> str:
-    for pat in _FOOTER_PATTERNS:
-        body = pat.sub("", body)
-    return re.sub(r"\n{3,}", "\n\n", body).strip()
-
-
-def _extract_body(html_text: str) -> tuple[str, str | None, str | None]:
-    """(body, engine, engine_version). 전 tier >= 200자 게이트, 미달이면 ("", None, None)."""
-    body, ver = _extract_web_with_trafilatura(html_text)
-    if body and len(body) >= _WEB_MIN_BODY_LEN:
-        return body, "trafilatura", ver
-    body, ver = _extract_web_with_readability(html_text)
-    if body and len(body) >= _WEB_MIN_BODY_LEN:
-        return body, "readability", ver
-    body, ver = _extract_web_with_bs4(html_text)
-    if body and len(body) >= _WEB_MIN_BODY_LEN:
-        return body, "bs4_text", ver
-    return "", None, None
-
-
-def _raw_html_path(source_id: int | None, file_hash: str, now: datetime) -> Path:
-    """A-7 원본 보존 경로 — NAS 본진. 한글 디렉토리의 NFC/NFD 비대칭을 피해 source_id 사용.
-
-    file_hash 는 DB 컬럼이 character(64) 라 32자 해시가 공백 패딩되어 돌아옴 — strip 필수
-    (미적용 시 NAS 파일명에 공백 32개 = 쉘/rsync 함정).
-    """
-    src_dir = f"src_{source_id}" if source_id is not None else "src_unknown"
-    return (
-        Path(settings.nas_mount_path) / "crawl_raw" / src_dir
-        / now.strftime("%Y-%m") / f"{file_hash.strip()}.html.gz"
-    )
-
-
-def _save_raw_html(path: Path, html_text: str) -> None:
-    path.parent.mkdir(parents=True, exist_ok=True)
-    with gzip.open(path, "wb") as f:
-        f.write(html_text.encode("utf-8", errors="replace"))
-
-
-async def _enqueue_downstream(session: AsyncSession, doc: Document) -> None:
-    """승격/격하 공통 후속 — summarize 무조건 + 30일 게이트 통과 시 embed/chunk."""
-    await enqueue_stage(session, doc.id, "summarize")
-    published_raw = (doc.extract_meta or {}).get("published_at")
-    if doc.source_channel == "crawl":
-        # 도메인 재료 코퍼스 — 발행일 무관 전량 색인 (30일 게이트는 뉴스 전용)
-        await enqueue_stage(session, doc.id, "embed")
-        await enqueue_stage(session, doc.id, "chunk")
-        return
-    days_old = 0
-    if published_raw:
-        try:
-            pub_dt = datetime.fromisoformat(published_raw)
-            days_old = (datetime.now(timezone.utc) - pub_dt).days
-        except ValueError:
-            days_old = 0  # 파싱 불가 = 신규 취급 (수집 시점 기본과 동일)
-    if days_old <= 30:
-        await enqueue_stage(session, doc.id, "embed")
-        await enqueue_stage(session, doc.id, "chunk")
-
-
-def _set_fulltext_meta(doc: Document, **fields) -> None:
-    """extract_meta.fulltext 갱신 — JSONB 변경 감지를 위해 dict 재할당."""
-    meta = dict(doc.extract_meta or {})
-    meta["fulltext"] = {**meta.get("fulltext", {}), **fields}
-    doc.extract_meta = meta
-
-
-_PROBE_TTL_SECONDS = 6 * 3600  # probe 유효 시간 — 만료 시 배치 경계에서 재검증
-
-
-async def _auth_session_ready(session: AsyncSession, source: NewsSource) -> tuple[bool, str]:
-    """B-3 ② 내용 기반 probe 게이트 + relogin_requested 소비 (수동 half-open).
-
-    플래그 소비는 '불가용 스킵' 분기보다 앞 — 어댑터 틱마다 도달 (r5 데드 버튼 함정 고정).
-    probe 실패 상태에서는 auth fetch 0회 (자동 재시도 루프 = 계정 잠금 직행 — B-3 ③).
-    복구 경로 = storage_state 갱신 후 relogin_requested 플래그 set (수동).
-    probe 설정은 source.selector_override JSONB: probe_url / min_body_chars / paywall_markers.
-    """
-    from workers.news_collector import _get_or_create_health
-
-    health = await _get_or_create_health(session, source.id)
-    now = datetime.now(timezone.utc)
-    cfg = source.selector_override or {}
-    probe_url = cfg.get("probe_url")
-
-    force = False
-    if health.relogin_requested:
-        health.relogin_requested = False  # 소비 = 1회 half-open 시도
-        health.updated_at = now
-        force = True
-        logger.info(f"[fulltext/auth] {source.name} relogin_requested 소비 — half-open probe")
-
-    if not force:
-        if health.last_probe_ok is False:
-            return False, "probe 실패 상태 (storage_state 갱신 + relogin_requested 대기)"
-        if (
-            health.last_probe_ok
-            and health.last_probe_at
-            and (now - health.last_probe_at).total_seconds() < _PROBE_TTL_SECONDS
-        ):
-            return True, ""
-
-    if not probe_url:
-        return False, "selector_override.probe_url 미설정"
-
-    result = await probe_session(
-        source.auth_profile,
-        probe_url,
-        int(cfg.get("min_body_chars", 800)),
-        list(cfg.get("paywall_markers", [])),
-    )
-    health.last_probe_at = now
-    health.last_probe_ok = bool(result.get("ok"))
-    health.updated_at = now
-    if not health.last_probe_ok:
-        logger.warning(f"[fulltext/auth] {source.name} probe 실패: {result.get('reason')}")
-        return False, str(result.get("reason"))
-    logger.info(f"[fulltext/auth] {source.name} probe OK ({result.get('body_chars')}자)")
-    return True, ""
-
-
-async def _degrade(session: AsyncSession, doc: Document, reason: str) -> None:
-    """본문 승격 실패 — RSS 요약 그대로 후속 단계 진행 (기사 유실 0)."""
-    _set_fulltext_meta(
-        doc, status="degraded", reason=reason[:300],
-        resolved_at=datetime.now(timezone.utc).isoformat(),
-    )
-    await _enqueue_downstream(session, doc)
-    logger.warning(f"[fulltext] doc={doc.id} 격하(RSS 요약 유지): {reason}")
-
-
-async def process(document_id: int, session: AsyncSession) -> None:
-    """기사 1건 풀텍스트 승격. queue_consumer 컨벤션 시그니처 (커밋은 consumer 가)."""
-    doc = await session.get(Document, document_id)
-    if not doc:
-        raise ValueError(f"문서 ID {document_id}를 찾을 수 없음")
-    if not doc.edit_url:
-        await _degrade(session, doc, "edit_url 없음")
-        return
-
-    meta = doc.extract_meta or {}
-    source_id = meta.get("source_id")
-
-    # B-3: 구독 소스(auth_profile)는 Playwright 세션 fetch — probe 게이트 선행
-    source = await session.get(NewsSource, source_id) if source_id else None
-    auth_profile = source.auth_profile if source is not None else None
-
-    if auth_profile:
-        ready, why = await _auth_session_ready(session, source)
-        if not ready:
-            await _degrade(session, doc, f"구독 세션 불가용: {why}")
-            return
-
-    try:
-        if auth_profile:
-            html_text, final_url = await fetch_page_via_browser(doc.edit_url, auth_profile)
-        else:
-            html_text, final_url = await fetch_page(doc.edit_url)
-    except (CrawlBlocked, CrawlSkip) as e:
-        await _degrade(session, doc, f"{type(e).__name__}: {e}")
-        return
-    except CrawlFetchError:
-        raise  # 일시 오류 — 큐 재시도
-
-    now = datetime.now(timezone.utc)
-
-    # A-7: 원본 HTML 보존 (추출기 교체 시 전체 재추출 가능 상태 유지)
-    raw_path = _raw_html_path(source_id, doc.file_hash, now)
-    try:
-        _save_raw_html(raw_path, html_text)
-        raw_saved = True
-    except OSError as e:
-        # NAS 일시 장애 시 보존만 누락하고 승격은 진행 — 사유 기록 (silent 누락 회피)
-        raw_saved = False
-        logger.error(f"[fulltext] doc={doc.id} 원본 보존 실패 (승격은 진행): {e}")
-
-    body, engine, engine_ver = _extract_body(html_text)
-    if not engine:
-        await _degrade(session, doc, f"추출 실패 (전 tier < {_WEB_MIN_BODY_LEN}자)")
-        return
-
-    clean_body = _strip_article_footer(body.replace("\x00", ""))
-    if len(clean_body) < _WEB_MIN_BODY_LEN:
-        await _degrade(session, doc, "푸터 제거 후 본문 부족")
-        return
-
-    # B-3: 추출 결과도 페이월 마커로 게이트 — probe 통과 후 만료된 세션의
-    # '페이월 안내문' 본문 승격(silent corruption) 차단 + 즉시 probe 상태 강등
-    if auth_profile:
-        from workers.news_collector import _get_or_create_health
-
-        markers = (source.selector_override or {}).get("paywall_markers", [])
-        hit = next((m for m in markers if m and m.lower() in clean_body.lower()), None)
-        if hit:
-            health = await _get_or_create_health(session, source.id)
-            health.last_probe_ok = False
-            health.updated_at = datetime.now(timezone.utc)
-            await _degrade(session, doc, f"본문 페이월 마커 검출({hit}) — 세션 손상 의심")
-            return
-
-    title = doc.title or ""
-    doc.extracted_text = f"{title}\n\n{clean_body}" if title else clean_body
-    doc.extracted_at = now
-    doc.extractor_version = f"rss+page@{engine}"
-    doc.md_content = clean_body
-    doc.md_status = "success"
-    doc.md_extraction_engine = engine
-    doc.md_extraction_engine_version = engine_ver
-    doc.md_format_version = "1.0"
-    doc.md_generated_at = now
-    doc.md_source_hash = hashlib.sha256(html_text.encode("utf-8", errors="replace")).hexdigest()
-    doc.md_content_hash = hashlib.sha256(clean_body.encode("utf-8")).hexdigest()
-    doc.md_extraction_error = None  # 수집 시점의 '변환 비대상' 마커 해제
-    doc.content_origin = "extracted"
-    doc.file_size = len(doc.extracted_text.encode())
-    _set_fulltext_meta(
-        doc, status="promoted", engine=engine,
-        raw_html_path=str(raw_path) if raw_saved else None,
-        final_url=final_url, body_chars=len(clean_body),
-        resolved_at=now.isoformat(),
-    )
-
-    await _enqueue_downstream(session, doc)
-    logger.info(
-        f"[fulltext/{engine}] doc={doc.id} {len(clean_body)}자 승격 "
-        f"(raw={'saved' if raw_saved else 'MISSING'})"
-    )
-
-
-async def reconcile_unresolved() -> None:
-    """안전망 (야간 1회): fulltext 영구 실패(3회 소진)로 summarize 가 영영 안 잡힌
-    뉴스 문서에 RSS 요약 기준 후속 단계를 enqueue. 멱등 — enqueue 후엔 조건 불일치."""
-    async with async_session() as session:
-        # 외부 쿼리 FROM 에 ProcessingQueue 가 이미 있어 alias 없이는 auto-correlation 이
-        # 서브쿼리 FROM 을 전부 제거 → InvalidRequestError (queue_consumer.reset_stale_items 패턴)
-        pq = aliased(ProcessingQueue)
-        summarize_q = (
-            select(pq.id)
-            .where(
-                pq.document_id == Document.id,
-                pq.stage == "summarize",
-            )
-        )
-        result = await session.execute(
-            select(Document)
-            .join(ProcessingQueue, ProcessingQueue.document_id == Document.id)
-            .where(
-                ProcessingQueue.stage == "fulltext",
-                ProcessingQueue.status == "failed",
-                Document.source_channel == "news",
-                ~exists(summarize_q),
-            )
-            .limit(200)
-        )
-        docs = result.scalars().unique().all()
-        for doc in docs:
-            _set_fulltext_meta(doc, status="failed_reconciled")
-            await _enqueue_downstream(session, doc)
-        if docs:
-            await session.commit()
-            logger.warning(f"[fulltext] reconcile: 영구 실패 {len(docs)}건 RSS 요약으로 후속 enqueue")
@@ -1,496 +0,0 @@
-"""C-2 KOSHA Open API 수집 워커 (plan crawl-24x7-1).
-
-4 API (2026-06-10/06-13 실키 live 검증 + fixture 박제 — tests/fixtures/kosha_*_response.json):
-  재해사례 게시판: GET /B552468/disaster_api02/getdisaster_api02            callApiId=1060
-  재해사례 첨부:   GET /B552468/disaster_attach_api02/Disaster_attach_api02 callApiId=1070
-  KOSHA GUIDE:    GET /B552468/koshaguide/getKoshaGuide                    callApiId=1050
-  사망사고 속보:   GET /B552468/news_api02/getNews_api02                     callApiId=1040
-
-daily 스케줄 1회 (main.py):
-  재해사례 = 최근 페이지만 diff (boardno dedup) — 사례 본문 Document(텍스트 네이티브)
-            + 첨부 PDF/HWP 다운로드 → /documents/crawl_raw/kosha/{boardno}/ 저장
-            → 파일 Document + extract enqueue (kordoc HWP/PDF 기존 파이프라인 재사용).
-  사망사고 = 최근 페이지만 diff (arno dedup) — 속보 본문 Document(HTML → _clean_html).
-            첨부 API 없음·business 필드 없음. 등록일 = arno 접두 8자리(YYYYMMDD).
-  GUIDE   = 전체 레지스트리 메타 diff (1039건, 100/page = 11 call) → 신규/개정만,
-            일일 ingest cap(기본 25) = backlog 자동 점진 백필(~6주) + 부하 평탄화.
-            cap 으로 미처리 잔량은 매회 로그 (silent cap 금지).
-
-키: KOSHA_API_KEY (credentials.env) — 공공데이터포털 '인코딩' 키를 그대로 저장.
-    httpx params= 로 넘기면 % 가 재인코딩되므로 반드시 URL 문자열에 직접 결합.
-개정 감지: GUIDE dedup 키 = 규정번호+공표일자 — 같은 번호의 새 공표일자 = 신규 문서로 적재.
-"""
-
-import asyncio
-import hashlib
-import os
-import random
-import re
-from datetime import date, datetime, timezone
-from pathlib import Path
-
-import httpx
-from sqlalchemy import select
-
-from core.config import settings
-from core.crawl_politeness import CRAWL_UA
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from workers.news_collector import (
-    FeedError,
-    _clean_html,
-    _get_or_create_health,
-    _record_failure,
-    _record_success,
-)
-
-logger = setup_logger("kosha_collector")
-
-_BASE = "https://apis.data.go.kr/B552468"
-_BOARD_EP = f"{_BASE}/disaster_api02/getdisaster_api02"
-_ATTACH_EP = f"{_BASE}/disaster_attach_api02/Disaster_attach_api02"
-_GUIDE_EP = f"{_BASE}/koshaguide/getKoshaGuide"
-_FATAL_EP = f"{_BASE}/news_api02/getNews_api02"
-
-_CASE_SOURCE = "KOSHA 재해사례"
-_GUIDE_SOURCE = "KOSHA GUIDE"
-_FATAL_SOURCE = "KOSHA 사망사고"
-
-_CASE_PAGES = 2          # daily diff 범위 (30×2 = 최근 60건 — 등록일 역순 API)
-_CASE_ROWS = 30
-_FATAL_PAGES = 2         # 사망사고 속보 daily diff (30×2 = 최근 60건 — 등록일 역순)
-_FATAL_ROWS = 30
-_GUIDE_ROWS = 100
-_GUIDE_DAILY_CAP = int(os.getenv("KOSHA_GUIDE_DAILY_CAP", "25"))
-_MAX_FILE_BYTES = 50 * 1024 * 1024
-_DOWNLOAD_DELAY = (2.0, 5.0)  # portal.kosha.or.kr 파일서버 — 연속 다운로드 간격
-
-# 안전 자료실 A-2 — KOSHA 산출물 라이선스 (KOGL 유형 미확정 → 보수적 redistribute=False,
-# 근거 확보 시 완화. 0-3 license 메타 deterministic 주입).
-_KOSHA_LICENSE = {"scheme": "kogl", "redistribute": False, "attribution": "한국산업안전보건공단(KOSHA)"}
-
-
-def _ymd_to_date(ymd: str | None) -> date | None:
-    """'YYYYMMDD'/'YYYY-MM-DD' → date. 형식 불일치는 None (fail-quiet — 날짜는 보조 축)."""
-    digits = re.sub(r"\D", "", ymd or "")
-    if len(digits) != 8:
-        return None
-    try:
-        return date(int(digits[:4]), int(digits[4:6]), int(digits[6:8]))
-    except ValueError:
-        return None
-
-
-def _api_key() -> str:
-    key = os.getenv("KOSHA_API_KEY", "")
-    if not key:
-        raise FeedError("KOSHA_API_KEY 미설정 — KOSHA 수집 불가")
-    return key
-
-
-async def _api_get(url: str) -> dict:
-    """공통 GET — 게이트웨이/제공자 이중 에러 체계 검사."""
-    async with httpx.AsyncClient(timeout=25) as client:
-        resp = await client.get(url, headers={"User-Agent": CRAWL_UA})
-    if resp.status_code != 200:
-        raise FeedError(f"KOSHA API {resp.status_code} @ {url.split('?')[0]}")
-    try:
-        payload = resp.json()
-    except ValueError as e:
-        # 게이트웨이 에러는 XML/plain 으로 옴 (SERVICE_KEY_IS_NOT_REGISTERED 등)
-        raise FeedError(f"KOSHA API 비-JSON 응답: {resp.text[:120]}") from e
-    code = (payload.get("header") or {}).get("resultCode")
-    if code != "00":
-        raise FeedError(f"KOSHA API resultCode={code}: {(payload.get('header') or {}).get('resultMsg')}")
-    return payload
-
-
-def _items(payload: dict) -> list[dict]:
-    """body.items.item — 단건이면 dict 로 오는 data.go.kr 관행 방어."""
-    item = ((payload.get("body") or {}).get("items") or {}).get("item")
-    if item is None:
-        return []
-    return [item] if isinstance(item, dict) else list(item)
-
-
-def _fatal_fields(item: dict) -> dict | None:
-    """사망사고 item(arno/keyword/contents 3필드 고정) → Document 필드 매핑.
-
-    순수 함수(httpx/DB 불요 — fixture 단위 테스트 대상). 필수 = arno+keyword,
-    부재 시 None(skip). 날짜 전용 필드가 없어 등록 식별자 arno 접두에서 유도:
-    arno = 'YYYYMMDDHHMMSS' + 임의 6자 (2019~ 라이브 전수 동형 검증). 접두 8자리=KST
-    등록일 → published_date, 14자리=등록시각 → reg_dt(원문 그대로, tz 해석 미주장).
-    """
-    arno = str(item.get("arno") or "").strip()
-    title = (item.get("keyword") or "").strip()
-    if not arno or not title:
-        return None
-    text = _clean_html(item.get("contents") or "", max_len=None)
-    reg_dt = arno[:14] if re.fullmatch(r"\d{14}", arno[:14]) else None
-    return {
-        "arno": arno,
-        "title": title,
-        "text": text,
-        "published_date": _ymd_to_date(arno[:8]),
-        "reg_dt": reg_dt,
-    }
-
-
-def _safe_filename(name: str) -> str:
-    """NAS 파일명 정화 — 경로분리자/제어문자/공백연쇄 제거 (쉘 함정 회피)."""
-    name = re.sub(r"[/\\\x00-\x1f]", "_", name).strip()
-    name = re.sub(r"\s+", " ", name)
-    return name[:140] or "unnamed"
-
-
-async def _download(url: str, dest: Path) -> int:
-    """첨부/규정 파일 다운로드 — 크기 cap + 디렉토리 생성 + 연속 간격."""
-    await asyncio.sleep(random.uniform(*_DOWNLOAD_DELAY))
-    async with httpx.AsyncClient(timeout=60, follow_redirects=True) as client:
-        resp = await client.get(url, headers={"User-Agent": CRAWL_UA})
-    if resp.status_code != 200:
-        raise FeedError(f"파일 다운로드 {resp.status_code}: {url}")
-    if len(resp.content) > _MAX_FILE_BYTES:
-        raise FeedError(f"파일 크기 초과 ({len(resp.content)} bytes): {url}")
-    dest.parent.mkdir(parents=True, exist_ok=True)
-    dest.write_bytes(resp.content)
-    return len(resp.content)
-
-
-async def _get_or_create_source(session, name: str, feed_url: str) -> NewsSource:
-    result = await session.execute(select(NewsSource).where(NewsSource.name == name))
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=name, feed_url=feed_url, feed_type="rss", fetch_method="api",
-            fulltext_policy="none", source_channel="crawl", category="Safety",
-            language="ko", country="KR",
-            enabled=False,  # 6h 뉴스 사이클 비대상 — 본 워커가 daily 폴링
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-async def _ingest_attachment(session, boardno: str, filenm: str, filepath: str) -> bool:
-    """첨부 1건 → NAS 저장 + 파일 Document + extract enqueue. 반환 = 신규 여부."""
-    safe = _safe_filename(filenm)
-    rel_path = f"crawl_raw/kosha/{boardno}/{safe}"
-    existing = await session.execute(
-        select(Document).where(Document.file_path == rel_path).limit(1)
-    )
-    if existing.scalars().first():
-        return False
-
-    dest = Path(settings.nas_mount_path) / rel_path
-    size = await _download(filepath, dest)
-    ext = (safe.rsplit(".", 1)[-1].lower() if "." in safe else "bin")[:10]
-
-    doc = Document(
-        file_path=rel_path,
-        file_hash=hashlib.sha256(dest.read_bytes()).hexdigest(),
-        file_format=ext,
-        file_size=size,
-        file_type="immutable",
-        title=safe.rsplit(".", 1)[0],
-        source_channel="crawl",
-        data_origin="external",
-        import_source="kosha_api",
-        edit_url=filepath,
-        ai_tags=["Safety/KOSHA재해사례/첨부"],
-        # 안전 자료실 A-2 — ingest 시점 deterministic (classify 경유해도 LLM 비의존)
-        material_type="incident",
-        jurisdiction="KR",
-        extract_meta={"kosha": {"boardno": boardno, "kind": "case_attachment"},
-                      "license": dict(_KOSHA_LICENSE)},
-    )
-    session.add(doc)
-    await session.flush()
-    # extract → (crawl override) classify → embed/chunk — 기존 파일 파이프라인 재사용
-    await enqueue_stage(session, doc.id, "extract")
-    logger.info(f"[kosha] 첨부 ingest: {rel_path} ({size} bytes)")
-    return True
-
-
-async def collect_disaster_cases(session) -> int:
-    """재해사례 daily diff — 최근 _CASE_PAGES 페이지, boardno dedup."""
-    key = _api_key()
-    source = await _get_or_create_source(session, _CASE_SOURCE, _BOARD_EP)
-    new_count = 0
-
-    for page in range(1, _CASE_PAGES + 1):
-        payload = await _api_get(
-            f"{_BOARD_EP}?serviceKey={key}&callApiId=1060&pageNo={page}&numOfRows={_CASE_ROWS}"
-        )
-        items = _items(payload)
-        if not items:
-            break
-        page_all_dup = True
-        for item in items:
-            boardno = str(item.get("boardno") or "").strip()
-            title = (item.get("keyword") or "").strip()
-            if not boardno or not title:
-                continue
-            fhash = hashlib.sha256(f"kosha-case|{boardno}".encode()).hexdigest()[:32]
-            existing = await session.execute(
-                select(Document).where(Document.file_hash == fhash).limit(1)
-            )
-            if existing.scalars().first():
-                continue
-            page_all_dup = False
-
-            contents = (item.get("contents") or "").strip()
-            business = (item.get("business") or "").strip()
-            now = datetime.now(timezone.utc)
-            doc = Document(
-                file_path=f"crawl/{_CASE_SOURCE}/{boardno}",
-                file_hash=fhash,
-                file_format="article",
-                file_size=len(contents.encode()),
-                file_type="note",
-                title=title,
-                extracted_text=f"{title}\n\n[{business}]\n{contents}",
-                extracted_at=now,
-                extractor_version="kosha_api",
-                md_status="skipped",
-                md_extraction_error="kosha case: 텍스트 네이티브, markdown 변환 비대상",
-                source_channel="crawl",
-                data_origin="external",
-                review_status="approved",
-                ai_domain="Safety",
-                ai_sub_group=_CASE_SOURCE,
-                ai_tags=[f"Safety/KOSHA재해사례/{business or '기타'}"],
-                # 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
-                material_type="incident",
-                jurisdiction="KR",
-                extract_meta={
-                    "source_id": source.id,
-                    "source_name": _CASE_SOURCE,
-                    "published_at": None,
-                    "kosha": {"boardno": boardno, "business": business,
-                              "atcflcnt": item.get("atcflcnt")},
-                    "license": dict(_KOSHA_LICENSE),
-                },
-            )
-            session.add(doc)
-            await session.flush()
-            await enqueue_stage(session, doc.id, "summarize")
-            await enqueue_stage(session, doc.id, "embed")
-            await enqueue_stage(session, doc.id, "chunk")
-            new_count += 1
-
-            # 첨부 (PDF/HWP) — 본문보다 정보량 큰 정식 사례 보고서
-            if int(item.get("atcflcnt") or 0) > 0:
-                attach = await _api_get(
-                    f"{_ATTACH_EP}?serviceKey={key}&callApiId=1070"
-                    f"&pageNo=1&numOfRows=10&boardno={boardno}"
-                )
-                for att in _items(attach):
-                    filenm = (att.get("filenm") or "").strip()
-                    filepath = (att.get("filepath") or "").strip()
-                    if not filenm or not filepath.startswith("https://"):
-                        continue
-                    try:
-                        await _ingest_attachment(session, boardno, filenm, filepath)
-                    except FeedError as e:
-                        logger.warning(f"[kosha] 첨부 실패 skip ({boardno}/{filenm}): {e}")
-
-            # 케이스 단위 commit (R4) — 이후 페이지/케이스의 _api_get 실패가 앞서 적재한
-            # 케이스까지 전체 rollback 하지 않게 부분 적재 보존 (csb/api_standards idiom).
-            await session.commit()
-        if page_all_dup:
-            break  # 등록일 역순 — 페이지 전체가 기존이면 이후 페이지도 기존
-
-    logger.info(f"[kosha] 재해사례 신규 {new_count}건")
-    return new_count
-
-
-async def collect_fatal_accidents(session) -> int:
-    """사망사고 속보 daily diff — 최근 _FATAL_PAGES 페이지, arno dedup.
-
-    재해사례(1060)와 별 채널(1040): business 필드·첨부 API 없음, contents=HTML.
-    본문 = 텍스트 네이티브(_clean_html) → md 변환 비대상, summarize/embed/chunk 큐.
-    """
-    key = _api_key()
-    source = await _get_or_create_source(session, _FATAL_SOURCE, _FATAL_EP)
-    new_count = 0
-
-    for page in range(1, _FATAL_PAGES + 1):
-        payload = await _api_get(
-            f"{_FATAL_EP}?serviceKey={key}&callApiId=1040&pageNo={page}&numOfRows={_FATAL_ROWS}"
-        )
-        items = _items(payload)
-        if not items:
-            break
-        page_all_dup = True
-        for item in items:
-            fields = _fatal_fields(item)
-            if fields is None:
-                continue
-            arno = fields["arno"]
-            fhash = hashlib.sha256(f"kosha-fatal|{arno}".encode()).hexdigest()[:32]
-            existing = await session.execute(
-                select(Document).where(Document.file_hash == fhash).limit(1)
-            )
-            if existing.scalars().first():
-                continue
-            page_all_dup = False
-
-            text = fields["text"]
-            now = datetime.now(timezone.utc)
-            doc = Document(
-                file_path=f"crawl/{_FATAL_SOURCE}/{arno}",
-                file_hash=fhash,
-                file_format="article",
-                file_size=len(text.encode()),
-                file_type="note",
-                title=fields["title"],
-                extracted_text=f"{fields['title']}\n\n{text}",
-                extracted_at=now,
-                extractor_version="kosha_api",
-                md_status="skipped",
-                md_extraction_error="kosha fatal: 텍스트 네이티브, markdown 변환 비대상",
-                source_channel="crawl",
-                data_origin="external",
-                review_status="approved",
-                ai_domain="Safety",
-                ai_sub_group=_FATAL_SOURCE,
-                ai_tags=["Safety/KOSHA사망사고"],
-                # 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
-                material_type="incident",
-                jurisdiction="KR",
-                published_date=fields["published_date"],
-                extract_meta={
-                    "source_id": source.id,
-                    "source_name": _FATAL_SOURCE,
-                    "published_at": None,
-                    "kosha": {"arno": arno, "kind": "fatal_accident",
-                              "reg_dt": fields["reg_dt"]},
-                    "license": dict(_KOSHA_LICENSE),
-                },
-            )
-            session.add(doc)
-            await session.flush()
-            await enqueue_stage(session, doc.id, "summarize")
-            await enqueue_stage(session, doc.id, "embed")
-            await enqueue_stage(session, doc.id, "chunk")
-            new_count += 1
-            # 케이스 단위 commit (R4) — 이후 페이지 실패가 앞 케이스 전체 rollback 방지.
-            await session.commit()
-        if page_all_dup:
-            break  # 등록일 역순 — 페이지 전체가 기존이면 이후 페이지도 기존
-
-    logger.info(f"[kosha] 사망사고 신규 {new_count}건")
-    return new_count
-
-
-async def collect_kosha_guide(session, cap: int = _GUIDE_DAILY_CAP) -> int:
-    """GUIDE 레지스트리 전체 메타 diff → 신규/개정만 다운로드 (일일 cap 점진 백필)."""
-    key = _api_key()
-    await _get_or_create_source(session, _GUIDE_SOURCE, _GUIDE_EP)
-    new_specs: list[dict] = []
-    page, total = 1, None
-
-    while True:
-        payload = await _api_get(
-            f"{_GUIDE_EP}?serviceKey={key}&callApiId=1050&pageNo={page}&numOfRows={_GUIDE_ROWS}"
-        )
-        if total is None:
-            total = int((payload.get("body") or {}).get("totalCount") or 0)
-        items = _items(payload)
-        if not items:
-            break
-        for item in items:
-            no = (item.get("techGdlnNo") or "").strip()
-            ymd = (item.get("techGdlnOfancYmd") or "").strip()
-            url = (item.get("fileDownloadUrl") or "").strip()
-            if not no or not url.startswith("https://"):
-                continue
-            fhash = hashlib.sha256(f"kosha-guide|{no}|{ymd}".encode()).hexdigest()[:32]
-            existing = await session.execute(
-                select(Document).where(Document.file_hash == fhash).limit(1)
-            )
-            if not existing.scalars().first():
-                new_specs.append({"no": no, "ymd": ymd, "url": url,
-                                  "name": (item.get("techGdlnNm") or no).strip(),
-                                  "fhash": fhash})
-        if page * _GUIDE_ROWS >= total:
-            break
-        page += 1
-
-    todo, deferred = new_specs[:cap], len(new_specs) - min(len(new_specs), cap)
-    ingested = 0
-    for spec in todo:
-        safe_no = _safe_filename(spec["no"])
-        rel_path = f"crawl_raw/kosha_guide/{safe_no}-{spec['ymd'] or 'nodate'}.pdf"
-        dest = Path(settings.nas_mount_path) / rel_path
-        try:
-            size = await _download(spec["url"], dest)
-        except FeedError as e:
-            logger.warning(f"[kosha] GUIDE 다운로드 실패 skip ({spec['no']}): {e}")
-            continue
-        doc = Document(
-            file_path=rel_path,
-            file_hash=spec["fhash"],
-            file_format="pdf",
-            file_size=size,
-            file_type="immutable",
-            title=f"{spec['name']} ({spec['no']})",
-            source_channel="crawl",
-            data_origin="external",
-            import_source="kosha_api",
-            edit_url=spec["url"],
-            ai_tags=["Safety/KOSHA GUIDE"],
-            # 안전 자료실 A-2 — GUIDE = 구속력 없는 권고 기술지침 (law 아님, plan 0-1)
-            material_type="guide",
-            jurisdiction="KR",
-            published_date=_ymd_to_date(spec["ymd"]),
-            extract_meta={"kosha": {"kind": "guide", "techGdlnNo": spec["no"],
-                                    "ofancYmd": spec["ymd"]},
-                          "license": dict(_KOSHA_LICENSE)},
-        )
-        session.add(doc)
-        await session.flush()
-        await enqueue_stage(session, doc.id, "extract")
-        ingested += 1
-        # 항목 단위 commit (R4) — 다운로드 실패가 앞서 적재한 GUIDE 항목 전체 rollback 방지.
-        await session.commit()
-
-    # silent cap 금지 — 잔량 가시화 (자동 점진 백필: 내일 cap 만큼 또 소화)
-    logger.info(f"[kosha] GUIDE 신규/개정 {len(new_specs)}건 중 {ingested}건 ingest"
-                + (f" (cap {cap}, 잔여 {deferred}건 — 일일 점진 백필)" if deferred > 0 else ""))
-    return ingested
-
-
-async def run() -> None:
-    """daily 1회 — 소스별 실패 격리 (재해사례 실패가 GUIDE 를 막지 않게)."""
-    now = datetime.now(timezone.utc)
-    for name, collector in ((_CASE_SOURCE, collect_disaster_cases),
-                            (_FATAL_SOURCE, collect_fatal_accidents),
-                            (_GUIDE_SOURCE, collect_kosha_guide)):
-        async with async_session() as session:
-            result = await session.execute(select(NewsSource).where(NewsSource.name == name))
-            source = result.scalars().first()
-            try:
-                count = await collector(session)
-                if source is None:  # 첫 실행에서 collector 가 생성
-                    result = await session.execute(
-                        select(NewsSource).where(NewsSource.name == name))
-                    source = result.scalars().first()
-                health = await _get_or_create_health(session, source.id)
-                _record_success(health, count, False, now)
-                await session.commit()
-            except Exception as e:
-                logger.error(f"[kosha] {name} 수집 실패: {e}")
-                await session.rollback()  # 부분 적재 폐기 후 health 만 기록
-                if source is not None:
-                    health = await _get_or_create_health(session, source.id)
-                    _record_failure(health, str(e) or repr(e), now)
-                    await session.commit()
-
-
-if __name__ == "__main__":
-    asyncio.run(run())
@@ -0,0 +1,363 @@
+"""법령 모니터 워커 — 국가법령정보센터 API 연동
+
+26개 법령 모니터링, 편/장 단위 분할 저장, 변경 이력 추적.
+매일 07:00 실행 (APScheduler).
+"""
+
+import os
+import re
+from datetime import datetime, timezone
+from pathlib import Path
+from xml.etree import ElementTree as ET
+
+import httpx
+from sqlalchemy import select
+
+from core.config import settings
+from core.database import async_session
+from core.utils import create_caldav_todo, escape_ical_text, file_hash, send_smtp_email, setup_logger
+from models.automation import AutomationState
+from models.document import Document
+from models.queue import enqueue_stage
+
+logger = setup_logger("law_monitor")
+
+LAW_SEARCH_URL = "https://www.law.go.kr/DRF/lawSearch.do"
+LAW_SERVICE_URL = "https://www.law.go.kr/DRF/lawService.do"
+
+# 모니터링 대상 법령 (26개)
+MONITORED_LAWS = [
+    # 산업안전보건 핵심
+    "산업안전보건법",
+    "산업안전보건법 시행령",
+    "산업안전보건법 시행규칙",
+    "산업안전보건기준에 관한 규칙",
+    "유해위험작업의 취업 제한에 관한 규칙",
+    "중대재해 처벌 등에 관한 법률",
+    "중대재해 처벌 등에 관한 법률 시행령",
+    # 건설안전
+    "건설기술 진흥법",
+    "건설기술 진흥법 시행령",
+    "건설기술 진흥법 시행규칙",
+    "시설물의 안전 및 유지관리에 관한 특별법",
+    # 위험물/화학
+    "위험물안전관리법",
+    "위험물안전관리법 시행령",
+    "위험물안전관리법 시행규칙",
+    "화학물질관리법",
+    "화학물질관리법 시행령",
+    "화학물질의 등록 및 평가 등에 관한 법률",
+    # 소방/전기/가스
+    "소방시설 설치 및 관리에 관한 법률",
+    "소방시설 설치 및 관리에 관한 법률 시행령",
+    "전기사업법",
+    "전기안전관리법",
+    "고압가스 안전관리법",
+    "고압가스 안전관리법 시행령",
+    "액화석유가스의 안전관리 및 사업법",
+    # 근로/환경
+    "근로기준법",
+    "환경영향평가법",
+]
+
+
+async def run():
+    """법령 변경 모니터링 실행"""
+    law_oc = os.getenv("LAW_OC", "")
+    if not law_oc:
+        logger.warning("LAW_OC 미설정 — 법령 API 승인 대기 중")
+        return
+
+    async with async_session() as session:
+        state = await session.execute(
+            select(AutomationState).where(AutomationState.job_name == "law_monitor")
+        )
+        state_row = state.scalar_one_or_none()
+        last_check = state_row.last_check_value if state_row else None
+
+        today = datetime.now(timezone.utc).strftime("%Y%m%d")
+        if last_check == today:
+            logger.info("오늘 이미 체크 완료")
+            return
+
+        new_count = 0
+        async with httpx.AsyncClient(timeout=30) as client:
+            for law_name in MONITORED_LAWS:
+                try:
+                    count = await _check_law(client, law_oc, law_name, session)
+                    new_count += count
+                except Exception as e:
+                    logger.error(f"[{law_name}] 체크 실패: {e}")
+
+        # 상태 업데이트
+        if state_row:
+            state_row.last_check_value = today
+            state_row.last_run_at = datetime.now(timezone.utc)
+        else:
+            session.add(AutomationState(
+                job_name="law_monitor",
+                last_check_value=today,
+                last_run_at=datetime.now(timezone.utc),
+            ))
+
+        await session.commit()
+        logger.info(f"법령 모니터 완료: {new_count}건 신규/변경 감지")
+
+
+async def _check_law(
+    client: httpx.AsyncClient,
+    law_oc: str,
+    law_name: str,
+    session,
+) -> int:
+    """단일 법령 검색 → 변경 감지 → 분할 저장"""
+    # 법령 검색 (lawSearch.do)
+    resp = await client.get(
+        LAW_SEARCH_URL,
+        params={"OC": law_oc, "target": "law", "type": "XML", "query": law_name},
+    )
+    resp.raise_for_status()
+
+    root = ET.fromstring(resp.text)
+    total = root.findtext(".//totalCnt", "0")
+    if total == "0":
+        logger.debug(f"[{law_name}] 검색 결과 없음")
+        return 0
+
+    # 정확히 일치하는 법령 찾기
+    for law_elem in root.findall(".//law"):
+        found_name = law_elem.findtext("법령명한글", "").strip()
+        if found_name != law_name:
+            continue
+
+        mst = law_elem.findtext("법령일련번호", "")
+        proclamation_date = law_elem.findtext("공포일자", "")
+        revision_type = law_elem.findtext("제개정구분명", "")
+
+        if not mst:
+            continue
+
+        # 이미 등록된 법령인지 확인 (같은 법령명 + 공포일자)
+        existing = await session.execute(
+            select(Document).where(
+                Document.title.like(f"{law_name}%"),
+                Document.source_channel == "law_monitor",
+            )
+        )
+        existing_docs = existing.scalars().all()
+
+        # 같은 공포일자 이미 있으면 skip
+        for doc in existing_docs:
+            if proclamation_date in (doc.title or ""):
+                return 0
+
+        # 이전 공포일 찾기 (변경 이력용)
+        prev_date = ""
+        if existing_docs:
+            prev_date = max(
+                (re.search(r'\d{8}', doc.title or "").group() for doc in existing_docs
+                 if re.search(r'\d{8}', doc.title or "")),
+                default=""
+            )
+
+        # 본문 조회 (lawService.do)
+        text_resp = await client.get(
+            LAW_SERVICE_URL,
+            params={"OC": law_oc, "target": "law", "MST": mst, "type": "XML"},
+        )
+        text_resp.raise_for_status()
+
+        # 분할 저장
+        count = await _save_law_split(
+            session, text_resp.text, law_name, proclamation_date,
+            revision_type, prev_date,
+        )
+
+        # DB 먼저 커밋 (알림 실패가 저장을 막지 않도록)
+        await session.commit()
+
+        # CalDAV + SMTP 알림 (실패해도 무시)
+        try:
+            _send_notifications(law_name, proclamation_date, revision_type)
+        except Exception as e:
+            logger.warning(f"[{law_name}] 알림 발송 실패 (무시): {e}")
+
+        return count
+
+    return 0
+
+
+async def _save_law_split(
+    session, xml_text: str, law_name: str, proclamation_date: str,
+    revision_type: str, prev_date: str,
+) -> int:
+    """법령 XML → 장(章) 단위 Markdown 분할 저장"""
+    root = ET.fromstring(xml_text)
+
+    # 조문단위에서 장 구분자 찾기 (조문키가 000으로 끝나는 조문)
+    units = root.findall(".//조문단위")
+    chapters = []  # [(장제목, [조문들])]
+    current_chapter = None
+    current_articles = []
+
+    for unit in units:
+        key = unit.attrib.get("조문키", "")
+        content = (unit.findtext("조문내용", "") or "").strip()
+
+        # 장 구분자: 키가 000으로 끝나고 내용에 "제X장" 포함
+        if key.endswith("000") and re.search(r"제\d+장", content):
+            # 이전 장/서문 저장
+            if current_articles:
+                chapter_name = current_chapter or "서문"
+                chapters.append((chapter_name, current_articles))
+            chapter_match = re.search(r"(제\d+장\s*.+)", content)
+            current_chapter = chapter_match.group(1).strip() if chapter_match else content.strip()
+            current_articles = []
+        else:
+            current_articles.append(unit)
+
+    # 마지막 장 저장
+    if current_articles:
+        chapter_name = current_chapter or "서문"
+        chapters.append((chapter_name, current_articles))
+
+    # 장 분할 성공
+    sections = []
+    if chapters:
+        for chapter_title, articles in chapters:
+            md_lines = [f"# {law_name}\n", f"## {chapter_title}\n"]
+            for article in articles:
+                title = article.findtext("조문제목", "")
+                content = article.findtext("조문내용", "")
+                if title:
+                    md_lines.append(f"\n### {title}\n")
+                if content:
+                    md_lines.append(content.strip())
+            section_name = _safe_name(chapter_title)
+            sections.append((section_name, "\n".join(md_lines)))
+    else:
+        # 장 분할 실패 → 전체 1파일
+        full_md = _law_xml_to_markdown(xml_text, law_name)
+        sections.append(("전문", full_md))
+
+    # 각 섹션 저장
+    inbox_dir = Path(settings.nas_mount_path) / "PKM" / "Inbox"
+    inbox_dir.mkdir(parents=True, exist_ok=True)
+    count = 0
+
+    for section_name, content in sections:
+        filename = f"{law_name}_{proclamation_date}_{section_name}.md"
+        file_path = inbox_dir / filename
+        file_path.write_text(content, encoding="utf-8")
+
+        rel_path = str(file_path.relative_to(Path(settings.nas_mount_path)))
+
+        # 변경 이력 메모
+        note = ""
+        if prev_date:
+            note = (
+                f"[자동] 법령 개정 감지\n"
+                f"이전 공포일: {prev_date}\n"
+                f"현재 공포일: {proclamation_date}\n"
+                f"개정구분: {revision_type}"
+            )
+
+        doc = Document(
+            file_path=rel_path,
+            file_hash=file_hash(file_path),
+            file_format="md",
+            file_size=len(content.encode()),
+            file_type="immutable",
+            title=f"{law_name} ({proclamation_date}) {section_name}",
+            source_channel="law_monitor",
+            data_origin="work",
+            category="law",
+            user_note=note or None,
+        )
+        session.add(doc)
+        await session.flush()
+
+        await enqueue_stage(session, doc.id, "extract")
+        count += 1
+
+    logger.info(f"[법령] {law_name} ({proclamation_date}) → {count}개 섹션 저장")
+    return count
+
+
+def _xml_section_to_markdown(elem) -> str:
+    """XML 섹션(편/장)을 Markdown으로 변환"""
+    lines = []
+    for article in elem.iter():
+        tag = article.tag
+        text = (article.text or "").strip()
+        if not text:
+            continue
+        if "조" in tag:
+            lines.append(f"\n### {text}\n")
+        elif "항" in tag:
+            lines.append(f"\n{text}\n")
+        elif "호" in tag:
+            lines.append(f"- {text}")
+        elif "목" in tag:
+            lines.append(f"  - {text}")
+        else:
+            lines.append(text)
+    return "\n".join(lines)
+
+
+def _law_xml_to_markdown(xml_text: str, law_name: str) -> str:
+    """법령 XML 전체를 Markdown으로 변환"""
+    root = ET.fromstring(xml_text)
+    lines = [f"# {law_name}\n"]
+
+    for elem in root.iter():
+        tag = elem.tag
+        text = (elem.text or "").strip()
+        if not text:
+            continue
+        if "편" in tag and "제목" not in tag:
+            lines.append(f"\n## {text}\n")
+        elif "장" in tag and "제목" not in tag:
+            lines.append(f"\n## {text}\n")
+        elif "조" in tag:
+            lines.append(f"\n### {text}\n")
+        elif "항" in tag:
+            lines.append(f"\n{text}\n")
+        elif "호" in tag:
+            lines.append(f"- {text}")
+        elif "목" in tag:
+            lines.append(f"  - {text}")
+
+    return "\n".join(lines)
+
+
+def _safe_name(name: str) -> str:
+    """파일명 안전 변환"""
+    return re.sub(r'[^\w가-힣-]', '_', name).strip("_")
+
+
+def _send_notifications(law_name: str, proclamation_date: str, revision_type: str):
+    """CalDAV + SMTP 알림"""
+    # CalDAV
+    caldav_url = os.getenv("CALDAV_URL", "")
+    caldav_user = os.getenv("CALDAV_USER", "")
+    caldav_pass = os.getenv("CALDAV_PASS", "")
+    if caldav_url and caldav_user:
+        create_caldav_todo(
+            caldav_url, caldav_user, caldav_pass,
+            title=f"법령 검토: {law_name}",
+            description=f"공포일자: {proclamation_date}, 개정구분: {revision_type}",
+            due_days=7,
+        )
+
+    # SMTP
+    smtp_host = os.getenv("MAILPLUS_HOST", "")
+    smtp_port = int(os.getenv("MAILPLUS_SMTP_PORT", "465"))
+    smtp_user = os.getenv("MAILPLUS_USER", "")
+    smtp_pass = os.getenv("MAILPLUS_PASS", "")
+    if smtp_host and smtp_user:
+        send_smtp_email(
+            smtp_host, smtp_port, smtp_user, smtp_pass,
+            subject=f"[법령 변경] {law_name} ({revision_type})",
+            body=f"법령명: {law_name}\n공포일자: {proclamation_date}\n개정구분: {revision_type}",
+        )
@@ -17,7 +17,7 @@ from sqlalchemy import select

 from core.config import settings
 from core.database import async_session
-from core.utils import file_hash, setup_logger
+from core.utils import file_hash, send_smtp_email, setup_logger
 from models.automation import AutomationState
 from models.document import Document
 from models.queue import enqueue_stage
@@ -55,13 +55,20 @@ def _detect_origin(subject: str, body: str) -> str:
    return "external"


-def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid: int | None):
-    """동기 IMAP 메일 가져오기 (asyncio.to_thread에서 실행)"""
+def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid: int | None, folder: str):
+    """동기 IMAP 메일 가져오기 (asyncio.to_thread에서 실행).
+
+    선별 폴더(MAILPLUS_FOLDER, 기본 'PKM')만 수집 — INBOX 전체 X.
+    폴더 부재 시 no-op (사용자가 MailPlus 규칙으로 폴더 생성 전까진 안전하게 0건).
+    """
    results = []
    conn = imaplib.IMAP4_SSL(host, port, timeout=30)
    try:
        conn.login(user, password)
-        conn.select("INBOX")
+        typ, _ = conn.select(folder)
+        if typ != "OK":
+            logger.info(f"[메일] 폴더 '{folder}' 없음/접근불가 — 수집 건너뜀 (no-op)")
+            return results

        if last_uid:
            # 증분 동기화: last_uid 이후
@@ -71,14 +78,13 @@ def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid:
            since = (datetime.now() - timedelta(days=7)).strftime("%d-%b-%Y")
            _, data = conn.uid("search", None, f"SINCE {since}")

-        uids = data[0].split()
+        uids = (data[0] or b"").split()
        for uid_bytes in uids:
            uid = int(uid_bytes)
            _, msg_data = conn.uid("fetch", uid_bytes, "(RFC822)")
            if msg_data[0] is None:
                continue
-            raw = msg_data[0][1]
-            results.append((uid, raw))
+            results.append((uid, msg_data[0][1]))
    finally:
        conn.logout()

@@ -91,15 +97,18 @@ async def run():
    port = int(os.getenv("MAILPLUS_PORT", "993"))
    user = os.getenv("MAILPLUS_USER", "")
    password = os.getenv("MAILPLUS_PASS", "")
+    folder = os.getenv("MAILPLUS_FOLDER", "PKM")

    if not all([host, user, password]):
        logger.warning("MailPlus 인증 정보 미설정")
        return

+    job_name = f"mailplus:{folder}"
+
    async with async_session() as session:
-        # 마지막 UID 조회
+        # 마지막 UID 조회 (UID 는 폴더별 네임스페이스 → job_name 에 폴더 포함)
        state = await session.execute(
-            select(AutomationState).where(AutomationState.job_name == "mailplus")
+            select(AutomationState).where(AutomationState.job_name == job_name)
        )
        state_row = state.scalar_one_or_none()
        last_uid = int(state_row.last_check_value) if state_row and state_row.last_check_value else None
@@ -107,7 +116,7 @@ async def run():
        # IMAP 동기 호출을 비동기로 래핑
        try:
            emails = await asyncio.to_thread(
-                _fetch_emails_sync, host, port, user, password, last_uid,
+                _fetch_emails_sync, host, port, user, password, last_uid, folder,
            )
        except Exception as e:
            logger.error(f"IMAP 연결 실패: {e}")
@@ -174,15 +183,10 @@ async def run():
                session.add(doc)
                await session.flush()

-                safe_subj = subject.replace("\n", " ").replace("\r", " ")[:200]
+                # 검색·색인 편입 (extract → classify → embed/chunk). 할일 연계 없음.
+                await enqueue_stage(session, doc.id, "extract")

-                # TODO: extract_worker가 eml 본문/첨부 파싱 지원 시 이 조건 제거
-                if doc.file_format != "eml":
-                    await enqueue_stage(session, doc.id, "extract")
-                else:
-                    logger.debug(f"[메일] {safe_subj} — eml extract 미지원, 큐 스킵")
-
-                archived.append(safe_subj)
+                archived.append(subject.replace("\n", " ").replace("\r", " ")[:200])
                max_uid = max(max_uid, uid)

            except Exception as e:
@@ -194,11 +198,18 @@ async def run():
            state_row.last_run_at = datetime.now(timezone.utc)
        else:
            session.add(AutomationState(
-                job_name="mailplus",
+                job_name=job_name,
                last_check_value=str(max_uid),
                last_run_at=datetime.now(timezone.utc),
            ))

        await session.commit()

+        # SMTP 알림
+        smtp_host = os.getenv("MAILPLUS_HOST", "")
+        smtp_port = int(os.getenv("MAILPLUS_SMTP_PORT", "465"))
+        if archived and smtp_host:
+            body = f"이메일 {len(archived)}건 수집 완료:\n\n" + "\n".join(f"- {s}" for s in archived)
+            send_smtp_email(smtp_host, smtp_port, user, password, "PKM 이메일 수집 알림", body)
+
        logger.info(f"이메일 {len(archived)}건 수집 완료 (max_uid={max_uid})")
@@ -17,7 +17,6 @@ md_content ref 형식: `![alt](docimg:img_001)` — image_key 가 sequence 기

 plan: ~/.claude/plans/piped-humming-crystal.md
 """
-import asyncio
 import base64
 import hashlib
 import json
@@ -39,11 +38,7 @@ from models.queue import ProcessingQueue

 logger = logging.getLogger(__name__)

-# 마크다운 추출 엔드포인트. compose env `MARKER_ENDPOINT`(base URL)에서 읽는다 —
-# 기본=marker(무변), 컷오버=`http://mineru-service:3301` 로 env 플립만으로 전환.
-# marker/mineru 가 동일 /convert 계약(file_path·start/end·md+base64 images)이라 워커 무변.
-_MARKDOWN_BASE = os.getenv("MARKER_ENDPOINT", "http://marker-service:3300").rstrip("/")
-MARKER_ENDPOINT = _MARKDOWN_BASE if _MARKDOWN_BASE.endswith("/convert") else _MARKDOWN_BASE + "/convert"
+MARKER_ENDPOINT = "http://marker-service:3300/convert"
 MARKER_TIMEOUT = 300                    # 큰 PDF 5 분 한도
 MAX_PAGES = 200                         # 소형 1-shot 경로 /convert max_pages 안전장치

@@ -73,13 +68,9 @@ _FORMAT_TO_MIME = {
    "gif": "image/gif",
 }

-# Phase 1B = PDF only (marker-service). office/hwp 는 C-2 에서 office_md 하이브리드로 분기.
+# Phase 1B = PDF only. DOCX 등은 후속 Phase.
 SUPPORTED_EXTENSIONS = {".pdf"}

-# C-2: office/hwp → md (OOXML=markitdown / hwp=LibreOffice). 변환기가 지원하는 suffix 집합.
-# 레거시 바이너리(.doc/.xls/.ppt)는 markitdown 미지원 → 여기 없음(=PDF-only 게이트에서 skip).
-from workers.office_md import SUPPORTED as OFFICE_MD_SUPPORTED  # noqa: E402
-
 # config.yaml document_types 의 한국어 label 직접 사용 (Pre-flight 결과).
 # Round 0 사용자 의도 = 표 중심 발주/계산/명세 도메인.
 SKIP_DOC_TYPES = {
@@ -185,22 +176,10 @@ async def process(document_id: int, session: AsyncSession) -> None:
        await _fail(session, document_id, "no file_path")
        return

-    # 후보 A: 자식(bundle cols)은 합성 file_path(`{부모}#p{s}-{e}`) → 실파일 = bundle_source_path
-    # 로 부모경로 복원. 일반 doc 은 그대로(접미사 없음). marker/mineru 는 실파일 + page 범위로 변환.
-    from workers.presegment_worker import bundle_source_path
-    container_path = _to_marker_path(bundle_source_path(doc.file_path))
+    container_path = _to_marker_path(doc.file_path)
+
+    # ---- (3) PDF only ----
    suffix = Path(container_path).suffix.lower()
-
-    # ---- (3) office/hwp → md (C-2): PDF 외 지원 포맷은 office_md 하이브리드 변환 ----
-    if suffix in OFFICE_MD_SUPPORTED:
-        await session.execute(
-            update(Document).where(Document.id == document_id).values(md_status="processing")
-        )
-        await session.commit()
-        await _process_office(doc, document_id, container_path, session)
-        return
-
-    # ---- (3.5) PDF only (그 외 확장자 = skip) ----
    if suffix not in SUPPORTED_EXTENSIONS:
        logger.info(f"markdown_skip_unsupported_extension id={document_id} ext={suffix}")
        await _set_skipped(
@@ -210,21 +189,7 @@ async def process(document_id: int, session: AsyncSession) -> None:
        return

    # ---- (4) page_count gauge + 분기 (LargeDoc split) ----
-    # G2 (PR-G2-2): 번들 자식 doc 은 부모 파일 공유 + 자기 page 범위([bundle_page_start, end],
-    # 1-based inclusive)만 변환해야 한다. page_offset = 절대 시작페이지(부모 파일 기준), page_count =
-    # 자식 범위의 페이지 수. cols 가 NULL(일반 doc)이면 page_offset=1 + 전체 page_count = 기존 동작 동일.
-    file_page_count = _get_page_count(container_path)
-    is_child = doc.bundle_page_start is not None and doc.bundle_page_end is not None
-    if is_child:
-        page_offset = doc.bundle_page_start
-        if file_page_count is not None:
-            child_end = min(doc.bundle_page_end, file_page_count)
-            page_count = max(0, child_end - doc.bundle_page_start + 1)
-        else:
-            page_count = doc.bundle_page_end - doc.bundle_page_start + 1
-    else:
-        page_offset = 1
-        page_count = file_page_count
+    page_count = _get_page_count(container_path)

    # >MAX_SPLIT_PAGES = 변환 안전상태(manual_review). silently skip 아님.
    if page_count is not None and page_count > MAX_SPLIT_PAGES:
@@ -243,35 +208,20 @@ async def process(document_id: int, session: AsyncSession) -> None:

    # ---- (6) 변환 분기: 소형 1-shot / 대형(>SPLIT_THRESHOLD) page-range 분할 ----
    if page_count is not None and page_count > SPLIT_THRESHOLD_PAGES:
-        await _process_split(doc, document_id, container_path, page_count, session, page_offset)
+        await _process_split(doc, document_id, container_path, page_count, session)
    else:
-        await _process_single(doc, document_id, container_path, session, page_count, page_offset)
+        await _process_single(doc, document_id, container_path, session)


 async def _process_single(
-    doc: Document, document_id: int, container_path: str, session: AsyncSession,
-    page_count: int | None = None, page_offset: int = 1,
+    doc: Document, document_id: int, container_path: str, session: AsyncSession
 ) -> None:
-    """소형 PDF(≤ SPLIT_THRESHOLD_PAGES) 통째 1-shot 변환 (Phase 1B/1B.5 기존 경로).
-
-    G2 (PR-G2-2): 번들 자식(page_offset>1)은 [page_offset, page_offset+page_count-1] 범위만
-    변환하도록 marker 에 start_page/end_page 를 명시한다. 일반 doc(page_offset=1)은 기존과
-    동일하게 max_pages 만 보낸다(payload byte-identical).
-    """
-    # 일반 doc = 기존 payload 유지. 자식만 절대 page 범위를 명시(부모 파일 기준 1-based inclusive).
-    if page_offset > 1 and page_count is not None:
-        req_json = {
-            "file_path": container_path,
-            "start_page": page_offset,
-            "end_page": page_offset + page_count - 1,
-        }
-    else:
-        req_json = {"file_path": container_path, "max_pages": MAX_PAGES}
+    """소형 PDF(≤ SPLIT_THRESHOLD_PAGES) 통째 1-shot 변환 (Phase 1B/1B.5 기존 경로)."""
    try:
        async with httpx.AsyncClient(timeout=MARKER_TIMEOUT) as client:
            resp = await client.post(
                MARKER_ENDPOINT,
-                json=req_json,
+                json={"file_path": container_path, "max_pages": MAX_PAGES},
            )
            resp.raise_for_status()
            data = resp.json()
@@ -307,10 +257,6 @@ async def _process_single(

    # ---- (7) image persist + md_content rewrite (Phase 1B.5) ----
    md_content_raw = data["md_content"]
-    # 2026-06-20 H1: 빈 추출(스캔/이미지 PDF)을 md_status=success + 빈 md 로 박제 X
-    # (계약: md_status in {success,partial} => md 非공백). office arm 동형 raise → queue 재시도 후 failed.
-    if not md_content_raw.strip():
-        raise ValueError("empty md_content (blank extraction) — success 박제 차단")
    images_resp = data.get("images") if MARKDOWN_IMAGE_PERSIST else None

    saved_images: list[dict[str, Any]] = []
@@ -422,134 +368,12 @@ async def _process_markdown_passthrough(
    )


-async def _process_office(
-    doc: Document, document_id: int, container_path: str, session: AsyncSession
-) -> None:
-    """office/hwp → md (C-2). C-5 상태머신 postcondition 의 office arm.
-
-    office_md.convert_office_to_md 는 이진 계약: 성공=비공백 md 반환 / 실패·빈출력·타임아웃·
-    의존성부재=OfficeMdError raise. 따라서:
-      - 성공 → md_status='success' (+ 비공백 md). 불변식 md_status ∈ {success,partial} ⟹ md 非공백 유지.
-      - 실패/예외 → _fail (md_status='failed', ¬success·¬skipped). silent 'success+빈md' 절대 없음.
-    partial arm 은 PDF split 전용 — office 는 이진이라 여기 없음. 'completed' 는 A-3 직렬화 전용(워커 미사용).
-    quality 는 content-type-aware: office=scored(_compute_quality). 동기 변환은 to_thread 로 event loop 비차단.
-    """
-    from workers.office_md import (
-        OfficeMdError,
-        convert_hwp_to_md_and_images,
-        convert_office_to_md,
-    )
-
-    suffix = Path(container_path).suffix.lower()
-    if suffix == ".hwp":
-        engine = "pyhwp"  # HWP5 binary: libhwplo 못 읽어 pyhwp 로 교체(2026-06-09)
-    elif suffix == ".hwpx":
-        engine = "libreoffice_hwp"  # HWPX 는 pyhwp 미지원 → LibreOffice 폴백
-    else:
-        engine = "markitdown"
-
-    hwp_images: list[dict[str, Any]] = []
-    try:
-        # 동기 subprocess/markitdown — 스레드로 빼서 이벤트 루프 비차단.
-        if suffix == ".hwp":
-            md_content, hwp_images = await asyncio.to_thread(
-                convert_hwp_to_md_and_images, container_path
-            )
-        else:
-            md_content = await asyncio.to_thread(convert_office_to_md, container_path)
-    except OfficeMdError as exc:
-        logger.warning(f"[marker] office md 변환 실패 id={document_id} engine={engine}: {exc}")
-        await _fail(session, document_id, f"office_md: {str(exc)[:990]}", engine=engine)
-        return
-    except Exception as exc:  # 예기치 못한 예외도 failed (success+빈md 절대 금지)
-        logger.exception(f"[marker] office md unexpected error id={document_id}: {exc}")
-        await _fail(session, document_id, f"office_md_unexpected: {str(exc)[:980]}", engine=engine)
-        return
-
-    # ---- 이미지 NAS persist (.hwp 전용) ----
-    # hwp5html 은 bindata raster 를 추출하나 본문 xhtml 에 <img> 앵커가 없어(orphan, --css/--html
-    # 동일) 인라인 위치 복원 불가 → marker(PDF) 의 _persist_images_to_nas 로 NAS 영속 후 md 말미
-    # 갤러리로 부착(docimg: ref = 뷰어 해석). OLE 수식/도형은 앵커도 raster 도 아니라 제외.
-    # docx/xlsx/pptx/hwpx 는 이미지 미처리(기존 동작 유지).
-    saved_images: list[dict[str, Any]] = []
-    orphan_paths: list[str] = []
-    if suffix == ".hwp" and MARKDOWN_IMAGE_PERSIST:
-        if hwp_images:
-            images_resp = [
-                {
-                    "bytes_b64": base64.b64encode(im["data"]).decode("ascii"),
-                    "format": im.get("format") or "png",
-                    "slug": "",
-                    "width": None,
-                    "height": None,
-                }
-                for im in hwp_images
-            ]
-            try:
-                saved_images = _persist_images_to_nas(document_id, images_resp)
-            except OSError as exc:
-                # NAS 일시 끊김 등 — transient. queue retry 로 복구.
-                logger.warning(
-                    f"[marker] hwp image persist NAS write failed id={document_id}: "
-                    f"{type(exc).__name__}: {exc}"
-                )
-                raise
-            if saved_images:
-                gallery = "\n\n## 첨부 이미지\n\n" + "\n\n".join(
-                    f"![](docimg:{img['image_key']})" for img in saved_images
-                )
-                md_content = md_content + gallery
-        # 재변환 시 현재 saved_images 기준으로 과거 document_images row/NAS 파일 정리.
-        orphan_paths = await _sync_document_images(
-            session, document_id, saved_images, {"engine": engine}
-        )
-
-    # 성공 — 계약상 md_content 는 비공백(빈출력은 raise). quality scored.
-    quality = _compute_quality(md_content, doc.extracted_text or "", {"page_count": None})
-    if saved_images:
-        quality.setdefault("warnings", []).append(f"hwp_images_appended:{len(saved_images)}")
-
-    await session.execute(
-        update(Document).where(Document.id == document_id).values(
-            md_content=md_content,
-            md_status="success",
-            md_extraction_engine=engine,
-            md_extraction_engine_version=None,
-            md_extraction_quality=quality,
-            md_content_hash=hashlib.sha256(md_content.encode("utf-8")).hexdigest(),
-            md_source_hash=doc.file_hash,
-            md_generated_at=_now(),
-            md_extraction_error=None,
-            md_frontmatter=doc.md_frontmatter or {},
-            md_format_version="1.0",
-            content_origin="extracted",
-        )
-    )
-    await session.commit()
-
-    # commit 후 고아 NAS 파일 unlink (best-effort, 실패해도 DB 정합 유지).
-    for orphan_path in orphan_paths:
-        try:
-            Path(orphan_path).unlink(missing_ok=True)
-        except Exception as exc:
-            logger.warning(
-                f"[marker] orphan image unlink failed id={document_id} path={orphan_path}: "
-                f"{type(exc).__name__}: {exc}"
-            )
-
-    logger.info(
-        f"[marker] office success id={document_id} engine={engine} "
-        f"len={len(md_content)} images={len(saved_images)}"
-    )
-
-
 async def _process_split(
    doc: Document,
    document_id: int,
    container_path: str,
    page_count: int,
    session: AsyncSession,
-    page_offset: int = 1,
 ) -> None:
    """대형 PDF page-range 분할 변환.

@@ -560,10 +384,6 @@ async def _process_split(

    invariant: page numbering = 1-based inclusive (batch1: 1..BATCH_PAGES, ...).
    marker slug(`_page_0_*`) 는 batch 마다 재시작 → batch 별 rewrite 후 stitch (충돌 회피).
-
-    G2 (PR-G2-2): page_offset = 부모 파일 기준 절대 시작페이지(번들 자식). marker 에 보내는
-    page 는 절대값(page_offset 가산), manifest/기록은 자식 상대값(1-based) 유지 — 일반 doc
-    (page_offset=1)은 abs==rel 이라 기존 동작과 동일.
    """
    n_batches = (page_count + BATCH_PAGES - 1) // BATCH_PAGES
    succeeded: list[dict[str, Any]] = []   # {start_page, end_page, md}
@@ -575,17 +395,15 @@ async def _process_split(

    async with httpx.AsyncClient(timeout=MARKER_TIMEOUT) as client:
        for b in range(n_batches):
-            start_page = b * BATCH_PAGES + 1          # 자식 상대 1-based (manifest/기록용)
+            start_page = b * BATCH_PAGES + 1
            end_page = min((b + 1) * BATCH_PAGES, page_count)
-            abs_start = start_page + (page_offset - 1)  # 부모 파일 절대 page (marker 요청용)
-            abs_end = end_page + (page_offset - 1)
            try:
                resp = await client.post(
                    MARKER_ENDPOINT,
                    json={
                        "file_path": container_path,
-                        "start_page": abs_start,
-                        "end_page": abs_end,
+                        "start_page": start_page,
+                        "end_page": end_page,
                    },
                )
                resp.raise_for_status()
@@ -657,8 +475,6 @@ async def _process_split(

    md_status = "success" if not failed else "partial"
    stitched = "\n\n".join(b["md"] for b in succeeded)
-    if not stitched.strip():
-        raise ValueError("empty stitched md_content (all batches blank) — success 박제 차단")
    md_content = _build_large_md_content(stitched[:LARGE_DOC_MD_CONTENT_HEAD_CHARS], manifest)

    quality = _compute_quality(stitched, doc.extracted_text or "", {"page_count": page_count})
@@ -963,17 +779,15 @@ async def _set_skipped(session: AsyncSession, document_id: int, reason: str) ->
    await session.commit()


-async def _fail(
-    session: AsyncSession, document_id: int, error: str, *, engine: str = "marker"
-) -> None:
-    """doc-level failed (재시도 무의미). engine = 실패한 변환 엔진(office=markitdown/libreoffice_hwp)."""
+async def _fail(session: AsyncSession, document_id: int, error: str) -> None:
+    """doc-level failed (재시도 무의미)."""
    await session.execute(
        update(Document).where(Document.id == document_id).values(
            md_status="failed",
            md_content=None,
            md_content_hash=None,
            md_extraction_error=error,
-            md_extraction_engine=engine,
+            md_extraction_engine="marker",
            md_generated_at=_now(),
            content_origin="extracted",
        )
@@ -1,110 +0,0 @@
-"""메모 → 문서 승격 시 거친 메모를 구조화된 마크다운 문서로 정리 (26B, P2).
-
-`POST /memos/{id}/promote-to-document` 가 `source_metadata.needs_draft=true` 마커를
-찍으면 본 스케줄 워커가 집어 AIClient.call_primary(26B Mac mini = 로컬, 과금규칙 부합)로
-md_content 를 생성한다. markdown canonical Phase 1A 스키마 재사용:
-  - content_origin='ai_drafted'  +  md_draft_status='draft'
-    (migration 212 제약: md_draft_status NOT NULL → content_origin='ai_drafted' 필수)
-  - md_status='success', md_extraction_engine='ai_draft'
-원본 메모는 extracted_text 에 보존(검색/청크는 원문 사용). "필요시" = 이미 정돈된 메모는
-프롬프트가 형식만 다듬고, 거친 메모는 구조화하도록 지시(사실 추가 금지).
-"""
-
-import logging
-from datetime import datetime, timezone
-
-from sqlalchemy import select
-
-from ai.client import AIClient, strip_thinking
-from core.database import async_session
-from models.document import Document
-from services.search.llm_gate import Priority, acquire_mlx_gate
-
-logger = logging.getLogger(__name__)
-
-# 한 번에 처리할 승격 문서 수 (26B 콜 = 무겁다 → 소량 순차). interval 잡이라 다음 틱에 이어 처리.
-_BATCH = 2
-# 너무 짧은 메모는 문서화 의미 없음 — 마커만 정리하고 md 생성 스킵.
-_MIN_CHARS = 20
-
-_DRAFT_SYSTEM = (
-    "당신은 사용자의 거친 메모를 사실 추가 없이 깔끔한 마크다운 문서로 정리하는 도우미입니다."
-)
-_DRAFT_PROMPT = """다음은 사용자가 빠르게 적은 메모입니다. 이를 정식 자료 문서로 정리하세요.
-
-규칙:
- 메모에 있는 정보만 사용하고, 내용·사실을 추가하거나 추측하지 마세요.
- 이미 잘 정돈돼 있으면 형식만 다듬고, 거친 메모면 제목·소제목·목록으로 구조화하세요.
- 원문 언어를 유지하세요(한국어는 한국어, 영어는 영어).
- 출력은 마크다운 본문만. 인사말·메타 설명 없이 문서 내용만 출력하세요.
-
--- 메모 ---
-{content}
--- 끝 ---"""
-
-
-async def _ids_needing_draft() -> list[int]:
-    async with async_session() as session:
-        rows = (
-            await session.execute(
-                select(Document.id)
-                .where(
-                    Document.deleted_at.is_(None),
-                    # JSONB 마커 (json/jsonb 공통 ->> 연산자). promote 가 needs_draft=true 세팅.
-                    Document.source_metadata.op("->>")("needs_draft") == "true",
-                )
-                .order_by(Document.id)
-                .limit(_BATCH)
-            )
-        ).scalars().all()
-    return list(rows)
-
-
-async def run() -> None:
-    """needs_draft 마커가 찍힌 승격 문서를 26B로 문서화 (interval job, no-arg)."""
-    ids = await _ids_needing_draft()
-    if not ids:
-        return
-
-    client = AIClient()
-    for doc_id in ids:
-        # 문서별 독립 세션·트랜잭션 — 1건 실패가 나머지를 막지 않게.
-        async with async_session() as session:
-            try:
-                doc = await session.get(Document, doc_id)
-                if doc is None or not (doc.source_metadata or {}).get("needs_draft"):
-                    continue  # 경합/이미 처리됨
-
-                source = (doc.extracted_text or "").strip()
-                now = datetime.now(timezone.utc)
-                meta = dict(doc.source_metadata or {})
-
-                md = ""
-                if len(source) >= _MIN_CHARS:
-                    # 26B 호출은 반드시 mlx gate(Semaphore 1) 안에서 — 동시 호출 pile-up 방지
-                    # ([[feedback_llm_verification_load_pileup]]). BACKGROUND = 사용자 대면보다 양보.
-                    async with acquire_mlx_gate(Priority.BACKGROUND):
-                        raw = await client.call_primary(
-                            _DRAFT_PROMPT.format(content=source), system=_DRAFT_SYSTEM
-                        )
-                    md = strip_thinking(raw or "").strip()
-
-                if md:
-                    doc.md_content = md
-                    # 제약(212): md_draft_status NOT NULL 이면 content_origin='ai_drafted' 여야 함.
-                    doc.content_origin = "ai_drafted"
-                    doc.md_draft_status = "draft"
-                    doc.md_status = "success"
-                    doc.md_extraction_engine = "ai_draft"
-                    doc.md_generated_at = now
-                    meta["drafted_at"] = now.isoformat()
-
-                # 성공/스킵 모두 마커 해제(무한 재시도 방지). 26B 호출 자체가 예외면 except 로 빠져 마커 유지.
-                meta["needs_draft"] = False
-                doc.source_metadata = meta
-                doc.updated_at = now
-                await session.commit()
-                logger.info("memo_draft doc=%s md_len=%d", doc_id, len(md))
-            except Exception:
-                logger.exception("memo_draft 실패 doc=%s (다음 틱 재시도)", doc_id)
-                await session.rollback()
@@ -1,30 +1,20 @@
-"""뉴스 수집 워커 — RSS/API에서 기사 수집, documents에 저장
+"""뉴스 수집 워커 — RSS/API에서 기사 수집, documents에 저장"""

-plan crawl-24x7-1 A그룹 (2026-06-10):
-  A-1 조건부 GET(ETag/Last-Modified 그대로 재전송) + 콘텐츠 해시 변경감지
-  A-2 fulltext_policy='page' 소스는 'fulltext' stage 로 본문 승격 위임
-  A-5 source_health 기록 + circuit breaker (소스별 실패 격리)
-  A-6 first-wins + 포털 전재 2차 dedup (제목+최근 3일, 12자 이상 제목 한정)
-"""
-
-import asyncio
 import hashlib
 import re
-from datetime import datetime, timedelta, timezone
+from datetime import datetime, timezone
 from html import unescape
-from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
+from urllib.parse import urlparse, urlunparse

 import feedparser
 import httpx
 from sqlalchemy import select

-from core.crawl_politeness import CRAWL_UA
 from core.database import async_session
 from core.utils import setup_logger
 from models.document import Document
 from models.news_source import NewsSource
 from models.queue import enqueue_stage
-from models.source_health import SourceHealth

 logger = setup_logger("news_collector")

@@ -36,7 +26,6 @@ CATEGORY_MAP = {
    "환경": "Environment", "기술": "Technology",
    # 영어
    "World": "International", "International": "International",
-    "World news": "International",  # Guardian sectionName (B-2)
    "Technology": "Technology", "Tech": "Technology", "Sci-Tech": "Technology",
    "Arts": "Culture", "Culture": "Culture",
    "Climate": "Environment", "Environment": "Environment",
@@ -46,54 +35,27 @@ CATEGORY_MAP = {
    "Kultur": "Culture", "Wissenschaft": "Technology",
    # 프랑스어
    "Environnement": "Environment",
-    # 도메인 채널 (source_channel='crawl', 0-5 (a)) — 양쪽 공통 맵
-    "안전": "Safety", "Safety": "Safety",
-    "공학": "Engineering", "Engineering": "Engineering",
-    "철학": "Philosophy", "Philosophy": "Philosophy",
 }


-class FeedError(Exception):
-    """소스 단위 fetch/parse 실패 — run() 이 source_health 실패로 기록."""
-
-
 def _normalize_category(raw: str) -> str:
    """카테고리 표준화"""
    return CATEGORY_MAP.get(raw, CATEGORY_MAP.get(raw.strip(), "Other"))


-def _clean_html(text: str, max_len: int | None = 1000) -> str:
-    """HTML 태그 제거 + 정제. max_len=None 이면 절단 없음 (feed-full 전문용)."""
+def _clean_html(text: str) -> str:
+    """HTML 태그 제거 + 정제"""
    if not text:
        return ""
    text = re.sub(r"<[^>]+>", "", text)
    text = unescape(text)
-    text = text.strip()
-    return text if max_len is None else text[:max_len]
-
-
-# tracking 파라미터 판별 — prefix(utm_/at_=BBC/ns_=BBC/mc_=mailchimp) + 단독 키
-_TRACKING_PREFIXES = ("utm_", "at_", "ns_", "mc_")
-_TRACKING_PARAMS = {"fbclid", "gclid", "igshid", "ref", "smid", "partner", "cmp", "ocid", "ftag"}
+    return text.strip()[:1000]


 def _normalize_url(url: str) -> str:
-    """URL 정규화 — tracking 파라미터만 제거, 콘텐츠 식별 파라미터는 보존.
-
-    query 전체 제거 금지: hada.io/topic?id= · aitimes articleView.html?idxno= ·
-    HN item?id= 등 query-식별 사이트에서 별개 기사가 같은 URL 로 붕괴된다.
-    저장(edit_url)·조회 양쪽이 이 함수를 공유해야 dedup 이 성립.
-
-    ★R11c: file_watcher._canonicalize_url(web_clip 채널)과 의도적으로 다르다 — 이쪽은 콘텐츠
-    식별 query 보존(별개 기사 붕괴 방지)이 핵심이라 query-sort/trailing-slash/소문자화를 안 한다.
-    두 함수 통합 금지(news dedup 가 깨짐). 채널별 normalization 은 의도된 설계.
-    """
+    """URL 정규화 (tracking params 제거)"""
    parsed = urlparse(url)
-    kept = [
-        (k, v) for k, v in parse_qsl(parsed.query, keep_blank_values=True)
-        if not (k.lower().startswith(_TRACKING_PREFIXES) or k.lower() in _TRACKING_PARAMS)
-    ]
-    return urlunparse((parsed.scheme, parsed.netloc, parsed.path, "", urlencode(kept), ""))
+    return urlunparse((parsed.scheme, parsed.netloc, parsed.path, "", "", ""))


 def _article_hash(title: str, published: str, source_name: str) -> str:
@@ -111,149 +73,34 @@ def _normalize_to_utc(dt) -> datetime:
    return datetime.now(timezone.utc)


-# ── A-5: circuit breaker 정책 ──
-# 연속 실패 >= OPEN 임계 → open (재시도 간격 지수 확대, 6h × 2^n, cap 48h)
-# 연속 실패 >  DISABLE 임계 → disabled (수집 제외 + 가시 로그, 수동 복구 대상)
-# news_sources.enabled 는 건드리지 않는다 — 사용자 의도(enabled)와 자동 상태(circuit) 분리.
-_CIRCUIT_OPEN_AFTER = 3
-_CIRCUIT_DISABLE_AFTER = 10
-_BACKOFF_BASE_HOURS = 6
-_BACKOFF_CAP_HOURS = 48
-_EMPTY_STREAK_ALERT = 8  # 6h 사이클 × 8 = 약 2일 연속 빈 피드 → 가시 경고
-
-
-def _should_attempt(health: SourceHealth, now: datetime) -> bool:
-    """circuit 상태에 따라 이번 사이클 fetch 여부 결정.
-
-    주의 (B-3 계약 ②, r5): 추후 relogin_requested 플래그 소비는 반드시 이
-    open-스킵 분기보다 *앞*에 두어야 한다 — open 이 스케줄 제외 형태가 되면
-    배치 경계가 안 와 플래그가 영원히 미소비(half-open 데드 버튼)가 된다.
-    """
-    if health.circuit_state == "disabled":
-        return False
-    if health.circuit_state == "open" and health.last_error_at is not None:
-        over = max(health.consecutive_failures - _CIRCUIT_OPEN_AFTER, 0)
-        backoff_h = min(_BACKOFF_BASE_HOURS * (2 ** over), _BACKOFF_CAP_HOURS)
-        if now - health.last_error_at < timedelta(hours=backoff_h):
-            return False
-    return True
-
-
-def _record_success(health: SourceHealth, items: int, not_modified: bool, now: datetime) -> None:
-    health.consecutive_failures = 0
-    health.total_fetches += 1
-    health.last_success_at = now
-    health.last_fetch_items = items
-    if health.circuit_state != "closed":
-        logger.info(f"[health] source={health.source_id} circuit {health.circuit_state}→closed")
-        health.circuit_state = "closed"
-        health.circuit_opened_at = None
-    # 빈 피드 streak: 304/해시동일은 정상 신호라 미집계, 200+entries 0 만 집계 (피드 부패 감시)
-    if not_modified:
-        pass
-    elif items == 0:
-        health.empty_streak += 1
-        if health.empty_streak >= _EMPTY_STREAK_ALERT:
-            logger.error(
-                f"[health] source={health.source_id} 빈 피드 {health.empty_streak}회 연속 "
-                f"— 피드 부패 의심 (RSSHub 류 라우트 깨짐 패턴)"
-            )
-    else:
-        health.empty_streak = 0
-    health.updated_at = now
-
-
-def _record_failure(health: SourceHealth, error: str, now: datetime) -> None:
-    health.consecutive_failures += 1
-    health.total_fetches += 1
-    health.total_failures += 1
-    health.last_error = error[:500]
-    health.last_error_at = now
-    health.updated_at = now
-    cf = health.consecutive_failures
-    if cf > _CIRCUIT_DISABLE_AFTER and health.circuit_state != "disabled":
-        health.circuit_state = "disabled"
-        logger.error(
-            f"[health] source={health.source_id} 연속 실패 {cf}회 — circuit DISABLED "
-            f"(수집 제외, A-8 패널에서 수동 복구 필요)"
-        )
-    elif cf >= _CIRCUIT_OPEN_AFTER and health.circuit_state == "closed":
-        health.circuit_state = "open"
-        health.circuit_opened_at = now
-        logger.warning(f"[health] source={health.source_id} 연속 실패 {cf}회 — circuit open")
-
-
-async def _get_or_create_health(session, source_id: int) -> SourceHealth:
-    result = await session.execute(
-        select(SourceHealth).where(SourceHealth.source_id == source_id)
-    )
-    health = result.scalars().first()
-    if health is None:
-        health = SourceHealth(source_id=source_id)
-        session.add(health)
-        await session.flush()
-    return health
-
-
-# 수동 POST /api/news/collect 와 6h 스케줄 사이클의 동시 실행 차단 (단일 프로세스·단일
-# 이벤트루프). 동시 진입 시 _get_or_create_health 가 같은 source_id 를 양쪽에서 INSERT
-# → uq_source_health_source_id 위반 IntegrityError 로 사이클 전체가 죽는 경합의 원천 봉쇄.
-_run_lock = asyncio.Lock()
-
-
 async def run():
    """뉴스 수집 실행"""
-    async with _run_lock:
-        await _run_locked()
-
-
-async def _run_locked():
-    now = datetime.now(timezone.utc)
    async with async_session() as session:
        result = await session.execute(
            select(NewsSource).where(NewsSource.enabled == True)
        )
-        source_ids = [s.id for s in result.scalars().all()]
+        sources = result.scalars().all()

-    if not source_ids:
-        logger.info("활성화된 뉴스 소스 없음")
-        return
+        if not sources:
+            logger.info("활성화된 뉴스 소스 없음")
+            return

-    # 2026-06-20 H3: 소스마다 독립 세션 — 한 소스의 DB 오류가 종단 단일 commit 을 깨뜨려
-    # 전 소스 insert 를 잃던 것 차단. 실패 시 rollback 후 깨끗한 상태에서 failure 기록.
-    # (csb_collector 의 per-iteration 세션 패턴과 동형.)
-    total = 0
-    for sid in source_ids:
-        async with async_session() as session:
-            source = await session.get(NewsSource, sid)
-            if source is None:
-                continue
-            sname = source.name
-            health = await _get_or_create_health(session, sid)
-            if not _should_attempt(health, now):
-                logger.info(f"[{sname}] circuit {health.circuit_state} — 이번 사이클 skip")
-                continue
+        total = 0
+        for source in sources:
            try:
                if source.feed_type == "api":
-                    count, status = await _fetch_api(session, source)
+                    count = await _fetch_api(session, source)
                else:
-                    count, status = await _fetch_rss(session, source)
+                    count = await _fetch_rss(session, source)

                source.last_fetched_at = datetime.now(timezone.utc)
-                _record_success(health, count, status == "not_modified", now)
                total += count
-                await session.commit()
            except Exception as e:
-                # str 이 빈 예외(httpx.ConnectError('')) 대비 — health 기록과 동일 규칙
-                await session.rollback()
-                logger.error(f"[{sname}] 수집 실패: {str(e) or repr(e)}")
-                health = await _get_or_create_health(session, sid)
-                src = await session.get(NewsSource, sid)
-                if src is not None:
-                    src.last_fetched_at = datetime.now(timezone.utc)
-                _record_failure(health, str(e) or repr(e), now)
-                await session.commit()
-    logger.info(f"뉴스 수집 완료: {total}건 신규")
+                logger.error(f"[{source.name}] 수집 실패: {e}")
+                source.last_fetched_at = datetime.now(timezone.utc)
+
+        await session.commit()
+        logger.info(f"뉴스 수집 완료: {total}건 신규")


 MAX_RESPONSE_SIZE = 5 * 1024 * 1024  # 5MB
@@ -261,213 +108,8 @@ ALLOWED_CONTENT_TYPES = ("application/rss+xml", "application/atom+xml",
                         "application/xml", "text/xml")


-# 연결 재시도 간격 — MOEL 추가 실측(2026-06-11): 드랍이 연결 단위 랜덤이라
-# 1.5s 후 재시도도 연속으로 걸리는 케이스 발생(직후 다른 연결은 즉시 성공) → 2회로 보강.
-_CONNECT_RETRY_DELAYS = (2.0, 5.0)
-
-
-async def _get_with_connect_retry(client, url: str):
-    """연결 계층(TCP/TLS) 오류만 재시도(최대 2회) — HTTP 상태 오류는 비대상 (호출측 분기 보존).
-
-    MOEL 실측(2026-06-11): 정부 사이트 보안장비가 TLS 핸드셰이크를 연결 단위로 간헐 드랍
-    (curl rc=35, 직후 재시도는 성공) → 사이클당 1회 fetch 인 피드 수집이 ConnectError('')
-    로 실패 누적·circuit open. 지속 장애는 그대로 circuit 몫.
-    """
-    for delay in _CONNECT_RETRY_DELAYS:
-        try:
-            return await client.get(url)
-        except (httpx.ConnectError, httpx.ConnectTimeout) as e:
-            logger.info(f"연결 오류 {delay}s 후 재시도 ({url.split('?')[0]}): {repr(e)}")
-            await asyncio.sleep(delay)
-    return await client.get(url)
-
-
-async def _is_portal_duplicate(session, title: str) -> bool:
-    """A-6 2차 dedup: 포털 전재본 vs 원본이 다른 URL 로 이중 적재되는 케이스.
-
-    보조 키 = 제목 + 최근 3일 (다른 소스/다른 URL 이므로 1차 키로 안 잡힘).
-    범용 제목 오탐 방지: 12자 미만 제목은 비적용. skip 은 전부 로그 (silent 누락 회피).
-    """
-    if len(title) < 12:
-        return False
-    cutoff = datetime.now(timezone.utc) - timedelta(days=3)
-    dup = await session.execute(
-        select(Document.id).where(
-            Document.title == title,
-            Document.source_channel == "news",
-            Document.file_format == "article",
-            Document.extracted_at >= cutoff,
-        ).limit(1)
-    )
-    return dup.scalars().first() is not None
-
-
-async def _enqueue_processing(session, doc: Document, source: NewsSource, pub_dt: datetime) -> None:
-    """후속 단계 enqueue.
-
-    fulltext_policy='page' 소스는 'fulltext' stage 만 — summarize/embed/chunk 는
-    fulltext_worker 가 승격(또는 격하) 확정 후 enqueue (RSS 요약 선요약 → 풀텍스트
-    도착 시 summarize_worker 의 '이미 요약 있음 skip' 에 막히는 순서 함정 회피).
-    """
-    if source.fetch_method == "signal-only":
-        # B-4: 시그널 = 검색 색인만 (embed/chunk). fulltext/summarize 절대 enqueue 안 함 —
-        # 레지스트리가 fulltext_policy='page' 로 잘못 설정돼도 페이지 fetch 0 (방어 우선).
-        # 요약 LLM 스킵 = 맥미니 부하 0. 다이제스트/브리핑은 ai_summary IS NULL 문서를
-        # 처음부터 제외(services/digest/loader.py)하므로 시그널 문서가 자연 배제된다.
-        if source.source_channel == "crawl" or (datetime.now(timezone.utc) - pub_dt).days <= 30:
-            await enqueue_stage(session, doc.id, "embed")
-            await enqueue_stage(session, doc.id, "chunk")
-        return
-    if source.fulltext_policy == "page" and doc.edit_url:
-        await enqueue_stage(session, doc.id, "fulltext")
-        return
-    await enqueue_stage(session, doc.id, "summarize")
-    if source.source_channel == "crawl":
-        # 도메인 재료 코퍼스 — 발행일 무관 전량 색인 (30일 게이트는 뉴스 전용)
-        await enqueue_stage(session, doc.id, "embed")
-        await enqueue_stage(session, doc.id, "chunk")
-        return
-    days_old = (datetime.now(timezone.utc) - pub_dt).days
-    if days_old <= 30:
-        await enqueue_stage(session, doc.id, "embed")
-        await enqueue_stage(session, doc.id, "chunk")
-
-
-def _entry_body(source: NewsSource, entry, summary: str) -> tuple[str, str]:
-    """(body, extractor_version) — 정책별 본문 선택, 순수 함수 (shape 테스트 대상).
-
-    signal-only: 피드 요약이 곧 본문 — 절단 없음 (arXiv 초록 1.3~1.6K자 보존,
-      1000자 cap 적용 시 초록 꼬리 유실). 페이지 fetch 는 어떤 경우에도 없음 (B-4).
-    feed-full: 피드 본문이 전문인 소스만 신뢰 (truncate·광고 삽입이 흔해 일반
-      소스의 summary/content:encoded 를 전문으로 오인 저장 금지 — A-6).
-    """
-    if source.fetch_method == "signal-only":
-        body = _clean_html(
-            entry.get("summary", "") or entry.get("description", ""), max_len=None
-        )
-        return (body or summary), "rss-signal"
-    if source.fulltext_policy == "feed-full":
-        content_list = entry.get("content") or []
-        raw_body = content_list[0].get("value", "") if content_list else ""
-        full_body = _clean_html(raw_body or entry.get("summary", ""), max_len=None)
-        if len(full_body) > len(summary):
-            return full_body, "rss-feed-full"
-    return summary, "rss"
-
-
-def _build_extract_meta(source: NewsSource, pub_dt: datetime) -> dict:
-    """fulltext_worker / 패널이 쓰는 출처 메타 (documents 에 source FK 가 없어 여기 기록)."""
-    meta = {
-        "source_id": source.id,
-        "source_name": source.name,
-        "published_at": pub_dt.isoformat(),
-    }
-    # 안전 자료실 A-2: 소스 레지스트리의 라이선스를 deterministic 주입 (0-3 license 메타).
-    # P3 다이제스트/발행류가 redistribute=false 소스를 구조적으로 제외하는 게이트 입력.
-    if source.license_scheme:
-        meta["license"] = {
-            "scheme": source.license_scheme,
-            "redistribute": bool(source.license_redistribute),
-            "attribution": source.name,
-        }
-    return meta
-
-
-def _material_axis(source: NewsSource) -> tuple[str | None, str | None]:
-    """안전 자료실 분류 축 (material_type, jurisdiction) — 레지스트리 deterministic.
-
-    - material_type = news_sources.material_type (NULL = 비대상, 뉴스/철학 등)
-    - jurisdiction = source.country 전파. 단 paper 는 NULL 강제
-      (국제 학술지에 관할 개념 부적합 — plan 0-1 계약. 레지스트리 country=US 여도 미전파).
-    """
-    mt = source.material_type
-    if not mt:
-        return None, None
-    if mt == "paper":
-        return mt, None
-    return mt, source.country
-
-
-def _doc_identity(source: NewsSource, source_short: str, category: str) -> dict:
-    """채널별 문서 정체성 — news 채널은 기존 값 그대로(무회귀), crawl 채널은 도메인 정체성.
-
-    file_path 접두사가 곧 채널 디렉토리. ai_domain 은 다이제스트/검색 필터의 분기 축이라
-    crawl 채널이 'News' 를 오염시키지 않게 분리 (0-5 채널 레벨 분리 사상).
-    """
-    material_type, jurisdiction = _material_axis(source)
-    if source.source_channel == "crawl":
-        domain = category if category and category != "Other" else "Domain"
-        return {
-            "path_prefix": "crawl",
-            "ai_domain": domain,
-            "ai_tags": [f"{domain}/{source_short}"],
-            "material_type": material_type,
-            "jurisdiction": jurisdiction,
-        }
-    return {
-        "path_prefix": "news",
-        "ai_domain": "News",
-        "ai_tags": [f"News/{source_short}/{category}"],
-        "material_type": material_type,
-        "jurisdiction": jurisdiction,
-    }
-
-
-async def _already_ingested(session, article_id: str, normalized_url: str, link: str) -> bool:
-    """이미 적재된 기사인지 — file_hash 또는 정규화/raw edit_url 매칭 (3 fetch 공통, R11c).
-
-    레거시 raw URL + 교차 게시 다중 매칭 내성(first). _fetch_rss/_fetch_api_guardian/
-    _fetch_api_nyt 가 복제하던 동일 존재체크를 단일화.
-    """
-    existing = await session.execute(
-        select(Document).where(
-            (Document.file_hash == article_id)
-            | (Document.edit_url.in_([normalized_url, link]))
-        ).limit(1)
-    )
-    return existing.scalars().first() is not None
-
-
-def _build_news_doc(source, ident, source_short, article_id, title, body,
-                    extractor_version, normalized_url, pub_dt) -> Document:
-    """3 fetch 공통 뉴스 Document 빌더 (R11c). 채널별 차이는 인자로만 — body(NYT=summary)·
-    extractor_version·ident(category 계산 차이 흡수)만 다르고 22 필드 구조는 정적 동일.
-    edit_url 은 조회와 동일 정규화 저장(raw 저장 시 URL dedup 무력화)."""
-    return Document(
-        file_path=f"{ident['path_prefix']}/{source.name}/{article_id}",
-        file_hash=article_id,
-        file_format="article",
-        file_size=len(body.encode()),
-        file_type="note",
-        title=title,
-        extracted_text=f"{title}\n\n{body}",
-        extracted_at=datetime.now(timezone.utc),
-        extractor_version=extractor_version,
-        # article = 텍스트 네이티브 → 생성 시점 terminal 'skipped' 명시(markdown 변환 비대상,
-        # 미명시 시 'pending' 영구 비수렴 → backlog 지표 오염). page 정책은 fulltext_worker 승격.
-        md_status="skipped",
-        md_extraction_error="news article: 텍스트 네이티브, markdown 변환 비대상",
-        source_channel=source.source_channel,
-        data_origin="external",
-        edit_url=normalized_url,
-        review_status="approved",
-        ai_domain=ident["ai_domain"],
-        ai_sub_group=source_short,
-        ai_tags=ident["ai_tags"],
-        # 안전 자료실 A-2 — 레지스트리 deterministic (classify-skip 경로라 ingest 시점 필수)
-        material_type=ident["material_type"],
-        jurisdiction=ident["jurisdiction"],
-        published_date=pub_dt.date() if pub_dt else None,
-        extract_meta=_build_extract_meta(source, pub_dt),
-    )
-
-
-async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
-    """RSS 피드 수집 — redirect 재검증 + 크기/content-type 제한 + 조건부 GET (A-1).
-
-    반환 (신규 건수, 상태). 상태 'not_modified' = 304 또는 콘텐츠 해시 동일.
-    소스 단위 실패는 FeedError raise — run() 이 health 실패로 기록.
-    """
+async def _fetch_rss(session, source: NewsSource) -> int:
+    """RSS 피드 수집 — redirect 재검증 + 크기/content-type 제한"""
    from urllib.parse import urljoin
    from core.url_validator import validate_feed_url, HTTP_EXCEPTION_DOMAINS

@@ -478,79 +120,51 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:

    # 순수 HTTP 소스인데 allowlist에 없으면 차단
    if source.feed_url.startswith("http://") and not http_allowed:
-        raise FeedError(f"HTTP 차단 (allowlist 미등록): {source_hostname}")
+        logger.error(f"[{source.name}] HTTP 차단 (allowlist 미등록): {source_hostname}")
+        return 0

    # fetch 전 URL 재검증 (등록 이후 DNS 변경 대비)
    try:
        validate_feed_url(source.feed_url, allow_http=http_allowed)
    except ValueError as e:
-        raise FeedError(f"URL 검증 실패: {e}") from e
+        logger.error(f"[{source.name}] URL 검증 실패: {e}")
+        return 0

-    # A-1: 정직 UA + 조건부 GET — 서버가 준 워터마크를 받은 그대로 재전송
-    headers = {"User-Agent": CRAWL_UA}
-    if source.etag:
-        headers["If-None-Match"] = source.etag
-    if source.last_modified:
-        headers["If-Modified-Since"] = source.last_modified
+    async with httpx.AsyncClient(timeout=10, follow_redirects=False) as client:
+        resp = await client.get(source.feed_url)

-    async with httpx.AsyncClient(
-        timeout=10, follow_redirects=False, headers=headers
-    ) as client:
-        resp = await _get_with_connect_retry(client, source.feed_url)
-
-        # 304 는 redirect 처리보다 먼저 — httpx 의 is_redirect 는 3xx 전체(304 포함)에
-        # True 라, 304 를 redirect 로 오인하면 location 없는 같은 URL 을 재요청해
-        # "redirect 3회 초과" 로 오류 처리됨(조건부 GET 안정 피드 전멸 버그).
-        if resp.status_code == 304:
-            logger.info(f"[{source.name}] 304 Not Modified — 본문 미전송")
-            return 0, "not_modified"
-
-        # redirect 수동 처리 (최대 3회, 각 target 재검증) — location 있는 진짜 redirect 만.
+        # redirect 수동 처리 (최대 3회, 각 target 재검증)
        # allowlist 도메인이면 redirect target의 HTTP도 허용
        redirects = 0
-        while resp.has_redirect_location and redirects < 3:
-            location = urljoin(str(resp.request.url), resp.headers["location"])
+        while resp.is_redirect and redirects < 3:
+            location = resp.headers.get("location", "")
+            location = urljoin(str(resp.request.url), location)
            try:
                validate_feed_url(location, allow_http=http_allowed)
            except ValueError as e:
-                raise FeedError(f"redirect target 차단: {e}") from e
+                logger.error(f"[{source.name}] redirect target 차단: {e}")
+                return 0
            resp = await client.get(location)
-            if resp.status_code == 304:
-                logger.info(f"[{source.name}] 304 Not Modified (redirect 후) — 본문 미전송")
-                return 0, "not_modified"
            redirects += 1
-        if resp.has_redirect_location:
-            raise FeedError("redirect 3회 초과")
+        if resp.is_redirect:
+            logger.error(f"[{source.name}] redirect 3회 초과")
+            return 0

        resp.raise_for_status()

        if len(resp.content) > MAX_RESPONSE_SIZE:
-            raise FeedError(f"응답 크기 초과: {len(resp.content)} bytes")
+            logger.warning(f"[{source.name}] 응답 크기 초과: {len(resp.content)} bytes")
+            return 0

        ct = resp.headers.get("content-type", "").lower()
        if not any(t in ct for t in ALLOWED_CONTENT_TYPES):
-            raise FeedError(f"비정상 content-type: {ct}")
-
-    # A-1: 콘텐츠 해시 변경감지 (CDN 의 ETag 회전 대비 병행) — 저장된 해시는 항상
-    # 파싱 검증을 통과한 응답의 것이므로 동일성 비교는 파싱 전에 안전
-    new_etag = resp.headers.get("etag")
-    new_last_modified = resp.headers.get("last-modified")
-    content_hash = hashlib.sha256(resp.content).hexdigest()
-    if source.feed_content_hash == content_hash:
-        logger.info(f"[{source.name}] 콘텐츠 해시 동일 — 파싱 skip")
-        return 0, "not_modified"
+            logger.warning(f"[{source.name}] 비정상 content-type: {ct}")
+            return 0

    feed = feedparser.parse(resp.text)
    if feed.bozo and not feed.entries:
-        raise FeedError(f"RSS 파싱 실패: {feed.bozo_exception}")
-
-    # A-1: 워터마크 영속은 파싱 검증 통과 후에만 — 부패(bozo) 응답의 ETag 를 저장하면
-    # 이후 304 로 영구 skip 되는 silent corruption 차단
-    if new_etag:
-        source.etag = new_etag
-    if new_last_modified:
-        source.last_modified = new_last_modified
-    source.feed_content_hash = content_hash
+        logger.warning(f"[{source.name}] RSS 파싱 실패: {feed.bozo_exception}")
+        return 0
    count = 0

    for entry in feed.entries:
@@ -562,166 +176,67 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
        if not summary:
            summary = title

-        # 정책별 본문 선택 — signal-only(무절단 요약) / feed-full(피드 전문) / 기본(요약)
-        body, extractor_version = _entry_body(source, entry, summary)
-
        link = entry.get("link", "")
-
-        # B-5 quirk: 비디오 항목 필터 (Aeon/Psyche — 텍스트 코퍼스에 비디오 페이지 무가치)
-        if source.parser_quirk == "skip-video" and re.search(r"/videos?/", link):
-            continue
-
        published = entry.get("published_parsed") or entry.get("updated_parsed")
        pub_dt = datetime(*published[:6], tzinfo=timezone.utc) if published else datetime.now(timezone.utc)

-        # 중복 체크 — 레거시 행은 raw URL 로 저장돼 있어 normalized/raw 양쪽 매칭.
-        # 교차 게시(같은 기사가 두 피드에 존재)로 2행 이상 매칭될 수 있어 first() 사용
-        # (scalar_one_or_none 은 MultipleResultsFound raise — 2026-06 BBC 수집 중단 원인).
+        # 중복 체크
        article_id = _article_hash(title, pub_dt.strftime("%Y%m%d"), source.name)
        normalized_url = _normalize_url(link)

-        if await _already_ingested(session, article_id, normalized_url, link):
-            continue
-
-        # A-6 2차: 포털 전재 dedup (first-wins — 먼저 적재된 쪽이 정본)
-        if await _is_portal_duplicate(session, title):
-            logger.info(f"[{source.name}] portal-dup skip: {title[:60]}")
+        existing = await session.execute(
+            select(Document).where(
+                (Document.file_hash == article_id) |
+                (Document.edit_url == normalized_url)
+            )
+        )
+        if existing.scalar_one_or_none():
            continue

        category = _normalize_category(source.category or "")
        source_short = source.name.split(" ")[0]  # "경향신문 문화" → "경향신문"
-        ident = _doc_identity(source, source_short, category)

-        doc = _build_news_doc(
-            source, ident, source_short, article_id, title, body,
-            extractor_version, normalized_url, pub_dt,
+        doc = Document(
+            file_path=f"news/{source.name}/{article_id}",
+            file_hash=article_id,
+            file_format="article",
+            file_size=len(summary.encode()),
+            file_type="note",
+            title=title,
+            extracted_text=f"{title}\n\n{summary}",
+            extracted_at=datetime.now(timezone.utc),
+            extractor_version="rss",
+            source_channel="news",
+            data_origin="external",
+            edit_url=link,
+            review_status="approved",
+            ai_domain="News",
+            ai_sub_group=source_short,
+            ai_tags=[f"News/{source_short}/{category}"],
        )
        session.add(doc)
        await session.flush()

-        # summarize + embed + chunk 등록 (classify 불필요).
-        # page 정책 소스는 fulltext 만 — 후속은 fulltext_worker 가 확정 후 enqueue.
-        await _enqueue_processing(session, doc, source, pub_dt)
+        # summarize + embed + chunk 등록 (classify 불필요)
+        await enqueue_stage(session, doc.id, "summarize")
+        days_old = (datetime.now(timezone.utc) - pub_dt).days
+        if days_old <= 30:
+            await enqueue_stage(session, doc.id, "embed")
+            await enqueue_stage(session, doc.id, "chunk")

        count += 1

    logger.info(f"[{source.name}] RSS → {count}건 수집")
-    return count, "ok"
+    return count


-async def _fetch_api(session, source: NewsSource) -> tuple[int, str]:
-    """API 소스 디스패치 — feed_url 호스트로 제공자 판별 (B-2).
-
-    레거시 NYT 행(feed_url=api.nytimes.com)은 무변경 경로. 신규 제공자는 호스트 분기 추가.
-    미지의 호스트 = NYT 경로로 넘기지 않고 명시 실패 (silent fallback 금지).
-    """
-    host = (urlparse(source.feed_url).hostname or "").lower()
-    if host.endswith("guardianapis.com"):
-        return await _fetch_api_guardian(session, source)
-    if host.endswith("nytimes.com"):
-        return await _fetch_api_nyt(session, source)
-    raise FeedError(f"API 제공자 미등록 호스트: {host} — 디스패치 분기 추가 필요")
-
-
-def _guardian_request(feed_url: str, api_key: str) -> tuple[str, dict]:
-    """Guardian 호출 형태 단일 source-of-truth — fixture 회귀 테스트 대상
-    (tests/fixtures/guardian_open_platform_search_response.json 박제 시 호출과 동일해야 함)."""
-    parsed = urlparse(feed_url)
-    params = {
-        **dict(parse_qsl(parsed.query)),
-        "show-fields": "bodyText,trailText",
-        "page-size": "20",
-        "order-by": "newest",
-        "api-key": api_key,
-    }
-    return f"{parsed.scheme}://{parsed.netloc}{parsed.path}", params
-
-
-async def _fetch_api_guardian(session, source: NewsSource) -> tuple[int, str]:
-    """Guardian Open Platform 수집 (B-2) — show-fields=bodyText 로 정식 전문 JSON.
-
-    feed_url 에 section 쿼리를 박아 등록 (예: https://content.guardianapis.com/search?section=world).
-    전문이 API 로 오므로 fulltext stage 불요. 키 미설정 = FeedError (health 실패 기록,
-    silent fallback 없음 — [[feedback_no_silent_fallback_explicit_opt_in]]).
-    """
-    import os
-    api_key = os.getenv("GUARDIAN_API_KEY", "")
-    if not api_key:
-        raise FeedError("GUARDIAN_API_KEY 미설정 — Guardian 수집 불가")
-
-    endpoint, params = _guardian_request(source.feed_url, api_key)
-
-    try:
-        async with httpx.AsyncClient(timeout=15) as client:
-            resp = await client.get(endpoint, params=params)
-            resp.raise_for_status()
-    except httpx.HTTPStatusError as e:
-        # 쿼리스트링(api-key 포함) 제거 — path 까지만 로깅 (NYT 와 동일 규율)
-        safe_url = str(e.request.url).split("?")[0]
-        raise FeedError(f"Guardian API 실패: {e.response.status_code} @ {safe_url}") from e
-    except httpx.RequestError as e:
-        safe_url = str(e.request.url).split("?")[0] if e.request else "unknown"
-        raise FeedError(f"Guardian API 연결 실패: {safe_url}") from e
-
-    payload = resp.json().get("response", {})
-    if payload.get("status") != "ok":
-        raise FeedError(f"Guardian API status={payload.get('status')}")
-
-    count = 0
-    for item in payload.get("results", []):
-        title = (item.get("webTitle") or "").strip()
-        if not title:
-            continue
-
-        fields = item.get("fields") or {}
-        body_text = (fields.get("bodyText") or "").strip()
-        trail = _clean_html(fields.get("trailText") or "")
-        # bodyText = plain text 전문 (HTML 정화 불요). 짧으면(라이브 블로그 잔재 등) trail 격하.
-        is_full = len(body_text) >= 200
-        body = body_text if is_full else (trail or title)
-
-        link = item.get("webUrl", "")
-        pub_str = item.get("webPublicationDate", "")
-        try:
-            pub_dt = datetime.fromisoformat(pub_str.replace("Z", "+00:00"))
-        except (ValueError, AttributeError):
-            pub_dt = datetime.now(timezone.utc)
-
-        article_id = _article_hash(title, pub_dt.strftime("%Y%m%d"), source.name)
-        normalized_url = _normalize_url(link)
-
-        # RSS 수집부와 동일: 레거시 raw URL + 교차 게시 다중 매칭 내성 (first)
-        if await _already_ingested(session, article_id, normalized_url, link):
-            continue
-
-        if await _is_portal_duplicate(session, title):
-            logger.info(f"[{source.name}] portal-dup skip: {title[:60]}")
-            continue
-
-        category = _normalize_category(item.get("sectionName", source.category or ""))
-        source_short = source.name.split(" ")[0]
-        ident = _doc_identity(source, source_short, category)
-
-        doc = _build_news_doc(
-            source, ident, source_short, article_id, title, body,
-            "guardian_api_full" if is_full else "guardian_api", normalized_url, pub_dt,
-        )
-        session.add(doc)
-        await session.flush()
-
-        await _enqueue_processing(session, doc, source, pub_dt)
-        count += 1
-
-    logger.info(f"[{source.name}] API → {count}건 수집")
-    return count, "ok"
-
-
-async def _fetch_api_nyt(session, source: NewsSource) -> tuple[int, str]:
+async def _fetch_api(session, source: NewsSource) -> int:
    """NYT API 수집 — 키 마스킹 + health degradation"""
    import os
    nyt_key = os.getenv("NYT_API_KEY", "")
    if not nyt_key:
-        raise FeedError("NYT_API_KEY 미설정 — US 뉴스 수집 불가")
+        logger.error("NYT_API_KEY 미설정 — US 뉴스 수집 불가")
+        return 0

    try:
        async with httpx.AsyncClient(timeout=10) as client:
@@ -733,10 +248,12 @@ async def _fetch_api_nyt(session, source: NewsSource) -> tuple[int, str]:
    except httpx.HTTPStatusError as e:
        # 쿼리스트링(api-key 포함) 제거 — path까지만 로깅
        safe_url = str(e.request.url).split("?")[0]
-        raise FeedError(f"NYT API 실패: {e.response.status_code} @ {safe_url}") from e
+        logger.error(f"NYT API 실패: {e.response.status_code} @ {safe_url}")
+        return 0
    except httpx.RequestError as e:
        safe_url = str(e.request.url).split("?")[0] if e.request else "unknown"
-        raise FeedError(f"NYT API 연결 실패: {safe_url}") from e
+        logger.error(f"NYT API 연결 실패: {safe_url}")
+        return 0

    data = resp.json()
    count = 0
@@ -760,28 +277,46 @@ async def _fetch_api_nyt(session, source: NewsSource) -> tuple[int, str]:
        article_id = _article_hash(title, pub_dt.strftime("%Y%m%d"), source.name)
        normalized_url = _normalize_url(link)

-        # RSS 수집부와 동일: 레거시 raw URL + 교차 게시 다중 매칭 내성 (first)
-        if await _already_ingested(session, article_id, normalized_url, link):
-            continue
-
-        if await _is_portal_duplicate(session, title):
-            logger.info(f"[{source.name}] portal-dup skip: {title[:60]}")
+        existing = await session.execute(
+            select(Document).where(
+                (Document.file_hash == article_id) |
+                (Document.edit_url == normalized_url)
+            )
+        )
+        if existing.scalar_one_or_none():
            continue

        category = _normalize_category(article.get("section", source.category or ""))
        source_short = source.name.split(" ")[0]

-        ident = _doc_identity(source, source_short, category)
-        doc = _build_news_doc(
-            source, ident, source_short, article_id, title, summary,
-            "nyt_api", normalized_url, pub_dt,
+        doc = Document(
+            file_path=f"news/{source.name}/{article_id}",
+            file_hash=article_id,
+            file_format="article",
+            file_size=len(summary.encode()),
+            file_type="note",
+            title=title,
+            extracted_text=f"{title}\n\n{summary}",
+            extracted_at=datetime.now(timezone.utc),
+            extractor_version="nyt_api",
+            source_channel="news",
+            data_origin="external",
+            edit_url=link,
+            review_status="approved",
+            ai_domain="News",
+            ai_sub_group=source_short,
+            ai_tags=[f"News/{source_short}/{category}"],
        )
        session.add(doc)
        await session.flush()

-        await _enqueue_processing(session, doc, source, pub_dt)
+        await enqueue_stage(session, doc.id, "summarize")
+        days_old = (datetime.now(timezone.utc) - pub_dt).days
+        if days_old <= 30:
+            await enqueue_stage(session, doc.id, "embed")
+            await enqueue_stage(session, doc.id, "chunk")

        count += 1

    logger.info(f"[{source.name}] API → {count}건 수집")
-    return count, "ok"
+    return count
@@ -1,233 +0,0 @@
-"""office/hwp → Markdown 하이브리드 변환기 (plan ds-s1-backend-1, C-1 PoC).
-
-★ PoC 상태 — marker_worker 에 아직 연결하지 않음(그건 C-2). 본 모듈은 변환 *계약*과
-   PoC 하니스(scripts/poc_office_md.py)가 호출하는 순수 함수만 제공한다.
-
-전략 (하이브리드):
-  - OOXML(.docx/.xlsx/.pptx) → markitdown        ← 신규 의존성(pip install markitdown). lazy import.
-  - .hwp(HWP5 binary)        → pyhwp hwp5html → HTML → markdownify  ← pyhwp+six 의존성.
-    (2026-06-09: LibreOffice 번들 libhwplo 필터가 실제 한컴 HWP5 파일을 못 읽어 rc=0 + 'source file
-     could not be loaded' 로 전건 실패 → 순수 Python HWP5 전용 변환기 pyhwp 로 교체.)
-  - .hwpx                    → LibreOffice(headless) → HTML → markdownify  ← markdownify 기존 의존성.
-    (HWPX(zip)는 pyhwp 미지원 → LibreOffice 폴백 유지. 현재 코퍼스는 전부 HWP5 binary.)
-
-실패 계약 (C-5 postcondition 의 backend 절반):
-  변환 실패·빈 출력·타임아웃·의존성 부재 → OfficeMdError 를 raise 한다.
-  **success + 빈 md 를 절대 반환하지 않는다** — 호출부(C-2 marker_worker)가 이를 잡아
-  md_status='failed'(¬success·¬skipped) 로 라우팅한다. 불변식: md_status ∈ {success,partial} ⟹ md_content 非공백.
-"""
-
-from __future__ import annotations
-
-import os
-import re
-import shutil
-import subprocess
-import tempfile
-from pathlib import Path
-
-OOXML_FORMATS = {".docx", ".xlsx", ".pptx"}
-HWP_FORMATS = {".hwp", ".hwpx"}
-SUPPORTED = OOXML_FORMATS | HWP_FORMATS
-
-# 빈 출력 판정 임계 — 공백 제거 후 이 미만이면 '실패(빈 변환)'로 본다.
-_MIN_BODY_CHARS = 16
-
-# extract_worker.py 가 이미 `libreoffice` 바이너리로 office 텍스트 추출에 성공(컨테이너 검증된
-# 이름) → 기본값 정합. soffice 만 있는 환경은 LIBREOFFICE_BIN 으로 override.
-_SOFFICE_BIN = os.environ.get("LIBREOFFICE_BIN", "libreoffice")
-
-# pyhwp 콘솔 스크립트(pip install pyhwp 시 PATH 등록). HWP5 binary(.hwp) 전용.
-_HWP5HTML_BIN = os.environ.get("HWP5HTML_BIN", "hwp5html")
-
-# hwp5html 이 bindata/ 로 추출하는 첨부물 중 NAS 영속 대상 raster 확장자.
-# (OLE 수식/도형은 index.xhtml 에 앵커가 없어 위치 복원 불가 → 영속 제외.)
-_RASTER_EXTS = {"jpg", "jpeg", "png", "gif", "bmp"}
-
-
-class OfficeMdError(Exception):
-    """office/hwp → md 변환 실패 신호. 호출부는 md_status='failed' 로 라우팅."""
-
-
-def convert_office_to_md(path: str | Path, *, timeout: int = 90) -> str:
-    """office/hwp 파일을 Markdown 문자열로 변환. 실패/빈출력 시 OfficeMdError raise."""
-    p = Path(path)
-    suffix = p.suffix.lower()
-    if suffix not in SUPPORTED:
-        raise OfficeMdError(f"unsupported suffix for office_md: {suffix!r}")
-    if not p.exists():
-        raise OfficeMdError(f"file not found: {p}")
-
-    if suffix in OOXML_FORMATS:
-        md = _via_markitdown(p)
-    elif suffix == ".hwp":
-        md = _via_pyhwp_html(p, timeout=timeout)
-    else:  # .hwpx (pyhwp 미지원 → LibreOffice 폴백)
-        md = _via_libreoffice_html(p, timeout=timeout)
-
-    md = (md or "").strip()
-    if len(md) < _MIN_BODY_CHARS:
-        raise OfficeMdError(f"empty/too-short conversion ({len(md)} chars) for {p.name}")
-    return md
-
-
-def _via_markitdown(path: Path) -> str:
-    try:
-        from markitdown import MarkItDown  # lazy — 신규 의존성
-    except ImportError as e:  # noqa: BLE001
-        raise OfficeMdError(
-            "markitdown 미설치 (OOXML 변환에 필요) — `pip install markitdown`. "
-            "C-1 PoC 는 prod worker 이미지/버전핀 컨텍스트에서 실행(E-1)."
-        ) from e
-    try:
-        result = MarkItDown().convert(str(path))
-    except Exception as e:  # noqa: BLE001 — 어떤 변환 예외든 failed 로 라우팅
-        raise OfficeMdError(f"markitdown 변환 실패: {path.name}: {e}") from e
-    return getattr(result, "text_content", "") or ""
-
-
-def _run_hwp5html(path: Path, *, timeout: int) -> tuple[str, list[dict]]:
-    """HWP5 binary(.hwp) → (markdown, raster_images). hwp5html 1회 실행 = md + 이미지 동시 추출.
-
-    LibreOffice 번들 libhwplo 필터가 실제 한컴 HWP5 파일을 못 읽어(rc=0 + 'source file could
-    not be loaded') 전건 실패 → 순수 Python HWP5 전용 변환기 pyhwp(CLI hwp5html)로 교체.
-    `_via_libreoffice_html` 와 동일한 실패 계약(rc≠0 또는 출력 부재 → OfficeMdError raise).
-
-    raster_images = [{'data': bytes, 'format': 'jpeg'|'png'|...}] — bindata/ 의 래스터만.
-    hwp5html 은 이미지를 본문 xhtml 에 <img> 로 앵커하지 않으므로(bindata orphan, --css/--html 동일)
-    인라인 위치는 복원 불가 → 호출부가 NAS 영속 후 말미 갤러리로 부착한다.
-    """
-    try:
-        from markdownify import markdownify  # 기존 의존성
-    except ImportError as e:  # noqa: BLE001
-        raise OfficeMdError("markdownify 미설치(기존 의존성이어야 함)") from e
-
-    with tempfile.TemporaryDirectory(prefix="office_md_hwp_") as tmp:
-        outdir = Path(tmp)
-        # hwp5html --output <dir> <file.hwp> → <dir>/index.xhtml + styles.css + bindata/
-        cmd = [_HWP5HTML_BIN, "--output", str(outdir), str(path)]
-        try:
-            proc = subprocess.run(
-                cmd, capture_output=True, text=True, timeout=timeout, check=False
-            )
-        except FileNotFoundError as e:
-            raise OfficeMdError(
-                f"pyhwp(hwp5html) 바이너리 부재({_HWP5HTML_BIN}) — `pip install pyhwp six` 필요"
-            ) from e
-        except subprocess.TimeoutExpired as e:
-            raise OfficeMdError(f"pyhwp 변환 타임아웃({timeout}s): {path.name}") from e
-
-        index_path = outdir / "index.xhtml"
-        if proc.returncode != 0 or not index_path.exists():
-            raise OfficeMdError(
-                f"pyhwp html 변환 실패: {path.name} (rc={proc.returncode}): "
-                f"{(proc.stderr or proc.stdout or '').strip()[:300]}"
-            )
-        html = index_path.read_text(encoding="utf-8", errors="replace")
-        # hwp5html 의 xhtml 은 최상단 <?xml ...?> 선언을 가짐(LibreOffice 의 .html 경로엔 없음).
-        # markdownify 의 html.parser 가 이를 PI 텍스트('xml version="1.0" encoding="utf-8"?')로
-        # 본문에 흘려 (1) md 최상단 잡음·검색/청크 오염, (2) 빈 body 셸일 때 그 ~34자가
-        # _MIN_BODY_CHARS(16) 빈출력 게이트를 무력화(빈 변환의 false-success) → markdownify 전에 제거.
-        html = re.sub(r"^\s*<\?xml[^>]*\?>\s*", "", html)
-        # 표 보존 위해 markdownify 가 table 을 GFM 으로 — heading_style ATX (libreoffice 경로와 동일).
-        md = markdownify(html, heading_style="ATX", strip=["span", "font"])
-
-        images: list[dict] = []
-        bindata = outdir / "bindata"
-        if bindata.is_dir():
-            for f in sorted(bindata.iterdir()):
-                ext = f.suffix.lower().lstrip(".")
-                if ext in _RASTER_EXTS:
-                    images.append({
-                        "data": f.read_bytes(),
-                        "format": "jpeg" if ext == "jpg" else ext,
-                    })
-        return md, images
-
-
-def _via_pyhwp_html(path: Path, *, timeout: int) -> str:
-    """HWP5 binary(.hwp) → markdown (이미지 제외). convert_office_to_md 단일 텍스트 경로용."""
-    md, _images = _run_hwp5html(path, timeout=timeout)
-    return md
-
-
-def convert_hwp_to_md_and_images(
-    path: str | Path, *, timeout: int = 90
-) -> tuple[str, list[dict]]:
-    """HWP5(.hwp) → (markdown, raster_images). marker_worker 이미지 영속 경로 전용.
-
-    실패/빈출력 계약은 convert_office_to_md 와 동일(OfficeMdError raise / 빈 md 절대 반환 금지).
-    raster_images 원소 = {'data': bytes, 'format': str}; 비어있을 수 있음(이미지 없는 문서).
-    """
-    p = Path(path)
-    if p.suffix.lower() != ".hwp":
-        raise OfficeMdError(f"convert_hwp_to_md_and_images: .hwp 전용, got {p.suffix!r}")
-    if not p.exists():
-        raise OfficeMdError(f"file not found: {p}")
-    md, images = _run_hwp5html(p, timeout=timeout)
-    md = (md or "").strip()
-    if len(md) < _MIN_BODY_CHARS:
-        raise OfficeMdError(f"empty/too-short conversion ({len(md)} chars) for {p.name}")
-    return md, images
-
-
-def _via_libreoffice_html(path: Path, *, timeout: int) -> str:
-    """LibreOffice headless 로 HTML 변환 후 markdownify. hwpx 용(.hwp 는 pyhwp)."""
-    try:
-        from markdownify import markdownify  # 기존 의존성
-    except ImportError as e:  # noqa: BLE001
-        raise OfficeMdError("markdownify 미설치(기존 의존성이어야 함)") from e
-
-    with tempfile.TemporaryDirectory(prefix="office_md_") as tmp:
-        tmpdir = Path(tmp)
-        # soffice 동시 실행 시 user profile 락 충돌 회피 — 호출별 격리 프로필.
-        profile = tmpdir / "lo_profile"
-        cmd = [
-            _SOFFICE_BIN,
-            "--headless",
-            "--nologo",
-            "--nofirststartwizard",
-            f"-env:UserInstallation=file://{profile}",
-            "--convert-to",
-            "html",
-            "--outdir",
-            str(tmpdir),
-            str(path),
-        ]
-        try:
-            proc = subprocess.run(
-                cmd, capture_output=True, text=True, timeout=timeout, check=False
-            )
-        except FileNotFoundError as e:
-            raise OfficeMdError(
-                f"LibreOffice 바이너리 부재({_SOFFICE_BIN}) — LIBREOFFICE_BIN 설정 또는 설치 필요"
-            ) from e
-        except subprocess.TimeoutExpired as e:
-            raise OfficeMdError(f"LibreOffice 변환 타임아웃({timeout}s): {path.name}") from e
-
-        html_path = tmpdir / f"{path.stem}.html"
-        if proc.returncode != 0 or not html_path.exists():
-            raise OfficeMdError(
-                f"LibreOffice html 변환 실패: {path.name} (rc={proc.returncode}): "
-                f"{(proc.stderr or proc.stdout or '').strip()[:300]}"
-            )
-        html = html_path.read_text(encoding="utf-8", errors="replace")
-        # 표 보존 위해 markdownify 가 table 을 GFM 으로 — heading_style ATX.
-        return markdownify(html, heading_style="ATX", strip=["span", "font"])
-
-
-def table_fidelity(md: str) -> dict:
-    """E-1 표 fidelity 의 crude 지표 — GFM 표 행/구분행 카운트 (정밀 평가 아님, 회귀 신호)."""
-    lines = md.splitlines()
-    pipe_rows = sum(1 for ln in lines if ln.strip().startswith("|") and ln.strip().endswith("|"))
-    sep_rows = sum(
-        1 for ln in lines
-        if ln.strip().startswith("|") and set(ln.strip()) <= set("|-: ")
-    )
-    return {
-        "chars": len(md),
-        "lines": len(lines),
-        "table_pipe_rows": pipe_rows,
-        "table_separator_rows": sep_rows,  # 표 개수의 근사
-        "has_heading": any(ln.lstrip().startswith("#") for ln in lines),
-    }
@@ -1,400 +0,0 @@
-"""OpenAlex 백본 수집기 — B-3 PR3 (plan safety-library-b3-1).
-
-OpenAlex = 발견+dedup 글로벌 백본(JP/EU/US 논문 다 색인 + 정본 DOI). 전문은 안 줌(oa_url 포인터만).
- scaffold-first: OPENALEX_API_KEY 부재 시 FeedError(explicit-skip, silent fallback 금지). 키=무료.
- signal-only: 초록(inverted-index 복원)만 색인(embed+chunk), summarize 절대 미enqueue(맥미니 큐 무접촉).
-  PDF 는 절대 OpenAlex 경유로 안 받음(oa_url 은 링크/신호일 뿐).
- 관련성 사전필터 = title_and_abstract.search 키워드(서버측) + per-run insert cap(임베드 firehose 차단,
-  적대리뷰 A major). cursor 페이징 + from_publication_date 워터마크로 증분.
- 초록 없는 thin 레코드(주로 비-OA 메타)는 skip — Phase-1 재료 품질 유지.
- DOI → paper.doi(holder, partial-unique 인덱스, 교차소스 dedup). 없으면 openalex id fallback.
- license: 명시 CC → redistribute=true / 그 외 OA·closed → false(restricted 부재 = 초록 RAG 사용 가능).
- enabled=False news_sources 행 + main.py CronTrigger(자체 폴링). list+filter 비용 미미($1/일 크레딧).
-"""
-
-import asyncio
-import hashlib
-import json
-import os
-from dataclasses import dataclass
-from datetime import date, datetime, timezone
-
-import httpx
-from sqlalchemy import select
-
-from core.crawl_politeness import CRAWL_UA
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.news_source import NewsSource
-from models.queue import enqueue_stage
-from services.papers.doi import normalize_doi
-from services.papers.holder import find_paper_holder
-from workers.news_collector import (
-    FeedError,
-    _get_or_create_health,
-    _record_failure,
-    _record_success,
-)
-
-logger = setup_logger("openalex_collector")
-
-_API = "https://api.openalex.org/works"
-_SOURCE_NAME = "OpenAlex 안전·공학 (keyword)"
-_ENV_KEY = "OPENALEX_API_KEY"
-
-# 압력용기·공정안전·구조건전성 도메인 키워드(키워드별 1쿼리 = 관련성 사전필터).
-_KEYWORDS = (
-    "pressure vessel safety",
-    "process safety",
-    "structural integrity",
-    "fracture mechanics",
-    "fatigue life assessment",
-)
-
-# 도메인 직결 저널 ISSN 시드(OpenAlex sources 실측 확인) — 키워드 매칭 누락분까지 전수 커버.
-# KR 안전/가스/기계 + JP 고압. KR/JP 관심 = OpenAlex 깨끗한 API 로 직접(KoreaScience/J-STAGE 전용
-# 스크래퍼 불요 — Phase-1 메타는 OpenAlex 와 중복, 전용 수집기의 유니크 가치=무료 전문 PDF=Phase-2).
-_JOURNAL_ISSNS = (
-    ("한국안전학회지", "1738-3803"),
-    ("한국가스학회지", "1226-8402"),
-    ("대한기계학회논문집 A", "1226-4873"),
-    ("대한기계학회논문집 B", "1226-4881"),
-    ("KSME International J.", "1226-4865"),
-    ("Review of High Pressure Sci&Tech (JP)", "0917-639X"),
-)
-
-_RUN_CAP = 60           # 1회 run 신규 적재 상한(임베드 큐 보호). bulk 시 해제.
-_PER_PAGE = 50
-_MAX_PAGES_PER_KW = 4   # 키워드당 최대 페이지(증분이라 보통 1페이지에 워터마크 도달)
-_REQ_SLEEP = 1.0        # 페이지 간 polite 간격
-_MAX_RETRY = 4
-_BACKOFF_BASE = 5.0
-
-
-# ───────────────────────── 순수 파서 (fixture 단위 테스트 대상) ─────────────────────────
-
-@dataclass
-class OpenAlexWork:
-    openalex_id: str           # "W2910511816"
-    doi: str | None            # normalize_doi 적용
-    title: str
-    abstract: str              # inverted-index 복원 (없으면 "")
-    publication_date: str | None
-    oa_status: str | None      # closed/green/bronze/hybrid/gold/diamond
-    oa_url: str | None
-    is_oa: bool
-    license: str | None        # cc-by / cc-by-nc-nd / None
-    source_name: str | None
-    primary_topic: str | None
-    work_type: str | None
-
-
-def _clean(text):
-    return " ".join(text.split()).strip() if text else ""
-
-
-def _reconstruct_abstract(inv: dict | None) -> str:
-    """abstract_inverted_index({word:[positions]}) → 평문 초록. 없으면 ''."""
-    if not inv:
-        return ""
-    positions = [(pos, word) for word, idxs in inv.items() for pos in idxs]
-    positions.sort()
-    return " ".join(w for _, w in positions)
-
-
-def license_meta(license_str: str | None, is_oa: bool, source_name: str | None) -> dict:
-    """extract_meta.license — 명시 CC/public-domain 만 redistribute=true. restricted 부재(초록 색인 자유).
-
-    redistribute=false 라도 restricted 가 없으면 RAG 사용 가능(초록). 비-CC 전문의 RAG verbatim 차단은
-    Phase-2 전문 승격 단계가 restricted=true 로 처리(L-1) — Phase-1(초록)은 무해.
-    """
-    attribution = source_name or "OpenAlex"
-    if license_str and (license_str.startswith("cc") or license_str == "public-domain"):
-        return {"scheme": license_str, "redistribute": True, "attribution": attribution}
-    return {
-        "scheme": "open-unspecified" if is_oa else "proprietary",
-        "redistribute": False,
-        "attribution": attribution,
-    }
-
-
-def parse_openalex_works(json_text: str) -> tuple[int, str | None, list[OpenAlexWork]]:
-    """OpenAlex /works 응답 → (count, next_cursor, [OpenAlexWork]). 순수 함수."""
-    d = json.loads(json_text)
-    meta = d.get("meta") or {}
-    count = meta.get("count") or 0
-    next_cursor = meta.get("next_cursor")
-    works: list[OpenAlexWork] = []
-    for w in d.get("results") or []:
-        oid = (w.get("id") or "").rstrip("/").rsplit("/", 1)[-1]
-        if not oid:
-            continue
-        oa = w.get("open_access") or {}
-        pl = w.get("primary_location") or {}
-        pt = w.get("primary_topic") or {}
-        works.append(OpenAlexWork(
-            openalex_id=oid,
-            doi=normalize_doi(w.get("doi")),
-            title=_clean(w.get("title")),
-            abstract=_reconstruct_abstract(w.get("abstract_inverted_index")),
-            publication_date=w.get("publication_date"),
-            oa_status=oa.get("oa_status"),
-            oa_url=oa.get("oa_url") or None,
-            is_oa=bool(oa.get("is_oa")),
-            license=pl.get("license"),
-            source_name=(pl.get("source") or {}).get("display_name"),
-            primary_topic=pt.get("display_name"),
-            work_type=w.get("type"),
-        ))
-    return count, next_cursor, works
-
-
-def build_filter(keyword: str, from_date: str | None = None) -> str:
-    f = f"title_and_abstract.search:{keyword}"
-    if from_date:
-        f += f",from_publication_date:{from_date}"
-    return f
-
-
-def build_issn_filter(issn: str, from_date: str | None = None) -> str:
-    f = f"primary_location.source.issn:{issn}"
-    if from_date:
-        f += f",from_publication_date:{from_date}"
-    return f
-
-
-def _seeds() -> list[tuple[str, str, str]]:
-    """수집 시드 = (라벨, 워터마크키, 종류). 도메인 저널 ISSN 우선(cap 우선권) → 키워드."""
-    s: list[tuple[str, str, str]] = [(label, issn, "issn") for label, issn in _JOURNAL_ISSNS]
-    s += [(kw, kw, "kw") for kw in _KEYWORDS]
-    return s
-
-
-# ───────────────────────── 적재 (DB — PR3 라이브 검증) ─────────────────────────
-
-def _build_paper_meta(source: NewsSource, w: OpenAlexWork) -> dict:
-    paper: dict = {"openalex_id": w.openalex_id}
-    if w.doi:
-        paper["doi"] = w.doi          # partial-unique 인덱스 진입(교차소스 dedup)
-    if w.oa_status:
-        paper["oa_status"] = w.oa_status
-    if w.oa_url:
-        paper["oa_url"] = w.oa_url     # 링크/신호 — 자동 fetch 안 함
-    if w.primary_topic:
-        paper["topic"] = w.primary_topic
-    meta: dict = {
-        "source_id": source.id,
-        "source_name": source.name,
-        "source_region": "INT",        # OpenAlex = 글로벌. paper.jurisdiction 은 NULL 유지(A-2).
-        "paper": paper,
-        "license": license_meta(w.license, w.is_oa, w.source_name),
-    }
-    if w.publication_date:
-        meta["published_at"] = w.publication_date
-    return meta
-
-
-async def _ingest_work(session, source: NewsSource, w: OpenAlexWork) -> bool:
-    """1건 적재. 반환 = 신규 여부. signal-only. 초록 없으면 skip(thin 레코드 배제)."""
-    if not w.abstract:
-        return False  # 초록 없는 thin 레코드(주로 비-OA 메타) — Phase-1 재료 품질 유지
-    oid_hash = hashlib.sha256(f"openalex|{w.openalex_id}".encode()).hexdigest()[:32]
-    dup = await session.execute(
-        select(Document.id).where(Document.file_hash == oid_hash).limit(1)
-    )
-    if dup.scalars().first():
-        return False
-    if w.doi and await find_paper_holder(session, w.doi):
-        return False  # 교차소스 dedup(arXiv 등이 이미 holder 보유)
-
-    pub_date = None
-    if w.publication_date:
-        try:
-            pub_date = date.fromisoformat(w.publication_date)
-        except ValueError:
-            pub_date = None
-    body = w.abstract
-    doc = Document(
-        file_path=f"crawl/openalex/{w.openalex_id}",
-        file_hash=oid_hash,
-        file_format="article",
-        file_size=len(body.encode()),
-        file_type="note",
-        title=w.title,
-        extracted_text=f"{w.title}\n\n{body}",
-        extracted_at=datetime.now(timezone.utc),
-        extractor_version="openalex-signal",
-        md_status="skipped",
-        md_extraction_error="OpenAlex abstract: signal-only, markdown 비대상",
-        source_channel="crawl",
-        data_origin="external",
-        edit_url=w.oa_url or f"https://openalex.org/{w.openalex_id}",
-        review_status="approved",
-        material_type="paper",
-        jurisdiction=None,
-        published_date=pub_date,
-        extract_meta=_build_paper_meta(source, w),
-    )
-    session.add(doc)
-    await session.flush()
-    await enqueue_stage(session, doc.id, "embed")
-    await enqueue_stage(session, doc.id, "chunk")
-    return True
-
-
-async def _get_or_create_source(session) -> NewsSource:
-    result = await session.execute(
-        select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
-    )
-    source = result.scalars().first()
-    if source is None:
-        source = NewsSource(
-            name=_SOURCE_NAME, feed_url=_API, feed_type="json",
-            fetch_method="signal-only", fulltext_policy="none",
-            source_channel="crawl", category="Engineering", language="en",
-            country=None, material_type="paper",
-            license_scheme="openalex", license_redistribute=False,
-            enabled=False,
-        )
-        session.add(source)
-        await session.flush()
-    return source
-
-
-def _api_key() -> str:
-    key = os.getenv(_ENV_KEY, "").strip()
-    if not key:
-        raise FeedError(f"{_ENV_KEY} 미설정 — OpenAlex 수집 불가 (scaffold-first explicit-skip)")
-    return key
-
-
-def _watermark(source: NewsSource, keyword: str) -> str | None:
-    return (source.selector_override or {}).get("openalex_watermark", {}).get(keyword)
-
-
-def _set_watermark(source: NewsSource, keyword: str, value: str) -> None:
-    cfg = dict(source.selector_override or {})
-    wm = dict(cfg.get("openalex_watermark") or {})
-    wm[keyword] = value
-    cfg["openalex_watermark"] = wm
-    source.selector_override = cfg
-
-
-async def _fetch(client: httpx.AsyncClient, key: str, filter_str: str, cursor: str) -> str:
-    params = {
-        "filter": filter_str, "per-page": _PER_PAGE, "cursor": cursor,
-        "sort": "publication_date:desc", "api_key": key,
-    }
-    for attempt in range(_MAX_RETRY):
-        resp = await client.get(_API, params=params)
-        if resp.status_code == 429:
-            await asyncio.sleep(_BACKOFF_BASE * (2 ** attempt))
-            continue
-        resp.raise_for_status()
-        return resp.text
-    raise FeedError(f"OpenAlex 429 재시도 초과: {filter_str[:48]}")
-
-
-async def run(bulk: bool = False, limit: int = 0) -> None:
-    """daily 진입점(스케줄러). 키 부재 = explicit-skip(health 실패 기록)."""
-    now = datetime.now(timezone.utc)
-    async with async_session() as session:
-        source = await _get_or_create_source(session)
-        await session.commit()
-        source_id = source.id
-
-    try:
-        key = _api_key()
-    except FeedError as e:
-        logger.warning(f"[openalex] {e}")
-        async with async_session() as session:
-            health = await _get_or_create_health(session, source_id)
-            _record_failure(health, str(e), now)
-            await session.commit()
-        return
-
-    run_cap = (limit or 10**9) if bulk else (min(limit, _RUN_CAP) if limit else _RUN_CAP)
-    inserted = 0
-    seen = 0
-    failures: list[str] = []
-
-    async with httpx.AsyncClient(
-        timeout=30.0, headers={"User-Agent": CRAWL_UA}, follow_redirects=True
-    ) as client:
-        for label, wm_key, kind in _seeds():
-            if inserted >= run_cap:
-                break
-            async with async_session() as session:
-                src = await session.get(NewsSource, source_id)
-                watermark = None if bulk else _watermark(src, wm_key)
-            filter_str = (build_issn_filter(wm_key, watermark) if kind == "issn"
-                          else build_filter(wm_key, watermark))
-            newest: str | None = None
-            capped = False  # 이번 run 이 cap 으로 시드 중도 절단됐는지 (R4)
-            cursor = "*"
-            max_pages = (10**6 if bulk else _MAX_PAGES_PER_KW)
-            try:
-                for _page in range(max_pages):
-                    if inserted >= run_cap:
-                        capped = True
-                        break
-                    text = await _fetch(client, key, filter_str, cursor)
-                    _count, next_cursor, works = parse_openalex_works(text)
-                    if not works:
-                        break
-                    for w in works:
-                        seen += 1
-                        if w.publication_date and (newest is None or w.publication_date > newest):
-                            newest = w.publication_date
-                        async with async_session() as session:
-                            src = await session.get(NewsSource, source_id)
-                            if await _ingest_work(session, src, w):
-                                inserted += 1
-                                await session.commit()
-                            else:
-                                await session.rollback()
-                        if inserted >= run_cap:
-                            capped = True
-                            break
-                    await asyncio.sleep(_REQ_SLEEP)
-                    if not next_cursor:
-                        break
-                    cursor = next_cursor
-                # cap 절단 시 워터마크 미전진 — 미페치 works 가 다음 run 의 watermark 필터
-                # (publication_date > watermark)에 영구 배제되는 silent loss 방지. 미전진하면
-                # 다음 run 이 옛 watermark 부터 재페치하며 적재분 dedup-skip(cap 미소모) 후
-                # 이어 적재 → 백로그 run 당 cap 소화 (R4). bulk 은 cap 무관.
-                if newest and not capped:
-                    async with async_session() as session:
-                        src = await session.get(NewsSource, source_id)
-                        _set_watermark(src, wm_key, newest)
-                        await session.commit()
-            except (httpx.HTTPError, FeedError, ValueError) as e:
-                msg = f"[{label}] {e or repr(e)}"
-                logger.error(f"[openalex] {msg}")
-                failures.append(msg)
-
-    async with async_session() as session:
-        health = await _get_or_create_health(session, source_id)
-        if failures and inserted == 0:
-            _record_failure(health, "; ".join(failures)[:500], now)
-        else:
-            _record_success(health, inserted, False, now)
-        await session.commit()
-
-    deferred = "" if inserted < run_cap else f" (cap {run_cap} 도달 — 잔여 다음 run 이월)"
-    logger.info(
-        f"[openalex] {len(_seeds())}개 시드(ISSN+키워드) 스캔 {seen}건 → 신규 {inserted}건{deferred}"
-        + (f" / 실패 {len(failures)}건" if failures else "")
-    )
-
-
-if __name__ == "__main__":
-    import argparse
-
-    parser = argparse.ArgumentParser(description="OpenAlex 안전·공학 키워드 백본 수집기")
-    parser.add_argument("--bulk", action="store_true", help="cap 해제 + 깊은 cursor 페이징 백필")
-    parser.add_argument("--limit", type=int, default=0, help="신규 적재 상한(0=기본 cap)")
-    args = parser.parse_args()
-    asyncio.run(run(bulk=args.bulk, limit=args.limit))
@@ -1,102 +0,0 @@
-"""paper DOI reconcile — B-3 PR4(레거시 arXiv) + PR5(구매 PDF) (plan safety-library-b3-1).
-
-paper.doi/parent_doi 둘 다 없는 paper 행을 두 갈래로 정리:
- 레거시 arXiv 초록(holder): arXiv id → arxiv_doi(10.48550/arxiv.{id}) 스탬프 → partial-unique
-  인덱스 편입 → 재유입 차단('동일-DOI 재유입 차단만').
- 구매 PDF(child, license.restricted=true — Papers_Purchased 드롭): 본문 DOI 파싱 → paper.parent_doi
-  링크(서지 holder 와 DOI 공유로 연결). child 는 doi 미보유(인덱스 밖) → unique 무충돌.
-
- KEYLESS·결정적(OpenAlex 호출 0)·in-DB·enqueue 0(콘텐츠 무변경). dedup_reconcile(file_hash 캐시)와
-  별 worker(적대리뷰 B·C major). 선재 DOI holder 존재 시 arXiv 행도 parent_doi 마킹(unique 위반 회피).
-"""
-
-import asyncio
-
-from sqlalchemy import select
-
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from services.papers.doi import (
-    arxiv_doi,
-    parse_arxiv_id,
-    parse_doi_from_text,
-    with_paper_doi,
-    with_parent_doi,
-)
-from services.papers.holder import find_paper_holder
-
-logger = setup_logger("paper_doi_reconcile")
-
-_DOI_TEXT = Document.extract_meta[("paper", "doi")].astext
-_PARENT_DOI_TEXT = Document.extract_meta[("paper", "parent_doi")].astext
-
-
-def _is_restricted(meta: dict) -> bool:
-    return (meta.get("license") or {}).get("restricted") in (True, "true")
-
-
-async def run(limit: int = 0) -> None:
-    """paper.doi/parent_doi 없는 paper 행 reconcile(멱등). limit=0 = 전건."""
-    stamped = marked_dup = skipped_no_arxiv = 0
-    linked_purchased = skipped_purchased_no_doi = 0
-    async with async_session() as session:
-        q = (
-            select(Document)
-            .where(
-                Document.material_type == "paper",
-                _DOI_TEXT.is_(None),
-                _PARENT_DOI_TEXT.is_(None),
-            )
-            .order_by(Document.id)
-        )
-        if limit:
-            q = q.limit(limit)
-        rows = (await session.execute(q)).scalars().all()
-
-        for row in rows:
-            meta = dict(row.extract_meta or {})
-            paper = dict(meta.get("paper") or {})
-
-            # PR5: 구매 PDF(restricted) = child → 본문 DOI 파싱 → parent_doi 링크
-            if _is_restricted(meta):
-                doi = parse_doi_from_text(row.extracted_text)
-                if not doi:
-                    skipped_purchased_no_doi += 1
-                    continue
-                row.extract_meta = with_parent_doi(meta, doi)
-                linked_purchased += 1
-                continue
-
-            # PR4: 레거시 arXiv 초록(holder) = arXiv DataCite DOI 스탬프
-            arxiv_id = paper.get("arxiv_id") or parse_arxiv_id(row.extracted_text)
-            doi = arxiv_doi(arxiv_id)
-            if not doi:
-                skipped_no_arxiv += 1
-                continue
-            paper["arxiv_id"] = arxiv_id
-            meta["paper"] = paper
-            holder = await find_paper_holder(session, doi)
-            if holder is not None and holder.id != row.id:
-                row.extract_meta = with_parent_doi(meta, doi)  # 선재 중복 → child 마킹
-                marked_dup += 1
-            else:
-                row.extract_meta = with_paper_doi(meta, doi)   # holder 스탬프, 인덱스 진입
-                stamped += 1
-            # 콘텐츠 무변경 → enqueue 없음(summarize/embed/chunk 0)
-        await session.commit()
-
-    logger.info(
-        f"[paper_doi_reconcile] {len(rows)}행 → arXiv 스탬프 {stamped} · 선재중복 {marked_dup} · "
-        f"arXiv id 없음 skip {skipped_no_arxiv} / 구매PDF parent_doi 링크 {linked_purchased} · "
-        f"구매PDF DOI 없음 skip {skipped_purchased_no_doi}"
-    )
-
-
-if __name__ == "__main__":
-    import argparse
-
-    parser = argparse.ArgumentParser(description="paper DOI reconcile (arXiv 레거시 + 구매 PDF, keyless)")
-    parser.add_argument("--limit", type=int, default=0, help="처리 상한(0=전건)")
-    args = parser.parse_args()
-    asyncio.run(run(limit=args.limit))
@@ -1,123 +0,0 @@
-"""논문 arXiv 전문 승격 (in-place) — B-3 Phase-2 P2-PR1 (plan safety-library-b3-1).
-
-arXiv 프리프린트 초록 행(file_format='article', signal-only)을 전문 PDF로 **in-place 승격**:
-PDF 다운로드 → file_format/file_type/file_path/md_status 갱신 → 'extract' enqueue → 기존 파이프라인
-(extract → classify[paper skip summarize] → embed/chunk/markdown)이 전문 검색 청크 + md_content(marker 표시)
-+ hier 절구조를 생성. 1-Document(2행 분리 회피, 기존 display 스택 재사용).
-
- arXiv = 공개 프리프린트(arxiv.org/pdf/{id}, friendly host) → 전문 검색/RAG 무난, restricted 불요.
-  (유료 구매 논문은 Papers_Purchased 경로가 restricted=true 로 별개 처리.)
- per-run cap (marker GPU ~10GB + embed 부하 보호, 4070 16GB 빡빡 → idle-unload·증분). keyless.
- 요약 0 (classify paper-skip 가드). file_hash·extract_meta.paper 보존(수집기 dedup 무영향).
- CLI 전용(Phase-2 deliberate 승격, GPU 부하 사용자 통제). 스케줄 잡 미등록.
-"""
-
-import argparse
-import asyncio
-import random
-from pathlib import Path
-
-import httpx
-from sqlalchemy import or_, select
-
-from core.config import settings
-from core.crawl_politeness import CRAWL_UA
-from core.database import async_session
-from core.utils import setup_logger
-from models.document import Document
-from models.queue import enqueue_stage
-
-logger = setup_logger("paper_fulltext_promote")
-
-_ARXIV_PDF = "https://arxiv.org/pdf/{id}"
-_MAX_FILE_BYTES = 50 * 1024 * 1024
-_DOWNLOAD_DELAY = (2.0, 5.0)
-_RUN_CAP = 10  # 1회 승격 상한(marker/embed GPU 보호). bulk 시 해제.
-
-_ARXIV_ID_EXPR = Document.extract_meta[("paper", "arxiv_id")].astext
-_OA_URL_EXPR = Document.extract_meta[("paper", "oa_url")].astext
-_OA_STATUS_EXPR = Document.extract_meta[("paper", "oa_status")].astext
-_REAL_OA = ("gold", "hybrid", "green", "diamond")
-
-
-async def _download(url: str, dest: Path) -> int:
-    """arXiv PDF 다운로드 — 크기 cap + PDF 헤더 검증 + 연속 간격(kosha 패턴)."""
-    await asyncio.sleep(random.uniform(*_DOWNLOAD_DELAY))
-    async with httpx.AsyncClient(timeout=60, follow_redirects=True) as client:
-        resp = await client.get(url, headers={"User-Agent": CRAWL_UA})
-    if resp.status_code != 200:
-        raise RuntimeError(f"arXiv PDF {resp.status_code}: {url}")
-    if len(resp.content) > _MAX_FILE_BYTES:
-        raise RuntimeError(f"크기 초과 {len(resp.content)}b: {url}")
-    if resp.content[:5] != b"%PDF-":
-        raise RuntimeError(f"PDF 아님(헤더 {resp.content[:8]!r}): {url}")
-    dest.parent.mkdir(parents=True, exist_ok=True)
-    dest.write_bytes(resp.content)
-    return len(resp.content)
-
-
-async def run(bulk: bool = False, limit: int = 0) -> None:
-    """미승격 arXiv 논문(file_format='article')을 전문 PDF로 in-place 승격."""
-    cap = (limit or 10**9) if bulk else (min(limit, _RUN_CAP) if limit else _RUN_CAP)
-    async with async_session() as session:
-        q = (
-            select(Document.id)
-            .where(
-                Document.material_type == "paper",
-                Document.file_format == "article",
-                or_(
-                    _ARXIV_ID_EXPR.isnot(None),
-                    Document.extract_meta[("paper", "oa_url")].astext.isnot(None),
-                ),
-            )
-            .order_by(Document.id.desc())
-            .limit(cap)
-        )
-        ids = [r[0] for r in (await session.execute(q)).all()]
-
-    promoted = failed = 0
-    for doc_id in ids:
-        async with async_session() as session:
-            doc = await session.get(Document, doc_id)
-            if doc is None or doc.file_format != "article":
-                continue
-            paper = (doc.extract_meta or {}).get("paper") or {}
-            arxiv_id = paper.get("arxiv_id")
-            oa_status = (paper.get("oa_status") or "").lower()
-            if arxiv_id:
-                url = _ARXIV_PDF.format(id=arxiv_id)
-                key = arxiv_id.replace("/", "_")
-            elif paper.get("oa_url") and oa_status in _REAL_OA:
-                url = paper["oa_url"]  # doi.org/KISTI/PMC (friendly OA). 비-OA·paywall 은 헤더검증서 skip
-                key = (paper.get("openalex_id") or paper.get("doi") or "oa").replace("/", "_")
-            else:
-                continue
-            rel_path = f"crawl_raw/papers/{key}.pdf"
-            dest = Path(settings.nas_mount_path) / rel_path
-            try:
-                size = await _download(url, dest)
-            except Exception as e:  # noqa: BLE001 — 다운로드 실패 격리
-                logger.error(f"[promote] {key} 다운로드 실패: {e}")
-                failed += 1
-                continue
-            # in-place 승격: 초록 행 → 전문 PDF 행 (file_hash·extract_meta.paper 보존)
-            doc.file_path = rel_path
-            doc.file_format = "pdf"
-            doc.file_type = "immutable"
-            doc.file_size = size
-            doc.md_status = "pending"  # marker 재실행(기존 'skipped' 해제)
-            doc.md_extraction_error = None
-            await enqueue_stage(session, doc.id, "extract")
-            await session.commit()
-            promoted += 1
-            logger.info(f"[promote] {key} → 전문 PDF in-place (doc {doc.id}, {size}b)")
-
-    logger.info(f"[paper_fulltext_promote] 승격 {promoted} · 실패 {failed} (cap {cap})")
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="논문 arXiv 전문 승격 (in-place, keyless)")
-    parser.add_argument("--bulk", action="store_true", help="cap 해제(전건 백필 — GPU 부하 주의)")
-    parser.add_argument("--limit", type=int, default=0, help="승격 상한(0=기본 cap 10)")
-    args = parser.parse_args()
-    asyncio.run(run(bulk=args.bulk, limit=args.limit))
@@ -1,562 +0,0 @@
-"""presegment_worker — extract 前 번들 PDF(여러 논리문서 한 파일) → N 자식 분할 (G2 / PR-G2-2).
-
-전 문서가 presegment stage 로 진입한다(worker-side gating):
-  - 非PDF(file_format != pdf · suffix != .pdf) = 즉시 fast-exit → enqueue_next_stage 가 extract 로 흘림.
-  - PDF = PyMuPDF ToC(level-1) deterministic 분석. '명확한 번들' 만 자식 분할, 나머지는 단일문서로 extract.
-
-deterministic 경로(PR-G2-2): 판정이 애매하면 보수적으로 분할하지 않고 단일문서로 둔다
-(bias to NOT splitting). 분할 = '확실한 번들' 만:
-  - page_count >= MIN_BUNDLE_PAGES AND level-1 ToC 항목 >= 2 AND 모든 자식 >= MIN_CHILD_PAGES
-    AND 단조 증가·비중첩 AND [1, page_count] 전 범위 커버 AND 2 <= N <= MAX_CHILDREN.
-
-LLM 경계 폴백(PR-G2-3, env PRESEGMENT_LLM_FALLBACK, 기본 OFF — scaffold-first): deterministic
-이 '명확한 번들' 을 못 만든 대형 PDF(ToC 없음/level-1 없음/게이트 미달)에 한해, OFF 면 오늘과
-동일(단일문서)이고 ON 이면 off-card Qwen(맥북, 라우터 :8890, model=qwen-macbook)에게 경계를
-제안받는다. compact per-page heading 샘플만 전송(본문 미전송). LLM 출력은 **동일 검증 게이트
-(_is_clear_bundle)** 통과 시에만 deterministic 과 같은 _create_children 경로로 분할 —
-is_bundle=false / 파싱·검증 실패 = 단일문서(오늘과 동일) + presegment_llm_rejected 로깅.
-맥북 불가(503/연결/절단)는 StageDeferred 로 큐 재시도(백오프, no silent fallback).
-
-분할 시 ★후보 A(물리분할 없음, uq_documents_file_path 해소): 자식 file_path = unique 합성값
-`{부모경로}#p{start}-{end}` (UNIQUE 제약 통과), 실파일은 `bundle_source_path()` 로 부모 경로 복원.
-자식은 bundle_page_start/end(1-based inclusive) 로 부모 파일의 자기 page 범위만 가리킨다.
-부모-자식 관계 정본 = document_lineage(relation_type='segmented_from'). 부모(presegment_role='parent')는
-파일 홀더라 자체 extract/embed 안 함 — enqueue_next_stage 의 presegment→extract 전이가 'parent' 면
-억제된다(queue_consumer 참조). 자식의 extract 는 이 워커가 직접 enqueue. extract_worker/marker_worker
-가 자식 처리 시 bundle_source_path() 로 실파일 접근.
-
-멱등: 재실행 시 같은 부모로 이미 자식이 있으면(document_lineage segmented_from) 재생성하지 않고
-수렴(각 자식이 extract 활성/완료 상태인지만 보장)한다.
-
-★해결 이력 (2026-06-18): 최초 Option A(자식이 부모 file_path 그대로 공유)는 uq_documents_file_path
-UNIQUE 위반(실번들 검증서 발견) → 합성 file_path(후보 A)로 해소. 인제스트 재활성 = 합성번들 재검증 PASS 후.
-
-plan: G2 pre-segmentation (PR-G2-2 deterministic ToC segmentation)
-"""
-
-import hashlib
-import os
-import re
-import unicodedata
-from pathlib import Path
-
-from pydantic import BaseModel, ValidationError
-from sqlalchemy import select
-from sqlalchemy.ext.asyncio import AsyncSession
-
-from ai.client import AIClient, call_deep_or_defer, parse_json_response
-from core.config import settings
-from core.utils import setup_logger
-from models.document import Document
-from models.document_lineage import DocumentLineage
-from models.queue import enqueue_stage
-
-logger = setup_logger("presegment_worker")
-
-# ─── 임계값 (모듈 상수, env-override 가능, 보수적 = 분할 안 하는 쪽으로 bias) ───
-# MIN_BUNDLE_PAGES: 이 미만이면 번들로 보지 않음(단일문서). 짧은 문서의 우연한 level-1 ToC 보호.
-MIN_BUNDLE_PAGES = int(os.getenv("PRESEGMENT_MIN_BUNDLE_PAGES", "60"))
-# MIN_CHILD_PAGES: 자식 하나라도 이 미만이면 분할 거부(표지/목차만 떼지는 over-split 방지).
-MIN_CHILD_PAGES = int(os.getenv("PRESEGMENT_MIN_CHILD_PAGES", "5"))
-# MAX_CHILDREN: 자식 수 상한. 초과 = ToC 가 챕터/소제목 수준이라 논리문서 경계가 아님 → 분할 거부.
-MAX_CHILDREN = int(os.getenv("PRESEGMENT_MAX_CHILDREN", "50"))
-
-# marker_worker._to_marker_path 와 동일 — NAS 상대경로 → 컨테이너 절대경로 prefix.
-CONTAINER_PATH_PREFIX = os.getenv("MARKER_CONTAINER_PATH_PREFIX", "/documents")
-
-# ─── PR-G2-3 LLM 경계 폴백 (scaffold-first, 기본 OFF) ───
-# PRESEGMENT_LLM_FALLBACK: 기본 "false". OFF 면 deterministic 경로만(=오늘과 동일 — 애매하면
-# 단일문서). ON 이면 deterministic 이 '명확한 번들' 을 못 만든 대형 PDF(page_count >=
-# MIN_BUNDLE_PAGES) 에 한해 off-card Qwen(맥북, 라우터 :8890 경유)에게 경계를 제안받아
-# **동일 검증 게이트(_is_clear_bundle)** 통과 시에만 deterministic 과 같은 자식 생성 경로로 분할.
-# 검증 실패/파싱 실패/is_bundle=false = 단일문서(오늘과 동일) + presegment_llm_rejected 로깅.
-PRESEGMENT_LLM_FALLBACK = os.getenv("PRESEGMENT_LLM_FALLBACK", "false").lower() in (
-    "1", "true", "yes", "on",
-)
-# LLM 에 보내는 per-page 샘플의 page 당 char 상한 (heading/첫줄만 — 본문 미전송).
-PRESEGMENT_LLM_PAGE_CHARS = int(os.getenv("PRESEGMENT_LLM_PAGE_CHARS", "80"))
-# 전체 page-sample 블록의 char 상한 (수 KB 가드 — 초과 시 잘라냄, 본문 누출/페이로드 폭발 방지).
-PRESEGMENT_LLM_SAMPLE_CHARS = int(os.getenv("PRESEGMENT_LLM_SAMPLE_CHARS", "12000"))
-
-# 경계 폴백 프롬프트 (app/prompts/presegment_boundaries.txt). system 지시 + 1-based inclusive·
-# 전범위 커버·무중첩 규칙. {page_count}/{page_samples} 를 str.replace 로 주입.
-_PRESEGMENT_PROMPT_PATH = Path(__file__).parent.parent / "prompts" / "presegment_boundaries.txt"
-
-
-class Segment(BaseModel):
-    """LLM 이 제안하는 1-based inclusive page 범위 한 조각."""
-
-    start_page: int
-    end_page: int
-    title: str | None = None
-
-
-class SegmentationOutput(BaseModel):
-    """presegment_boundaries 응답 스키마. parse_json_response → model_validate."""
-
-    is_bundle: bool = False
-    segments: list[Segment] = []
-    confidence: float | None = None
-
-
-def _resolve_path(file_path: str) -> Path | None:
-    """NFC(DB) vs NFD(NFS) 한글 경로 차이 흡수. thumbnail_worker._resolve_path 와 동일 패턴."""
-    candidates = [
-        file_path,
-        unicodedata.normalize("NFD", file_path),
-        unicodedata.normalize("NFC", file_path),
-    ]
-    for c in candidates:
-        p = Path(c)
-        if p.exists():
-            return p
-    parent = Path(file_path).parent
-    if parent.exists():
-        target = unicodedata.normalize("NFC", Path(file_path).name)
-        for child in parent.iterdir():
-            if unicodedata.normalize("NFC", child.name) == target:
-                return child
-    return None
-
-
-def _to_container_path(file_path: str) -> str:
-    """file_path 를 컨테이너 내부 절대경로로 변환 (marker_worker._to_marker_path 와 동일)."""
-    if file_path.startswith("/"):
-        return file_path
-    return f"{CONTAINER_PATH_PREFIX}/{file_path}"
-
-
-# 후보 A: 자식 합성 file_path 패턴 `{부모경로}#p{start}-{end}` (uq_documents_file_path 유일성).
-_BUNDLE_SUFFIX_RE = re.compile(r"#p\d+-\d+$")
-
-
-def bundle_source_path(file_path: str | None) -> str | None:
-    """자식 합성 file_path → 부모 실파일 경로 복원. 일반 doc(접미사 없음)은 그대로 반환.
-
-    extract_worker/marker_worker 가 자식 처리 시 실제 파일 접근에 사용 (자식 file_path 는
-    합성값이라 디스크에 없음). 결정적·세션 불필요. lineage 가 부모-자식 관계의 정본 기록.
-    """
-    if not file_path:
-        return file_path
-    return _BUNDLE_SUFFIX_RE.sub("", file_path)
-
-
-def _is_pdf(doc: Document) -> bool:
-    """PDF 판정 — file_format=pdf 또는 .pdf 확장자."""
-    fmt = (doc.file_format or "").lower()
-    if fmt == "pdf":
-        return True
-    if doc.file_path:
-        return Path(doc.file_path).suffix.lower() == ".pdf"
-    return False
-
-
-def _level1_segments(toc: list, page_count: int) -> list[dict]:
-    """get_toc(simple=True) 결과에서 level-1 항목만 골라 자식 후보 segment 리스트 생성.
-
-    toc 항목 = [level, title, page] (page 는 1-based). level==1 만 채택.
-    end_page = 다음 level-1 항목의 page - 1, 마지막 = page_count.
-    동일 page 에서 시작하는 level-1 이 여럿이면 정렬 후 인접 항목으로 경계 계산되며,
-    그 경우 0-페이지 segment 가 생겨 후속 검증(MIN_CHILD_PAGES·단조)에서 거부된다.
-    """
-    starts = []
-    for entry in toc:
-        # simple=True 는 [level, title, page]. 방어적으로 길이 체크.
-        if not entry or len(entry) < 3:
-            continue
-        level, title, page = entry[0], entry[1], entry[2]
-        if level != 1:
-            continue
-        # ToC page 가 범위 밖(0/음수/page_count 초과)이면 깨진 ToC → 후속 검증에서 거부됨.
-        starts.append((int(page), (title or "").strip()))
-
-    # ToC 가 정렬돼 있지 않을 수 있으므로 page 기준 정렬(원본 순서 보존 위해 안정 정렬).
-    starts.sort(key=lambda x: x[0])
-
-    segments: list[dict] = []
-    for i, (start_page, title) in enumerate(starts):
-        if i + 1 < len(starts):
-            end_page = starts[i + 1][0] - 1
-        else:
-            end_page = page_count
-        segments.append({"start_page": start_page, "end_page": end_page, "title": title})
-    return segments
-
-
-def _is_clear_bundle(segments: list[dict], page_count: int) -> tuple[bool, str]:
-    """deterministic '명확한 번들' 판정. (clear, reason) 반환.
-
-    clear=True 면 reason="" / clear=False 면 reason 은 거부 사유(로깅용).
-    모든 조건은 보수적 — 하나라도 어긋나면 단일문서로 처리(분할 안 함).
-    """
-    n = len(segments)
-    if n < 2:
-        return False, f"too_few_level1_entries(n={n})"
-    if n > MAX_CHILDREN:
-        return False, f"too_many_children(n={n}>{MAX_CHILDREN})"
-
-    # 첫 segment 가 1페이지에서 시작 + 마지막이 page_count 에서 끝 = 전 범위 커버.
-    if segments[0]["start_page"] != 1:
-        return False, f"first_start_not_1(start={segments[0]['start_page']})"
-    if segments[-1]["end_page"] != page_count:
-        return False, f"last_end_not_page_count(end={segments[-1]['end_page']},pc={page_count})"
-
-    prev_end = 0
-    for seg in segments:
-        start, end = seg["start_page"], seg["end_page"]
-        # 단조 증가 · 비중첩: 각 start 는 직전 end + 1 이어야 빈틈/겹침 없이 [1,pc] 정확 분할.
-        if start != prev_end + 1:
-            return False, f"non_contiguous(start={start},prev_end={prev_end})"
-        if end < start:
-            return False, f"non_monotonic(start={start},end={end})"
-        if (end - start + 1) < MIN_CHILD_PAGES:
-            return False, f"child_too_small(pages={end - start + 1}<{MIN_CHILD_PAGES})"
-        prev_end = end
-
-    if prev_end != page_count:
-        return False, f"coverage_gap(covered={prev_end},pc={page_count})"
-
-    return True, ""
-
-
-def _child_title(parent: Document, seg: dict) -> str:
-    """자식 제목 = 부모 제목 + ' — ' + (segment 제목 또는 page 범위)."""
-    base = (parent.title or "").strip() or (parent.original_filename or "") or "문서"
-    seg_title = (seg.get("title") or "").strip()
-    suffix = seg_title if seg_title else f"p.{seg['start_page']}-{seg['end_page']}"
-    return f"{base} — {suffix}"
-
-
-def _child_file_hash(parent_hash: str, start: int, end: int) -> str:
-    """자식 file_hash = sha256(f"{parent.file_hash}:{start}-{end}"). 결정적 → 재실행 멱등.
-
-    부모 file_hash 가 NULL 일 수는 없으나(NOT NULL) 방어적으로 빈 문자열 처리.
-    """
-    return hashlib.sha256(f"{parent_hash or ''}:{start}-{end}".encode("utf-8")).hexdigest()
-
-
-async def _ensure_child_extract(session: AsyncSession, child_id: int) -> None:
-    """자식이 아직 extract 안 됐으면 extract enqueue (멱등 수렴 경로).
-
-    이미 extracted_text 가 채워졌거나 활성 큐 행이 있으면 enqueue_stage 가 no-op/skip.
-    """
-    child = await session.get(Document, child_id)
-    if child is None:
-        return
-    # 이미 추출 완료면 재enqueue 불필요 (큐 중복은 enqueue_stage 가 막지만 의미상으로도 skip).
-    if child.extracted_at is not None and child.extracted_text is not None:
-        return
-    await enqueue_stage(session, child_id, "extract")
-
-
-async def _create_children(
-    doc: Document, segments: list[dict], session: AsyncSession
-) -> int:
-    """검증된 segments 로 자식 N개 생성 + lineage + extract enqueue + 부모 표식 (멱등).
-
-    deterministic '명확한 번들' 경로와 LLM 폴백 경로가 공유하는 단일 자식 생성 경로.
-    호출 전 segments 는 반드시 _is_clear_bundle 검증을 통과해야 한다(여기선 재검증 X).
-    commit 까지 수행. 반환값 = 실제 생성한 자식 수(이미 존재해 수렴만 한 경우 0).
-    """
-    # ─── 멱등 체크: 이미 자식이 있으면 수렴만 (재생성 금지) ───
-    existing_children = (
-        await session.execute(
-            select(DocumentLineage.derived_document_id).where(
-                DocumentLineage.source_document_id == doc.id,
-                DocumentLineage.relation_type == "segmented_from",
-            )
-        )
-    ).scalars().all()
-
-    if existing_children:
-        # 부모 표식이 누락된 경우 보정(이전 부분실패 복구).
-        if doc.presegment_role != "parent":
-            doc.presegment_role = "parent"
-        for child_id in existing_children:
-            await _ensure_child_extract(session, child_id)
-        await session.commit()
-        logger.info(
-            f"[presegment] id={doc.id} children already exist "
-            f"(n={len(existing_children)}) → converge(ensure extract), no re-create"
-        )
-        return 0
-
-    # ─── 자식 N개 생성 + lineage + extract enqueue ───
-    created_ids: list[int] = []
-    for seg in segments:
-        start, end = seg["start_page"], seg["end_page"]
-        child = Document(
-            # 후보 A: 자식 file_path = unique 합성값 `{부모경로}#p{s}-{e}` (uq_documents_file_path
-            # 충돌 회피). 실파일은 bundle_source_path() 로 복원(부모 경로). 물리 분할 없음 —
-            # 자식은 bundle_page_start/end 로 부모 파일을 슬라이스.
-            file_path=f"{doc.file_path}#p{start}-{end}",
-            file_hash=_child_file_hash(doc.file_hash, start, end),
-            file_format=doc.file_format,
-            file_size=doc.file_size,
-            file_type=doc.file_type,
-            import_source=doc.import_source,
-            original_filename=doc.original_filename,
-            source_channel=doc.source_channel,
-            category=doc.category,
-            data_origin=doc.data_origin,
-            doc_purpose=doc.doc_purpose,
-            # 안전 자료실 축은 부모에서 상속(분할이 자료유형/관할을 바꾸지 않음).
-            material_type=doc.material_type,
-            jurisdiction=doc.jurisdiction,
-            title=_child_title(doc, seg),
-            bundle_page_start=start,
-            bundle_page_end=end,
-            presegment_role="child",
-        )
-        session.add(child)
-        await session.flush()  # child.id 확보
-        created_ids.append(child.id)
-
-        session.add(
-            DocumentLineage(
-                source_document_id=doc.id,
-                derived_document_id=child.id,
-                relation_type="segmented_from",
-                meta={"start_page": start, "end_page": end},
-            )
-        )
-        # 자식 extract 는 워커가 직접 enqueue (부모는 'parent' 라 extract 로 흐르지 않음).
-        await enqueue_stage(session, child.id, "extract")
-
-    # 부모 = 파일 홀더. presegment→extract 전이는 enqueue_next_stage 가 'parent' 면 억제.
-    doc.presegment_role = "parent"
-    await session.commit()
-
-    logger.info(
-        f"[presegment] id={doc.id} SPLIT into {len(created_ids)} children "
-        f"child_ids={created_ids}"
-    )
-    return len(created_ids)
-
-
-def _segments_from_output(out: "SegmentationOutput") -> list[dict]:
-    """SegmentationOutput.segments(Pydantic) → _is_clear_bundle / _create_children 가 쓰는 dict 형태."""
-    return [
-        {"start_page": s.start_page, "end_page": s.end_page, "title": (s.title or "")}
-        for s in out.segments
-    ]
-
-
-def _page_samples(pdf, page_count: int) -> str:
-    """LLM 입력용 compact per-page 샘플 — page 당 heading/첫줄만(`p{n}: {firstline}`).
-
-    PyMuPDF page.get_text() 로 page 별 텍스트를 스트리밍하되 page 당 첫 비공백 줄만,
-    PRESEGMENT_LLM_PAGE_CHARS 로 잘라 본문 누출 차단. 전체 블록은 PRESEGMENT_LLM_SAMPLE_CHARS
-    가드로 상한(수 KB) — 초과 시 그 지점에서 중단(앞쪽 페이지 우선 보존).
-    """
-    lines: list[str] = []
-    total = 0
-    for i in range(page_count):
-        try:
-            text = pdf[i].get_text() or ""
-        except Exception:
-            text = ""
-        first = ""
-        for ln in text.splitlines():
-            ln = ln.strip()
-            if ln:
-                first = ln
-                break
-        first = first[:PRESEGMENT_LLM_PAGE_CHARS]
-        entry = f"p{i + 1}: {first}"
-        if total + len(entry) + 1 > PRESEGMENT_LLM_SAMPLE_CHARS:
-            break
-        lines.append(entry)
-        total += len(entry) + 1
-    return "\n".join(lines)
-
-
-async def _llm_boundary_fallback(
-    doc: Document, source: Path, page_count: int, session: AsyncSession
-) -> bool:
-    """애매 + 대형(ToC-less 등) PDF 에 대해 off-card Qwen 으로 경계 제안 → 검증 → 분할.
-
-    반환 True = LLM 경로가 분할을 수행(또는 멱등 수렴)했으므로 호출자는 추가 처리 없이 return.
-    반환 False = is_bundle=false / 파싱 실패 / 검증 실패 → 호출자는 단일문서(오늘과 동일) 처리.
-    맥북 불가(503/연결/절단)는 call_deep_or_defer 가 StageDeferred 로 raise → 큐 재시도(백오프).
-    silent fallback 금지 — deep 슬롯 외 다른 backend 자동 호출 안 함.
-    """
-    import fitz  # PyMuPDF — deterministic 경로와 동일 의존
-
-    # per-page 샘플은 파일을 다시 열어 스트리밍(deterministic with 블록과 분리해 그 경로 무회귀).
-    try:
-        with fitz.open(str(source)) as pdf:
-            samples = _page_samples(pdf, page_count)
-    except Exception as exc:
-        logger.warning(
-            f"[presegment] id={doc.id} llm fallback sample 실패 "
-            f"({type(exc).__name__}: {exc}) → single doc(extract)"
-        )
-        return False
-
-    try:
-        template = _PRESEGMENT_PROMPT_PATH.read_text(encoding="utf-8")
-    except Exception as exc:
-        logger.warning(
-            f"[presegment] id={doc.id} prompt 로드 실패 ({type(exc).__name__}: {exc}) "
-            f"→ single doc(extract)"
-        )
-        return False
-
-    prompt = template.replace("{page_count}", str(page_count)).replace(
-        "{page_samples}", samples
-    )
-
-    # off-card 호출 — call_deep_or_defer 가 deep 슬롯(맥북, 라우터 :8890, model=qwen-macbook)
-    # 으로 라우팅. 맥북 불가는 StageDeferred 로 전파(여기서 잡지 않음 → 큐가 보류/백오프).
-    # classify_worker 와 동일하게 AIClient() 인스턴스화.
-    client = AIClient()
-    try:
-        raw = await call_deep_or_defer(client, prompt)
-    finally:
-        await client.close()
-
-    parsed = parse_json_response(raw)
-    if not parsed:
-        logger.info(
-            f"[presegment] presegment_llm_rejected id={doc.id} "
-            f"reason=parse_failed raw={raw[:160]!r} → single doc(extract)"
-        )
-        return False
-
-    try:
-        out = SegmentationOutput.model_validate(parsed)
-    except (ValidationError, ValueError, TypeError) as exc:
-        logger.info(
-            f"[presegment] presegment_llm_rejected id={doc.id} "
-            f"reason=schema_invalid({type(exc).__name__}) → single doc(extract)"
-        )
-        return False
-
-    if not out.is_bundle:
-        logger.info(
-            f"[presegment] presegment_llm_rejected id={doc.id} "
-            f"reason=is_bundle_false → single doc(extract)"
-        )
-        return False
-
-    segments = _segments_from_output(out)
-    clear, reason = _is_clear_bundle(segments, page_count)
-    if not clear:
-        # LLM 출력을 그대로 믿지 않음 — deterministic 과 동일 게이트 미달이면 단일문서.
-        logger.info(
-            f"[presegment] presegment_llm_rejected id={doc.id} "
-            f"reason={reason} n={len(segments)} pages={page_count} → single doc(extract)"
-        )
-        return False
-
-    n = await _create_children(doc, segments, session)
-    logger.info(
-        f"[presegment] id={doc.id} LLM-SPLIT accepted "
-        f"(pages={page_count} n={len(segments)} created={n} "
-        f"confidence={out.confidence})"
-    )
-    return True
-
-
-async def process(document_id: int, session: AsyncSession) -> None:
-    """presegment stage 워커 진입점. queue_consumer 가 호출.
-
-    전 문서가 진입하며, 非PDF·단일문서는 변경 없이 통과(presegment_role 그대로 NULL) → extract 로 흐른다.
-    '명확한 번들' PDF 만 자식 분할 + 부모를 'parent' 로 표식(이 경우 부모는 extract 로 흐르지 않음).
-    """
-    doc = await session.get(Document, document_id)
-    if doc is None:
-        logger.warning(f"[presegment] document {document_id} not found")
-        return
-
-    # ─── (0) 非PDF — fast-exit. presegment_role 그대로 NULL → enqueue_next_stage 가 extract 로 흘림 ───
-    if not _is_pdf(doc):
-        logger.info(f"[presegment] id={document_id} non-pdf (fmt={doc.file_format}) → extract")
-        return
-
-    # ─── (0.5) file_path 없음(예: note) — 분할 불가, 단일문서로 통과 ───
-    if not doc.file_path:
-        logger.info(f"[presegment] id={document_id} no file_path → extract")
-        return
-
-    # ─── (1) 이미 분할된 자식 자신이 presegment 로 다시 들어온 경우 — 재분할 금지 ───
-    # (정상 흐름에선 자식은 곧장 extract 로 enqueue 되지만, 재처리 스크립트 등으로 들어올 수 있음.)
-    if doc.presegment_role in ("child", "parent"):
-        logger.info(
-            f"[presegment] id={document_id} already presegment_role={doc.presegment_role} → skip"
-        )
-        return
-
-    # ─── (2) 파일 열기 + page_count ───
-    raw = str(Path(settings.nas_mount_path) / doc.file_path)
-    source = _resolve_path(raw)
-    if source is None:
-        # 파일 부재 = extract 가 동일 상황에서 FileNotFoundError 로 처리할 사안.
-        # presegment 는 분할 불가일 뿐이므로 단일문서로 통과시켜 extract 가 일관되게 처리하게 둔다.
-        logger.warning(f"[presegment] id={document_id} file not found ({raw}) → extract")
-        return
-
-    import fitz  # PyMuPDF — extract_worker/marker_worker 와 동일 의존
-
-    try:
-        with fitz.open(str(source)) as pdf:
-            page_count = pdf.page_count
-            toc = pdf.get_toc(simple=True) or []
-    except Exception as exc:
-        # PDF 손상 등 — 분할 불가. 단일문서로 통과(extract 가 PyMuPDF/OCR 로 재시도하며 가시화).
-        logger.warning(
-            f"[presegment] id={document_id} fitz open/toc failed "
-            f"({type(exc).__name__}: {exc}) → extract"
-        )
-        return
-
-    # ─── (3) page_count 가 임계 미만 = 단일문서 (대다수 경로) ───
-    if page_count < MIN_BUNDLE_PAGES:
-        logger.info(
-            f"[presegment] id={document_id} single doc "
-            f"(pages={page_count}<{MIN_BUNDLE_PAGES}) → extract"
-        )
-        return
-
-    # ─── (4) level-1 ToC → 자식 후보 segment ───
-    segments = _level1_segments(toc, page_count)
-
-    if not segments:
-        # 큰 PDF 인데 ToC 없음/level-1 없음 = 애매. flag ON 이면 LLM 경계 폴백(PR-G2-3),
-        # OFF(기본) 이면 오늘과 동일 — 단일문서로 처리하고 사유를 남긴다.
-        if PRESEGMENT_LLM_FALLBACK:
-            logger.info(
-                f"[presegment] presegment_ambiguous id={document_id} "
-                f"reason=no_level1_toc pages={page_count} → LLM fallback"
-            )
-            if await _llm_boundary_fallback(doc, source, page_count, session):
-                return
-            # LLM 이 분할하지 않음(is_bundle=false / 검증·파싱 실패) — 단일문서.
-            return
-        logger.info(
-            f"[presegment] presegment_ambiguous id={document_id} "
-            f"reason=no_level1_toc pages={page_count} → single doc(extract)"
-        )
-        return
-
-    clear, reason = _is_clear_bundle(segments, page_count)
-    if not clear:
-        # 큰 PDF + ToC 는 있으나 '명확한 번들' 기준 미달 = 애매. flag ON 이면 LLM 경계 폴백,
-        # OFF(기본) 이면 오늘과 동일 — 단일문서(분할 안 함).
-        if PRESEGMENT_LLM_FALLBACK:
-            logger.info(
-                f"[presegment] presegment_ambiguous id={document_id} "
-                f"reason={reason} pages={page_count} level1={len(segments)} → LLM fallback"
-            )
-            if await _llm_boundary_fallback(doc, source, page_count, session):
-                return
-            return
-        logger.info(
-            f"[presegment] presegment_ambiguous id={document_id} "
-            f"reason={reason} pages={page_count} level1={len(segments)} → single doc(extract)"
-        )
-        return
-
-    # ─── (5) 명확한 번들 (deterministic) — 공유 자식 생성 경로 (멱등 수렴 포함) ───
-    await _create_children(doc, segments, session)
@@ -13,27 +13,17 @@ from sqlalchemy import select, update, delete, exists
 from sqlalchemy.exc import IntegrityError, SQLAlchemyError
 from sqlalchemy.orm import aliased

-from core.config import settings
 from core.database import async_session
 from core.utils import setup_logger
-from models.queue import ProcessingQueue, StageDeferred, enqueue_stage, not_deferred_condition
+from models.queue import ProcessingQueue, enqueue_stage

 logger = setup_logger("queue_consumer")

-# pipeline.held_stages 안내 로그는 1분 사이클마다 반복하지 않고 최초 1회만.
-_hold_logged = False
-
 # stage별 배치 크기
 # stt 는 GPU 단일 점유 + 회의 30분짜리도 가능 → 배치 1. thumbnail 은 ffmpeg subprocess 로 가벼움.
 # deep_summary (PR-B B-1) 는 MLX 26B 단일 Semaphore(1) 경유 → 배치 1.
-# fulltext 는 politeness 지연(같은 도메인 5–15s)이 배치 내 직렬로 걸린다 — 배치 3 이면
-# 같은 도메인 최악 ~45s/사이클, 메인 큐 1m 간격(max_instances=1, coalesce)이 흡수.
-# embed/chunk 1→10 (2026-06-12 fast-consumer): 건당 <1s 실측 — Phase 0.1 초기 보수값이
-# LLM 사이클에 인질로 잡혀 실효 ~580/일 vs 수요 최대 2,700/일 → 적체 원인이었음.
-# 10 = TEI/marker 와 GPU 공유 고려한 보수 상향(전용 1분 잡 기준 캡 ~14,400/일).
-BATCH_SIZE = {"presegment": 3, "extract": 5, "classify": 3, "summarize": 3, "embed": 10,
-              "chunk": 10, "preview": 2, "stt": 1, "thumbnail": 3, "deep_summary": 1,
-              "markdown": 1, "fulltext": 3}
+BATCH_SIZE = {"extract": 5, "classify": 3, "summarize": 3, "embed": 1, "chunk": 1,
+              "preview": 2, "stt": 1, "thumbnail": 3, "deep_summary": 1, "markdown": 1}
 STALE_THRESHOLD_MINUTES = 10
 # markdown 대형 split 변환은 한 doc 이 수십 분(5210 ≈ 40분) 동안 processing 상태로 머문다.
 # marker_worker 는 queue 행에 heartbeat 를 찍지 않으므로(started_at 고정), main 의 10분
@@ -41,26 +31,14 @@ STALE_THRESHOLD_MINUTES = 10
 # 따라서 markdown consumer 는 별도의 generous 임계를 쓴다.
 MARKDOWN_STALE_THRESHOLD_MINUTES = int(os.getenv("MARKDOWN_STALE_MINUTES", "120"))

-# consume_queue(메인) 가 담당하는 stage. markdown 은 consume_markdown_queue,
-# embed/chunk 는 consume_fast_queue (2026-06-12) 로 분리 — 세 집합은 disjoint
-# (reset_stale_items 가 자기 집합만 reset, 교차 시 이중 복구 위험).
+# consume_queue(메인) 가 담당하는 stage. markdown 은 consume_markdown_queue 로 분리.
 # STT 도 장기 작업 가능성이 있으나 본 PR 범위 밖 — main 에 유지(follow-up).
 MAIN_QUEUE_STAGES = [
-    "presegment", "extract", "classify", "summarize",
-    "preview", "stt", "thumbnail", "fulltext",
+    "extract", "classify", "summarize", "embed", "chunk",
+    "preview", "stt", "thumbnail", "deep_summary",
 ]
 MARKDOWN_QUEUE_STAGES = ["markdown"]

-# 2026-06-15: deep_summary(26B, 콜당 70~300s)를 메인 루프에서 분리 (markdown/fast 선례).
-# 단일 deep 호출이 1분 틱을 초과해 메인 consume_queue 가 영구 coalesce 되고 extract/
-# classify 등 경량 stage 까지 굶던 문제 제거. 집합 disjoint(자기 집합만 stale reset).
-DEEP_QUEUE_STAGES = ["deep_summary"]
-
-# 고속(비-LLM·경량 GPU) stage — LLM 사이클(분 단위)에서 분리해 1분 잡 전용 소비.
-# embed/chunk 는 건당 <1s 라 main 루프에 두면 classify(~190s×3) 뒤에서 굶는다
-# (2026-06-12 실측: 적체 3,570 · 4070 가동률 0%). markdown 분리(05-01)와 동일 패턴.
-FAST_QUEUE_STAGES = ["embed", "chunk"]
-

 async def reset_stale_items(stages, threshold_minutes=STALE_THRESHOLD_MINUTES):
    """processing 상태로 오래 방치된 항목 복구 (지정 stage 한정)
@@ -159,16 +137,9 @@ async def enqueue_next_stage(document_id: int, current_stage: str):
    # source_channel-aware override (extract stage 만). source_channel 누락 시 _default.
    extract_override_by_channel = {
        "devonagent": ["embed", "chunk"],
-        # crawl 채널 파일형 (KOSHA 첨부/GUIDE PDF 등): preview 사전 캐시 스킵 —
-        # 재료 코퍼스 대량 백필이 preview 큐를 점령하지 않게. classify → embed/chunk/markdown 유지.
-        "crawl": ["classify"],
    }

    next_stages = {
-        # G2 (PR-G2-2): 전 문서가 presegment → extract. 단, 번들 분할로 'parent' 가 된 문서는
-        # 파일 홀더라 자체 extract 안 함 — 아래 suppression 으로 이 전이를 건너뛴다(자식 extract 는
-        # presegment_worker 가 직접 enqueue). 단일/非PDF 문서(role NULL)는 정상적으로 extract 로 흐름.
-        "presegment": ["extract"],
        "extract": ["classify", "preview"],
        "classify": ["embed", "chunk", "markdown"],
        "stt": ["classify"],
@@ -184,18 +155,6 @@ async def enqueue_next_stage(document_id: int, current_stage: str):
            stages = extract_override_by_channel[sc]
        else:
            stages = next_stages.get(current_stage, [])
-    elif current_stage == "presegment":
-        # 번들 분할 parent 는 extract 로 흐르지 않게 억제 (자식이 부모 extract 에 가려지는 것 방지).
-        # role NULL(단일/非PDF) / 'child' 는 정상 전이. presegment_worker 가 자식 extract 를 직접
-        # enqueue 하므로 'parent' 만 여기서 no-op.
-        from models.document import Document
-        async with async_session() as lookup_session:
-            doc = await lookup_session.get(Document, document_id)
-            role = doc.presegment_role if doc else None
-        if role == "parent":
-            stages = []
-        else:
-            stages = next_stages.get(current_stage, [])
    else:
        stages = next_stages.get(current_stage, [])

@@ -215,17 +174,13 @@ def _load_workers():
    from workers.deep_summary_worker import process as deep_summary_process
    from workers.embed_worker import process as embed_process
    from workers.extract_worker import process as extract_process
-    from workers.presegment_worker import process as presegment_process
    from workers.preview_worker import process as preview_process
    from workers.stt_worker import process as stt_process
    from workers.summarize_worker import process as summarize_process
    from workers.thumbnail_worker import process as thumbnail_process
    from workers.marker_worker import process as marker_process
-    from workers.fulltext_worker import process as fulltext_process

    return {
-        # G2 (PR-G2-2): extract 前 번들 PDF → N 자식 분할 (deterministic ToC). 非PDF/단일은 통과.
-        "presegment": presegment_process,
        "extract": extract_process,
        "classify": classify_process,
        "summarize": summarize_process,
@@ -240,9 +195,6 @@ def _load_workers():
        # Phase 1B: classify 완료 후 enqueue. PDF→markdown 변환 (leaf, embed/chunk 와 독립).
        # consume_markdown_queue 가 전담 (대형 split 변환이 메인 파이프라인을 막지 않도록).
        "markdown": marker_process,
-        # crawl-24x7 A-2: 기사 페이지 fetch → 4-tier 본문 승격. 후속(summarize/embed/chunk)은
-        # 워커가 직접 enqueue — next_stages dict 미등록 (enqueue_next_stage no-op).
-        "fulltext": fulltext_process,
    }


@@ -254,14 +206,13 @@ async def _process_stage(stage, worker_fn):
    """
    batch_size = BATCH_SIZE.get(stage, 3)

-    # pending 항목 조회 (보류 백오프 deferred_until 미래 항목 제외 — ds-macbook-offload-1)
+    # pending 항목 조회
    async with async_session() as session:
        result = await session.execute(
            select(ProcessingQueue.id, ProcessingQueue.document_id)
            .where(
                ProcessingQueue.stage == stage,
                ProcessingQueue.status == "pending",
-                not_deferred_condition(),
            )
            .order_by(ProcessingQueue.created_at)
            .limit(batch_size)
@@ -294,15 +245,7 @@ async def _process_stage(stage, worker_fn):
                                item.status = "completed"
                                item.completed_at = datetime.now(timezone.utc)
                                await skip_session.commit()
-                        # 완료 커밋 후 enqueue — 실패가 outer except 로 전파돼 completed 재오픈
-                        # 되지 않게 격리 (R3, 정상 완료 경로와 동일 처리).
-                        try:
-                            await enqueue_next_stage(document_id, stage)
-                        except Exception as enq_err:
-                            logger.error(
-                                f"[{stage}] document_id={document_id} skip(note) 완료됐으나 "
-                                f"다음 단계 enqueue 실패: {enq_err}"
-                            )
+                        await enqueue_next_stage(document_id, stage)
                        logger.info(f"[{stage}] document_id={document_id} skip (note)")
                        continue

@@ -320,37 +263,9 @@ async def _process_stage(stage, worker_fn):
                item.completed_at = datetime.now(timezone.utc)
                await session.commit()

-            # 완료는 이미 커밋됨. enqueue_next_stage 실패가 outer except 로 전파되면
-            # completed 항목을 재오픈(pending/failed)해 같은 단계를 재실행 = 비싼 작업 중복
-            # + 부분 재쓰기. 자체 try 로 격리하고 ERROR 로 가시화한다 (R3).
-            try:
-                await enqueue_next_stage(document_id, stage)
-            except Exception as enq_err:
-                logger.error(
-                    f"[{stage}] document_id={document_id} 완료됐으나 다음 단계 enqueue 실패: {enq_err}"
-                )
+            await enqueue_next_stage(document_id, stage)
            logger.info(f"[{stage}] document_id={document_id} 완료")

-        except StageDeferred as defer:
-            # 보류 (ds-macbook-offload-1): 맥북 일시 불가(sleep/cold/editor_busy) — 실패 아님.
-            # attempts 는 claim 시 선증가분을 반환(미소모)하고 deferred_until 백오프 후 자연 재개.
-            # 워커는 완주 전 doc 쓰기를 하지 않으므로 이 시점의 데이터 변경 = 0 (sleep-안전).
-            async with async_session() as session:
-                item = await session.get(ProcessingQueue, queue_id)
-                if not item:
-                    logger.warning(f"[{stage}] queue_id={queue_id} 없음 (삭제됨?), skip")
-                    continue
-                item.status = "pending"
-                item.started_at = None
-                item.attempts = max(0, item.attempts - 1)
-                until = datetime.now(timezone.utc) + timedelta(minutes=defer.retry_after_minutes)
-                item.payload = {**(item.payload or {}), "deferred_until": until.isoformat()}
-                await session.commit()
-            logger.info(
-                f"[{stage}] document_id={document_id} 보류({defer}) — "
-                f"{defer.retry_after_minutes}분 후 재개"
-            )
-
        except Exception as e:
            # 실패 처리
            async with async_session() as session:
@@ -389,43 +304,14 @@ async def _process_stage(stage, worker_fn):

 async def consume_queue():
    """메인 큐 소비자 — markdown 제외 전 stage 를 1분 간격으로 처리."""
-    global _hold_logged
    workers = _load_workers()

-    held = [s for s in MAIN_QUEUE_STAGES if s in settings.pipeline_held_stages]
-    if held and not _hold_logged:
-        logger.info(f"pipeline.held_stages 보류 중: {held} — claim 하지 않음 (pending 적체 = 의도)")
-        _hold_logged = True
-
    try:
        await reset_stale_items(MAIN_QUEUE_STAGES, STALE_THRESHOLD_MINUTES)
    except Exception:
        logger.exception("stale reset failed, but continuing queue consumption")

    for stage in MAIN_QUEUE_STAGES:
-        if stage in settings.pipeline_held_stages:
-            continue
-        await _process_stage(stage, workers[stage])
-
-
-async def consume_fast_queue():
-    """embed/chunk 전용 고속 소비자 — LLM 사이클과 완전 디커플 (2026-06-12).
-
-    main 루프는 classify/summarize/deep 가 사이클을 분 단위로 점유해 건당 <1s 짜리
-    embed/chunk 가 사이클당 1번씩만 기회를 얻었다 (실효 ~60건/시 = 적체 원인).
-    분리 후 = 1분 잡 × 배치 10 → 캡 ~600건/시. APScheduler max_instances=1 이라
-    배치가 1분을 넘으면 다음 fire 는 coalesce (폭주 방지).
-    """
-    workers = _load_workers()
-
-    try:
-        await reset_stale_items(FAST_QUEUE_STAGES, STALE_THRESHOLD_MINUTES)
-    except Exception:
-        logger.exception("fast stale reset failed, but continuing queue consumption")
-
-    for stage in FAST_QUEUE_STAGES:
-        if stage in settings.pipeline_held_stages:
-            continue
        await _process_stage(stage, workers[stage])


@@ -445,24 +331,3 @@ async def consume_markdown_queue():

    for stage in MARKDOWN_QUEUE_STAGES:
        await _process_stage(stage, workers[stage])
-
-
-async def consume_deep_queue():
-    """deep_summary 전용 큐 소비자 (2026-06-15) — 26B 심층요약을 메인 파이프라인과 분리.
-
-    deep_summary 1콜이 70~300s(맥미니 Qwen 27B 폴백)라 메인 consume_queue(1분 틱) 안에
-    있으면 매 틱이 interval 을 초과해 영구 "maximum running instances" coalesce 되고
-    extract/classify 등 경량 stage 까지 함께 굶었다. 분리 후 = deep 만 자기 1분 잡에서
-    coalesce, 나머지 메인 루프는 틱 내 완료. max_instances=1 로 동시 deep 2건은 방지.
-    """
-    workers = _load_workers()
-
-    try:
-        await reset_stale_items(DEEP_QUEUE_STAGES, STALE_THRESHOLD_MINUTES)
-    except Exception:
-        logger.exception("deep stale reset failed, but continuing queue consumption")
-
-    for stage in DEEP_QUEUE_STAGES:
-        if stage in settings.pipeline_held_stages:
-            continue
-        await _process_stage(stage, workers[stage])
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
hyungi	0cbb188375	feat(ui): DS 웹 UI 세이지 재설계 5페이지 일괄 배포 (대시보드·문서·뉴스·digest·메모) goal 1~4 + digest. 적대 리뷰(15-agent) 확정 결함 반영본. frontend-only. 원본 커밋 = gitea feat/ui-sage-all `a1a46f2`. eid 마이그레이션 301~305 워킹트리 격리(미티게이션) 미접촉. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 11:12:17 +00:00
hyungi	3a865978af	feat(ui): 홈 대시보드 데일리 홈 cockpit 재설계 (안1 골격+안2 위젯+안3 분포) 확정 시안 dashboard-sage-3 의 권장 합성(안1 데일리 홈 골격 + 안2 검토/파이프라인 위젯 + 안3 도메인 분포 한 줄)으로 콘텐츠 재구조화. F1 세이지 테마 위 레이아웃 개편. - 인사 헤더 + 오늘 요약 띠(검토 대기 + 디제스트 톱 + 스탯 띠) - 2열: 좌(빠른 캡처·활동 타임라인) / 우(학습·도메인 분포+파이프라인 칩·고정) - digest/도메인 분포는 기존 엔드포인트 wiring(백엔드 변경 0), 학습 streak는 링크형 degrade Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:57:59 +00:00
hyungi	4001949d78	fix(study): 복습함 탭 전환 시 선택 초기화 (탭별 독립 선택) 검토 지적: 탭 바꿔도 selected 잔존 → 탭별 독립 선택으로 setTab 에서 selected={} 리셋. (선택 복습은 이미 현재 탭 shown 기준이라 데이터 오염은 없었고 UX 정합 개선.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:22:36 +00:00
hyungi	af07fb427b	feat(study): 복습함(B4 v1) — 오늘 할 일/미확인 2탭 + 멀티셀렉트 선택 복습 /study/review-box: GET /study-cards/due(review_stage) 를 2탭 분리(오늘 할 일=review_stage 보유 / 미확인=review_stage null 신규). 카드 멀티셀렉트 → pendingReviewCards store 로 cards-study 복습 세션에 선택분 전달(백엔드 세션 X = eid contention 중 fastapi 무재빌드). '이 탭 전체 복습'도. 완료 탭은 졸업카드 엔드포인트 필요라 비활성('추후'). 허브에 복습함 진입 카드. - 신규 store /stores/studySession.ts(pendingReviewCards). cards-study startReview 가 consume. 전부 frontend-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:18:13 +00:00
hyungi	50b070fcd3	polish(study): 암기카드 학습 문구 다듬기 + '이 카드 이상해요' 버튼 강조 시안 합의본 문구 실제 반영: 탭하면 정답이 보여요 / 봤어요·다음 / 오늘 복습을 마쳤어요 / 애매하거나 몰랐던 카드는 내일 다시 만나요 / 공부로 돌아가기 / 앞—떠올리기 / 평가 sublabel 내일 다시·N일 뒤. 키보드 힌트(Space·Enter)는 sm:inline(데스크탑만). 플래그 버튼=흐린 텍스트→테두리 칩(hover 경고색). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:07:02 +00:00
hyungi	56d1caf1c8	fix(study): 검수함 카드 마크다운+수식 렌더 — 근거/앞면/정답 cards-review view 모드가 cue/cloze/fact/근거를 평문으로 뿌려 표·굵게·수식이 raw 노출. cards-study와 동일하게 renderMathMarkdown(근거 블록)·renderMathMarkdownInline(앞면·정답) 적용. 편집모드 textarea는 raw 유지. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 06:43:10 +00:00
hyungi	c0772bc21c	fix(study): 검수함 each_key_duplicate 크래시 — 자료(수동) 그룹 null 키 중복 해소 manual 카드 그룹은 source_question_id=null 이라 자료가 2개+ 면 {#each ... (g.source_question_id)} 키 중복 → Svelte each_key_duplicate 크래시. 키를 (source_question_id ?? question_text) 고유값으로 변경. 추가로 자료 그룹은 approve-batch 가 source_question_id:int 필수라 422 → 일괄승인 버튼을 question 그룹에만 노출. 개별 승인/수정/삭제는 cardId 기반이라 자료도 정상. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 06:39:16 +00:00
hyungi	a812d93c64	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 06:14:22 +00:00
hyungi	e0cd89aef7	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 06:07:49 +00:00
hyungi	420ee0fefb	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 05:47:08 +00:00
hyungi	56df3d3b4f	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 05:37:29 +00:00
hyungi	b9802fcee5	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 02:45:15 +00:00
hyungi	4eb01bcd11	Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder	2026-06-07 02:38:39 +00:00
hyungi	eda1da1507	merge(study): 카드 그룹핑(material)	2026-06-07 00:41:26 +00:00
hyungi	b679f3ecb5	merge(study): 암기카드 검수 UI	2026-06-06 23:49:25 +00:00
hyungi	9334c50bd3	merge(study): 암기노트 P1 HR/A (정정훅·needs_review 큐·알람)	2026-06-06 23:09:10 +00:00
hyungi	0bd82a7b0f	merge(study): 암기노트 Phase 1 추출 파이프라인 (additive, disjoint)	2026-06-06 12:39:39 +00:00
hyungi	2e5baa8329	feat(email): PKM 폴더 선별 수집 + eml 본문 추출 활성화 mailplus_archive: INBOX 전체 → MAILPLUS_FOLDER(기본 PKM) 선별 수집, eml extract 스킵 가드 제거(검색·색인 편입), 폴더별 UID 상태(job_name=mailplus:<folder>), 폴더 부재 시 no-op. extract_worker: _extract_eml 신설(From/To/Date/Subject 헤더 prepend + 본문 text/plain 우선·html→bs4 fallback, 첨부 extract_meta 인벤토리 scaffold). preview/marker 는 eml 자동 skip 이라 라우팅 무변경. DNS drift(D25, mailplus.hyungi.net→.227) 교정 후 활성. 할일(events) 연계 없음. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 15:41:17 +09:00