Compare commits

..

9 Commits

Author SHA1 Message Date
hyungi 250896cdfa feat(eid): deep 모드 = ReAct 자동검색 + 근거 카드 (ds-eid-ask-absorb P1)
- deep 분기 _eid_chat_deep: 비생성 probe → phase:searching → agentic_ask_loop
  (tool_choice=auto 가 검색 여부 자율 판단, 검색 불요는 early-exit 대화) → final_answer
  + eid_sources envelope → DONE. heartbeat {phase:ping}(~10s, 프록시 idle timeout 차단)
  · mid-stream BackendUnavailable → in-stream error envelope · disconnect 시 task.cancel()
  + await(고아화·27B 점유 방지).
- daily = call_stream 무변경(맥미니 대화). deep = 맥북 27B ReAct (tool calling 27B 전용,
  맥미니 26B token-leak 미검증). 멀티턴 = 메시지 단독 처리(agentic_ask_loop query: str,
  history 2단계 백로그).
- EidEvidenceCard.svelte 접이식 근거 카드(sources 순서번호·제목·점수) + 프론트 SSE 파서
  확장(ping/searching/error/eid_sources) + 검색 중 표시 + 이력 보존.
- 테스트: deep 4건(검색성/대화성/probe-503/mid-stream-error) + 기존 call_stream 회귀 daily
  로 이전 = 29 passed.
- 동반(이전 eid-chat 세션 미커밋): /api/eid/status endpoint + llm_gate.gate_status +
  test_eid_status (채팅 대기 UI 의 '대기 vs 고장' 구분용, 5 passed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:51:00 +09:00
hyungi 9fb3de6e0a fix(eid): Caddyfile encode 응답 매처 문법 — header 필드·값 한 쌍씩(여러 줄 OR)
한 줄 다중 값은 'malformed header matcher' 파싱 에러로 caddy 기동 실패
(로컬 E2E 라이브 기동에서 검출 — compose build 는 못 잡는 결함).
HTML gzip 동작 + SSE(text/event-stream) 비압축 증분 스트리밍 라이브 검증 완료.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:16:44 +09:00
hyungi cd06ef0403 feat(eid): 이드 채팅 표면 — /api/eid/chat SSE 스트리밍 + /chat 페이지 (P1)
- compose: eid_chat surface 등록(persona+rules, 자유-prose) + rules_present() 라이브 판정(D-6 fail-closed)
- EidAIClient.call_stream: 닫힌 mode 매핑(daily→mac-mini-default/deep→qwen-macbook), router 경유,
  MLX gate(FOREGROUND)+wall-clock 300s deadline, SSE 라인 relay(model→mode 치환·usage 제거),
  router 400 fail-loud, error_reason allowlist sanitize
- POST /api/eid/chat: JWT, role=system 422 거부, 8000자/40턴/총량 32000 cap,
  503 error_reason(ask 컨벤션), 본문 무로깅
- frontend /chat: 이드 표면 문법(일상/심층, 모델·머신명 비노출), SSE 파서(경계 buf·flush·[DONE]),
  error_reason UX, 8000자 선차단+422 오염 차단, localStorage 이력(logout 시 제거), nav 등록
- Caddyfile: encode 명시 match로 text/event-stream gzip 버퍼링 제외
- tests: 신규 32+ (fixture: router 경유 26B/27B SSE 박제), tests/eid 61 + ask 회귀 9 = 70 passed
- 적대 리뷰 3렌즈 18 finding 반영 13/13. 배포는 D26 게이트(fix/hwp 머지+Soft Lock) 대기

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:16:44 +09:00
hyungi d3aa640f65 feat(documents): hier analyze 서브커맨드 — 재분해와 독립한 절분석 self-heal (g3-t3 갭)
re-decompose 의 char_start 완료마커는 'jump-target char_start 보유'라 컨테이너 recreate/deadline 으로
analyze 가 잘린 doc(char_start 있으나 일부 leaf 미분석)을 재선별 못 함 → rail summary 영구 미수렴 갭.
`analyze` 가 LEAF_SQL(미분석 leaf 보유) 기준 독립 선별로 수렴(멱등, --doc 제한 가능, jump 무관).
sweep 로그도 `analyze` 커맨드 안내로 갱신. (2026-06-10 백필서 recreate 로 잘린 5 doc·53 leaf 수동 처리한 케이스 항구화.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 11:16:44 +09:00
hyungi e10ccc9169 fix(documents): g-measure junk 검출 all-caps 과탐 제거 + verdict=coarse 스크린 명시
전부-대문자 휴리스틱이 기술문서 정상 heading(GENERAL REQUIREMENTS/WELDING) 130건 과탐 →
windowed/clean doc 거짓 A_better 강등. 회사-접미사(INC./LLC…)만, cover 영역(앞 4노드)+미stored 게이트.
verdict 는 coarse 스크린(감사용)이고 실집행 결정 = 결정적 partition + 적대 워크플로임을 docstring 박제.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 11:16:44 +09:00
hyungi 321d997123 fix(news): 연결 재시도 2회로 보강 — 드랍이 연결 단위 랜덤(재시도 1회도 연속 피격 실측) + 빈 에러 로그 repr
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:54:13 +09:00
hyungi b75307b89b fix(news): 연결 계층(TCP/TLS) 오류 1회 재시도 — MOEL 보안장비 첫 핸드셰이크 간헐 드랍 (재실측 진단)
GPU 회선에서 moel.go.kr 첫 TLS 연결이 간헐 드랍(curl rc=35, 직후 재시도 5/5 성공,
맥북 무발생·단일 A 레코드) → 사이클당 1회 fetch 인 피드가 ConnectError('') 누적,
입법행정예고 circuit open. ConnectError/ConnectTimeout 만 1.5s 후 1회 재시도,
HTTP 상태 오류 비대상. 회귀 테스트 3건 (42 passed).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:43:05 +09:00
hyungi f3530e382d fix(services): playwright-fetcher CF JS 챌린지 통과 대기 — aiche.org 인터스티셜 스냅샷 함정 (검증 게이트 발견)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:23:58 +09:00
hyungi 8583465c58 feat(news): crawl-24x7 사이클 3 — B-4 시그널·C-4 공학 지속·CSB sitemap·CCPS Beacon (마이그 327)
- B-4 fetch_method='signal-only': 페이지 fetch 0 + summarize 스킵(검색 색인만,
  맥미니 부하 0) + 본문 무절단(_entry_body — arXiv 초록 1.6K 보존). 다이제스트는
  ai_summary NULL 제외 규칙으로 자연 배제. 레지스트리 오설정(page) 방어 가드.
- 시드 9 소스 (전 URL 2026-06-11 live 검증): Bloomberg Markets/Technology(skip-video,
  비디오 혼재 실측)·Economist Latest·Nikkei Asia(RDF — feedparser 네이티브, 분기 불요
  fixture 박제)·ASME JPVT(site_1000037 실측 매핑)·arXiv 2종·IEEE Spectrum 2종(feed-full,
  피드 description 이 전문 7.9~14K자 실측).
- csb_collector: sitemap lastmod diff (weekly 월 06:50) — 워터마크(selector_override)
  + cap 40/회 점진 백필 + diff sanity 300 + 보고서 PDF(/assets/, recommendation 제외)
  → extract 파이프라인. 초기 일괄 = CLI --bulk.
- api_standards_collector: 공지 목록 링크 파싱(실측 — 페이지 diff 아님, 상세 URL
  10건/페이지) → 신규 상세만 ingest (monthly 5일 07:05). 초기 백필 = CLI --bulk.
- ccps_collector: aiche.org 평문 403(UA 무관 실측) → playwright-fetcher 익명 컨텍스트
  + referer 쿠키 승계 /download(base64) 신설로 월간 Beacon PDF (monthly 5일 07:20).
  헤드리스 차단 시 CrawlBlocked → health 가시화 (르몽드 PARK 선례).
- B-5 잔여: rdf/feed-reader-UA = 코드 분기 불요 실측 박제 (Economist 는 Archiver UA
  200). table-strip/gn-redirect 는 해당 소스 미진입 — 백로그 유지.
- 테스트 24건 신규 (fixture 9건 live 박제, economist/ieee 는 item trim) — 39 passed.
- 마이그 327 단일 statement (PKM 트랙과 번호 경합 주의 — 327 본 트랙 선점).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:13:17 +09:00
39 changed files with 5219 additions and 48 deletions
+17 -1
View File
@@ -9,7 +9,23 @@
}
http://document.hyungi.net {
encode gzip
# 명시 Content-Type match — 기본 match 의 text/* 는 text/event-stream 까지 포함해
# SSE(/api/eid/chat)의 첫 ~512B 를 gzip 버퍼링함. SSE 제외, 기존 압축 대상은 보존.
# (응답 매처는 header <필드> <값> 한 쌍씩 — 여러 줄 = OR. 한 줄 다중 값은 파싱 에러)
encode {
gzip
match {
header Content-Type text/html*
header Content-Type text/css*
header Content-Type text/plain*
header Content-Type text/xml*
header Content-Type text/javascript*
header Content-Type application/json*
header Content-Type application/javascript*
header Content-Type application/xml*
header Content-Type image/svg+xml*
}
}
# API + 문서 → FastAPI
handle /api/* {
+321
View File
@@ -0,0 +1,321 @@
"""이드 채팅 표면 — POST /api/eid/chat (eid-chat 트랙).
확정 결정:
- D-1 경로 = /api/eid/chat (main.py prefix=/api/eid + 본 라우터 POST /chat)
- D-2 mode 닫힌 어휘: daily(mac-mini-default) / deep(qwen-macbook). 클라는 mode 만 보냄 —
claude-cloud / auto 금지 (Literal 로 422 차단). 심층(deep) 모드 무게이트.
- D-3 독립 /chat 라우트 (frontend) — 본 모듈은 백엔드 API 만.
- D-5 LLM 호출 = EidAIClient.call_stream 한 곳 (이드 egress 봉쇄 불변식 #5,
RouterBackend 직접 호출 금지).
- D-6 rules.md 부재 = 503 substrate_degraded fail-closed — 다른 표면의 degraded 배너
컨벤션(compose._rules)과 달리 채팅은 진행 자체를 거부.
응답 = router SSE 라인 단위 중계 (text/event-stream — call_stream 이 model 필드를 mode
어휘로 치환·usage 제거, 프레이밍 보존. 본 모듈은 무변형 relay). 스트림 시작 전
backend 실패는 /api/search/ask 와 동일 shape 의 503 + error_reason 매핑(자동 fallback 0).
로그는 메타 1줄(mode·턴수·status)만 — 대화 본문 로깅 0.
"""
from __future__ import annotations
import asyncio
import json
from collections.abc import AsyncIterator
from typing import Annotated, Literal
import httpx
from fastapi import APIRouter, Depends
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel, Field, field_validator, model_validator
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from core.utils import setup_logger
from eid import compose as eid_compose
from eid.ai import EidAIClient
from models.user import User
from services.llm.backends import BackendUnavailable, _router_url, get_backend
from services.search import llm_gate
from services.search.react_loop import agentic_ask_loop
logger = setup_logger("eid_chat")
router = APIRouter()
# ── ds-eid-ask-absorb P1: deep 모드 = ReAct 자동검색 (qwen-macbook 27B) ──
# 비생성 reachability probe — router 도달만 확인(coarse). 27B(맥북) 자체 미가용은
# 첫 generate_with_tools 호출의 BackendUnavailable → mid-stream error envelope 로 커버
# (plan: probe 정밀도 불필요, TOCTOU 는 in-stream error 가 처리). ~2s 타임아웃·생성 슬롯 비점유.
_DEEP_PROBE_TIMEOUT = httpx.Timeout(connect=2.0, read=2.0, write=2.0, pool=2.0)
# heartbeat: ReAct 다회 tool call 시 수십초 무출력 → 프록시 idle timeout 차단.
# `{"phase":"ping"}` no-op 이벤트 (프론트 envelope 파서가 자연 스킵 — `: ping` comment 는
# POST SSE fetch 파서가 처리 보장 안 됨).
_HEARTBEAT_INTERVAL_S = 10.0
async def _probe_router_reachable() -> bool:
"""router(:8890) /v1/models GET — 도달 확인(비생성). 실패/비200 = 미가용."""
url = f"{_router_url().rstrip('/')}/v1/models"
try:
async with httpx.AsyncClient(timeout=_DEEP_PROBE_TIMEOUT) as client:
resp = await client.get(url)
return resp.status_code == 200
except Exception:
return False
def _sse(obj: dict) -> bytes:
"""SSE 이벤트 1건 — data: <json>\\n\\n. final_answer 는 OpenAI 호환 choices.delta.content
로, sources/phase 는 별 envelope 키로(프론트가 분기). model/usage 머신 메타 미포함."""
return b"data: " + json.dumps(obj, ensure_ascii=False).encode("utf-8") + b"\n\n"
class ChatMessage(BaseModel):
"""채팅 턴 1건. role=system 은 Literal 밖 → 422 (system 합본은 서버 compose 만 주입)."""
role: Literal["user", "assistant"]
content: str = Field(min_length=1, max_length=8000)
# 대화 총량 cap (전 메시지 content 합) — per-message 8000·40턴 제한과 별도의 총량 상한
_TOTAL_CONTENT_CAP = 32000
class ChatRequest(BaseModel):
"""POST /api/eid/chat body. mode 는 닫힌 어휘(D-2), messages 는 1~40턴 + 총량 32000자."""
mode: Literal["daily", "deep"]
messages: list[ChatMessage] = Field(min_length=1, max_length=40)
@field_validator("messages")
@classmethod
def _last_turn_is_user(cls, v: list[ChatMessage]) -> list[ChatMessage]:
if v and v[-1].role != "user":
raise ValueError("마지막 메시지는 role=user 여야 합니다")
return v
@model_validator(mode="after")
def _total_content_cap(self) -> "ChatRequest":
if sum(len(m.content) for m in self.messages) > _TOTAL_CONTENT_CAP:
raise ValueError(
"대화 총량 초과 — 새 대화로 시작하거나 입력을 줄여주세요 "
f"(전체 메시지 합 {_TOTAL_CONTENT_CAP}자 제한)"
)
return self
@router.get("/status")
async def eid_status(
user: Annotated[User, Depends(get_current_user)],
):
"""이드 backend 점유 상태 스냅샷 — GET /api/eid/status (UI 의 "대기 vs 고장" 구분용).
daily(맥미니 MLX) 의 DS 프로세스 내부 llm_gate 점유만 본다 — 외부 소비자
(맥미니 자체 derived-worker·Hermes 등)의 endpoint 점유는 미포착.
따라서 busy=true 는 확실(지금 줄이 있다), false 는 근사(외부 점유 가능성 잔존).
가벼움 보장: DB 0 / LLM 0 / 본문 로깅 0 — 폴링 대상으로 안전.
자동 fallback 판단 근거로 쓰지 않는다 (모드 전환 = 명시 버튼만, 정책).
"""
snap = llm_gate.gate_status()
inflight = bool(snap["inflight"])
waiters = int(snap["waiters"])
return {
"daily": {
"busy": inflight or waiters > 0,
"inflight": inflight,
"waiters": waiters,
}
}
def _backend_unavailable_response(body: ChatRequest, reason: str, backend_name: str) -> JSONResponse:
"""스트림 시작 전 27B 미가용 → ask 컨벤션과 동일 shape 503 (자동 fallback 0)."""
logger.warning(
"eid_chat backend_unavailable mode=%s turns=%d status=503 reason=%s",
body.mode, len(body.messages), reason,
)
return JSONResponse(
status_code=503,
content={
"error": "backend_unavailable",
"error_reason": reason,
"backend_requested": backend_name,
"detail": (
"심층 엔진(검색)이 일시적으로 응답할 수 없습니다. "
"잠시 후 다시 시도하거나 일상 모드로 물어보세요."
),
},
)
async def _eid_chat_deep(body: ChatRequest, session: AsyncSession) -> StreamingResponse | JSONResponse:
"""deep 모드 = ReAct 자동검색. ReAct(`tool_choice=auto`)가 검색 여부를 LLM 자율 판단 —
검색 불요 질문은 early-exit 으로 대화 답변. substrate(persona+rules+react_ask task)는
agentic_ask_loop 내부 compose("react_ask") 가 주입(evidence-first 자동 상속).
멀티턴 = 1단계는 마지막 user 메시지 단독 처리(agentic_ask_loop 가 query: str — history
미지원). 후속 질문 대명사 해소는 2단계 백로그.
"""
# ① 첫 SSE 바이트(=HTTP 200 확정) 전 비생성 probe — router 도달 실패 시 503 (재매핑 가능 구간)
if not await _probe_router_reachable():
return _backend_unavailable_response(body, "macbook_unavailable", "qwen-macbook")
query = body.messages[-1].content # 메시지 단독 처리 (마지막 user 턴)
backend = get_backend("qwen-macbook")
async def _stream() -> AsyncIterator[bytes]:
# ② phase:searching 방출 = HTTP 200 확정. 이후 미가용은 503 불가 → in-stream error.
yield _sse({"phase": "searching"})
task = asyncio.create_task(agentic_ask_loop(session, query, backend=backend))
try:
# heartbeat: task 미완 동안 ~10s 마다 ping (shield 로 wait_for 취소가 task 안 죽임)
while not task.done():
try:
await asyncio.wait_for(asyncio.shield(task), timeout=_HEARTBEAT_INTERVAL_S)
except asyncio.TimeoutError:
yield _sse({"phase": "ping"})
result = task.result() # BackendUnavailable 은 여기서 raise (mid-stream)
# final_answer = OpenAI 호환 1청크(프론트 기존 content 누적 경로 재사용)
yield _sse({"choices": [{"delta": {"content": result.final_answer}}]})
# 근거 = 별 envelope (citation 번호 없음 — 프론트가 순서 기반). partial = 근거 부족 표식
yield _sse({"eid_sources": result.sources, "partial": result.partial})
yield b"data: [DONE]\n\n"
logger.info(
"eid_chat deep ok turns=%d sources=%d partial=%s iters=%d",
len(body.messages), len(result.sources), result.partial, result.iterations,
)
except BackendUnavailable as exc:
# mid-stream 미가용(검색 중 AC 분리·뚜껑 닫힘) — 200 이미 송신, in-stream error envelope.
# error 뒤 [DONE] = 프론트 sawDone 로 '중단' 오경보 방지(명시 error notice 유지).
logger.warning(
"eid_chat deep mid-stream unavailable turns=%d reason=%s",
len(body.messages), exc.reason,
)
yield _sse({"phase": "error", "error_reason": exc.reason})
yield b"data: [DONE]\n\n"
except asyncio.CancelledError:
raise # 클라 disconnect — finally 가 task 정리
except Exception:
logger.exception("eid_chat deep stream failed turns=%d", len(body.messages))
yield _sse({"phase": "error", "error_reason": "deep_failed"})
yield b"data: [DONE]\n\n"
finally:
# 클라 disconnect 시 ReAct task 고아화 방지 — cancel + await(전파 완료 보장).
# 안 하면 27B 가 닫힌 연결 위해 수분 점유, router 동시성상 다음 검색 대기.
if not task.done():
task.cancel()
try:
await task
except (asyncio.CancelledError, Exception):
pass
return StreamingResponse(
_stream(),
media_type="text/event-stream",
headers={"Cache-Control": "no-store", "X-Accel-Buffering": "no"},
)
@router.post("/chat")
async def eid_chat(
body: ChatRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""이드 채팅 — daily = router SSE pass-through(대화) / deep = ReAct 자동검색(근거).
503 경로 (모두 자동 fallback 없음):
- substrate_degraded: rules.md 부재 (D-6 fail-closed, 채팅 진행 거부)
- backend_unavailable: 스트림 시작 전 backend 실패 (daily/deep 공통, ask 컨벤션 shape)
"""
# D-6: rules 부재 = fail-closed. 채팅은 안전·정책 가드 없이 진행하지 않는다(배너 X).
if not eid_compose.rules_present():
logger.error(
"eid_chat substrate_degraded mode=%s turns=%d status=503 — rules.md 부재, 채팅 거부",
body.mode, len(body.messages),
)
return JSONResponse(
status_code=503,
content={
"detail": (
"이드 substrate 가 degraded 상태입니다 (운영 규칙 rules.md 부재). "
"복구 전까지 채팅을 진행하지 않습니다."
),
"error_reason": "substrate_degraded",
},
)
# deep = ReAct 자동검색 (별 흐름 — probe + 동기 ReAct → SSE 변환)
if body.mode == "deep":
return await _eid_chat_deep(body, session)
# daily = 순수 대화 SSE pass-through (기존)
system = eid_compose.compose("eid_chat", task="")
client = EidAIClient()
stream = client.call_stream(
body.mode, [m.model_dump() for m in body.messages], system,
)
# async generator 는 첫 __anext__ 에서야 실제 요청 전송 — 스트림 시작 전 실패(연결/4xx/5xx)
# 를 503 으로 매핑하기 위해 첫 chunk 를 여기서 먼저 당긴다.
try:
first = await anext(stream, None)
except BackendUnavailable as exc:
logger.warning(
"eid_chat backend_unavailable mode=%s turns=%d status=503 reason=%s",
body.mode, len(body.messages), exc.reason,
)
await client.close()
return JSONResponse(
status_code=503,
content={
"error": "backend_unavailable",
"error_reason": exc.reason,
"backend_requested": exc.backend_name,
"detail": (
"선택한 모드의 backend 가 일시적으로 응답할 수 없습니다. "
"잠시 후 다시 시도하거나 mode 를 바꿔 호출하세요."
),
},
)
except BaseException:
await client.close()
raise
# 메타 로그 1줄 — 본문 로깅 0 (대화 내용은 어디에도 남기지 않는다)
logger.info(
"eid_chat stream mode=%s turns=%d status=200", body.mode, len(body.messages)
)
async def _passthrough():
# call_stream 방출분 무변형 relay (정화는 call_stream 라인 단위 한 곳). 취소·
# disconnect 포함 finally 에서 generator aclose → AsyncExitStack 이 upstream 정리.
try:
try:
if first is not None:
yield first
async for chunk in stream:
yield chunk
except (BackendUnavailable, httpx.HTTPError) as exc:
# 스트림 시작 후 절단 — status 200 은 이미 송신돼 재매핑 불가. 메타 로그
# 1줄만 남기고 조용히 종료(traceback 전파 0) — 프론트는 [DONE] 부재로 처리.
logger.warning(
"eid_chat stream aborted mode=%s turns=%d reason=%s",
body.mode, len(body.messages),
getattr(exc, "reason", type(exc).__name__),
)
return
finally:
# stream.aclose() 가 예외여도 client.close() 는 보장 (중첩 finally)
try:
await stream.aclose()
finally:
await client.close()
return StreamingResponse(
_passthrough(),
media_type="text/event-stream",
headers={"Cache-Control": "no-store", "X-Accel-Buffering": "no"},
)
+69 -7
View File
@@ -12,6 +12,7 @@ SSRF 차단은 core.url_validator.validate_feed_url 재사용 (redirect target
"""
import asyncio
import base64
import random
import time
import urllib.robotparser
@@ -48,6 +49,9 @@ _CHALLENGE_MARKERS = (
"Checking your browser before",
"captcha-delivery.com",
"geo.captcha-delivery",
# CF JS 챌린지 인터스티셜의 스크립트 도메인 (aiche.org 실측 2026-06-11) —
# fetcher 의 챌린지 대기를 끝까지 통과 못 한 최종 HTML 만 여기 걸린다.
"challenges.cloudflare.com",
)
_ROBOTS_CACHE_TTL = 24 * 3600 # 24h
@@ -206,11 +210,13 @@ async def fetch_page(
# ── B-3 구독 세션 fetch (Playwright 격리 컨테이너 경유) ──────────────────────
async def fetch_page_via_browser(url: str, profile: str) -> tuple[str, str]:
"""인증 페이지 1건 — playwright-fetcher 에 위임, politeness 는 사람 속도(30~60s).
async def fetch_page_via_browser(url: str, profile: str | None) -> tuple[str, str]:
"""브라우저 페이지 1건 — playwright-fetcher 에 위임, politeness 는 사람 속도(30~60s).
(html_text, final_url) 반환. robots 미적용 — 구독 계약 기반 개인 보관 fetch 로
공개 크롤러 규약 대상이 아님 (대신 사람 속도 + 동시 1 + 야간 저빈도가 보호 장치).
profile=None = 익명 컨텍스트 (사이클 3 — 평문 httpx 를 UA 무관 403 하는 공개
사이트의 WAF 우회 전용, CCPS aiche.org 실측). 값 = B-3 구독 세션.
(html_text, final_url) 반환. robots 미적용 — 구독 fetch 는 사용자 행위 성격,
익명 WAF 우회는 월간 1~2회 저빈도 + 사람 속도가 보호 장치.
예외 어휘는 fetch_page 와 동일 (호출측 분기 재사용).
"""
try:
@@ -218,14 +224,16 @@ async def fetch_page_via_browser(url: str, profile: str) -> tuple[str, str]:
except ValueError as e:
raise CrawlSkip(f"URL 검증 실패: {e}") from e
payload = {"url": url}
if profile:
payload["profile"] = profile
domain = _domain_of(url)
async with _get_lock(domain):
await _respect_domain_rate(domain, _AUTH_DELAY_MIN, _AUTH_DELAY_MAX)
try:
async with httpx.AsyncClient(timeout=_FETCHER_TIMEOUT) as client:
resp = await client.post(
f"{_FETCHER_URL}/fetch", json={"url": url, "profile": profile}
)
resp = await client.post(f"{_FETCHER_URL}/fetch", json=payload)
except httpx.TimeoutException as e:
raise CrawlFetchError(f"browser fetch timeout: {url}") from e
except httpx.HTTPError as e:
@@ -250,6 +258,60 @@ async def fetch_page_via_browser(url: str, profile: str) -> tuple[str, str]:
return html_text, data.get("final_url", url)
_MAX_DOWNLOAD_BYTES = 60 * 1024 * 1024 # fetcher MAX_DOWNLOAD_BYTES 와 동률
async def download_via_browser(
url: str, *, referer: str | None = None, profile: str | None = None
) -> tuple[bytes, str]:
"""바이너리(PDF) 1건 — fetcher /download 위임. (content, content_type) 반환.
referer = WAF 챌린지 쿠키를 먼저 획득할 목록 페이지 (CCPS Beacon 패턴).
내부 status 판정: 403/429 = CrawlBlocked, 그 외 4xx = CrawlSkip, 5xx = CrawlFetchError
(fetch_page 와 동일 어휘 — 호출측 분기 재사용).
"""
try:
validate_feed_url(url)
except ValueError as e:
raise CrawlSkip(f"URL 검증 실패: {e}") from e
payload: dict = {"url": url}
if referer:
payload["referer"] = referer
if profile:
payload["profile"] = profile
domain = _domain_of(url)
async with _get_lock(domain):
await _respect_domain_rate(domain, _AUTH_DELAY_MIN, _AUTH_DELAY_MAX)
try:
async with httpx.AsyncClient(timeout=_FETCHER_TIMEOUT) as client:
resp = await client.post(f"{_FETCHER_URL}/download", json=payload)
except httpx.TimeoutException as e:
raise CrawlFetchError(f"browser download timeout: {url}") from e
except httpx.HTTPError as e:
raise CrawlFetchError(f"playwright-fetcher 연결 오류: {e}") from e
finally:
_domain_last_request[domain] = time.monotonic()
if resp.status_code == 503:
raise CrawlBlocked(f"세션 프로필 부재: {profile}")
if resp.status_code != 200:
raise CrawlFetchError(f"playwright-fetcher {resp.status_code}: {url}")
data = resp.json()
inner = int(data.get("status", 0))
if inner in (403, 429):
raise CrawlBlocked(f"{inner} (browser download): {url}")
if 400 <= inner < 500:
raise CrawlSkip(f"{inner} (browser download): {url}")
if inner != 200:
raise CrawlFetchError(f"{inner} (browser download): {url}")
content = base64.b64decode(data.get("body_b64", ""))
if len(content) > _MAX_DOWNLOAD_BYTES:
raise CrawlSkip(f"크기 초과 (browser download): {url}")
return content, data.get("content_type", "")
async def probe_session(
profile: str, probe_url: str, min_body_chars: int, paywall_markers: list[str]
) -> dict:
+193
View File
@@ -11,11 +11,116 @@ endpoint 를 못 부른다(silent fallback 0, rules no-silent-fallback).
- _request() → endpoint 에 anthropic.com 있으면 raise(primary 오결선 방어, 이중보증)
call_primary / call_triage / embed / rerank 는 그대로(내부 inference·임베딩 허용).
egress 워커·시스템 경로는 기존 AIClient 유지 — fallback 은 시스템만, 이드만 박탈(분리).
eid-chat (D-5): 이드 채팅 SSE 스트리밍도 이 클래스의 call_stream() 한 곳 — RouterBackend
직접 호출 금지, mode 어휘는 _CHAT_ALIAS 닫힌 매핑(daily/deep)만, 미지 mode = EidEgressBlocked.
"""
from __future__ import annotations
import asyncio
import json
import re
from collections.abc import AsyncIterator
from contextlib import AsyncExitStack
import httpx
from ai.client import AIClient
from services.llm.backends import (
MAC_MINI_DEFAULT,
QWEN_MACBOOK,
BackendUnavailable,
_router_url, # router URL 단일 출처 재사용 (settings → env LLM_ROUTER_URL → MVP default)
)
from services.search.llm_gate import Priority, acquire_mlx_gate
# 이드 채팅 mode → router alias 닫힌 매핑 (D-2). 클라는 mode 만 보냄 — claude-cloud/auto 금지.
_CHAT_ALIAS: dict[str, str] = {
"daily": MAC_MINI_DEFAULT, # router tier_b → Mac mini :8801 gemma-4-26b
"deep": QWEN_MACBOOK, # router named upstream → M5 Max Qwen3.6-27B (무게이트, D-2)
}
# read 는 per-chunk 적용이라 MacBook wake(24s)+토큰 생성 간격 커버. connect 는 내부 router 라 짧게.
_STREAM_TIMEOUT = httpx.Timeout(connect=5.0, read=120.0, write=30.0, pool=5.0)
# 스트림 중계 전체(업스트림 진입~종료) wall-clock 상한. per-chunk read timeout 만으로는
# 토큰이 계속 흐르는 한 무한 점유 가능 → daily 는 mlx gate 를 물고 있어 deadline 필수.
# deep 도 동일 적용(단순·일관). 정상 스트림(max_tokens 2048, ~90tps ≈ 23s)은 여유 통과.
_STREAM_DEADLINE_S = 300.0
# error_reason allowlist — 이 밖(대문자/공백/JSON 직렬화 파편)은 일반화해 비노출
_REASON_ALLOWED = re.compile(r"[a-z0-9_]{1,64}")
# 스트림 시작 전 transport 계열 실패 → BackendUnavailable 매핑 대상 (RouterBackend._post 와 동일 목록)
_TRANSPORT_ERRORS = (
httpx.ConnectError,
httpx.ConnectTimeout,
httpx.ReadTimeout,
httpx.PoolTimeout,
httpx.WriteTimeout,
httpx.RemoteProtocolError,
)
def _stream_error_reason(status_code: int, body: bytes) -> str:
"""스트림 시작 전 4xx/5xx 응답 본문 → error_reason 추출.
어휘는 /api/search/ask(RouterBackend._post)와 일치 — router 가 주는 error.type /
error.error_reason (macbook_unavailable / warming / editor_busy / upstream_cold /
provider_not_configured 등) 우선, 없으면 status 기반 router_503 / upstream_502 /
router_http_<status>.
최종 reason 은 [a-z0-9_]{1,64} allowlist 검사 — 불일치(대문자/공백/dict 직렬화
파편)는 upstream_502(502 계열) / router_error(그 외) 로 일반화해 외부 비노출.
"""
try:
data = json.loads(body.decode("utf-8", errors="replace"))
except Exception:
data = {}
err = data.get("error", {}) if isinstance(data, dict) else {}
reason: str | None = None
if isinstance(err, dict):
raw = err.get("type") or err.get("error_reason")
if raw:
reason = str(raw)
if reason is None and isinstance(data, dict) and data.get("error_reason"):
reason = str(data["error_reason"])
if reason is None:
if status_code == 502:
reason = "upstream_502"
elif status_code == 503:
reason = "router_503"
else:
reason = f"router_http_{status_code}"
if _REASON_ALLOWED.fullmatch(reason):
return reason
return "upstream_502" if status_code == 502 else "router_error"
def _rewrite_sse_line(line: bytes, mode: str) -> bytes:
"""SSE 라인 1건 정화 — data: JSON 의 model 을 mode 어휘로 치환 + usage 제거.
fixture 실측: 27B chunk 의 model 필드가 맥북 파일시스템 절대경로
("/Users/.../mlx-models/Qwen3.6-27B-8bit")를 노출 — 표면 문법 '모델·머신명
비노출'과 충돌해 라인 단위로 재작성한다. usage(tps/peak_memory 등 머신
텔레메트리)도 함께 제거. [DONE]·비-data 라인(빈 줄 포함)·파싱 실패 라인은
원문 그대로(방어적) — SSE 프레이밍(data: 라인 + 빈 줄) 보존.
"""
if not line.startswith(b"data: "):
return line
payload = line[len(b"data: "):]
if payload.strip() == b"[DONE]":
return line
try:
obj = json.loads(payload)
except Exception:
return line
if not isinstance(obj, dict):
return line
obj["model"] = mode
obj.pop("usage", None)
return b"data: " + json.dumps(obj, ensure_ascii=False).encode("utf-8")
class EidEgressBlocked(RuntimeError):
@@ -39,3 +144,91 @@ class EidAIClient(AIClient):
if "anthropic.com" in endpoint:
raise EidEgressBlocked(f"이드: 외부 endpoint 차단 ({endpoint}). 내부 inference 만.")
return await super()._request(model_config, prompt, system=system)
async def call_stream(
self, mode: str, messages: list[dict], system: str
) -> AsyncIterator[bytes]:
"""이드 채팅 SSE 스트림 — router /v1/chat/completions stream=true 라인 단위 중계 (D-5).
mode : "daily" | "deep" — _CHAT_ALIAS 닫힌 매핑. 미지 mode = EidEgressBlocked
(이드 LLM 호출 봉쇄는 이 클래스 한 곳, 불변식 #5).
messages : user/assistant 턴 목록 (system role 금지 — system 인자로만 주입).
system : compose("eid_chat", ...) 합본. messages 맨 앞에 system role 로 끼움.
스트림 시작 전 실패(연결 실패·5xx 응답) = BackendUnavailable(reason 어휘는 ask
와 동일). router 400 = 닫힌 매핑에서 alias drift 코드 버그 → ValueError fail-loud
(RouterBackend._post 컨벤션 미러). 스트림 시작 후엔 bytes 를 라인 버퍼링해
_rewrite_sse_line 으로 model 치환(mode 어휘)·usage 제거만 하고 프레이밍은 보존.
취소/disconnect 시 AsyncExitStack 이 response·client 정리(upstream 닫힘 보장).
daily(mac-mini-default)는 Mac mini MLX 단일 inference 영구 룰(llm_gate docstring
"예외 없이 gate 획득 필수")에 따라 acquire_mlx_gate(FOREGROUND) 안에서 스트리밍 —
RouterBackend 의 requires_gate=True 와 동일한 client-side mutex 효과.
deep(qwen-macbook)은 별 endpoint 라 무게이트 (D-2, RouterBackend 동형).
중계 전체(업스트림 진입~종료)는 asyncio.timeout(_STREAM_DEADLINE_S) wall-clock
deadline 안 — llm_gate 계약 "timeout 은 gate 안쪽" 준수(gate 대기엔 미적용).
초과 시 BackendUnavailable(alias, "stream_deadline_exceeded") 로 수렴.
"""
alias = _CHAT_ALIAS.get(mode)
if alias is None:
raise EidEgressBlocked(
f"이드: 미지 chat mode {mode!r} — 닫힌 매핑(daily/deep) 외 호출 차단."
)
router_url = _router_url()
if "anthropic.com" in router_url:
# 기존 _request 패턴 미러 — router URL 오결선 시 외부 egress 방어 (이중보증)
raise EidEgressBlocked(f"이드: 외부 endpoint 차단 ({router_url}). 내부 router 만.")
url = f"{router_url.rstrip('/')}/v1/chat/completions"
payload = {
"model": alias,
"messages": [{"role": "system", "content": system}] + messages,
"stream": True,
"max_tokens": 2048,
"temperature": 0.4,
}
async with AsyncExitStack() as stack:
if alias == MAC_MINI_DEFAULT:
await stack.enter_async_context(acquire_mlx_gate(Priority.FOREGROUND))
client = await stack.enter_async_context(httpx.AsyncClient(timeout=_STREAM_TIMEOUT))
try:
# wall-clock deadline — gate 획득 *후* 진입 (llm_gate "timeout 은 gate 안쪽")
async with asyncio.timeout(_STREAM_DEADLINE_S):
try:
resp = await stack.enter_async_context(
client.stream("POST", url, json=payload)
)
except _TRANSPORT_ERRORS as exc:
# 스트림 시작 전 연결 계열 실패 — reason 어휘 = RouterBackend(router_*) 와 일치
raise BackendUnavailable(alias, f"router_{type(exc).__name__}") from exc
if resp.status_code == 400:
# 닫힌 매핑에서 400 = alias drift 코드 버그 — RouterBackend._post 미러,
# BackendUnavailable(일시 비가용) 아님 → fail-loud
body = await resp.aread()
try:
data = json.loads(body.decode("utf-8", errors="replace"))
except Exception:
data = {}
raise ValueError(f"router rejected alias={alias!r} body={data!r}")
if resp.status_code >= 400:
body = await resp.aread()
raise BackendUnavailable(
alias, _stream_error_reason(resp.status_code, body)
)
buf = b""
try:
async for chunk in resp.aiter_bytes():
buf += chunk
# 라인 버퍼링 — 청크 경계에서 b"\n" 분리, 잔여 버퍼 유지
while (nl := buf.find(b"\n")) != -1:
line, buf = buf[:nl], buf[nl + 1:]
yield _rewrite_sse_line(line, mode) + b"\n"
except _TRANSPORT_ERRORS as exc:
# 시작 후 중단 — 이미 보낸 chunk 는 전송됨. typed 예외로 수렴(caller 가 끊고 정리).
raise BackendUnavailable(alias, f"router_{type(exc).__name__}") from exc
if buf:
# 스트림 끝 잔여분 flush (개행 없는 마지막 라인 — 원문에 없던 \n 추가 안 함)
yield _rewrite_sse_line(buf, mode)
except TimeoutError as exc:
# asyncio.timeout 초과 — 게이트 점유 무한화 차단, typed 예외로 수렴
raise BackendUnavailable(alias, "stream_deadline_exceeded") from exc
+13
View File
@@ -50,6 +50,8 @@ _ROUTE: dict[str, dict] = {
"react_ask": {"overlay": None, "variant": "full"},
"study_subject_note": {"overlay": None, "variant": "full"},
"study_question_explanation": {"overlay": None, "variant": "full"},
# 이드 채팅 표면 (D-1 /api/eid/chat) — 자유-prose(base), persona ON (불변식 #3)
"eid_chat": {"overlay": None, "variant": "full"},
# 미래 active eid 표면 — 기능 overlay (W3+ 에서 호출 배선)
"study_diagnosis": {"overlay": "study", "variant": "full"},
"document_brief": {"overlay": "document", "variant": "full"},
@@ -113,6 +115,17 @@ def is_composed_surface(surface: str) -> bool:
return surface in _ROUTE
def rules_present() -> bool:
"""rules.md 존재 여부 — 채팅 표면(D-6)의 fail-closed 판정 재료.
기존 _rules() 의 degraded 배너 컨벤션(다른 표면, fail-loud 진행)은 그대로 둔다 —
여긴 '진행 거부' 판정만 제공하고 강제는 호출부(/api/eid/chat) 책임.
lru_cache 된 _read 를 쓰지 않고 매 호출 직접 stat — D-6 게이트는 살아있는 판정
이어야 한다(캐시 동결 시 rules.md 부재/복구가 영원히 반영 안 됨).
"""
return (_SUBSTRATE_DIR / "rules.md").is_file()
def compose(surface: str, task: str, *, variant: str | None = None,
budget_chars: int | None = None) -> str:
"""persona → rules → overlay → task 단일 system 문자열 합성.
+12
View File
@@ -17,6 +17,7 @@ from api.digest import router as digest_router
from api.document_notes import router as document_notes_router
from api.document_reads import router as document_reads_router
from api.documents import router as documents_router
from api.eid_chat import router as eid_chat_router
from api.events import router as events_router
from api.library import router as library_router
from api.memos import router as memos_router
@@ -56,6 +57,9 @@ async def lifespan(app: FastAPI):
from workers.news_collector import run as news_collector_run
from workers.fulltext_worker import reconcile_unresolved as fulltext_reconcile_run
from workers.kosha_collector import run as kosha_collector_run
from workers.csb_collector import run as csb_collector_run
from workers.api_standards_collector import run as api_standards_run
from workers.ccps_collector import run as ccps_collector_run
from workers.queue_consumer import consume_queue, consume_markdown_queue
from workers.study_queue_consumer import consume_study_queue
from workers.study_session_queue_consumer import consume_study_session_queue
@@ -131,6 +135,12 @@ async def lifespan(app: FastAPI):
scheduler.add_job(dedup_reconcile_run, CronTrigger(hour=3, minute=30, timezone=KST), id="dedup_reconcile")
# crawl-24x7 C-2: KOSHA 재해사례 diff + GUIDE 점진 백필 (daily, 새벽 잡들과 비충돌 슬롯).
scheduler.add_job(kosha_collector_run, CronTrigger(hour=6, minute=40, timezone=KST), id="kosha_collector")
# 사이클 3 C-2 잔여: CSB sitemap lastmod diff (weekly 월, cap 40 + 워터마크 점진 백필).
scheduler.add_job(csb_collector_run, CronTrigger(day_of_week="mon", hour=6, minute=50, timezone=KST), id="csb_collector")
# 사이클 3 C-4: API 표준 공지 목록 diff (monthly — 월 1~2건 공지 페이스).
scheduler.add_job(api_standards_run, CronTrigger(day=5, hour=7, minute=5, timezone=KST), id="api_standards_collector")
# 사이클 3 C-2 잔여: CCPS Beacon 월간 PDF (playwright 익명 경유 — WAF 차단 시 health 로 가시화).
scheduler.add_job(ccps_collector_run, CronTrigger(day=5, hour=7, minute=20, timezone=KST), id="ccps_collector")
scheduler.start()
# Phase 2.1 (async 구조): QueryAnalyzer prewarm.
@@ -165,6 +175,8 @@ app.include_router(documents_router, prefix="/api/documents", tags=["documents"]
app.include_router(document_reads_router, prefix="/api/documents", tags=["document-reads"])
app.include_router(document_notes_router, prefix="/api/documents", tags=["document-notes"])
app.include_router(search_router, prefix="/api/search", tags=["search"])
# 이드 채팅 표면 (D-1) — POST /api/eid/chat. SSE 스트리밍, EidAIClient.call_stream 봉쇄 경유.
app.include_router(eid_chat_router, prefix="/api/eid", tags=["eid-chat"])
app.include_router(memos_router, prefix="/api/memos", tags=["memos"])
app.include_router(events_router, prefix="/api/events", tags=["events"])
+8
View File
@@ -222,6 +222,14 @@ def get_mlx_gate():
return acquire_mlx_gate(DEFAULT_PRIORITY)
# ── Read-only status (UI 표시용) ─────────────────────────────────────────────
def gate_status() -> dict:
"""현재 gate 점유 스냅샷 (read-only, lock-free 근사치 — UI 표시용)."""
return {"inflight": _inflight, "waiters": len(_waiters)}
# ── Test helpers (conftest reset) ────────────────────────────────────────────
+250
View File
@@ -0,0 +1,250 @@
"""C-4 ① API 표준 공지(Important Standards Announcements) 수집 워커 (사이클 3).
RSS 없음. 실측(2026-06-11) 결과 '페이지 diff' 가 아니라 공지별 상세 URL 이 있는
목록 페이지(10건/페이지, ?page=N&pageSize=10 페이지네이션 ~12+) — 목록 링크 파싱
→ 신규 상세 페이지만 ingest 가 정확하고 dedup 도 자연스럽다 (rss+page 패턴의 HTML 판).
510/570/653 개정 공지가 업무 직결 — 표준 본문은 유료라 공지만 수집 (카드 C-4).
스케줄 = monthly (main.py 5일 07:05 KST) — 최근 2페이지 diff (월 1~2건 공지 페이스).
초기 일괄: docker exec hyungi_document_server-fastapi-1 \
python -m workers.api_standards_collector --bulk # 전 페이지 (~120건, politeness ~30분)
멱등: edit_url(정규화)+file_hash dedup — 재실행 = 신규분만.
"""
import argparse
import asyncio
import hashlib
import re
from datetime import datetime, timezone
from sqlalchemy import select
from core.crawl_politeness import (
CrawlBlocked,
CrawlFetchError,
CrawlSkip,
fetch_page,
)
from core.database import async_session
from core.utils import setup_logger
from models.document import Document
from models.news_source import NewsSource
from models.queue import enqueue_stage
from workers.fulltext_worker import (
_WEB_MIN_BODY_LEN,
_extract_body,
_raw_html_path,
_save_raw_html,
_strip_article_footer,
)
from workers.news_collector import (
_get_or_create_health,
_normalize_url,
_record_failure,
_record_success,
)
from workers.static_corpus_ingest import _page_title
logger = setup_logger("api_standards")
_BASE = "https://www.api.org"
_LISTING_PATH = "/products-and-services/standards/important-standards-announcements"
_LISTING_URL = f"{_BASE}{_LISTING_PATH}"
_SOURCE_NAME = "API 표준 공지"
_SCHEDULED_PAGES = 2 # monthly diff 범위 (20건 — 월 1~2건 페이스에 충분한 겹침)
_BULK_MAX_PAGES = 15 # 실측 12페이지 + 여유. 빈 페이지에서 조기 종료.
_DETAIL_RE = re.compile(
r'href="(' + re.escape(_LISTING_PATH) + r'/[^"?#]+)"'
)
_DATE_RE = re.compile(
r"(January|February|March|April|May|June|July|August|September|October"
r"|November|December)\s+(\d{1,2}),?\s+(\d{4})"
)
_MONTHS = {m: i for i, m in enumerate(
["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"], start=1)}
def _parse_listing(html_text: str) -> list[str]:
"""상세 공지 절대 URL — 순서 보존 dedup (페이지네이션 링크는 ?가 패턴에서 배제)."""
seen: set[str] = set()
out: list[str] = []
for m in _DETAIL_RE.finditer(html_text):
url = f"{_BASE}{m.group(1)}"
if url not in seen:
seen.add(url)
out.append(url)
return out
def _parse_pub_date(text: str) -> datetime | None:
"""본문 첫 'Month DD, YYYY' — 공지 게시일 관행. 실패 = None (색인은 채널 게이트로 무조건)."""
m = _DATE_RE.search(text)
if not m:
return None
try:
return datetime(int(m.group(3)), _MONTHS[m.group(1)], int(m.group(2)),
tzinfo=timezone.utc)
except ValueError:
return None
async def _get_or_create_source(session) -> NewsSource:
result = await session.execute(
select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
)
source = result.scalars().first()
if source is None:
source = NewsSource(
name=_SOURCE_NAME, feed_url=_LISTING_URL, feed_type="rss",
fetch_method="page", fulltext_policy="none",
source_channel="crawl", category="Engineering", language="en", country="US",
enabled=False, # 6h 뉴스 사이클 비대상 — 본 워커가 monthly 폴링
)
session.add(source)
await session.flush()
return source
async def _ingest_detail(session, source: NewsSource, url: str) -> str:
"""공지 1건. 반환: 'ok' / 'dup' / 'skip'."""
normalized_url = _normalize_url(url)
ann_hash = hashlib.sha256(f"api-ann|{normalized_url}".encode()).hexdigest()[:32]
existing = await session.execute(
select(Document).where(
(Document.file_hash == ann_hash)
| (Document.edit_url.in_([normalized_url, url]))
).limit(1)
)
if existing.scalars().first():
return "dup"
try:
html_text, final_url = await fetch_page(url)
except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
logger.warning(f"[api-std] fetch 실패 skip: {url}{type(e).__name__}: {e}")
return "skip"
body, engine, engine_ver = _extract_body(html_text)
if not engine:
logger.warning(f"[api-std] 추출 실패 skip (< {_WEB_MIN_BODY_LEN}자): {url}")
return "skip"
clean_body = _strip_article_footer(body.replace("\x00", ""))
if len(clean_body) < _WEB_MIN_BODY_LEN:
return "skip"
now = datetime.now(timezone.utc)
raw_path = _raw_html_path(source.id, ann_hash, now)
raw_saved = True
try:
_save_raw_html(raw_path, html_text)
except OSError as e:
raw_saved = False
logger.error(f"[api-std] 원본 보존 실패 (ingest 는 진행): {e}")
pub_dt = _parse_pub_date(clean_body)
title = _page_title(html_text, fallback=url.rsplit("/", 1)[-1][:90])
title = re.sub(r"\s*\|\s*API\s*$", "", title).strip() or title
doc = Document(
file_path=f"crawl/{_SOURCE_NAME}/{ann_hash}",
file_hash=ann_hash,
file_format="article",
file_size=0,
file_type="note",
title=title,
extracted_text=f"{title}\n\n{clean_body}",
extracted_at=now,
extractor_version=f"listing+page@{engine}",
md_content=clean_body,
md_status="success",
md_extraction_engine=engine,
md_extraction_engine_version=engine_ver,
md_format_version="1.0",
md_generated_at=now,
md_source_hash=hashlib.sha256(html_text.encode("utf-8", errors="replace")).hexdigest(),
md_content_hash=hashlib.sha256(clean_body.encode("utf-8")).hexdigest(),
content_origin="extracted",
source_channel="crawl",
data_origin="external",
edit_url=normalized_url,
review_status="approved",
ai_domain="Engineering",
ai_sub_group=_SOURCE_NAME,
ai_tags=["Engineering/API 표준 공지"],
extract_meta={
"source_id": source.id,
"source_name": _SOURCE_NAME,
"published_at": pub_dt.isoformat() if pub_dt else None,
"fulltext": {
"status": "api_announcement",
"engine": engine,
"final_url": final_url,
"raw_html_path": str(raw_path) if raw_saved else None,
"body_chars": len(clean_body),
"resolved_at": now.isoformat(),
},
},
)
doc.file_size = len(doc.extracted_text.encode())
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "summarize")
await enqueue_stage(session, doc.id, "embed")
await enqueue_stage(session, doc.id, "chunk")
logger.info(f"[api-std] ingest {len(clean_body)}자 ({engine}): {title[:60]}")
return "ok"
async def run(bulk: bool = False) -> None:
"""monthly 진입점 (스케줄러) — bulk 는 CLI 전용 (전 페이지 일괄)."""
now = datetime.now(timezone.utc)
async with async_session() as session:
source = await _get_or_create_source(session)
await session.commit()
source_id = source.id
max_pages = _BULK_MAX_PAGES if bulk else _SCHEDULED_PAGES
counts = {"ok": 0, "dup": 0, "skip": 0}
try:
for page in range(1, max_pages + 1):
listing_url = (
_LISTING_URL if page == 1
else f"{_LISTING_URL}?page={page}&pageSize=10"
)
html_text, _ = await fetch_page(listing_url)
detail_urls = _parse_listing(html_text)
if not detail_urls:
break # 빈 페이지 = 끝 (bulk 조기 종료)
for url in detail_urls:
async with async_session() as session:
src = await session.get(NewsSource, source_id)
status = await _ingest_detail(session, src, url)
await session.commit()
counts[status] += 1
logger.info(f"[api-std] 목록 p{page}: 누적 {counts}")
except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
logger.error(f"[api-std] 목록 수집 실패: {e}")
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_failure(health, str(e) or repr(e), now)
await session.commit()
return
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_success(health, counts["ok"], False, now)
src = await session.get(NewsSource, source_id)
src.last_fetched_at = now
await session.commit()
logger.info(f"[api-std] 완료: {counts}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="API 표준 공지 수집")
parser.add_argument("--bulk", action="store_true", help="전 페이지 일괄 (초기 백필)")
args = parser.parse_args()
asyncio.run(run(bulk=args.bulk))
+185
View File
@@ -0,0 +1,185 @@
"""C-2 잔여 ② CCPS Process Safety Beacon 수집 워커 (사이클 3).
월간 1페이지 PDF + 한국어 번역판 — RAG 청크로 이상적 크기 (카드 C-2).
aiche.org 는 평문 httpx 를 UA 무관 403 (2026-06-11 실측: Archiver UA·브라우저 UA 모두)
→ playwright-fetcher 익명 컨텍스트 경유 (B-3 인프라 재사용):
목록 페이지 브라우저 fetch → beacon PDF 링크 파싱 → referer 쿠키 승계 다운로드.
알려진 리스크: WAF 가 헤드리스 자체를 차단하면 _CHALLENGE_MARKERS → CrawlBlocked
→ health 실패 기록 후 종료 (르몽드 B-3 PARK 선례 — 그 경우 대안 = 이메일 구독
.eml 트랙 결합, [[feedback_antibot_headless_subscription_wall]]).
스케줄 = monthly (main.py 5일 07:20 KST). 월간 1건 페이스라 diff 는 file_path dedup 으로 충분.
수동: docker exec hyungi_document_server-fastapi-1 python -m workers.ccps_collector
"""
import asyncio
import hashlib
import re
from datetime import datetime, timezone
from pathlib import Path
from urllib.parse import urljoin, urlparse
from sqlalchemy import select
from core.config import settings
from core.crawl_politeness import (
CrawlBlocked,
CrawlFetchError,
CrawlSkip,
download_via_browser,
fetch_page_via_browser,
)
from core.database import async_session
from core.utils import setup_logger
from models.document import Document
from models.news_source import NewsSource
from models.queue import enqueue_stage
from workers.kosha_collector import _safe_filename
from workers.news_collector import (
_get_or_create_health,
_record_failure,
_record_success,
)
logger = setup_logger("ccps_collector")
_BEACON_URL = "https://www.aiche.org/ccps/resources/process-safety-beacon"
_SOURCE_NAME = "CCPS Process Safety Beacon"
_MAX_PDFS_PER_RUN = 10 # 월간 1~2건(영/한) 페이스 — 페이지 구조 오판 시 폭주 방지
def _beacon_pdf_links(html_text: str, base_url: str) -> list[str]:
"""beacon 관련 PDF 링크 — href/앵커텍스트에 'beacon' 포함만 (보수적).
필터에 안 걸린 PDF 가 있으면 호출측이 로그로 가시화 (첫 실측에서 패턴 보정용).
"""
seen: set[str] = set()
out: list[str] = []
for m in re.finditer(
r'<a\s+[^>]*href="([^"]+\.pdf(?:\?[^"]*)?)"[^>]*>(.*?)</a>',
html_text, re.I | re.S,
):
href, text = m.group(1), re.sub(r"<[^>]+>", " ", m.group(2))
if "beacon" not in href.lower() and "beacon" not in text.lower():
continue
absolute = urljoin(base_url, href)
path = urlparse(absolute).path
if path not in seen:
seen.add(path)
out.append(absolute)
return out
def _all_pdf_hrefs(html_text: str) -> list[str]:
return sorted({m.group(1) for m in re.finditer(r'href="([^"]+\.pdf(?:\?[^"]*)?)"', html_text, re.I)})
async def _get_or_create_source(session) -> NewsSource:
result = await session.execute(
select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
)
source = result.scalars().first()
if source is None:
source = NewsSource(
name=_SOURCE_NAME, feed_url=_BEACON_URL, feed_type="rss",
fetch_method="page", fulltext_policy="none",
source_channel="crawl", category="Safety", language="en", country="US",
enabled=False, # 6h 뉴스 사이클 비대상 — 본 워커가 monthly 폴링
)
session.add(source)
await session.flush()
return source
async def _ingest_pdf(session, pdf_url: str) -> bool:
"""Beacon PDF 1건 → NAS 저장 + Document + extract enqueue. 반환 = 신규 여부."""
fname = _safe_filename(Path(urlparse(pdf_url).path).name)
rel_path = f"crawl_raw/ccps_beacon/{fname}"
existing = await session.execute(
select(Document).where(Document.file_path == rel_path).limit(1)
)
if existing.scalars().first():
return False
content, content_type = await download_via_browser(pdf_url, referer=_BEACON_URL)
if "pdf" not in content_type.lower() and not content.startswith(b"%PDF"):
raise CrawlSkip(f"PDF 아님 (content-type={content_type[:60]}): {pdf_url}")
dest = Path(settings.nas_mount_path) / rel_path
dest.parent.mkdir(parents=True, exist_ok=True)
dest.write_bytes(content)
doc = Document(
file_path=rel_path,
file_hash=hashlib.sha256(content).hexdigest(),
file_format="pdf",
file_size=len(content),
file_type="immutable",
title=fname.rsplit(".", 1)[0].replace("_", " ").replace("-", " "),
source_channel="crawl",
data_origin="external",
import_source="ccps_beacon",
edit_url=pdf_url,
ai_tags=["Safety/CCPS Beacon"],
extract_meta={"ccps": {"kind": "beacon_pdf"}},
)
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "extract")
logger.info(f"[ccps] Beacon ingest: {rel_path} ({len(content)} bytes)")
return True
async def run() -> None:
"""monthly 진입점 — 실패는 health 기록 (circuit 가 A-8 패널 가시화)."""
now = datetime.now(timezone.utc)
async with async_session() as session:
source = await _get_or_create_source(session)
await session.commit()
source_id = source.id
try:
html_text, final_url = await fetch_page_via_browser(_BEACON_URL, profile=None)
links = _beacon_pdf_links(html_text, final_url)
if not links:
others = _all_pdf_hrefs(html_text)
# 필터 0건 = 페이지 구조/명명 변경 가능성 — 발견 PDF 를 가시화해 보정 단서 제공
raise CrawlFetchError(
f"beacon PDF 0건 (전체 PDF {len(others)}건: {others[:5]})"
)
new_count = 0
for pdf_url in links[:_MAX_PDFS_PER_RUN]:
async with async_session() as session:
try:
if await _ingest_pdf(session, pdf_url):
new_count += 1
await session.commit()
except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
await session.rollback()
logger.warning(f"[ccps] PDF 실패 skip ({pdf_url}): {e}")
if len(links) > _MAX_PDFS_PER_RUN:
logger.warning(
f"[ccps] PDF {len(links)}건 중 {_MAX_PDFS_PER_RUN}건만 처리 "
f"(월간 1~2건 가정 초과 — 페이지 구조 확인 필요)"
)
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_success(health, new_count, False, now)
src = await session.get(NewsSource, source_id)
src.last_fetched_at = now
await session.commit()
logger.info(f"[ccps] 완료: 신규 {new_count}건 (링크 {len(links)}건)")
except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
# CrawlBlocked = WAF 헤드리스 차단 신호 — 연속되면 circuit open (PARK 판단 근거)
logger.error(f"[ccps] 수집 실패: {type(e).__name__}: {e}")
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_failure(health, str(e) or repr(e), now)
await session.commit()
if __name__ == "__main__":
asyncio.run(run())
+390
View File
@@ -0,0 +1,390 @@
"""C-2 잔여 ① US CSB sitemap diff 수집 워커 (plan crawl-24x7-1, 사이클 3).
RSS 폐지 → sitemap.xml lastmod diff 폴링이 정석 (정부 사이트라 lastmod 양호 —
2026-06-11 실측 1,307 URL, 조사 보고서 페이지는 루트 슬러그). 페이지 본문(4-tier
≥200자 게이트) + 보고서 PDF(/assets/, recommendation 상태요약 제외) →
기존 extract 파이프라인(marker/kordoc) 재사용.
스케줄 = weekly (main.py 월 06:50 KST):
워터마크(selector_override.sitemap_watermark — B-3 probe 설정과 같은 JSONB 슬롯)
이후 lastmod 만, 오래된 것부터 cap(40페이지/회). 워터마크는 처리분까지만 전진
= 잔량 자동 점진 백필 (KOSHA GUIDE cap 패턴). cap 미처리 잔량은 매회 로그
(silent cap 금지). diff 건수 > sanity(300) = sitemap 부패/lastmod 남발 의심 가시 경고.
초기 일괄 (cap 해제, politeness 로 수 시간 — docker exec -d, 진행 중 같은 서비스
재배포 금지 [[feedback_docker_exec_orphan_kill]] 자매 함정):
docker exec hyungi_document_server-fastapi-1 \
python -m workers.csb_collector --limit 3 # 검증용
docker exec -d hyungi_document_server-fastapi-1 \
python -m workers.csb_collector --bulk # 전체
멱등: 페이지 = edit_url(정규화)+file_hash dedup (first-wins — lastmod 갱신 페이지의
본문 재적재는 안 함, 갱신의 실체인 신규 PDF 는 개별 dedup 으로 적재됨).
PDF = file_path dedup. 워터마크 경계는 >= 재조회 — 경계 페이지 1회 재fetch 후
dedup 이 잡는다 (lastmod 실측 distinct 라 누적 재fetch 없음).
"""
import argparse
import asyncio
import hashlib
import random
import re
from datetime import datetime, timezone
from pathlib import Path
from urllib.parse import urljoin, urlparse
import httpx
from sqlalchemy import select
from core.config import settings
from core.crawl_politeness import (
CRAWL_UA,
CrawlBlocked,
CrawlFetchError,
CrawlSkip,
fetch_page,
)
from core.database import async_session
from core.utils import setup_logger
from models.document import Document
from models.news_source import NewsSource
from models.queue import enqueue_stage
from workers.fulltext_worker import (
_WEB_MIN_BODY_LEN,
_extract_body,
_raw_html_path,
_save_raw_html,
_strip_article_footer,
)
from workers.kosha_collector import _safe_filename
from workers.news_collector import (
FeedError,
_get_or_create_health,
_normalize_url,
_record_failure,
_record_success,
)
from workers.static_corpus_ingest import _page_title
logger = setup_logger("csb_collector")
_SITEMAP_URL = "https://www.csb.gov/sitemap.xml"
_SOURCE_NAME = "US CSB 사고조사보고서"
_RUN_PAGE_CAP = 40 # weekly 1회 처리 상한 — 잔량은 워터마크 미전진으로 자동 이월
_DIFF_SANITY = 300 # 주간 diff 가 이를 넘으면 sitemap lastmod 남발/부패 의심 (카드 C-2)
_MAX_PDF_BYTES = 50 * 1024 * 1024
_PDF_DELAY = (2.0, 5.0) # 같은 도메인 연속 PDF 다운로드 간격 (kosha _DOWNLOAD_DELAY 동률)
# 텍스트 코퍼스 무가치/관리성 섹션 — 첫 path segment 기준 (조사 보고서·뉴스 릴리스는
# 루트 슬러그라 영향 없음. /news/·/investigations/ 는 목록 페이지뿐이라 제외).
_SKIP_FIRST_SEGMENT = {
"videos", "photos", "events", "members", "disclaimers", "media-room",
"about-the-csb", "about-us", "foia", "news", "investigations",
"site-map", "subscribe", "unsubscribe", "optout", "test",
"privacy-policy", "vulnerability-disclosure-policy", "en-espanol",
"newsletter", "recom-stats", "500.aspx", "documents", "records-details",
}
def _parse_sitemap(xml_text: str) -> list[tuple[str, datetime]]:
"""(url, lastmod) 목록 — lastmod 없는/파싱불가 항목은 제외 (diff 축이 없음)."""
out: list[tuple[str, datetime]] = []
for m in re.finditer(
r"<url>\s*<loc>([^<]+)</loc>\s*<lastmod>([^<]+)</lastmod>", xml_text
):
try:
lastmod = datetime.fromisoformat(m.group(2).strip())
except ValueError:
continue
if lastmod.tzinfo is None:
lastmod = lastmod.replace(tzinfo=timezone.utc)
out.append((m.group(1).strip(), lastmod))
return out
def _should_skip(url: str) -> bool:
path = urlparse(url).path.strip("/")
if not path:
return True # 홈
return path.split("/", 1)[0].lower() in _SKIP_FIRST_SEGMENT
def _pdf_links(html_text: str, base_url: str) -> list[str]:
"""페이지 내 보고서 PDF — /assets/recommendation/(상태변경 요약 다수)은 제외.
cache-buster 쿼리(?17346)는 다운로드 URL 에는 유지, dedup/파일명은 path 기준.
"""
seen: set[str] = set()
out: list[str] = []
for m in re.finditer(r'href="([^"]+\.pdf(?:\?[^"]*)?)"', html_text, re.I):
absolute = urljoin(base_url, m.group(1))
path = urlparse(absolute).path
if "/assets/recommendation/" in path.lower():
continue
if (urlparse(absolute).hostname or "").lower() != "www.csb.gov":
continue
if path not in seen:
seen.add(path)
out.append(absolute)
return out
async def _download_pdf(url: str, dest: Path) -> int:
"""PDF 다운로드 — 크기 cap + 연속 간격 (politeness 는 순차 실행 전제)."""
await asyncio.sleep(random.uniform(*_PDF_DELAY))
async with httpx.AsyncClient(timeout=60, follow_redirects=True) as client:
resp = await client.get(url, headers={"User-Agent": CRAWL_UA})
if resp.status_code != 200:
raise FeedError(f"PDF 다운로드 {resp.status_code}: {url}")
if len(resp.content) > _MAX_PDF_BYTES:
raise FeedError(f"PDF 크기 초과 ({len(resp.content)} bytes): {url}")
dest.parent.mkdir(parents=True, exist_ok=True)
dest.write_bytes(resp.content)
return len(resp.content)
async def _get_or_create_source(session) -> NewsSource:
result = await session.execute(
select(NewsSource).where(NewsSource.name == _SOURCE_NAME)
)
source = result.scalars().first()
if source is None:
source = NewsSource(
name=_SOURCE_NAME, feed_url=_SITEMAP_URL, feed_type="rss",
fetch_method="sitemap+page", fulltext_policy="none",
source_channel="crawl", category="Safety", language="en", country="US",
enabled=False, # 6h 뉴스 사이클 비대상 — 본 워커가 weekly 폴링
)
session.add(source)
await session.flush()
return source
def _watermark(source: NewsSource) -> datetime | None:
raw = (source.selector_override or {}).get("sitemap_watermark")
if not raw:
return None
try:
return datetime.fromisoformat(raw)
except ValueError:
return None
def _set_watermark(source: NewsSource, value: datetime) -> None:
# JSONB 변경 감지를 위해 dict 재할당 (fulltext_worker._set_fulltext_meta 동일 규약)
cfg = dict(source.selector_override or {})
cfg["sitemap_watermark"] = value.isoformat()
source.selector_override = cfg
async def _ingest_pdf(session, page_slug: str, pdf_url: str) -> bool:
"""PDF 1건 → NAS 저장 + Document + extract enqueue. 반환 = 신규 여부."""
fname = _safe_filename(Path(urlparse(pdf_url).path).name)
rel_path = f"crawl_raw/csb/{page_slug}/{fname}"
existing = await session.execute(
select(Document).where(Document.file_path == rel_path).limit(1)
)
if existing.scalars().first():
return False
dest = Path(settings.nas_mount_path) / rel_path
size = await _download_pdf(pdf_url, dest)
doc = Document(
file_path=rel_path,
file_hash=hashlib.sha256(dest.read_bytes()).hexdigest(),
file_format="pdf",
file_size=size,
file_type="immutable",
title=fname.rsplit(".", 1)[0].replace("_", " "),
source_channel="crawl",
data_origin="external",
import_source="csb_sitemap",
edit_url=pdf_url,
ai_tags=["Safety/CSB/보고서"],
extract_meta={"csb": {"page_slug": page_slug, "kind": "report_pdf"}},
)
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "extract")
logger.info(f"[csb] PDF ingest: {rel_path} ({size} bytes)")
return True
async def _ingest_url(session, source: NewsSource, url: str, lastmod: datetime) -> dict:
"""변경 URL 1건: 페이지 fetch → PDF 전수 스캔(개별 dedup) + 본문 신규면 적재.
페이지 재방문(lastmod 갱신)에서도 PDF 스캔은 항상 수행 — 갱신의 실체
(최종 보고서 추가 등)가 PDF 로 오는 경우가 핵심 가치다.
"""
counts = {"page": 0, "pdf": 0, "skip": 0}
try:
html_text, final_url = await fetch_page(url)
except (CrawlBlocked, CrawlSkip, CrawlFetchError) as e:
logger.warning(f"[csb] fetch 실패 skip: {url}{type(e).__name__}: {e}")
counts["skip"] = 1
return counts
page_slug = _safe_filename(urlparse(url).path.strip("/").split("/")[-1] or "root")
for pdf_url in _pdf_links(html_text, final_url):
try:
if await _ingest_pdf(session, page_slug, pdf_url):
counts["pdf"] += 1
except FeedError as e:
logger.warning(f"[csb] PDF 실패 skip ({pdf_url}): {e}")
# 페이지 본문 — first-wins (이미 있으면 본문 재적재 없음)
normalized_url = _normalize_url(url)
page_hash = hashlib.sha256(f"csb-page|{normalized_url}".encode()).hexdigest()[:32]
existing = await session.execute(
select(Document).where(
(Document.file_hash == page_hash)
| (Document.edit_url.in_([normalized_url, url]))
).limit(1)
)
if existing.scalars().first():
return counts
body, engine, engine_ver = _extract_body(html_text)
if not engine:
logger.info(f"[csb] 본문 부족 — 페이지 비적재 (PDF 만): {url}")
return counts
clean_body = _strip_article_footer(body.replace("\x00", ""))
if len(clean_body) < _WEB_MIN_BODY_LEN:
return counts
now = datetime.now(timezone.utc)
raw_path = _raw_html_path(source.id, page_hash, now)
raw_saved = True
try:
_save_raw_html(raw_path, html_text)
except OSError as e:
raw_saved = False
logger.error(f"[csb] 원본 보존 실패 (ingest 는 진행): {e}")
title = _page_title(html_text, fallback=page_slug.replace("-", " ")[:90])
doc = Document(
file_path=f"crawl/{_SOURCE_NAME}/{page_hash}",
file_hash=page_hash,
file_format="article",
file_size=0,
file_type="note",
title=title,
extracted_text=f"{title}\n\n{clean_body}",
extracted_at=now,
extractor_version=f"sitemap+page@{engine}",
md_content=clean_body,
md_status="success",
md_extraction_engine=engine,
md_extraction_engine_version=engine_ver,
md_format_version="1.0",
md_generated_at=now,
md_source_hash=hashlib.sha256(html_text.encode("utf-8", errors="replace")).hexdigest(),
md_content_hash=hashlib.sha256(clean_body.encode("utf-8")).hexdigest(),
content_origin="extracted",
source_channel="crawl",
data_origin="external",
edit_url=normalized_url,
review_status="approved",
ai_domain="Safety",
ai_sub_group=_SOURCE_NAME,
ai_tags=["Safety/CSB"],
extract_meta={
"source_id": source.id,
"source_name": _SOURCE_NAME,
"published_at": lastmod.isoformat(),
"fulltext": {
"status": "csb_sitemap",
"engine": engine,
"final_url": final_url,
"raw_html_path": str(raw_path) if raw_saved else None,
"body_chars": len(clean_body),
"resolved_at": now.isoformat(),
},
},
)
doc.file_size = len(doc.extracted_text.encode())
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "summarize")
await enqueue_stage(session, doc.id, "embed")
await enqueue_stage(session, doc.id, "chunk")
counts["page"] = 1
logger.info(f"[csb] page ingest {len(clean_body)}자 ({engine}): {title[:60]}")
return counts
async def run(bulk: bool = False, limit: int = 0) -> None:
"""weekly 진입점 (스케줄러) — bulk/limit 은 CLI 전용."""
now = datetime.now(timezone.utc)
async with async_session() as session:
source = await _get_or_create_source(session)
await session.commit()
source_id = source.id
watermark = _watermark(source)
try:
xml_text, _ = await fetch_page(
_SITEMAP_URL, content_types=("text/xml", "application/xml", "text/html")
)
entries = _parse_sitemap(xml_text)
if not entries:
raise FeedError("sitemap 파싱 0건 — 포맷 변경/부패 의심")
except (CrawlBlocked, CrawlSkip, CrawlFetchError, FeedError) as e:
logger.error(f"[csb] sitemap 수집 실패: {e}")
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_failure(health, str(e) or repr(e), now)
await session.commit()
return
changed = sorted(
(
(url, lastmod) for url, lastmod in entries
if not _should_skip(url) and (watermark is None or lastmod >= watermark)
),
key=lambda pair: pair[1],
)
if watermark is not None and len(changed) > _DIFF_SANITY:
logger.error(
f"[csb] diff {len(changed)}건 > sanity {_DIFF_SANITY}"
f"sitemap lastmod 남발/부패 의심 (cap 처리는 계속, 관찰 필요)"
)
cap = len(changed) if bulk else _RUN_PAGE_CAP
if limit:
cap = min(cap, limit)
todo, deferred = changed[:cap], max(len(changed) - cap, 0)
logger.info(
f"[csb] sitemap {len(entries)}건 중 변경 {len(changed)}건, 처리 {len(todo)}"
+ (f" (잔여 {deferred}건 — 워터마크 미전진으로 자동 이월)" if deferred else "")
)
totals = {"page": 0, "pdf": 0, "skip": 0}
for i, (url, lastmod) in enumerate(todo, 1):
async with async_session() as session:
src = await session.get(NewsSource, source_id)
counts = await _ingest_url(session, src, url, lastmod)
_set_watermark(src, lastmod)
await session.commit()
for k in totals:
totals[k] += counts[k]
if i % 10 == 0:
logger.info(f"[csb] 진행 {i}/{len(todo)} {totals}")
async with async_session() as session:
health = await _get_or_create_health(session, source_id)
_record_success(health, totals["page"] + totals["pdf"], False, now)
src = await session.get(NewsSource, source_id)
src.last_fetched_at = now
await session.commit()
logger.info(f"[csb] 완료: {totals} (변경 {len(changed)}건 중 {len(todo)}건 처리)")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="CSB sitemap diff 수집")
parser.add_argument("--bulk", action="store_true", help="cap 해제 — 초기 일괄")
parser.add_argument("--limit", type=int, default=0, help="처리 상한 (검증용)")
args = parser.parse_args()
asyncio.run(run(bulk=args.bulk, limit=args.limit))
+58 -14
View File
@@ -231,7 +231,8 @@ async def _run_locked():
_record_success(health, count, status == "not_modified", now)
total += count
except Exception as e:
logger.error(f"[{source.name}] 수집 실패: {e}")
# str 이 빈 예외(httpx.ConnectError('')) 대비 — health 기록과 동일 규칙
logger.error(f"[{source.name}] 수집 실패: {str(e) or repr(e)}")
source.last_fetched_at = datetime.now(timezone.utc)
_record_failure(health, str(e) or repr(e), now)
@@ -244,6 +245,27 @@ ALLOWED_CONTENT_TYPES = ("application/rss+xml", "application/atom+xml",
"application/xml", "text/xml")
# 연결 재시도 간격 — MOEL 추가 실측(2026-06-11): 드랍이 연결 단위 랜덤이라
# 1.5s 후 재시도도 연속으로 걸리는 케이스 발생(직후 다른 연결은 즉시 성공) → 2회로 보강.
_CONNECT_RETRY_DELAYS = (2.0, 5.0)
async def _get_with_connect_retry(client, url: str):
"""연결 계층(TCP/TLS) 오류만 재시도(최대 2회) — HTTP 상태 오류는 비대상 (호출측 분기 보존).
MOEL 실측(2026-06-11): 정부 사이트 보안장비가 TLS 핸드셰이크를 연결 단위로 간헐 드랍
(curl rc=35, 직후 재시도는 성공) → 사이클당 1회 fetch 인 피드 수집이 ConnectError('')
로 실패 누적·circuit open. 지속 장애는 그대로 circuit 몫.
"""
for delay in _CONNECT_RETRY_DELAYS:
try:
return await client.get(url)
except (httpx.ConnectError, httpx.ConnectTimeout) as e:
logger.info(f"연결 오류 {delay}s 후 재시도 ({url.split('?')[0]}): {repr(e)}")
await asyncio.sleep(delay)
return await client.get(url)
async def _is_portal_duplicate(session, title: str) -> bool:
"""A-6 2차 dedup: 포털 전재본 vs 원본이 다른 URL 로 이중 적재되는 케이스.
@@ -271,6 +293,15 @@ async def _enqueue_processing(session, doc: Document, source: NewsSource, pub_dt
fulltext_worker 가 승격(또는 격하) 확정 후 enqueue (RSS 요약 선요약 → 풀텍스트
도착 시 summarize_worker 의 '이미 요약 있음 skip' 에 막히는 순서 함정 회피).
"""
if source.fetch_method == "signal-only":
# B-4: 시그널 = 검색 색인만 (embed/chunk). fulltext/summarize 절대 enqueue 안 함 —
# 레지스트리가 fulltext_policy='page' 로 잘못 설정돼도 페이지 fetch 0 (방어 우선).
# 요약 LLM 스킵 = 맥미니 부하 0. 다이제스트/브리핑은 ai_summary IS NULL 문서를
# 처음부터 제외(services/digest/loader.py)하므로 시그널 문서가 자연 배제된다.
if source.source_channel == "crawl" or (datetime.now(timezone.utc) - pub_dt).days <= 30:
await enqueue_stage(session, doc.id, "embed")
await enqueue_stage(session, doc.id, "chunk")
return
if source.fulltext_policy == "page" and doc.edit_url:
await enqueue_stage(session, doc.id, "fulltext")
return
@@ -286,6 +317,28 @@ async def _enqueue_processing(session, doc: Document, source: NewsSource, pub_dt
await enqueue_stage(session, doc.id, "chunk")
def _entry_body(source: NewsSource, entry, summary: str) -> tuple[str, str]:
"""(body, extractor_version) — 정책별 본문 선택, 순수 함수 (shape 테스트 대상).
signal-only: 피드 요약이 곧 본문 — 절단 없음 (arXiv 초록 1.3~1.6K자 보존,
1000자 cap 적용 시 초록 꼬리 유실). 페이지 fetch 는 어떤 경우에도 없음 (B-4).
feed-full: 피드 본문이 전문인 소스만 신뢰 (truncate·광고 삽입이 흔해 일반
소스의 summary/content:encoded 를 전문으로 오인 저장 금지 — A-6).
"""
if source.fetch_method == "signal-only":
body = _clean_html(
entry.get("summary", "") or entry.get("description", ""), max_len=None
)
return (body or summary), "rss-signal"
if source.fulltext_policy == "feed-full":
content_list = entry.get("content") or []
raw_body = content_list[0].get("value", "") if content_list else ""
full_body = _clean_html(raw_body or entry.get("summary", ""), max_len=None)
if len(full_body) > len(summary):
return full_body, "rss-feed-full"
return summary, "rss"
def _build_extract_meta(source: NewsSource, pub_dt: datetime) -> dict:
"""fulltext_worker / 패널이 쓰는 출처 메타 (documents 에 source FK 가 없어 여기 기록)."""
return {
@@ -349,7 +402,7 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
async with httpx.AsyncClient(
timeout=10, follow_redirects=False, headers=headers
) as client:
resp = await client.get(source.feed_url)
resp = await _get_with_connect_retry(client, source.feed_url)
# 304 는 redirect 처리보다 먼저 — httpx 의 is_redirect 는 3xx 전체(304 포함)에
# True 라, 304 를 redirect 로 오인하면 location 없는 같은 URL 을 재요청해
@@ -415,17 +468,8 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
if not summary:
summary = title
# A-6: feed-full 소스만 피드 본문을 전문으로 신뢰 (truncate·광고 삽입이 흔해
# 일반 소스의 summary/content:encoded 를 전문으로 오인 저장 금지)
body = summary
is_feed_full = False
if source.fulltext_policy == "feed-full":
content_list = entry.get("content") or []
raw_body = content_list[0].get("value", "") if content_list else ""
full_body = _clean_html(raw_body or entry.get("summary", ""), max_len=None)
if len(full_body) > len(summary):
body = full_body
is_feed_full = True
# 정책별 본문 선택 — signal-only(무절단 요약) / feed-full(피드 전문) / 기본(요약)
body, extractor_version = _entry_body(source, entry, summary)
link = entry.get("link", "")
@@ -469,7 +513,7 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
title=title,
extracted_text=f"{title}\n\n{body}",
extracted_at=datetime.now(timezone.utc),
extractor_version="rss-feed-full" if is_feed_full else "rss",
extractor_version=extractor_version,
# article = 텍스트 네이티브(본문=extracted_text). markdown 단계 미enqueue 라
# 기본값 'pending' 이면 영구 비수렴 → backlog 지표 오염 + md_status_pending partial
# 인덱스 비대. 생성 시점에 terminal 'skipped' 로 명시(변환 비대상).
+55
View File
@@ -172,6 +172,61 @@ export async function api<T = unknown>(
return res.json();
}
/**
* Raw fetch 헬퍼 — SSE/스트리밍 등 JSON 일괄 파싱이 부적합한 endpoint 전용.
*
* api<T>() 와 동일한 정책을 공유한다:
* - access token 자동 첨부
* - 401 → refresh 1회 재시도 (실패 시 handleTokenRefresh 가 강제 logout)
* - JSON body 면 Content-Type 자동 설정
*
* 차이: Response 를 그대로 반환한다 (status 판단 / body 소비는 호출자 책임).
* PR-Eid-Chat: `/api/eid/chat` SSE 스트림이 첫 소비자. additive export only —
* 기존 api()/uploadFile() 동작은 변경하지 않는다.
*/
export async function apiFetchRaw(
path: string,
options: RequestInit = {},
): Promise<Response> {
const headers: Record<string, string> = {
...(options.headers as Record<string, string> || {}),
};
if (accessToken) {
headers['Authorization'] = `Bearer ${accessToken}`;
}
if (options.body && !(options.body instanceof FormData)) {
headers['Content-Type'] = 'application/json';
}
const res = await fetch(`${API_BASE}${path}`, {
...options,
headers,
credentials: 'include',
});
// 401 → refresh 1회 시도 (api() 와 같은 정책, auth endpoint 제외)
const isAuthEndpoint = path.startsWith('/auth/login') || path.startsWith('/auth/refresh');
if (res.status === 401 && accessToken && !isAuthEndpoint) {
try {
await handleTokenRefresh();
} catch {
// refresh 실패 — handleTokenRefresh 가 강제 logout(리다이렉트) 처리.
// api() 와 일관되게 원본 401 Response 를 그대로 반환해 호출자가
// 네트워크 에러로 오인하지 않게 한다 (body 미소비 상태라 재사용 가능).
return res;
}
headers['Authorization'] = `Bearer ${accessToken}`;
return fetch(`${API_BASE}${path}`, {
...options,
headers,
credentials: 'include',
});
}
return res;
}
/**
* 업로드 전용 헬퍼 — XMLHttpRequest 기반.
*
+11 -1
View File
@@ -2,7 +2,7 @@
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { api } from '$lib/api';
import { ChevronRight, ChevronDown, FolderOpen, FolderTree, Inbox, Clock, Mail, Scale, StickyNote, GraduationCap, CalendarCheck } from 'lucide-svelte';
import { ChevronRight, ChevronDown, FolderOpen, FolderTree, Inbox, Clock, Mail, Scale, StickyNote, GraduationCap, CalendarCheck, MessageCircle } from 'lucide-svelte';
let tree = $state([]);
let loading = $state(true);
@@ -229,6 +229,16 @@
공부
</span>
</a>
<a
href="/chat"
class="flex items-center justify-between px-3 py-2 rounded-md text-sm transition-colors
{$page.url.pathname.startsWith('/chat') ? 'bg-accent/15 text-accent' : 'text-text hover:bg-surface'}"
>
<span class="flex items-center gap-2">
<MessageCircle size={16} />
이드
</span>
</a>
<a
href="/inbox"
class="flex items-center justify-between px-3 py-2 rounded-md text-sm text-text hover:bg-surface transition-colors"
@@ -0,0 +1,31 @@
<!--
EidEvidenceCard — 이드 채팅 deep(검색) 답변의 근거 카드 (ds-eid-ask-absorb P1).
ReactResult.sources = {id, doc_id, title, score} (citation 번호 n 없음 — /ask 의 Citation 과
다름) → 순서 기반 번호([1],[2]...). 1단계 카드 = 제목·출처·점수 (스니펫은 react_loop
_result_payload items_src 에 없음 — 2단계 후보). 접이식 <details> 로 채팅 흐름 보존.
디자인 토큰만 (CLAUDE.md lint:tokens).
-->
<script lang="ts">
type EidSource = { id?: number; doc_id?: number; title?: string; score?: number };
let { sources, partial = false }: { sources: EidSource[]; partial?: boolean } = $props();
</script>
{#if sources.length}
<details class="mt-2 rounded-lg border border-default bg-surface text-xs max-w-[85%] sm:max-w-[75%]">
<summary class="cursor-pointer px-3 py-2 text-dim hover:text-text select-none font-semibold">
근거 {sources.length}{partial ? ' · 부분 답변 (확정 근거 부족)' : ''}
</summary>
<ul class="px-3 pb-2.5 flex flex-col gap-1.5">
{#each sources as src, i (src.id ?? i)}
<li class="flex items-start gap-2">
<span class="text-accent font-bold shrink-0">[{i + 1}]</span>
<span class="flex-1 min-w-0 text-text break-words">{src.title || `문서 ${src.doc_id ?? '?'}`}</span>
{#if typeof src.score === 'number'}
<span class="text-faint shrink-0 tabular-nums">{src.score.toFixed(2)}</span>
{/if}
</li>
{/each}
</ul>
</details>
{/if}
+8
View File
@@ -0,0 +1,8 @@
/**
* (/chat) (PR-Eid-Chat).
*
* localStorage (routes/chat/+page.svelte) /
* logout(stores/auth.ts)
* ( posture 정합: 로그아웃 ).
*/
export const EID_CHAT_STORAGE_KEY = 'eid_chat:v1';
+9
View File
@@ -1,5 +1,6 @@
import { writable } from 'svelte/store';
import { api, setAccessToken } from '$lib/api';
import { EID_CHAT_STORAGE_KEY } from '$lib/eidChat';
interface User {
id: number;
@@ -39,6 +40,14 @@ export async function logout() {
setAccessToken(null);
user.set(null);
isAuthenticated.set(false);
// 본문 무로깅 posture 정합 — 로그아웃 시 이드 대화 이력도 브라우저에서 제거
if (typeof window !== 'undefined') {
try {
window.localStorage.removeItem(EID_CHAT_STORAGE_KEY);
} catch {
// 이력 제거 실패가 logout 자체를 막지는 않음
}
}
}
export async function tryRefresh() {
+3 -1
View File
@@ -3,7 +3,7 @@
import { browser } from '$app/environment';
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { Menu, EllipsisVertical, ChevronDown, FileText, Newspaper, HelpCircle, StickyNote, Inbox, PanelLeft } from 'lucide-svelte';
import { Menu, EllipsisVertical, ChevronDown, FileText, Newspaper, HelpCircle, StickyNote, Inbox, PanelLeft, MessageCircle } from 'lucide-svelte';
import { isAuthenticated, user, tryRefresh, logout } from '$lib/stores/auth';
import { toasts, removeToast } from '$lib/stores/toast';
import { refresh as refreshPublicConfig } from '$lib/stores/config';
@@ -140,6 +140,7 @@
</div>
<a href="/ask" class="px-3 py-1.5 rounded-md text-sm font-semibold transition-colors {isActive('/ask') ? 'text-accent bg-accent/12' : 'text-dim hover:text-text hover:bg-surface'}">질문</a>
<a href="/chat" class="px-3 py-1.5 rounded-md text-sm font-semibold transition-colors {isActive('/chat') ? 'text-accent bg-accent/12' : 'text-dim hover:text-text hover:bg-surface'}">이드</a>
<SystemStatusDot />
</div>
@@ -178,6 +179,7 @@
<a href="/documents" aria-current={docsActive ? 'page' : undefined} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold transition-colors {docsActive ? 'text-accent' : 'text-dim'}"><FileText size={18} strokeWidth={1.9} /> 문서</a>
<a href="/news" aria-current={newsActive ? 'page' : undefined} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold transition-colors {newsActive ? 'text-accent' : 'text-dim'}"><Newspaper size={18} strokeWidth={1.9} /> 뉴스</a>
<a href="/ask" aria-current={isActive('/ask') ? 'page' : undefined} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold transition-colors {isActive('/ask') ? 'text-accent' : 'text-dim'}"><HelpCircle size={18} strokeWidth={1.9} /> 질문</a>
<a href="/chat" aria-current={isActive('/chat') ? 'page' : undefined} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold transition-colors {isActive('/chat') ? 'text-accent' : 'text-dim'}"><MessageCircle size={18} strokeWidth={1.9} /> 이드</a>
<a href="/memos" aria-current={isActive('/memos') ? 'page' : undefined} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold transition-colors {isActive('/memos') ? 'text-accent' : 'text-dim'}"><StickyNote size={18} strokeWidth={1.9} /> 메모</a>
<button onclick={() => ui.openDrawer('sidebar')} class="flex-1 flex flex-col items-center justify-center gap-1 py-2 text-[10px] font-semibold text-dim"><Menu size={18} strokeWidth={1.9} /> 더보기</button>
</nav>
+751
View File
@@ -0,0 +1,751 @@
<!--
/chat — 이드 채팅 표면 (PR-Eid-Chat).
표면 문법: 페이지 정체성 = "이드". 모델명·머신명·alias 비노출
(persona model-agnostic 원칙 — 프로토콜 레이어도 동일: SSE payload 의
model 필드는 서버에서 mode 값으로 치환되고 usage 는 제거됨).
클라이언트는 mode('daily'|'deep') 만 보내고 alias 매핑은 서버(/api/eid/chat) 책임.
- 모드: 일상(daily) / 심층(deep) segmented 토글. 심층 = 장문·무거운 질문,
잠들어 있으면 자동 기동(처음 최대 ~1분) — 기계중립 표현만 사용.
- 스트리밍: POST /api/eid/chat → SSE. api<T>() 는 JSON 전용이라 raw fetch
(apiFetchRaw, 토큰 첨부 + 401 refresh 1회 공유) 사용. 라인 버퍼로 청크
경계 분리, "data:" 라인만, [DONE] 종료, choices[0].delta.content 누적
(fixture 2종 — 26B tool_calls 배열 / 27B reasoning·logprobs null — 모두
content 만 읽으면 동일 처리).
- 에러: error_reason 매핑 (warming / editor_busy / upstream_cold /
macbook_unavailable / substrate_degraded / 기타 detail). 자동 fallback
금지 — 다른 모드로 자동 전환하지 않는다. 스트림 도중 중단 = 받은 부분
유지 + 표시.
- 대기 표시(첫 바이트 전): 경과 타이머 1초 갱신 + 3초 후 GET /api/eid/status
1회·이후 8초 간격 재조회(실패는 조용히 무시 — 기능 비차단)로 "대기"와
"고장"을 정직하게 구분. daily.busy=true 면 줄 서는 중 안내. 15초 경과 +
daily 모드면 [심층으로 전환]/[취소] 버튼 노출 — 전환은 명시 클릭만
(자동 fallback 금지 정책 위반 아님). 첫 바이트 도착/스트림 종료 시
타이머·폴링 즉시 정리.
- 이력: localStorage `eid_chat:v1` (키 상수는 $lib/eidChat — logout 시 제거와 공유).
전송 payload 는 마지막 20턴(40 messages) cap.
- 입력 한도: 메시지당 8,000자 클라 선차단(서버 422 검증과 동일 한도).
422 수신 시 detail 을 한 줄로 정규화 + 방금 push 한 user 턴 pop 으로
payload 오염 고리 차단.
-->
<script lang="ts">
import { onMount, onDestroy } from 'svelte';
import { api, apiFetchRaw } from '$lib/api';
import { EID_CHAT_STORAGE_KEY } from '$lib/eidChat';
import Button from '$lib/components/ui/Button.svelte';
import EmptyState from '$lib/components/ui/EmptyState.svelte';
import EidEvidenceCard from '$lib/components/eid/EidEvidenceCard.svelte';
import { MessageCircle, SendHorizontal, RotateCcw, AlertCircle } from 'lucide-svelte';
type ChatMode = 'daily' | 'deep';
// deep(검색) 답변은 sources(근거)·partial 동반. daily 답변은 없음.
type EidSource = { id?: number; doc_id?: number; title?: string; score?: number };
type ChatMessage = {
role: 'user' | 'assistant';
content: string;
sources?: EidSource[];
partial?: boolean;
};
type Notice = { kind: 'warn' | 'error'; message: string; retryable: boolean };
// GET /api/eid/status 응답 — 대기 중 바쁨 신호 조회에 필요한 필드만 좁게 정의
type EidStatus = { daily?: { busy?: boolean } };
// 이력 키 — logout(stores/auth.ts) 의 이력 제거와 단일 상수 공유
const STORAGE_KEY = EID_CHAT_STORAGE_KEY;
// 전송 payload cap: 마지막 20턴(40 messages)
const MAX_PAYLOAD_MESSAGES = 40;
// localStorage 보존 cap (payload cap 과 별개 — 화면 표시용 이력)
const MAX_STORED_MESSAGES = 200;
// 메시지당 입력 한도 — 서버(eid_chat.py) 422 검증과 동일 한도, 클라에서 선차단
const MAX_MESSAGE_CHARS = 8000;
// 한도 근접 카운터 노출 시작점
const COUNTER_THRESHOLD = 7500;
// 대기 표시(첫 바이트 전): 상태 폴링 시작 시점(초) / 재조회 간격(초) / 행동 버튼 노출 시점(초)
const STATUS_POLL_START_SEC = 3;
const STATUS_POLL_INTERVAL_SEC = 8;
const WAIT_ACTIONS_SEC = 15;
const DEEP_CAPTION =
'장문·무거운 질문에 적합 — 잠들어 있으면 자동 기동 (처음 응답까지 최대 ~1분)';
// 프리셋 칩: 입력창 prefix 채움
const PRESETS: Array<{ label: string; prefix: string }> = [
{ label: '번역 한→영', prefix: '다음을 영어로 번역해줘.\n\n' },
{ label: '번역 영→한', prefix: '다음을 한국어로 번역해줘.\n\n' },
{ label: '요약', prefix: '다음 내용을 핵심만 간결히 요약해줘.\n\n' },
{ label: '글 다듬기', prefix: '다음 글을 뜻은 유지하면서 자연스럽게 다듬어줘.\n\n' },
];
// ── state ───────────────────────────────────────────
let mode = $state<ChatMode>('daily');
let messages = $state<ChatMessage[]>([]);
let input = $state('');
let streaming = $state(false);
let streamingText = $state('');
let notice = $state<Notice | null>(null);
// deep(검색) 모드 첫 바이트 전 단계 — 'searching' 이면 대기 표시를 "근거 검색 중"으로
let deepPhase = $state<'searching' | null>(null);
let scrollEl: HTMLDivElement | undefined = $state();
let textareaEl: HTMLTextAreaElement | undefined = $state();
let abortCtrl: AbortController | null = null;
// ── 대기 추적 (첫 바이트 전) ────────────────────────
// 경과 초 + daily 엔진 바쁨 여부(null = 미확인). 토큰(세대 카운터)으로
// 스트림별 소유를 구분 — abort 직후 즉시 재전송(심층 전환) 경로에서
// 이전 스트림의 늦은 정리가 새 스트림의 타이머를 죽이지 않게 한다.
let waitSeconds = $state(0);
let dailyBusy = $state<boolean | null>(null);
let waitIntervalId: ReturnType<typeof setInterval> | null = null;
let waitTokenSeq = 0;
let waitToken = 0; // 현재 활성 추적 토큰 (0 = 추적 없음)
function startWaitTracking(streamMode: ChatMode): number {
// 이전 추적 잔여 정리 (전환 재전송처럼 stop 전에 start 가 오는 경로 방어)
if (waitIntervalId !== null) {
clearInterval(waitIntervalId);
waitIntervalId = null;
}
const token = ++waitTokenSeq;
waitToken = token;
waitSeconds = 0;
dailyBusy = null;
waitIntervalId = setInterval(() => {
if (waitToken !== token) return; // 정리 누락 방어 — 무해 no-op
waitSeconds += 1;
// 바쁨 신호 폴링: 3초 경과 시 1회 + 이후 8초 간격 (3, 11, 19, ...).
// daily 모드 전용 — deep 대기는 기존 wake 안내 + 경과 타이머만.
if (
streamMode === 'daily' &&
waitSeconds >= STATUS_POLL_START_SEC &&
(waitSeconds - STATUS_POLL_START_SEC) % STATUS_POLL_INTERVAL_SEC === 0
) {
void pollEidStatus(token);
}
}, 1000);
return token;
}
// token 가드: 본인 소유 추적만 정리 — 다른 스트림이 이어받았으면 no-op
function stopWaitTracking(token: number) {
if (token !== waitToken) return;
waitToken = 0;
if (waitIntervalId !== null) {
clearInterval(waitIntervalId);
waitIntervalId = null;
}
waitSeconds = 0;
dailyBusy = null;
}
// 상태 조회 — 실패는 조용히 무시 (대기 표시는 타이머만으로 유지, 기능 비차단)
async function pollEidStatus(token: number) {
try {
const status = await api<EidStatus>('/eid/status');
if (token !== waitToken) return; // 스트림 종료/교체 후 도착한 늦은 응답 폐기
dailyBusy = status?.daily?.busy === true;
} catch {
// 무시 — 바쁨 신호는 부가 정보일 뿐 채팅 기능을 차단하지 않는다
}
}
// ── localStorage 이력 ───────────────────────────────
function persist() {
if (typeof window === 'undefined') return;
try {
const trimmed = messages.slice(-MAX_STORED_MESSAGES);
window.localStorage.setItem(STORAGE_KEY, JSON.stringify({ mode, messages: trimmed }));
} catch {
// quota 초과 등 — 이력 저장 실패는 치명적이지 않음
}
}
function restore() {
if (typeof window === 'undefined') return;
try {
const raw = window.localStorage.getItem(STORAGE_KEY);
if (!raw) return;
const parsed = JSON.parse(raw) as { mode?: unknown; messages?: unknown };
if (parsed.mode === 'daily' || parsed.mode === 'deep') mode = parsed.mode;
if (Array.isArray(parsed.messages)) {
messages = parsed.messages
.filter(
(m): m is ChatMessage =>
!!m &&
typeof m === 'object' &&
((m as ChatMessage).role === 'user' || (m as ChatMessage).role === 'assistant') &&
typeof (m as ChatMessage).content === 'string'
)
// 배열 크기 가드 + content 8,000자 clamp — 외부에서 손상/비대해진
// 이력이 전송 payload 를 오염시키지 않도록 복원 시점에 정규화.
// sources/partial(deep 답변 근거)은 보존 — 전송 payload 엔 안 실림(runStream map 이 role/content 만).
.slice(-MAX_STORED_MESSAGES)
.map((m) => ({
role: m.role,
content: m.content.slice(0, MAX_MESSAGE_CHARS),
sources: Array.isArray((m as ChatMessage).sources) ? (m as ChatMessage).sources : undefined,
partial: (m as ChatMessage).partial === true || undefined,
}));
}
} catch {
// 손상된 이력은 무시 (새 대화로 시작)
}
}
onMount(() => restore());
onDestroy(() => {
abortCtrl?.abort();
// 페이지 이탈 시 대기 타이머/폴링 정리 (abort 의 finally 와 이중이어도 무해)
if (waitIntervalId !== null) clearInterval(waitIntervalId);
});
// ── 자동 스크롤 (새 메시지 / 스트림 청크마다 하단 고정) ──
$effect(() => {
void messages.length;
void streamingText;
if (scrollEl) scrollEl.scrollTop = scrollEl.scrollHeight;
});
// ── 입력 textarea auto-grow ─────────────────────────
$effect(() => {
void input;
if (!textareaEl) return;
textareaEl.style.height = 'auto';
textareaEl.style.height = Math.min(textareaEl.scrollHeight, 160) + 'px';
});
function applyPreset(prefix: string) {
if (!input.startsWith(prefix)) input = prefix + input;
textareaEl?.focus();
}
function newConversation() {
abortCtrl?.abort();
messages = [];
notice = null;
streamingText = '';
streaming = false;
persist();
textareaEl?.focus();
}
// ── error_reason → 안내 메시지 매핑 ──────────────────
// 자동 fallback 금지 ([[feedback_no_silent_fallback_explicit_opt_in]]):
// 어떤 사유든 다른 모드로 자동 전환하지 않고 명시 표시만 한다.
function mapErrorReason(reason: string | undefined, detail: string): Notice {
switch (reason) {
case 'warming':
return {
kind: 'warn',
message: '심층 엔진 기동 중입니다 — 잠시 후 다시 시도하세요.',
retryable: true,
};
case 'editor_busy':
return {
kind: 'warn',
message: '편집 작업 보호로 잠시 사용할 수 없습니다.',
retryable: false,
};
case 'upstream_cold':
case 'macbook_unavailable':
return {
kind: 'warn',
message: '심층 엔진이 잠들어 있습니다 — 다시 시도하면 기동을 시작합니다.',
retryable: true,
};
case 'substrate_degraded':
return {
kind: 'error',
message: '운영 규칙이 적재되지 않았습니다 — 관리자 확인이 필요합니다.',
retryable: false,
};
default:
return { kind: 'error', message: detail || '응답 생성에 실패했습니다.', retryable: true };
}
}
// 비-200 응답 body 파싱: {detail, error_reason} — detail 은 string 또는
// {message} 객체 가능 (api.ts parseDetail 과 같은 정규화 규칙의 축소판).
async function parseErrorBody(res: Response): Promise<Notice> {
const body = (await res.json().catch(() => null)) as
| { detail?: unknown; error_reason?: unknown }
| null;
const reason = typeof body?.error_reason === 'string' ? body.error_reason : undefined;
let detail = '';
if (typeof body?.detail === 'string') detail = body.detail;
else if (body?.detail && typeof body.detail === 'object') {
const obj = body.detail as { message?: string; error_reason?: string };
detail = obj.message || '';
// error_reason 이 detail 객체 안에 중첩된 경우도 수용
return mapErrorReason(reason ?? obj.error_reason, detail || res.statusText);
}
return mapErrorReason(reason, detail || res.statusText);
}
// 422: FastAPI validation detail(배열 shape — [{loc, msg, type}, ...]) 을
// 사람이 읽을 한 줄로 정규화. 길이 한도 위반(메시지당 8,000자 / 총량 cap)
// 은 친화 메시지로 치환. pydantic v2 의 "Value error, " prefix 는 제거.
function normalizeValidationDetail(detail: unknown): string {
const first = (Array.isArray(detail) ? detail[0] : undefined) as
| { msg?: unknown }
| undefined;
const msg =
typeof first?.msg === 'string' ? first.msg.replace(/^Value error,\s*/i, '') : '';
if (/at most|too.?long|초과|깁니다/i.test(msg)) {
return '입력이 너무 깁니다 — 메시지는 8,000자 이내로 줄이거나, 대화가 길면 새 대화로 시작하세요.';
}
if (msg) return `요청 형식 오류: ${msg}`;
return '요청 형식이 올바르지 않습니다 — 입력을 줄이거나 새 대화로 시작하세요.';
}
// ── 전송 / 재시도 ───────────────────────────────────
function sendMessage() {
const text = input.trim();
if (!text || streaming) return;
// 메시지당 8,000자 클라 선차단 — 한도 초과 payload 를 422 전에 막는다
// (입력바 하단 카운터가 같은 안내를 인라인으로 상시 표시)
if (text.length > MAX_MESSAGE_CHARS) {
notice = {
kind: 'error',
message: '입력이 너무 깁니다 — 8,000자 이내로 줄여주세요.',
retryable: false,
};
return;
}
messages.push({ role: 'user', content: text });
input = '';
persist();
void runStream();
}
// 재시도: 이력 끝의 user 메시지를 그대로 재전송 (user 턴 중복 추가 X)
function retry() {
if (streaming) return;
if (messages.length === 0 || messages[messages.length - 1].role !== 'user') return;
void runStream();
}
// ── 대기 중 행동 버튼 (daily + 15초 경과) ────────────
// [심층으로 전환] — 명시 클릭에 의한 모드 전환 (자동 fallback 금지 정책
// 위반 아님). 현재 fetch abort → 같은 user 턴을 mode=deep 으로 즉시 재전송.
// abort 된 이전 스트림의 finally 는 abortCtrl 비교 + 대기 token 가드로
// 새 스트림 상태를 건드리지 않는다 (새 대화 abort race 가드와 동일 구조).
function switchToDeep() {
if (!streaming || mode !== 'daily') return;
mode = 'deep'; // 모드 토글 상태도 deep 으로 갱신
abortCtrl?.abort();
void runStream();
}
// [취소] — abort 후 방금 push 한 user 턴 pop + 입력창 본문 복원
// (422 처리와 동일 패턴: 이력 오염 차단 + localStorage 재저장).
// placeholder 제거는 abort 된 스트림의 finally(streaming=false)가 처리.
function cancelWait() {
if (!streaming) return;
abortCtrl?.abort();
if (messages.length > 0 && messages[messages.length - 1].role === 'user') {
const popped = messages.pop();
if (popped && !input) input = popped.content;
persist();
}
}
async function runStream() {
notice = null;
streaming = true;
streamingText = '';
const ctrl = new AbortController();
abortCtrl = ctrl;
// 첫 바이트 전 대기 추적 시작 — 본 스트림 소유 토큰으로 정리 시점 제어
const waitTok = startWaitTracking(mode);
const payload = {
mode,
messages: messages
.slice(-MAX_PAYLOAD_MESSAGES)
.map((m) => ({ role: m.role, content: m.content })),
};
let acc = '';
let sawDone = false;
// deep(검색) 답변 동반 데이터 — daily 는 안 옴
let accSources: EidSource[] = [];
let accPartial = false;
try {
const res = await apiFetchRaw('/eid/chat', {
method: 'POST',
body: JSON.stringify(payload),
signal: ctrl.signal,
});
if (!res.ok) {
if (res.status === 422) {
// validation 거부 — detail 정규화 + 방금 push 한 user 턴 pop.
// 한도 초과 턴이 이력에 남으면 이후 모든 전송 payload 가 계속
// 422 를 맞는 오염 고리가 되므로 여기서 끊는다 (localStorage 재저장).
const body = (await res.json().catch(() => null)) as { detail?: unknown } | null;
notice = {
kind: 'error',
message: normalizeValidationDetail(body?.detail),
retryable: false,
};
if (messages.length > 0 && messages[messages.length - 1].role === 'user') {
const popped = messages.pop();
// 입력창이 비어 있으면 본문을 돌려놓아 줄여서 재전송할 수 있게 한다
if (popped && !input) input = popped.content;
persist();
}
return;
}
notice = await parseErrorBody(res);
return;
}
if (!res.body) {
notice = { kind: 'error', message: '스트림을 열 수 없습니다.', retryable: true };
return;
}
// SSE 라인 버퍼 파싱 — 청크 경계에서 라인이 잘릴 수 있으므로
// 마지막 불완전 라인은 buf 에 남겨 다음 청크와 이어붙인다.
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
// data: 라인 1개 처리 — [DONE] 이면 true (acc/streamingText 누적은 closure)
const processLine = (rawLine: string): boolean => {
const line = rawLine.trim();
if (!line.startsWith('data:')) return false;
const data = line.slice(5).trim();
if (data === '[DONE]') return true;
try {
const obj = JSON.parse(data) as {
choices?: Array<{ delta?: { content?: unknown } }>;
phase?: string;
error_reason?: string;
eid_sources?: EidSource[];
partial?: boolean;
};
// deep(검색) envelope 분기 — daily 응답엔 없음
if (obj?.phase === 'ping') return false; // heartbeat — 무시
if (obj?.phase === 'searching') {
deepPhase = 'searching'; // 대기 표시를 "근거 검색 중"으로
return false;
}
if (obj?.phase === 'error') {
// in-stream 미가용/실패 — 받은 부분 유지 + 명시 표시 (자동 fallback 0).
// 뒤따르는 [DONE] 이 sawDone 처리하므로 '중단' 오경보 없음.
notice = mapErrorReason(obj.error_reason, '');
return false;
}
if (Array.isArray(obj?.eid_sources)) {
accSources = obj.eid_sources;
accPartial = obj.partial === true;
return false;
}
const piece = obj?.choices?.[0]?.delta?.content;
if (typeof piece === 'string' && piece) {
// 첫 바이트 도착 — 대기 타이머/폴링 제거, 기존 스트리밍 표시로 전환
if (!acc) {
stopWaitTracking(waitTok);
deepPhase = null;
}
acc += piece;
streamingText = acc;
}
} catch {
// 불완전/비 JSON data 라인 무시
}
return false;
};
while (true) {
const { value, done } = await reader.read();
if (done) {
// 종단 flush — decoder 내부 잔여 바이트 + 개행 없이 끝난 마지막
// 라인을 1회 처리. 마지막 data:/[DONE] 라인이 \n 없이 끝나면 buf 에
// 남아 '응답이 중단되었습니다' 오경보가 나던 경로의 해소 지점.
buf += decoder.decode();
for (const rawLine of buf.split('\n')) {
if (processLine(rawLine)) {
sawDone = true;
break;
}
}
break;
}
buf += decoder.decode(value, { stream: true });
const lines = buf.split('\n');
buf = lines.pop() ?? '';
for (const rawLine of lines) {
if (processLine(rawLine)) {
sawDone = true;
break;
}
}
if (sawDone) {
// [DONE] 수신 — 잔여 스트림 lock 해제 (실패해도 종료에 영향 없음)
void reader.cancel().catch(() => {});
break;
}
}
// [DONE] 없이 연결이 끊긴 경우 — 받은 부분 유지 + 표시
if (!sawDone) {
notice = acc
? {
kind: 'warn',
message: '응답이 중단되었습니다 — 받은 부분까지 표시합니다.',
retryable: false,
}
: { kind: 'error', message: '응답을 받지 못했습니다 — 다시 시도하세요.', retryable: true };
}
} catch (err) {
if ((err as Error)?.name === 'AbortError') {
// 새 대화 / 대기 취소 / 심층 전환 등 사용자 의도 중단 — 안내 불필요
return;
}
// 스트림 도중 네트워크 에러 — 받은 부분 유지 + 표시
notice = acc
? {
kind: 'warn',
message: '연결이 끊겼습니다 — 받은 부분까지 표시합니다.',
retryable: false,
}
: { kind: 'error', message: '요청에 실패했습니다 — 네트워크를 확인하세요.', retryable: true };
} finally {
// 스트림 종료 — 대기 타이머/폴링 정리. 첫 바이트에서 이미 정리됐거나
// 전환 재전송으로 새 스트림이 추적을 이어받았으면 token 가드로 no-op.
stopWaitTracking(waitTok);
// abort(새 대화/페이지 이탈) 시에는 push 하지 않음 — 새 대화로 비운
// messages 에 이전 스트림 잔여분이 흘러들어가는 race 방지.
if (acc && !ctrl.signal.aborted) {
messages.push({
role: 'assistant',
content: acc,
sources: accSources.length ? accSources : undefined,
partial: accPartial || undefined,
});
}
if (abortCtrl === ctrl) {
streaming = false;
streamingText = '';
deepPhase = null;
abortCtrl = null;
}
persist();
}
}
function handleKeydown(e: KeyboardEvent) {
// Enter 전송 / Shift+Enter 줄바꿈 (한글 조합 중 전송 방지)
if (e.key === 'Enter' && !e.shiftKey && !e.isComposing) {
e.preventDefault();
sendMessage();
}
}
// 마지막 메시지가 user 턴이고 스트리밍 중이 아니면 재시도 가능 상태
let canRetry = $derived(
!streaming && messages.length > 0 && messages[messages.length - 1].role === 'user'
);
// 입력 길이(전송 기준 = trim 후) — 7,500자부터 카운터 노출, 8,000자 초과 차단
let inputLength = $derived(input.trim().length);
let overLimit = $derived(inputLength > MAX_MESSAGE_CHARS);
// 첫 바이트 전 placeholder 문구 — "대기"와 "고장"의 정직한 구분:
// 바쁨 확인 = 줄 서는 중 / 비-바쁨 확인 = 생성 준비 중 / 미확인 = 응답 대기 중.
// deep 모드는 폴링하지 않으므로 항상 미확인(타이머만) — wake 안내는 헤더 caption.
let waitPlaceholder = $derived(
deepPhase === 'searching'
? `이드가 문서·뉴스에서 근거를 찾는 중 · ${waitSeconds}초`
: dailyBusy === true
? `엔진이 다른 작업을 처리하고 있어요 — 차례가 오면 바로 시작됩니다 (대기 ${waitSeconds}초)`
: dailyBusy === false
? `응답 생성 준비 중 · ${waitSeconds}초`
: `응답 대기 중 · ${waitSeconds}초`
);
// 행동 버튼 노출: daily 모드 + 첫 바이트 전 + 15초 경과
let showWaitActions = $derived(
streaming && !streamingText && mode === 'daily' && waitSeconds >= WAIT_ACTIONS_SEC
);
</script>
<svelte:head>
<title>이드 - PKM</title>
</svelte:head>
<div class="h-full flex flex-col">
<!-- 헤더: 정체성 + 모드 토글 + 새 대화 -->
<div class="shrink-0 border-b border-default bg-sidebar px-4 py-2.5">
<div class="max-w-3xl mx-auto flex items-center gap-2 flex-wrap">
<h1 class="flex items-center gap-2 text-sm font-extrabold tracking-tight shrink-0">
<MessageCircle size={16} class="text-accent" />
이드
</h1>
<!-- 모드 segmented 토글: 일상 / 심층 -->
<div class="flex rounded-md border border-default overflow-hidden" role="group" aria-label="응답 모드">
<button
type="button"
aria-pressed={mode === 'daily'}
onclick={() => (mode = 'daily')}
disabled={streaming}
title="짧은 질문·일상 대화에 적합"
class="px-3 py-1.5 text-xs font-semibold transition-colors disabled:opacity-50
{mode === 'daily' ? 'bg-accent text-white' : 'bg-surface text-dim hover:text-text hover:bg-surface-hover'}"
>
일상
</button>
<button
type="button"
aria-pressed={mode === 'deep'}
onclick={() => (mode = 'deep')}
disabled={streaming}
title={DEEP_CAPTION}
class="px-3 py-1.5 text-xs font-semibold border-l border-default transition-colors disabled:opacity-50
{mode === 'deep' ? 'bg-accent text-white' : 'bg-surface text-dim hover:text-text hover:bg-surface-hover'}"
>
심층
</button>
</div>
<div class="flex-1"></div>
<Button variant="ghost" size="sm" icon={RotateCcw} onclick={newConversation}>
새 대화
</Button>
</div>
{#if mode === 'deep'}
<div class="max-w-3xl mx-auto mt-1.5">
<p class="text-[11px] text-dim">{DEEP_CAPTION}</p>
</div>
{/if}
</div>
<!-- 메시지 리스트 -->
<div bind:this={scrollEl} class="flex-1 min-h-0 overflow-y-auto px-4 py-4">
<div class="max-w-3xl mx-auto flex flex-col gap-3" role="log" aria-live="polite">
{#if messages.length === 0 && !streaming}
<div class="py-10">
<EmptyState
icon={MessageCircle}
title="이드와 대화를 시작하세요"
description="일상 질문은 바로, 장문·무거운 질문은 심층 모드로 물어보세요. 아래 프리셋 칩으로 번역·요약·글 다듬기를 빠르게 시작할 수 있습니다."
/>
</div>
{/if}
{#each messages as msg, i (i)}
{#if msg.role === 'user'}
<div class="flex justify-end">
<div class="max-w-[85%] sm:max-w-[75%] px-3.5 py-2.5 rounded-lg rounded-br-sm bg-accent text-white text-sm whitespace-pre-wrap break-words">
{msg.content}
</div>
</div>
{:else}
<div class="flex flex-col items-start">
<div class="max-w-[85%] sm:max-w-[75%] px-3.5 py-2.5 rounded-lg rounded-bl-sm bg-surface border border-default text-text text-sm whitespace-pre-wrap break-words">
{msg.content}
</div>
{#if msg.sources?.length}
<EidEvidenceCard sources={msg.sources} partial={msg.partial ?? false} />
{/if}
</div>
{/if}
{/each}
<!-- 스트리밍 중 assistant 부분 응답 / 첫 바이트 전 대기 표시 -->
{#if streaming}
<div class="flex justify-start">
<div class="max-w-[85%] sm:max-w-[75%] px-3.5 py-2.5 rounded-lg rounded-bl-sm bg-surface border border-default text-text text-sm whitespace-pre-wrap break-words">
{#if streamingText}
{streamingText}<span class="inline-block w-1.5 h-3.5 ml-0.5 align-middle bg-accent animate-pulse rounded-sm"></span>
{:else}
<span class="text-dim animate-pulse">{waitPlaceholder}</span>
{/if}
</div>
</div>
<!-- 대기 행동 버튼: daily + 15초 경과 — 전환은 명시 클릭만 (자동 fallback 금지) -->
{#if showWaitActions}
<div class="flex justify-start gap-2">
<Button variant="secondary" size="sm" onclick={switchToDeep}>심층으로 전환</Button>
<Button variant="ghost" size="sm" onclick={cancelWait}>취소</Button>
</div>
{/if}
{/if}
<!-- 에러/안내 카드: 자동 fallback 없이 명시 표시만 -->
{#if notice}
<div
class="flex items-start gap-2 px-3.5 py-3 rounded-lg border text-sm
{notice.kind === 'warn'
? 'border-warning/30 bg-warning/10 text-warning'
: 'border-error/30 bg-error/10 text-error'}"
>
<AlertCircle size={15} class="mt-0.5 shrink-0" />
<div class="flex-1 min-w-0">
<p>{notice.message}</p>
{#if notice.retryable && canRetry}
<Button variant="secondary" size="sm" class="mt-2" onclick={retry}>
다시 시도
</Button>
{/if}
</div>
</div>
{/if}
</div>
</div>
<!-- 입력 바 (하단 고정 — 모바일에서도 flex 컬럼 하단에 붙음) -->
<div class="shrink-0 border-t border-default bg-sidebar px-4 pt-2 pb-3">
<div class="max-w-3xl mx-auto">
<!-- 프리셋 칩 -->
<div class="flex gap-1.5 overflow-x-auto pb-2">
{#each PRESETS as preset (preset.label)}
<button
type="button"
onclick={() => applyPreset(preset.prefix)}
class="shrink-0 px-2.5 py-1 rounded-full border border-default bg-surface text-xs text-dim hover:text-text hover:border-accent transition-colors"
>
{preset.label}
</button>
{/each}
</div>
<div class="flex items-end gap-2">
<textarea
bind:this={textareaEl}
bind:value={input}
onkeydown={handleKeydown}
rows="1"
placeholder="이드에게 메시지 보내기 (Enter 전송, Shift+Enter 줄바꿈)"
class="flex-1 min-w-0 px-3 py-2 rounded-lg text-sm bg-bg text-text placeholder:text-faint border border-default focus:border-accent focus:ring-2 focus:ring-accent-ring outline-none resize-none overflow-y-auto transition-colors"
></textarea>
<Button
variant="primary"
size="md"
icon={SendHorizontal}
loading={streaming}
disabled={!input.trim() || overLimit}
onclick={sendMessage}
aria-label="전송"
>
<span class="hidden sm:inline">전송</span>
</Button>
</div>
<!-- 글자수 카운터: 한도(8,000자) 근접 시에만 노출, 초과 시 인라인 안내 -->
{#if inputLength >= COUNTER_THRESHOLD}
<p class="mt-1 text-right text-[11px] {overLimit ? 'text-error' : 'text-dim'}" aria-live="polite">
{inputLength.toLocaleString()} / {MAX_MESSAGE_CHARS.toLocaleString()}{overLimit
? ' — 입력이 너무 깁니다 (8,000자 이내)'
: ''}
</p>
{/if}
</div>
</div>
</div>
@@ -0,0 +1,32 @@
-- crawl-24x7 사이클 3 소스 seed (B-4 시그널 + C-4 공학 지속수집) — 2026-06-11 전 URL live 검증.
-- 326 선례: WHERE NOT EXISTS idempotent, 기존 행 보존, 신규만 insert (단일 statement).
-- fetch_method='signal-only' (B-4): 헤드라인+요약만 인제스트, 페이지 fetch 0,
-- summarize 스킵(검색 색인만 — embed/chunk). 다이제스트는 ai_summary NULL 제외라 자연 배제.
-- Bloomberg = anti-bot 최강이라 본문 수집 비권고 → 시그널 전용. 피드에 비디오 혼재 실측 → skip-video.
-- Economist = 실측 200 (Archiver UA 는 feed-reader 로 취급됨 — 브라우저 UA 만 403). 구독 없음 = 시그널.
-- Nikkei Asia = RSS 1.0(RDF) 실측 — feedparser 가 네이티브 정규화 (title/link 만, 요약·날짜 없음
-- = 제목 시그널). 코드 분기 불요 (tests/test_crawl_cycle3_shapes.py fixture 회귀로 박제).
-- arXiv/ASME = 초록이 곧 본문 (C-4 2단: 초록 색인 먼저, 선별 전문은 Phase 3) → signal-only 재사용.
-- IEEE Spectrum = 피드 description 이 전문 (7.9~14K자 실측) → feed-full. 카테고리 필터 = topic 피드.
INSERT INTO news_sources
(name, country, language, feed_type, feed_url, category, enabled,
fetch_method, fulltext_policy, source_channel, parser_quirk)
SELECT v.name, v.country, v.language, v.feed_type, v.feed_url, v.category, v.enabled,
v.fetch_method, v.fulltext_policy, v.source_channel::source_channel, v.parser_quirk
FROM (VALUES
-- B-4: 시그널 전용 (news 채널 — 헤드라인 시그널)
('Bloomberg Markets', 'US', 'en', 'rss', 'https://feeds.bloomberg.com/markets/news.rss', 'Economy', true, 'signal-only', 'none', 'news', 'skip-video'),
('Bloomberg Technology', 'US', 'en', 'rss', 'https://feeds.bloomberg.com/technology/news.rss', 'Technology', true, 'signal-only', 'none', 'news', 'skip-video'),
('Economist Latest', 'GB', 'en', 'rss', 'https://www.economist.com/latest/rss.xml', 'International', true, 'signal-only', 'none', 'news', NULL),
('Nikkei Asia', 'JP', 'en', 'rss', 'https://asia.nikkei.com/rss/feed/nar', 'International', true, 'signal-only', 'none', 'news', NULL),
-- C-4: 공학 지속수집 (crawl 채널 — 도메인 재료. API 공지/CSB/CCPS 는 전용 워커가 runtime 등록)
('ASME J. Pressure Vessel Technology', 'US', 'en', 'rss', 'https://asmedigitalcollection.asme.org/rss/site_1000037/LatestOpenIssueArticles_1000020.xml', 'Engineering', true, 'signal-only', 'none', 'crawl', NULL),
('arXiv cond-mat.mtrl-sci', 'US', 'en', 'rss', 'https://rss.arxiv.org/rss/cond-mat.mtrl-sci', 'Engineering', true, 'signal-only', 'none', 'crawl', NULL),
('arXiv physics.app-ph', 'US', 'en', 'rss', 'https://rss.arxiv.org/rss/physics.app-ph', 'Engineering', true, 'signal-only', 'none', 'crawl', NULL),
('IEEE Spectrum Energy', 'US', 'en', 'rss', 'https://spectrum.ieee.org/feeds/topic/energy.rss', 'Engineering', true, 'rss', 'feed-full', 'crawl', NULL),
('IEEE Spectrum Robotics', 'US', 'en', 'rss', 'https://spectrum.ieee.org/feeds/topic/robotics.rss', 'Engineering', true, 'rss', 'feed-full', 'crawl', NULL)
) AS v(name, country, language, feed_type, feed_url, category, enabled,
fetch_method, fulltext_policy, source_channel, parser_quirk)
WHERE NOT EXISTS (
SELECT 1 FROM news_sources ns WHERE ns.name = v.name
);
+11 -11
View File
@@ -51,14 +51,10 @@ _FENCE_ANY = re.compile(r"(?m)^\s{0,3}(```|~~~)")
def _looks_junk(title: str | None) -> bool:
if not title:
return False
if _JUNK_ORG.search(title):
return True
letters = [c for c in title if c.isalpha()]
if len(letters) >= 6 and sum(1 for c in letters if c.isupper()) / len(letters) >= 0.85:
return True
return False
"""cover/TOC org-이름 junk. ★전부-대문자 휴리스틱은 폐기(2026-06-09): 기술문서의 정상
all-caps heading('GENERAL REQUIREMENTS'/'WELDING') 130 과탐해 windowed/clean doc 거짓
A_better 강등시켰음. 명시적 회사-접미사만 junk 본다(그것도 호출측이 cover 위치+미stored 게이트)."""
return bool(title and _JUNK_ORG.search(title))
def _make_engine():
@@ -102,10 +98,14 @@ async def _measure_doc(session, doc_id):
res["hash_stable_99"] = False
stored_titles = {s["section_title"] for s in stored if s["section_title"]}
res["junk_b"] = any(_looks_junk(n.section_title) and n.section_title not in stored_titles for n in nodes)
# junk = cover 영역(앞쪽 노드)의 신규 org-이름 heading 만 (positional). 본문 전반의 정상 heading 무관.
res["junk_b"] = any(_looks_junk(n.section_title) and n.section_title not in stored_titles for n in nodes[:4])
# verdict 휴리스틱 (high-recall junk 보호 + absent-structure → A_better).
# MEASURE2 가 canonical 분포를 이미 박제 — 이 verdict 는 재현/감사용. 애매(notes:ambiguous)는 PASS 미차단.
# verdict 휴리스틱 = coarse 스크린(재현/감사용). ★2026-06-09 실집행의 authoritative 결정은 이게 아니라:
# (a) 결정적 partition: pure_benefit(n_a<=2)/comparable(0.85<=ratio<=2) = 자동 INCLUDE,
# overseg(ratio>2)/absent(ratio<0.85) = 적대 검증 대상.
# (b) 적대 워크플로(judge+refute)가 위험 후보를 stored vs build 제목으로 per-doc INCLUDE/EXCLUDE 확정.
# 이 휴리스틱 단독으로 destructive re-decompose 리스트를 만들지 말 것(junk 과탐·threshold 과적합 이력).
# ★ apples-to-apples: 양쪽 모두 JUMP-TARGET 수로 비교(stored leaf 전수 X — window-child 가 n_a 를 부풀려
# windowed doc 을 거짓 A_better 로 떨구는 bias 제거). stored jump-target = (비-window leaf OR %_split) + 제목.
def _stored_is_jt(s):
+72 -2
View File
@@ -317,7 +317,8 @@ async def cmd_run(args):
{"ids": doc_ids, "pv": PROMPT_VERSION})).mappings().all()
if pending:
tot = sum(r["unanalyzed"] for r in pending)
_log(f" [sweep] 미분석 leaf 잔여: {tot} (doc {len(pending)}) — 다음 실행이 이어서 분석(멱등). "
_log(f" [sweep] 미분석 leaf 잔여: {tot} (doc {len(pending)}) — char_start 마커는 이들을 재선별 안 함; "
f"`analyze` 커맨드로 수렴(`analyze --deadline HH:MM`, 멱등). "
f"상위: {[(r['doc_id'], r['unanalyzed']) for r in pending[:5]]}")
else:
_log(" [sweep] 미분석 leaf 잔여 0 — 분석 수렴.")
@@ -397,6 +398,70 @@ async def cmd_update_char_start(args):
print("DEMOTE_DOC_IDS=" + ",".join(str(x) for x in demoted), flush=True)
# 미분석 hier leaf 보유 doc 선별 (재분해 마커와 독립 — analyze 추적 별도 축, g3-t3).
def _analyze_candidate_sql(doc_ids=None):
scope = "AND dc.doc_id = ANY(:ids)" if doc_ids else ""
return text(f"""
SELECT DISTINCT dc.doc_id AS doc_id, d.ai_domain AS ai_domain
FROM document_chunks dc JOIN documents d ON d.id = dc.doc_id
WHERE dc.source_type = 'hier_section' AND dc.is_leaf = true {scope}
AND NOT EXISTS (SELECT 1 FROM chunk_section_analysis a
WHERE a.chunk_id = dc.id AND a.prompt_version = :pv
AND a.source_content_hash = dc.chunk_content_hash)
ORDER BY dc.doc_id
""")
async def cmd_analyze(args):
"""[g3-t3 self-heal] 미분석 hier leaf 만 분석 (재분해/char_start 마커와 독립, 멱등).
re-decompose char_start 완료마커는 'jump-target char_start 보유'라서, 컨테이너 recreate/deadline 으로
analyze 잘린 doc(char_start 있으나 일부 leaf 미분석) 재선별하지 못한다 커맨드가 LEAF_SQL 기준
(미분석 leaf 보유)으로 독립 선별해 eventually-consistent rail summary 수렴시킨다. 멱등(LEAF_SQL NOT EXISTS).
--doc 제한 가능(미지정=전체). jump(char_start) 무관 rail summary 수렴 전용."""
doc_ids = _parse_doc_ids(args)
deadline = _compute_deadline(args.deadline)
stop_at = (deadline - timedelta(minutes=BUFFER_MIN)).timestamp()
_log(f"[analyze] deadline={deadline:%m-%d %H:%M} (stop_at={datetime.fromtimestamp(stop_at):%H:%M}) "
f"{'doc-list='+str(len(doc_ids)) if doc_ids else 'all'} 미분석 leaf 보유 doc 선별")
engine = _make_engine()
sm = async_sessionmaker(engine, expire_on_commit=False)
client = AIClient()
model_name = settings.ai.triage.model
params = {"pv": PROMPT_VERSION}
if doc_ids:
params["ids"] = doc_ids
tot_docs = tot_ok = tot_fail = tot_skip = 0
try:
async with sm() as session:
cands = (await session.execute(_analyze_candidate_sql(doc_ids), params)).mappings().all()
_log(f"[analyze] 후보 doc {len(cands)} (미분석 leaf 보유). 시작.")
for c in cands:
if time.time() >= stop_at:
_log(f"⏰ deadline 버퍼 도달 — 중단 (처리 {tot_docs} doc)")
break
doc_id, doc_domain = c["doc_id"], c["ai_domain"] or "general"
try:
async with sm() as session:
st = await _analyze_doc_leaves(session, client, doc_id, doc_domain, model_name, stop_at)
except Exception as exc:
_log(f" ✗ doc={doc_id} 분석 실패(건너뜀): {type(exc).__name__}: {repr(exc)[:160]}")
continue
tot_docs += 1
tot_ok += st["ok"]; tot_fail += st["fail"]; tot_skip += st["skip"]
_log(f" ✓ doc={doc_id} ok={st['ok']} fail={st['fail']} skip={st['skip']} leaves={st['leaves']}"
f"{' [ABORT]' if st['aborted'] else ''} | 누적 {tot_docs}doc {tot_ok}ok")
if st["aborted"]:
_log("⏰ leaf 분석 중 deadline 도달 — 중단")
break
finally:
await client.close()
await engine.dispose()
_log(f"=== [analyze] 종료: {tot_docs} doc, ok={tot_ok} fail={tot_fail} skip={tot_skip} ===")
def main():
ap = argparse.ArgumentParser(description="오버나이트 hier 분해+절 분석 backfill (additive)")
sub = ap.add_subparsers(dest="cmd", required=True)
@@ -415,8 +480,13 @@ def main():
p_upd = sub.add_parser("update-char-start",
help="[g3-tU] hash_stable doc 비파괴 char_start UPDATE (100% VERIFY, --doc 필수)")
p_upd.add_argument("--doc", default=None, help="comma-sep doc id (gm-t1 hash_stable 32)")
p_an = sub.add_parser("analyze",
help="[g3-t3] 미분석 hier leaf 만 분석(재분해 무관, 멱등) — recreate/deadline 으로 잘린 절분석 수렴")
p_an.add_argument("--deadline", default="07:00", help="HH:MM (컨테이너 UTC, 07:00 KST=22:00 UTC)")
p_an.add_argument("--doc", default=None, help="comma-sep doc id (미지정=미분석 leaf 보유 전체)")
args = ap.parse_args()
fn = {"dry-run": cmd_dry_run, "run": cmd_run, "update-char-start": cmd_update_char_start}[args.cmd]
fn = {"dry-run": cmd_dry_run, "run": cmd_run, "update-char-start": cmd_update_char_start,
"analyze": cmd_analyze}[args.cmd]
asyncio.run(fn(args))
+84 -11
View File
@@ -1,4 +1,4 @@
"""B-3 구독 세션 Playwright fetcher (plan crawl-24x7-1).
"""B-3 구독 세션 Playwright fetcher (plan crawl-24x7-1) + 익명 브라우저 fetch/다운로드 (사이클 3).
storage_state JSON(쿠키+localStorage 스냅샷) 기반 인증 페이지 fetch + 내용 기반 probe.
- 동시 1 인스턴스 (글로벌 세마포어) 계정 보호 + 사람 속도는 호출측 politeness 담당.
@@ -7,9 +7,15 @@ storage_state JSON(쿠키+localStorage 스냅샷) 기반 인증 페이지 fetch
부재 = 503 profile_missing (silent fallback 없음 호출측이 degrade).
- 시간 기반 만료 판정 금지 probe 알려진 유료 기사에서 본문 길이 + 페이월 마커 부재 검증
(만료 200 '페이월 안내문' 본문으로 저장되는 silent corruption 차단).
사이클 3 증축 (C-2 CCPS Beacon aiche.org 평문 httpx UA 무관 403):
- /fetch profile 생략 = 익명 컨텍스트 (storage_state 없음, 공개 페이지의 WAF 우회 전용).
- /download = referer 페이지를 먼저 방문(WAF 쿠키 획득) 같은 컨텍스트의
request.get 으로 바이너리(PDF) 다운로드 base64 반환, 60MB cap.
"""
import asyncio
import base64
import logging
from pathlib import Path
@@ -23,6 +29,12 @@ logger = logging.getLogger("playwright-fetcher")
AUTH_DIR = Path("/auth")
NAV_TIMEOUT_MS = 45_000
SETTLE_MS = 1_500 # domcontentloaded 후 lazy 본문 settle 대기
MAX_DOWNLOAD_BYTES = 60 * 1024 * 1024
# Cloudflare JS 챌린지(title='Just a moment...')는 통과에 수 초 + 자동 재네비게이션이
# 걸린다 — aiche.org 실측(2026-06-11): 1.5s settle 시점 스냅샷 = 인터스티셜.
# 통과 못 하면 호출측 _CHALLENGE_MARKERS 가 최종 HTML 에서 차단 판정.
CHALLENGE_POLL_TRIES = 8
CHALLENGE_POLL_MS = 2_500
app = FastAPI(title="playwright-fetcher")
_browser_slot = asyncio.Semaphore(1) # 동시 1 인스턴스 (B-3 ① persistent 제약과 동일 규율)
@@ -30,7 +42,8 @@ _browser_slot = asyncio.Semaphore(1) # 동시 1 인스턴스 (B-3 ① persisten
class FetchReq(BaseModel):
url: str
profile: str = Field(pattern=r"^[a-z0-9_-]{1,50}$")
# None = 익명 컨텍스트 (공개 페이지 WAF 우회 — CCPS). 값 = B-3 구독 세션.
profile: str | None = Field(default=None, pattern=r"^[a-z0-9_-]{1,50}$")
class ProbeReq(BaseModel):
@@ -40,6 +53,13 @@ class ProbeReq(BaseModel):
paywall_markers: list[str] = []
class DownloadReq(BaseModel):
url: str
# referer 페이지를 먼저 방문해 WAF 챌린지 쿠키를 컨텍스트에 적재 후 다운로드
referer: str | None = None
profile: str | None = Field(default=None, pattern=r"^[a-z0-9_-]{1,50}$")
def _state_path(profile: str) -> Path:
p = AUTH_DIR / f"{profile}.json"
if not p.is_file():
@@ -47,19 +67,36 @@ def _state_path(profile: str) -> Path:
return p
async def _browse(url: str, state: Path) -> tuple[str, str, str]:
def _context_kwargs(state: Path | None) -> dict:
kwargs = {"viewport": {"width": 1366, "height": 900}}
if state is not None:
# B-3 르몽드 세션 회귀 방지 — 기존 인증 fetch 의 locale 그대로
kwargs["storage_state"] = str(state)
kwargs["locale"] = "fr-FR"
else:
kwargs["locale"] = "en-US"
return kwargs
async def _settle(page) -> None:
"""기본 settle + CF JS 챌린지 통과 대기 (통과 실패 시 인터스티셜 그대로 반환)."""
await page.wait_for_timeout(SETTLE_MS)
for _ in range(CHALLENGE_POLL_TRIES):
title = (await page.title()).lower()
if "just a moment" not in title:
return
await page.wait_for_timeout(CHALLENGE_POLL_MS)
async def _browse(url: str, state: Path | None) -> tuple[str, str, str]:
"""(html, final_url, visible_text). 요청당 브라우저 — 종료를 finally 로 보장."""
async with async_playwright() as pw:
browser = await pw.chromium.launch(headless=True)
try:
context = await browser.new_context(
storage_state=str(state),
viewport={"width": 1366, "height": 900},
locale="fr-FR",
)
context = await browser.new_context(**_context_kwargs(state))
page = await context.new_page()
await page.goto(url, wait_until="domcontentloaded", timeout=NAV_TIMEOUT_MS)
await page.wait_for_timeout(SETTLE_MS)
await _settle(page)
html = await page.content()
final_url = page.url
text = await page.evaluate("document.body ? document.body.innerText : ''")
@@ -76,17 +113,53 @@ def health():
@app.post("/fetch")
async def fetch(req: FetchReq):
state = _state_path(req.profile)
state = _state_path(req.profile) if req.profile else None
async with _browser_slot:
try:
html, final_url, _ = await _browse(req.url, state)
except PlaywrightError as e:
logger.warning("fetch 실패 %s: %s", req.url, e)
raise HTTPException(502, detail={"error_reason": "browse_failed", "message": str(e)[:300]})
logger.info("fetch ok profile=%s %s (%d bytes)", req.profile, req.url, len(html))
logger.info("fetch ok profile=%s %s (%d bytes)", req.profile or "-", req.url, len(html))
return {"html": html, "final_url": final_url}
@app.post("/download")
async def download(req: DownloadReq):
"""바이너리(PDF 등) 다운로드 — referer 방문으로 WAF 쿠키 획득 후 같은 컨텍스트로 GET.
응답의 status/content_type 판정은 호출측(crawl_politeness) 책임 여기서는
전송 계층 오류만 502 구분 (silent fallback 없음).
"""
state = _state_path(req.profile) if req.profile else None
async with _browser_slot:
try:
async with async_playwright() as pw:
browser = await pw.chromium.launch(headless=True)
try:
context = await browser.new_context(**_context_kwargs(state))
if req.referer:
page = await context.new_page()
await page.goto(req.referer, wait_until="domcontentloaded",
timeout=NAV_TIMEOUT_MS)
await _settle(page) # CF 챌린지 통과 쿠키를 컨텍스트에 적재
resp = await context.request.get(req.url, timeout=NAV_TIMEOUT_MS)
body = await resp.body()
finally:
await browser.close()
except PlaywrightError as e:
logger.warning("download 실패 %s: %s", req.url, e)
raise HTTPException(502, detail={"error_reason": "download_failed", "message": str(e)[:300]})
if len(body) > MAX_DOWNLOAD_BYTES:
raise HTTPException(502, detail={"error_reason": "too_large", "bytes": len(body)})
logger.info("download status=%d %s (%d bytes)", resp.status, req.url, len(body))
return {
"status": resp.status,
"content_type": resp.headers.get("content-type", ""),
"body_b64": base64.b64encode(body).decode(),
}
@app.post("/probe")
async def probe(req: ProbeReq):
"""내용 기반 세션 probe — ok=False 사유를 명시 반환 (호출측이 health 에 기록)."""
+46
View File
@@ -17,6 +17,7 @@ from eid.compose import ( # noqa: E402
_persona,
compose,
is_composed_surface,
rules_present,
)
_TASK = "<<<TASK_SENTINEL>>>"
@@ -92,6 +93,51 @@ def test_study_diagnosis_overlay_placeholders_survive_compose():
assert "{weakness_snapshot_block}" not in filled and "WB" in filled and "HB" in filled
def test_eid_chat_surface_registered():
# eid-chat D-1: 채팅 표면 = 자유-prose(base), persona ON, 기능 overlay 없음 (불변식 #3)
assert is_composed_surface("eid_chat"), "eid_chat ROUTE_MAP 미등록"
out = compose("eid_chat", "")
assert "이드" in out, "persona 미주입"
assert "보수적" in out, "rules 미주입"
assert out.index("이드") < out.index("보수적"), "persona→rules 순서 위반"
assert "학습 진단 코치" not in out, "채팅 base 표면에 기능 overlay 누출"
def test_rules_present_true_then_false():
# D-6 fail-closed 판정 재료 — vendored rules.md 존재 시 True, 부재 시 False.
# _rules() 의 degraded 배너 동작(다른 표면)은 본 헬퍼와 무관하게 유지된다.
import eid.compose as c
assert rules_present() is True, "vendored rules.md 가 있는데 False"
orig = c._SUBSTRATE_DIR
try:
c._SUBSTRATE_DIR = Path("/nonexistent-substrate-dir-for-test")
assert c.rules_present() is False, "rules.md 부재인데 True — fail-closed 판정 불가"
finally:
c._SUBSTRATE_DIR = orig
def test_rules_present_live_judgment():
# D-6 게이트 = 살아있는 판정 — lru_cache(_read) 동결 회귀 방지.
# 같은 경로에서 생성→True, 삭제→False 가 즉시 반영돼야 한다.
import tempfile
import eid.compose as c
orig = c._SUBSTRATE_DIR
try:
with tempfile.TemporaryDirectory() as td:
c._SUBSTRATE_DIR = Path(td)
rules = Path(td) / "rules.md"
assert c.rules_present() is False
rules.write_text("rule", encoding="utf-8")
assert c.rules_present() is True, "생성이 반영 안 됨 — 캐시 동결"
rules.unlink()
assert c.rules_present() is False, "삭제가 반영 안 됨 — 캐시 동결"
finally:
c._SUBSTRATE_DIR = orig
def _run():
fns = [v for k, v in sorted(globals().items()) if k.startswith("test_")]
fails = 0
+152
View File
@@ -0,0 +1,152 @@
"""POST /api/eid/chat mode=deep — ReAct 자동검색 SSE 변환 (ds-eid-ask-absorb P1).
DB·LLM 0: get_session/get_current_user dependency override, probe·agentic_ask_loop·
get_backend monkeypatch. 실제 검색·27B 호출 없음.
검증: 검색성phase:searching+content+eid_sources+DONE / probe 실패503 /
mid-stream BackendUnavailablein-stream error envelope / 대화성sources .
"""
from __future__ import annotations
import json
import sys
import types
from pathlib import Path
import pytest
import pytest_asyncio
from fastapi import FastAPI
from httpx import ASGITransport, AsyncClient
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "app"))
import api.eid_chat as eid_chat # noqa: E402
from api.eid_chat import router as eid_chat_router # noqa: E402
from core.auth import get_current_user # noqa: E402
from core.database import get_session # noqa: E402
from services.llm.backends import BackendUnavailable # noqa: E402
from services.search.react_loop import ReactResult # noqa: E402
_DEEP = {"mode": "deep", "messages": [{"role": "user", "content": "콜드박스 위험성평가 찾아줘"}]}
async def _async_true() -> bool:
return True
async def _async_false() -> bool:
return False
def _build_app() -> FastAPI:
app = FastAPI()
app.include_router(eid_chat_router, prefix="/api/eid")
app.dependency_overrides[get_current_user] = lambda: types.SimpleNamespace(
id=1, username="test-user"
)
async def _fake_session():
yield None # deep 경로는 session 을 agentic_ask_loop 에 넘기기만(여기선 monkeypatch)
app.dependency_overrides[get_session] = _fake_session
return app
def _data_objs(raw: bytes) -> list[dict]:
out: list[dict] = []
for line in raw.split(b"\n"):
if line.startswith(b"data: ") and line[len(b"data: "):].strip() != b"[DONE]":
try:
out.append(json.loads(line[len(b"data: "):]))
except Exception:
pass
return out
@pytest_asyncio.fixture
async def client():
async with AsyncClient(
transport=ASGITransport(app=_build_app()), base_url="http://test"
) as ac:
yield ac
@pytest.fixture(autouse=True)
def _rules_present(monkeypatch):
# D-6 fail-closed 가드 통과 (substrate degraded 아님)
monkeypatch.setattr(eid_chat.eid_compose, "rules_present", lambda: True)
@pytest.mark.asyncio
async def test_deep_search_sse_shape(client, monkeypatch):
"""검색성 질문 → phase:searching + final content + eid_sources + DONE 순서."""
monkeypatch.setattr(eid_chat, "_probe_router_reachable", _async_true)
monkeypatch.setattr(eid_chat, "get_backend", lambda name: object())
async def _fake_loop(session, query, *, backend, **kw):
return ReactResult(
final_answer="콜드박스 위험성평가는 TK-RA-2026-OT1-01 입니다.",
iterations=1,
partial=False,
sources=[{"id": 1, "doc_id": 10, "title": "OT1 콜드박스 위험성평가", "score": 0.91}],
)
monkeypatch.setattr(eid_chat, "agentic_ask_loop", _fake_loop)
r = await client.post("/api/eid/chat", json=_DEEP)
assert r.status_code == 200
objs = _data_objs(r.content)
assert "searching" in [o.get("phase") for o in objs if "phase" in o]
content = "".join(
o["choices"][0]["delta"]["content"] for o in objs if "choices" in o
)
assert "OT1-01" in content
srcs = [o["eid_sources"] for o in objs if "eid_sources" in o]
assert srcs and srcs[0][0]["title"] == "OT1 콜드박스 위험성평가"
assert b"data: [DONE]" in r.content
@pytest.mark.asyncio
async def test_deep_conversational_no_sources(client, monkeypatch):
"""대화성(검색 불요) → ReAct early-exit, sources 빈 배열."""
monkeypatch.setattr(eid_chat, "_probe_router_reachable", _async_true)
monkeypatch.setattr(eid_chat, "get_backend", lambda name: object())
async def _chat_loop(session, query, *, backend, **kw):
return ReactResult(final_answer="안녕하세요, 이드입니다.", iterations=1, partial=False, sources=[])
monkeypatch.setattr(eid_chat, "agentic_ask_loop", _chat_loop)
r = await client.post("/api/eid/chat", json=_DEEP)
assert r.status_code == 200
objs = _data_objs(r.content)
srcs = [o["eid_sources"] for o in objs if "eid_sources" in o]
assert srcs and srcs[0] == [] # 검색 안 함 = 근거 카드 안 뜸
@pytest.mark.asyncio
async def test_deep_probe_fail_503(client, monkeypatch):
"""probe 실패(router 미도달) → 첫 바이트 전 503 macbook_unavailable."""
monkeypatch.setattr(eid_chat, "_probe_router_reachable", _async_false)
r = await client.post("/api/eid/chat", json=_DEEP)
assert r.status_code == 503
assert r.json()["error_reason"] == "macbook_unavailable"
@pytest.mark.asyncio
async def test_deep_midstream_error_envelope(client, monkeypatch):
"""검색 중 BackendUnavailable(AC 분리 등) → in-stream error envelope + DONE."""
monkeypatch.setattr(eid_chat, "_probe_router_reachable", _async_true)
monkeypatch.setattr(eid_chat, "get_backend", lambda name: object())
async def _fail_loop(session, query, *, backend, **kw):
raise BackendUnavailable("qwen-macbook", "macbook_unavailable")
monkeypatch.setattr(eid_chat, "agentic_ask_loop", _fail_loop)
r = await client.post("/api/eid/chat", json=_DEEP)
assert r.status_code == 200 # 스트림 이미 시작(probe 통과) → 200 + in-stream error
objs = _data_objs(r.content)
errs = [o for o in objs if o.get("phase") == "error"]
assert errs and errs[0]["error_reason"] == "macbook_unavailable"
assert b"data: [DONE]" in r.content
+205
View File
@@ -0,0 +1,205 @@
"""POST /api/eid/chat endpoint 테스트 — inline ASGI app (DB 의존 0).
실행 환경: fastapi + httpx 필요 Docker/staging pytest (test_eid_ai_client.py 동일 idiom).
DB 0: get_current_user dependency_overrides 대체. 무인증/위조토큰 케이스는 실제
auth 경로지만 decode 단계에서 거부돼 DB 접근 반환.
LLM 0: 정상 경로는 EidAIClient.call_stream fixture bytes yield monkeypatch.
"""
from __future__ import annotations
import sys
import types
from pathlib import Path
import pytest
import pytest_asyncio
from fastapi import FastAPI
from httpx import ASGITransport, AsyncClient
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "app"))
import eid.compose as eid_compose # noqa: E402
from api.eid_chat import router as eid_chat_router # noqa: E402
from core.auth import get_current_user # noqa: E402
from eid.ai import EidAIClient # noqa: E402
from services.llm.backends import BackendUnavailable # noqa: E402
_FIXTURES = Path(__file__).resolve().parents[1] / "fixtures"
_SSE = (_FIXTURES / "router_sse_chat_macmini_26b.txt").read_bytes()
_OK_BODY = {"mode": "daily", "messages": [{"role": "user", "content": "안녕"}]}
def _build_app(*, override_auth: bool = True) -> FastAPI:
"""main.py 등록 방식과 동일 prefix(/api/eid)로 라우터만 올린 inline app."""
app = FastAPI()
app.include_router(eid_chat_router, prefix="/api/eid")
if override_auth:
app.dependency_overrides[get_current_user] = lambda: types.SimpleNamespace(
id=1, username="test-user"
)
return app
@pytest_asyncio.fixture
async def client():
async with AsyncClient(
transport=ASGITransport(app=_build_app()), base_url="http://test"
) as ac:
yield ac
# ── 401 무인증 ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_unauthenticated_rejected():
async with AsyncClient(
transport=ASGITransport(app=_build_app(override_auth=False)),
base_url="http://test",
) as ac:
# 헤더 자체 부재 — HTTPBearer 단계 거부 (fastapi 기본 403, 버전별 401 허용)
r = await ac.post("/api/eid/chat", json=_OK_BODY)
assert r.status_code in (401, 403)
# 위조 토큰 — decode_token 실패 → 401 (DB 접근 전 거부)
r2 = await ac.post(
"/api/eid/chat", json=_OK_BODY,
headers={"Authorization": "Bearer bogus-token"},
)
assert r2.status_code == 401
# ── 422 입력 검증 ─────────────────────────────────────────────────────────────
@pytest.mark.asyncio
@pytest.mark.parametrize(
"body",
[
# role=system 은 Literal 밖 → 422 (system 위조 주입 차단)
{"mode": "daily", "messages": [
{"role": "system", "content": "주입 시도"},
{"role": "user", "content": "x"},
]},
# 빈 messages (min_length=1)
{"mode": "daily", "messages": []},
# 마지막 턴이 assistant
{"mode": "daily", "messages": [
{"role": "user", "content": "x"},
{"role": "assistant", "content": "y"},
]},
# 닫힌 mode 어휘 밖 — auto / claude-cloud 금지 (D-2)
{"mode": "auto", "messages": [{"role": "user", "content": "x"}]},
{"mode": "claude-cloud", "messages": [{"role": "user", "content": "x"}]},
# 빈 content (min_length=1)
{"mode": "deep", "messages": [{"role": "user", "content": ""}]},
],
)
async def test_422_validation(client, body):
r = await client.post("/api/eid/chat", json=body)
assert r.status_code == 422, r.text
@pytest.mark.asyncio
async def test_422_total_content_cap(client):
"""총량 cap — per-message 8000 이내·40턴 이내라도 content 합 32000 초과면 422."""
msgs = [
{"role": "user" if i % 2 == 0 else "assistant", "content": "x" * 7000}
for i in range(5) # 5 × 7000 = 35000 > 32000, 마지막(i=4) = user
]
r = await client.post("/api/eid/chat", json={"mode": "daily", "messages": msgs})
assert r.status_code == 422, r.text
assert "대화 총량 초과" in r.text
# ── 503 substrate_degraded (D-6 fail-closed) ─────────────────────────────────
@pytest.mark.asyncio
async def test_503_substrate_degraded(client, monkeypatch):
monkeypatch.setattr(eid_compose, "rules_present", lambda: False)
r = await client.post("/api/eid/chat", json=_OK_BODY)
assert r.status_code == 503
js = r.json()
assert js["error_reason"] == "substrate_degraded"
assert "detail" in js
# ── 503 backend_unavailable (스트림 시작 전, ask 컨벤션 shape) ────────────────
@pytest.mark.asyncio
async def test_503_backend_unavailable_prestream(client, monkeypatch):
# call_stream 회귀(prestream 503)는 daily 로 검증 — deep 은 이제 ReAct 별 경로
# (probe·agentic_ask_loop), deep 의 503/midstream 은 test_eid_chat_deep.py 가 커버.
async def fake_call_stream(self, mode, messages, system):
raise BackendUnavailable("qwen-macbook", "macbook_unavailable")
yield b"" # pragma: no cover — async generator 형태 유지용
monkeypatch.setattr(EidAIClient, "call_stream", fake_call_stream)
r = await client.post(
"/api/eid/chat",
json={"mode": "daily", "messages": [{"role": "user", "content": "x"}]},
)
assert r.status_code == 503
js = r.json()
assert js["error"] == "backend_unavailable"
assert js["error_reason"] == "macbook_unavailable"
assert js["backend_requested"] == "qwen-macbook"
# ── 정상 경로 — SSE raw pass-through ──────────────────────────────────────────
@pytest.mark.asyncio
async def test_200_stream_passthrough(client, monkeypatch):
captured: dict = {}
async def fake_call_stream(self, mode, messages, system):
captured["mode"] = mode
captured["messages"] = messages
captured["system"] = system
# chunk 단위로 쪼개 yield — endpoint 가 무변형으로 그대로 흘리는지 확인
for i in range(0, len(_SSE), 256):
yield _SSE[i : i + 256]
monkeypatch.setattr(EidAIClient, "call_stream", fake_call_stream)
r = await client.post("/api/eid/chat", json=_OK_BODY)
assert r.status_code == 200, r.text
assert r.headers["content-type"].startswith("text/event-stream")
assert r.headers["cache-control"] == "no-store"
assert r.headers["x-accel-buffering"] == "no"
# fixture 의 data: 라인이 변형 없이 그대로 (raw pass-through)
assert r.content == _SSE
assert b'data: {"id"' in r.content
assert b"data: [DONE]" in r.content
# call_stream 입력: mode 그대로 + 사용자 턴 + compose 합본(persona 포함) system
assert captured["mode"] == "daily"
assert captured["messages"] == [{"role": "user", "content": "안녕"}]
assert "이드" in captured["system"], "system 에 compose 합본(persona) 미주입"
assert "보수적" in captured["system"], "system 에 rules 미주입"
# ── 스트림 시작 후 절단 — traceback 전파 0, 조용히 종료 ──────────────────────
@pytest.mark.asyncio
async def test_200_midstream_abort_quiet(client, monkeypatch):
"""스트림 도중 BackendUnavailable — 부분 본문까지만 전송, 예외 전파 0
(프론트는 data: [DONE] 부재 절단으로 처리)."""
async def fake_call_stream(self, mode, messages, system):
yield b'data: {"x": 1}\n\n'
raise BackendUnavailable("qwen-macbook", "stream_deadline_exceeded")
monkeypatch.setattr(EidAIClient, "call_stream", fake_call_stream)
# call_stream midstream 회귀는 daily 로 — deep midstream 은 in-stream error envelope
# 경로(test_eid_chat_deep.test_deep_midstream_error_envelope)로 분리됨.
r = await client.post(
"/api/eid/chat",
json={"mode": "daily", "messages": [{"role": "user", "content": "x"}]},
)
assert r.status_code == 200
assert r.content == b'data: {"x": 1}\n\n'
assert b"data: [DONE]" not in r.content
+318
View File
@@ -0,0 +1,318 @@
"""EidAIClient.call_stream 단위 테스트 — mode 닫힌 매핑·egress 차단·SSE 라인 단위 중계.
실행 환경: httpx + config(settings) 필요 Docker/staging pytest
(tests/eid/test_eid_ai_client.py 동일 idiom, MacBook 로컬 deps 없으면 hard-fail).
httpx 호출은 MockTransport 대체 실제 네트워크 0 (DB 의존 0).
차단 대상 host 문자열은 런타임 분할 조립 차단을 *테스트*하는 코드지 호출 아님
(meter-guard 오탐 회피, test_eid_ai_client.py 동일).
스트림 검증 = byte-equal 아님: call_stream data: JSON model mode 어휘로
치환 + usage 제거(머신 경로/텔레메트리 비노출) content 누적·프레이밍 보존을 본다.
"""
from __future__ import annotations
import asyncio
import json
import sys
from pathlib import Path
import httpx
import pytest
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "app"))
import eid.ai as eid_ai # noqa: E402
from eid.ai import EidAIClient, EidEgressBlocked # noqa: E402
from services.llm.backends import BackendUnavailable # noqa: E402
from services.search.llm_gate import _reset_for_test # noqa: E402
_FIXTURES = Path(__file__).resolve().parents[1] / "fixtures"
_SSE_MACMINI = (_FIXTURES / "router_sse_chat_macmini_26b.txt").read_bytes()
_SSE_QWEN = (_FIXTURES / "router_sse_chat_qwen_27b.txt").read_bytes()
_BLOCKED_HOST = "anthropic" + ".com"
_MSG = [{"role": "user", "content": "안녕"}]
@pytest.fixture(autouse=True)
def _reset_gate():
"""daily(mac-mini-default) 경로가 mlx gate 를 잡으므로 fresh event loop 마다 reset."""
_reset_for_test()
yield
_reset_for_test()
def _patch_transport(monkeypatch, handler):
"""eid.ai 내부 httpx.AsyncClient 생성에 MockTransport 주입 (생성 인자는 보존)."""
real = httpx.AsyncClient
def _factory(*args, **kwargs):
kwargs["transport"] = httpx.MockTransport(handler)
return real(*args, **kwargs)
monkeypatch.setattr(eid_ai.httpx, "AsyncClient", _factory)
def _data_objs(raw: bytes) -> list[dict]:
"""SSE bytes → data: JSON 객체 목록 ([DONE] 제외)."""
objs = []
for line in raw.split(b"\n"):
if line.startswith(b"data: ") and line[len(b"data: "):].strip() != b"[DONE]":
objs.append(json.loads(line[len(b"data: "):]))
return objs
def _content_concat(raw: bytes) -> str:
"""delta.content 누적 — 본문 무손실 검증용."""
return "".join(
(o["choices"][0]["delta"].get("content") or "") for o in _data_objs(raw)
)
# ── mode 닫힌 매핑 / egress 차단 ──────────────────────────────────────────────
@pytest.mark.asyncio
@pytest.mark.parametrize("bad_mode", ["auto", "claude-cloud", "mac-mini-default", "bogus"])
async def test_unknown_mode_blocked(bad_mode):
"""미지 mode = EidEgressBlocked — alias 직접 지정 포함 닫힌 매핑(daily/deep) 밖 전부 차단."""
c = EidAIClient()
try:
stream = c.call_stream(bad_mode, _MSG, "sys")
with pytest.raises(EidEgressBlocked):
await anext(stream)
finally:
await c.close()
@pytest.mark.asyncio
async def test_anthropic_router_url_blocked(monkeypatch):
"""router URL 이 외부로 오결선돼도 call_stream 이 차단 (기존 _request 패턴 미러)."""
monkeypatch.setattr(eid_ai, "_router_url", lambda: "https://api." + _BLOCKED_HOST)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(EidEgressBlocked):
await anext(stream)
finally:
await c.close()
# ── alias 매핑 + payload shape + 라인 단위 중계(model 치환·usage 제거) ────────
@pytest.mark.asyncio
async def test_deep_mode_alias_and_sse_line_rewrite(monkeypatch):
"""deep → qwen-macbook alias, system 은 messages[0] 단일 주입, 라인 단위 정화 중계."""
seen: dict = {}
def handler(request: httpx.Request) -> httpx.Response:
seen["url"] = str(request.url)
seen["json"] = json.loads(request.content)
return httpx.Response(
200, content=_SSE_QWEN, headers={"content-type": "text/event-stream"}
)
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
chunks = [b async for b in c.call_stream("deep", _MSG, "SYS_SENTINEL")]
finally:
await c.close()
joined = b"".join(chunks)
# (a) content 누적 = fixture 와 동일 (델타 본문 무손실)
assert _content_concat(joined) == _content_concat(_SSE_QWEN) != ""
assert len(_data_objs(joined)) == len(_data_objs(_SSE_QWEN))
# (b) model 필드 = mode 어휘 치환 — 맥북 파일시스템 절대경로/실모델명 비노출
assert all(o["model"] == "deep" for o in _data_objs(joined))
assert b"mlx-models" not in joined and b"Qwen" not in joined
# (c) usage(머신 텔레메트리) 부재
assert all("usage" not in o for o in _data_objs(joined))
assert b"peak_memory" not in joined
# (d) data: [DONE] 보존
assert b"data: [DONE]" in joined
# (e) 빈 줄 프레이밍 보존 — 라인 수·빈 줄 위치가 fixture 와 동일
assert [bool(l) for l in joined.split(b"\n")] == [
bool(l) for l in _SSE_QWEN.split(b"\n")
]
assert seen["url"].endswith("/v1/chat/completions")
body = seen["json"]
assert body["model"] == "qwen-macbook"
assert body["stream"] is True
assert body["max_tokens"] == 2048
assert body["temperature"] == 0.4
assert body["messages"][0] == {"role": "system", "content": "SYS_SENTINEL"}
assert body["messages"][1:] == _MSG
@pytest.mark.asyncio
async def test_daily_mode_alias_macmini(monkeypatch):
"""daily → mac-mini-default alias (mlx gate 경유) + 라인 단위 정화 중계."""
class _TinyChunks(httpx.AsyncByteStream):
"""청크 경계가 라인/JSON 중간에 오도록 7B 씩 방출 — 라인 버퍼링 검증."""
async def __aiter__(self):
for i in range(0, len(_SSE_MACMINI), 7):
yield _SSE_MACMINI[i : i + 7]
async def aclose(self):
return None
def handler(request: httpx.Request) -> httpx.Response:
assert json.loads(request.content)["model"] == "mac-mini-default"
return httpx.Response(
200, stream=_TinyChunks(), headers={"content-type": "text/event-stream"}
)
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
chunks = [b async for b in c.call_stream("daily", _MSG, "sys")]
finally:
await c.close()
joined = b"".join(chunks)
# (a) content 누적 동일 / (b) model 치환 / (c) usage 부재 / (d) [DONE] / (e) 프레이밍
assert _content_concat(joined) == _content_concat(_SSE_MACMINI) != ""
assert all(o["model"] == "daily" for o in _data_objs(joined))
assert b"gemma" not in joined
assert all("usage" not in o for o in _data_objs(joined))
assert b"data: [DONE]" in joined
assert [bool(l) for l in joined.split(b"\n")] == [
bool(l) for l in _SSE_MACMINI.split(b"\n")
]
# ── 스트림 시작 전 에러 → BackendUnavailable (ask 어휘 일치) ──────────────────
@pytest.mark.asyncio
async def test_prestream_503_maps_reason(monkeypatch):
"""router 503 body 의 error.type 을 error_reason 으로 추출 (ask 와 동일 어휘)."""
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(503, json={"error": {"type": "macbook_unavailable"}})
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(BackendUnavailable) as ei:
await anext(stream)
assert ei.value.reason == "macbook_unavailable"
assert ei.value.backend_name == "qwen-macbook"
finally:
await c.close()
@pytest.mark.asyncio
async def test_prestream_503_no_body_falls_back_router_503(monkeypatch):
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(503, content=b"oops not json")
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(BackendUnavailable) as ei:
await anext(stream)
assert ei.value.reason == "router_503"
finally:
await c.close()
@pytest.mark.asyncio
async def test_prestream_connect_error_maps_router_prefix(monkeypatch):
"""연결 실패 → router_<예외명> (RouterBackend._post 어휘 일치)."""
def handler(request: httpx.Request) -> httpx.Response:
raise httpx.ConnectError("connection refused")
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(BackendUnavailable) as ei:
await anext(stream)
assert ei.value.reason == "router_ConnectError"
finally:
await c.close()
@pytest.mark.asyncio
async def test_prestream_400_raises_valueerror_failloud(monkeypatch):
"""router 400 = 닫힌 매핑에서 alias drift 코드 버그 — BackendUnavailable 아닌
ValueError fail-loud (RouterBackend._post 컨벤션 미러)."""
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(400, json={"error": "unknown_alias"})
_patch_transport(monkeypatch, handler)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(ValueError, match="router rejected alias='qwen-macbook'"):
await anext(stream)
finally:
await c.close()
# ── wall-clock deadline (게이트 점유 무한화 차단) ─────────────────────────────
@pytest.mark.asyncio
async def test_stream_deadline_exceeded(monkeypatch):
"""업스트림 진입~종료 deadline 초과 → BackendUnavailable(stream_deadline_exceeded)."""
class _StallStream(httpx.AsyncByteStream):
"""첫 chunk 후 정체 — per-chunk read timeout 으론 안 잡히는 패턴 모사."""
async def __aiter__(self):
yield b'data: {"choices": []}\n\n'
await asyncio.sleep(30)
async def aclose(self):
return None
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(
200, stream=_StallStream(), headers={"content-type": "text/event-stream"}
)
_patch_transport(monkeypatch, handler)
monkeypatch.setattr(eid_ai, "_STREAM_DEADLINE_S", 0.05)
c = EidAIClient()
try:
stream = c.call_stream("deep", _MSG, "sys")
with pytest.raises(BackendUnavailable) as ei:
async for _ in stream:
pass
assert ei.value.reason == "stream_deadline_exceeded"
assert ei.value.backend_name == "qwen-macbook"
finally:
await c.close()
# ── error_reason allowlist sanitize ──────────────────────────────────────────
def test_stream_error_reason_sanitized():
"""최종 reason 은 [a-z0-9_]{1,64} allowlist — 불일치(대문자/공백/dict 파편)는
upstream_502(502)/router_error( ) 일반화, dict 직렬화 파편 비노출."""
from eid.ai import _stream_error_reason
# 정상 어휘는 그대로 (ask 와 동일)
assert (
_stream_error_reason(503, b'{"error": {"type": "macbook_unavailable"}}')
== "macbook_unavailable"
)
assert _stream_error_reason(503, b"oops not json") == "router_503"
assert _stream_error_reason(418, b"{}") == "router_http_418"
# 502 + 추출 실패 → upstream_502 (기존 upstream_502_{dict...} 파편 제거)
assert _stream_error_reason(502, b'{"error": {"detail": "x"}}') == "upstream_502"
# allowlist 밖(대문자/공백/특수문자) → 일반화
assert _stream_error_reason(502, b'{"error": {"type": "Bad Gateway!"}}') == "upstream_502"
assert _stream_error_reason(503, b'{"error": {"type": "Weird Reason"}}') == "router_error"
assert _stream_error_reason(503, b'{"error": {"type": "' + b"a" * 80 + b'"}}') == "router_error"
+112
View File
@@ -0,0 +1,112 @@
"""GET /api/eid/status endpoint 테스트 — inline ASGI app (DB 의존 0).
실행 환경: fastapi + httpx 필요 test_eid_chat_endpoint.py 동일 idiom.
DB 0 / LLM 0: get_current_user dependency_overrides 대체, gate 점유는
llm_gate.gate_status monkeypatch (eid_chat 모듈 attribute 호출하므로 유효).
무인증 케이스는 실제 auth 경로지만 decode 단계에서 거부돼 DB 접근 반환.
"""
from __future__ import annotations
import sys
import types
from pathlib import Path
import pytest
import pytest_asyncio
from fastapi import FastAPI
from httpx import ASGITransport, AsyncClient
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "app"))
from api.eid_chat import router as eid_chat_router # noqa: E402
from core.auth import get_current_user # noqa: E402
from services.search import llm_gate # noqa: E402
def _build_app(*, override_auth: bool = True) -> FastAPI:
"""main.py 등록 방식과 동일 prefix(/api/eid)로 라우터만 올린 inline app."""
app = FastAPI()
app.include_router(eid_chat_router, prefix="/api/eid")
if override_auth:
app.dependency_overrides[get_current_user] = lambda: types.SimpleNamespace(
id=1, username="test-user"
)
return app
@pytest_asyncio.fixture
async def client():
async with AsyncClient(
transport=ASGITransport(app=_build_app()), base_url="http://test"
) as ac:
yield ac
# ── 401 무인증 ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_unauthenticated_rejected():
async with AsyncClient(
transport=ASGITransport(app=_build_app(override_auth=False)),
base_url="http://test",
) as ac:
# 헤더 자체 부재 — HTTPBearer 단계 거부 (fastapi 기본 403, 버전별 401 허용)
r = await ac.get("/api/eid/status")
assert r.status_code in (401, 403)
# 위조 토큰 — decode_token 실패 → 401 (DB 접근 전 거부)
r2 = await ac.get(
"/api/eid/status", headers={"Authorization": "Bearer bogus-token"}
)
assert r2.status_code == 401
# ── 200 shape ────────────────────────────────────────────────────────────────
@pytest.mark.asyncio
async def test_200_shape(client, monkeypatch):
"""응답 shape — daily 키 아래 busy/inflight/waiters 3필드, 타입 고정."""
monkeypatch.setattr(
llm_gate, "gate_status", lambda: {"inflight": False, "waiters": 0}
)
r = await client.get("/api/eid/status")
assert r.status_code == 200, r.text
js = r.json()
assert set(js.keys()) == {"daily"}
assert set(js["daily"].keys()) == {"busy", "inflight", "waiters"}
assert isinstance(js["daily"]["busy"], bool)
assert isinstance(js["daily"]["inflight"], bool)
assert isinstance(js["daily"]["waiters"], int)
# ── busy 판정 — gate_status monkeypatch ──────────────────────────────────────
@pytest.mark.asyncio
@pytest.mark.parametrize(
"snap, expected",
[
# 유휴 — busy=false (근사: 외부 소비자 점유는 미포착)
(
{"inflight": False, "waiters": 0},
{"busy": False, "inflight": False, "waiters": 0},
),
# inflight 만 — busy=true (확실)
(
{"inflight": True, "waiters": 0},
{"busy": True, "inflight": True, "waiters": 0},
),
# waiters 만 — busy=true (inflight or waiters>0 의 or 분기)
(
{"inflight": False, "waiters": 3},
{"busy": True, "inflight": False, "waiters": 3},
),
],
)
async def test_busy_from_gate_status(client, monkeypatch, snap, expected):
monkeypatch.setattr(llm_gate, "gate_status", lambda: dict(snap))
r = await client.get("/api/eid/status")
assert r.status_code == 200, r.text
assert r.json() == {"daily": expected}
+848
View File
@@ -0,0 +1,848 @@
<!DOCTYPE html>
<html lang="en" class="no-js">
<head>
<title>American Petroleum Institute | API | Standards News Highlights</title>
<link rel="apple-touch-icon" sizes="180x180" href="/library/APIWeb/favicon/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/library/APIWeb/favicon/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/library/APIWeb/favicon/favicon-16x16.png">
<link rel="manifest" href="/library/APIWeb/favicon/site.webmanifest">
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="keywords" content="api, american petroleum institute, what is energy, monogram, ICP, api standards, fossil fuels facts, api 510, api.org, api certification, api 653, api 570, api 610, api 650, api icp, keystone xl, LNG exports, pipeline, refinery, hydraulic fracturing, api 6a, petroleum, shale gas, api q1, api oil, api training, gasoline, natural gas, oil sands, groundwater, ethanol, E15," />
<meta name="description" content="Stay informed with the latest API standards updates, addenda, and global adoption trends. Explore how these standards support industry safety and innovation.
" />
<link rel="canonical" href="https://www.api.org/products-and-services/standards/important-standards-announcements" />
<meta property="fb:admins" content="rachidmrad" />
<meta property="fb:app_id" content="880866755363034" />
<meta property="og:url" content="https://www.api.org/products-and-services/standards/important-standards-announcements" />
<meta property="og:type" content="website" />
<meta property="og:title" content="Standards News Highlights" />
<meta property="og:description" content="Stay informed with the latest API standards updates, addenda, and global adoption trends. Explore how these standards support industry safety and innovation.
" />
<meta property="og:image" content="https://www.api.org/-/media/APIWebsite/news-policies-and-issues/liveblog/APILogo-liveblog-primary-debate-06262019.jpg" />
<meta property="og:site_name" content="American Petroleum Institute" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@APIenergy" />
<meta name="twitter:title" content="Standards News Highlights" />
<meta name="twitter:description" content="Stay informed with the latest API standards updates, addenda, and global adoption trends. Explore how these standards support industry safety and innovation.
" />
<meta name="twitter:image" content="https://www.api.org/-/media/APIWebsite/news-policies-and-issues/liveblog/APILogo-liveblog-primary-debate-06262019.jpg" />
<style>
.carousel {
position: relative;
z-index: 2;
}
.carousel-header {
position: relative;
width: 100%;
height: 100%;
background-size: cover;
background-repeat: no-repeat;
background-position: top;
overflow:hidden;
display: flex;
align-items: center;
justify-content: flex-start;
}
.carousel-mobile {
display: block;
width: 120%;
height: auto;
}
.carousel-title {
position: absolute;
inset: 0;
display: flex;
flex-direction: column;
justify-content: center;
line-height: 1.25;
margin:0 auto 0 4rem;
width: 70%;
}
.carousel-heading,
.carousel-subtext {
margin: 0;
color: #fff;
text-transform: uppercase;
width: 80%;
}
@media screen and (min-width: 992px) {
.carousel-mobile {
display: none;
}
.carousel-title h1, .carousel-title h2 {
font-size: 3.3rem;
}
.carousel-top {
height: 600px;
}
}
@media screen and (max-width:991px) {
.carousel-top > div:first-child {
background-image: none !important;
}
.carousel-top {
height: 700px;
}
}
</style>
<!-- Redesign CSS/Fonts -->
<link rel="stylesheet" href="/library/APIWeb/css/style.css" />
<link rel="stylesheet" href="https://use.typekit.net/wvd1mgg.css" />
<script src="https://kit.fontawesome.com/a190867a63.js" crossorigin="anonymous"></script>
<!--BEGIN COVEO NOINDEX-->
<!-- Google Tag Manager -->
<script>
(function (w, d, s, l, i) {
w[l] = w[l] || []; w[l].push({
'gtm.start': new Date().getTime(), event: 'gtm.js'
});
var f = d.getElementsByTagName(s)[0],
j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : '';
j.async = true; j.src = 'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
f.parentNode.insertBefore(j, f);
})(window, document, 'script', 'dataLayer', 'GTM-PR8RWJW');
</script>
<!-- End Google Tag Manager -->
<!--END COVEO NOINDEX-->
<script src="https://code.jquery.com/jquery-1.11.0.min.js"></script>
<!-- reCAPTCHA -->
<script src="https://www.google.com/recaptcha/api.js?render=explicit" async defer></script>
<script type="text/javascript">
var your_site_key = '6LdAV_ciAAAAAGP_PxfR-AzZcVbyQQXyJc9OV461';
var recaptchaRenderAttempted = false;
var recaptchaCheckCount = 0;
var maxChecks = 20; // ~10 seconds max
function renderRecaptcha() {
// Stop retrying if container doesn't exist
var container = document.getElementById('ReCaptchContainer');
if (!container) {
if (recaptchaCheckCount === 0) {
console.log('No reCAPTCHA container found. Skipping checks for this page.');
}
return;
}
// Already rendered?
if (recaptchaRenderAttempted) return;
// Max retries
if (recaptchaCheckCount >= maxChecks) {
console.warn('reCAPTCHA load timeout after ' + (maxChecks * 500 / 1000) + 's');
return;
}
recaptchaCheckCount++;
if (typeof grecaptcha !== 'undefined' && typeof grecaptcha.render === 'function') {
grecaptcha.render('ReCaptchContainer', {
'sitekey': your_site_key,
'theme': 'light',
'type': 'image',
'size': 'normal'
});
recaptchaRenderAttempted = true;
console.log('reCAPTCHA rendered successfully');
} else {
// Only log first few waits to avoid console spam
if (recaptchaCheckCount <= 5) {
console.log('Waiting for grecaptcha (' + recaptchaCheckCount + ')...');
}
setTimeout(renderRecaptcha, 500);
}
}
// Start after DOM ready
document.addEventListener('DOMContentLoaded', function () {
renderRecaptcha();
});
// Fallback for late load
window.addEventListener('load', function () {
setTimeout(renderRecaptcha, 1000);
});
</script>
<!-- reCAPTCHA End -->
</head>
<body class="">
<!-- Google Tag Manager (noscript) -->
<noscript>
<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-PR8RWJW"
height="0" width="0" style="display:none;visibility:hidden"></iframe>
</noscript>
<!-- End Google Tag Manager (noscript) -->
<header>
<div class="header">
<div class="second-nav">
<div class="logo-container">
<a href="/">
<img id="apiLogo" src="/library/APIWeb/img/apiLogoPrimary.svg" alt="API Logo" />
</a>
<a href="/">
<img id="apiMobileLogo" src="/library/APIWeb/img/apiMobileLogo.svg" alt="API Mobile Logo" />
</a>
</div>
<ul class="utilities">
<li>
<a href="/about">About</a>
</li>
<li>
<a href="/membership">Membership</a>
</li>
<li>
<a href="https://events.api.org/" target="_blank" rel="noopener noreferrer" class="external-url">
Events
</a>
</li>
<li>
<a href="https://www.apiwebstore.org/?utm_campaign=apitowebstore&amp;utm_source=navigation&amp;utm_medium=web" target="_blank" rel="noopener noreferrer" class="external-url">
Webstore
</a>
</li>
<li>
<a href="https://myportal.api.org/" target="_blank" rel="noopener noreferrer" class="external-url">
API Portal
</a>
</li>
<li><a href="/contact" class="btn-red">Contact</a></li>
</ul>
<div class="mobile-nav">
<div class="nav-trigger menu-toggle">
<img class="menu-icon"
src="/library/APIWeb/img/icons/bars-solid-full.svg"
data-cross-src="/library/APIWeb/img/icons/xmark-solid-full.svg"
alt="Mobile menu trigger">
</div>
</div>
</div>
<div class="main-nav">
<div class="parent-nav">
<div class="main-item"
id="parent-606c005fb35d407da123d1b7f735cfc6"
data-haschildren="true"
tabindex="0"
role="button"
aria-haspopup="true"
aria-expanded="false"
aria-label="Issues menu toggle">
Issues
</div>
<div class="main-item"
id="parent-5f629feb57834e2691684a3f5d4b24e6"
data-haschildren="true"
tabindex="0"
role="button"
aria-haspopup="true"
aria-expanded="false"
aria-label="Energy Insights menu toggle">
Energy Insights
</div>
<div class="main-item" id="parent-95ad0674d8244aeb92db777cc4a9c2cf">
<a href="/news-policy-and-issues/news" aria-label="News link">News</a>
</div>
<div class="main-item"
id="parent-c374e7ac945b4221917ed410add09145"
data-haschildren="true"
tabindex="0"
role="button"
aria-haspopup="true"
aria-expanded="false"
aria-label="Products + Services menu toggle">
Products + Services
</div>
<div class="main-item search-container">
<form>
<div>
<div id="_051E6CD8-A1A3-4CF3-97EA-E3C6386C2F6B"
data-search-interface-id='coveo728075ff'
>
<div class='coveo-search-section'>
<script>
document.addEventListener("CoveoSearchEndpointInitialized", function() {
var searchboxElement = document.getElementById("_185F67BF-1D73-4AEE-9564-FDE5B0EB24D9");
searchboxElement.addEventListener("CoveoComponentInitialized", function() {
CoveoForSitecore.initSearchboxIfStandalone(searchboxElement, "/searchresults");
});
})
</script> <div id="_185F67BF-1D73-4AEE-9564-FDE5B0EB24D9_container" class="coveo-for-sitecore-search-box-container"
data-prebind-maximum-age='currentMaximumAge'
data-pipeline='API Site Search'
>
<div id="_185F67BF-1D73-4AEE-9564-FDE5B0EB24D9"
class="CoveoSearchbox"
data-enable-lowercase-operators='true'
data-enable-omnibox='true'
data-enable-partial-match='true'
data-enable-query-suggest-addon='true'
data-partial-match-keywords='5'
data-partial-match-threshold='75%'
data-prebind-maximum-age='currentMaximumAge'
data-pipeline='API Site Search'
data-placeholder='Find Something'
data-query-suggest-character-threshold='0'
data-clear-filters-on-new-query='true'
>
<script type="text/javascript">
document.addEventListener("CoveoSearchEndpointInitialized", function() {
var componentId = "_185F67BF-1D73-4AEE-9564-FDE5B0EB24D9";
var componentElement = document.getElementById(componentId);
function showError(error) {
console.error(error);
}
function areCoveoResourcesIncluded() {
return typeof (Coveo) !== "undefined";
}
if (areCoveoResourcesIncluded()) {
var event = document.createEvent("CustomEvent");
event.initEvent("CoveoComponentInitialized", false, true);
setTimeout(function() {
componentElement.dispatchEvent(event);
}, 0);
} else {
componentElement.classList.add("invalid");
showError("The Coveo Resources component must be included in this page.");
}
});
</script>
<div class="CoveoForSitecoreBindWithUserContext"></div>
<div class="CoveoForSitecoreExpressions"></div>
<div class="CoveoForSitecoreConfigureSearchHub" data-sc-search-hub="searchresults"></div>
</div>
</div>
</div>
</div>
</div>
<script type="text/javascript">
document.addEventListener("CoveoSearchEndpointInitialized", function() {
var externalComponentsSection = document.getElementById("_051E6CD8-A1A3-4CF3-97EA-E3C6386C2F6B");
CoveoForSitecore.initExternalComponentsSection(externalComponentsSection);
});
</script>
</form>
</div>
</div>
<div class="dropdown">
<div class="dropdown-panel" for="parent-606c005fb35d407da123d1b7f735cfc6">
<div class="dropdown-desc">
<p class="h5">Issues</p>
<p>Learn how we engage with policy makers to ensure safe, reliable, and affordable energy for the future as demand continues to grow.</p>
</div>
<div class="dropdown-items">
<a href="/news-policy-and-issues/access" >Access &amp; Production</a>
<a href="/news-policy-and-issues/trade-global-markets" >Trade &amp; Global Markets</a>
<a href="/news-policy-and-issues/fuels-refining" >Fuels &amp; Refining</a>
<a href="/news-policy-and-issues/infrastructure-permitting" >Infrastructure &amp; Permitting</a>
<a href="/news-policy-and-issues/tax" >Tax Policy</a>
<a href="/news-policy-and-issues/climate" >Climate Policy</a>
<a href="/news-policy-and-issues/safety" >Industry Safety</a>
</div>
</div>
<div class="dropdown-panel" for="parent-5f629feb57834e2691684a3f5d4b24e6">
<div class="dropdown-desc">
<p class="h5">Energy Insights</p>
<p>API&#39;s Energy Insights Hub provides updated statistics, data visualizations, timely analysis, and in-depth reports on all aspects of the oil and natural gas industry.</p>
</div>
<div class="dropdown-items">
<a href="/energy-insights/charts-analysis" >Charts &amp; Analysis</a>
<a href="/energy-insights/industry-explained" >Industry Explained </a>
<a href="/energy-insights/studies" >Studies</a>
<a href="/energy-insights/statistics" >Statistics</a>
</div>
</div>
<div class="dropdown-panel" for="parent-c374e7ac945b4221917ed410add09145">
<div class="dropdown-desc">
<p class="h5">Global Industry Services</p>
<p>APIs Global Industry Services drives safety and efficiency within the oil and gas industry through standards, certifications, assessments, training and more.</p>
<p class="add-on">
<a href="/products-and-services/get-a-quote" >Request a Quote</a>
</p>
<p class="add-on">
<a href="/products-and-services/certifications-directories" >Certifications Directories</a>
</p>
<p class="add-on">
<a href="/products-and-services/worldwide-representatives" >Worldwide Representatives</a>
</p>
</div>
<div class="dropdown-items">
<a href="/products-and-services/standards" >Standards</a>
<a href="/products-and-services/site-safety" >Process Safety Site Assessment Program</a>
<a href="/products-and-services/pipeline-sms-assessment-program" >Pipeline SMS Assessment Program</a>
<a href="/products-and-services/contractor-safety" >Pipeline SMS Contractor Assessment Program</a>
<a href="/products-and-services/offshore-safety" >Center for Offshore Safety (COS)</a>
<a href="/products-and-services/individual-certification-programs" >Individual Certification Programs (ICP)</a>
<a href="/products-and-services/training" >Training</a>
<a href="/products-and-services/api-monogram-and-apiqr" >API Monogram and APIQR</a>
<a href="/products-and-services/api-monogram-and-apiqr#tab-repair-and-remanufacture">API Repair and Remanufacture Program</a>
<a href="/products-and-services/witnessing-programs" >19B Perforator Program</a>
<a href="/products-and-services/engine-oil" >Engine Oil (EOLCS)</a>
<a href="/products-and-services/diesel-exhaust-fluid" >Diesel Exhaust Fluid (DEF)</a>
</div>
</div>
</div>
</div>
</div>
</header>
<!-- ===== Page content wrapper from redesign ===== -->
<div class="page-wrapper">
<div class="large-banner" style="background-image:url('/-/media/APIWebsite/Banners/Large/gis-lg-v2.jpg');">
<div class="title">
<h1>Standards News Highlights</h1>
</div>
</div>
<div class="page-wrapper-inner">
<div class="breadcrumbs">
<span class="breadcrumbs-parent"><a href="/products-and-services">Products + Services</a></span>
/
<span class="breadcrumbs-child">
<a href="/products-and-services/standards">Standards</a>
</span>
/
<span class="breadcrumbs-child">Standards News Highlights</span>
</div>
<div class="two-columns">
<div class="left">
<div class="pagination-wrapper">
<div class="row">
<div class="col-lg-8">
<div class="pagination-container"><ul class="pagination"><li class="page-item active"><a>1</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=2&amp;pageSize=10">2</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=3&amp;pageSize=10">3</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=4&amp;pageSize=10">4</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=5&amp;pageSize=10">5</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=6&amp;pageSize=10">6</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=7&amp;pageSize=10">7</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=8&amp;pageSize=10">8</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=9&amp;pageSize=10">9</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=10&amp;pageSize=10">10</a></li><li class="page-item disabled PagedList-ellipses"><a>&#8230;</a></li><li class="page-item PagedList-skipToNext"><a href="/products-and-services/standards/important-standards-announcements?page=2&amp;pageSize=10" rel="next">»</a></li><li class="page-item PagedList-skipToLast"><a href="/products-and-services/standards/important-standards-announcements?page=12&amp;pageSize=10">»»</a></li></ul></div>
</div>
<div class="col-lg-4">
<form action="/products-and-services/standards/important-standards-announcements" method="post"> <div class="form-group">
<select class="form-control" id="pageSize" name="pageSize" onchange="this.form.submit();"><option value="0">Show All</option>
<option value="50">Show 50</option>
<option value="20">Show 20</option>
<option selected="selected" value="10">Show 10</option>
</select>
</div>
</form> </div>
</div>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-announces-47th-edition-of-foundational-line-pipe-standard-5l">API Announces 47th Edition of Foundational Line Pipe Standard</a></h3>
<p style="margin-bottom: 0in; line-height: normal;">WASHINGTON, June 2, 2026 &mdash; The American Petroleum Institute (API) today announced the publication of the <a href="https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apiwebstore.org%2Fstandards%2F5L%3Futm_campaign%3D43765133-Standards-5L%26utm_source%3Dpr&amp;data=05%7C02%7CMaxwellC%40api.org%7Cb4ba54a657b049ff58c808dec0a2bad7%7C2df2418fe75f46f0898d65f4eeecb14b%7C0%7C0%7C639160002307544852%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=qvuVR2PsJQx5M9oShqTN%2BBc%2FXQQiE9NdPwNm6MZ8VoE%3D&amp;reserved=0" data-auth="NotApplicable" originalsrc="https://www.apiwebstore.org/standards/5L?utm_campaign=43765133-Standards-5L&amp;utm_source=pr" data-outlook-id="e5cfad5e-c8c0-439b-9809-73c2341274b0" data-linkindex="0" title="Original URL: https://www.apiwebstore.org/standards/5L?utm_campaign=43765133-Standards-5L&amp;utm_source=pr. Click or tap if you trust this link." style="color: blue; margin: 0px; padding: 0px; border: 0px; line-height: inherit;"><span style="text-decoration: underline;">47th edition of API Specification 5L (API Spec 5L), </span><em><span style="text-decoration: underline;">Line Pipe</span></em></a>. Originally published in 1924 as API&rsquo;s first standard, API Spec 5L has supported the safe and reliable manufacture of steel line pipe used to transport oil and gas for more than a century. The 47th edition includes important new requirements across more than 15 topic areas, including high-frequency weld (HFW) pipe quality and pipe used in CO2 transport.</p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-announces-47th-edition-of-foundational-line-pipe-standard-5l">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-spec-5l-47th">API Specification 5L, Line Pipe</a></h3>
<p>API has published the 47th edition of American Petroleum Institute <a href="https://www.apiwebstore.org/standards/5L">Specification 5L (API Spec 5L), Line Pipe</a>. Originally introduced as API&rsquo;s first standard in 1924, the updated edition includes new requirements across more than 15 topic areas to support the safe and reliable manufacture of steel line pipe used in energy transportation, including CO2 transport.&nbsp;&nbsp;</p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-spec-5l-47th">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-rp-1192">API Recommended Practice 1192 (RP 1192), Transportation of Carbon Dioxide by Pipeline</a></h3>
<p>API has published<span style="line-height: 18.4px;"><a href="https://www.apiwebstore.org/standards/1192?utm_campaign=32314451-rp-1192&amp;utm_source=pub-announcement">&nbsp;Recommended Practice 1192 (RP 1192),&nbsp;<em>Transportation of Carbon Dioxide by Pipeline<strong></strong></em></a><strong><em></em></strong></span><span style="line-height: 18.4px;">.This first edition standard provides performance requirements for the safe and reliable transport of carbon dioxide (CO<sub>2</sub>) by pipeline. It also addresses the design, construction, operation, and management of CO<sub>2</sub>&nbsp;</span><span style="line-height: 18.4px;">pipelines</span></p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-rp-1192">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-strengthens-requirements-for-steel-casing-and-tubing">API Strengthens Requirements for Steel Casing and Tubing</a></h3>
<p style="margin-bottom: 0in; line-height: normal;"><span>WASHINGTON, May 5, 2025 &mdash; The American Petroleum Institute (API) is pleased to announce the publication of an&nbsp;</span><a href="https://www.apiwebstore.org/standards/5CT?utm_campaign=Spec%205ct&amp;utm_source=standardshighlights&amp;utm_medium=PubNotice&amp;__hstc=23321061.e37da81b94fb192a0eca1fd2b60ae651.1745351530524.1745592885893.1745605800922.3&amp;__hssc=23321061.3.1745605800922&amp;__hsfp=509228229">Addendum to the 11th edition of the API 5CT, Casing and Tubing</a><span>. The update strengthens the requirements for the manufacture of steel casing and tubing used in oil and gas drilling and production operations, enhancing safety, environmental protection and operational integrity.</span></p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-strengthens-requirements-for-steel-casing-and-tubing">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/addendum-1-rp-1183">Addendum to API RP 1183 for Improved Dent Screening</a></h3>
<p><span style="color: black;">In December 2020, American Petroleum Institute (API) published Recommended Practice 1183, First<sup> </sup>Edition (RP 1183), <em>Assessment and Management of Dents in Pipelines</em>. Since being issued, RP 1183 has been applied by pipeline operating companies and engineering consultants providing services to the energy pipeline industry to evaluate dents and deformations on pipeline systems and to support mitigation and repair decisions. RP 1183 includes various screening tools to estimate the remaining fatigue life of a dent in a pipeline.&nbsp; </span></p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/addendum-1-rp-1183">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/2025-international-standards-report-announcement">New API Report Highlights Broader Global Adoption of API Standards </a></h3>
<p>February 4, 2025 &ndash; The American Petroleum Institute (API)&nbsp;today&nbsp;released a new report,&nbsp;<em><a href="/-/media/APIWebsite/products-and-services/2025_Intnl-Usage_Report_web-final.pdf"><em>202</em><em>5&nbsp;API Standards</em><em></em><em>International Usage Report</em></a></em>,&nbsp;detailing the&nbsp;growing&nbsp;international&nbsp;influence&nbsp;of API&nbsp;standards.&nbsp;The&nbsp;report&nbsp;identifies&nbsp;where&nbsp;governments and standards&nbsp;bodies reference&nbsp;API standards in policies, national and international standards, and technical&nbsp;regulations,&nbsp;highlighting&nbsp;the paramount role of API standards in&nbsp;advancing safety, sustainability, and efficiency&nbsp;across&nbsp;the&nbsp;global&nbsp;natural gas and oil industry.</p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/2025-international-standards-report-announcement">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/20s-3d-printing-update">API Enhances 3D Printing Guidelines with Updated Additive Manufacturing Standard</a></h3>
<p style="margin-bottom: 0in; line-height: normal;"><span style="color: black;">The American Petroleum Institute (API) is pleased to announce the release of the second edition of API Standard 20S, <em>Qualification of Metal Additive Manufacturing Processes and Components Production Control for Use in the Petroleum and Natural Gas Industries</em>. This update strengthens the industry&rsquo;s ability to effectively deploy additive manufacturing (AM), or 3D printing, improving efficiency, supply chain resilience and sustainability across oil and natural gas operations.</span></p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/20s-3d-printing-update">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/addendum-1-to-api-rp-2001-10th-ed">Addendum 1 to API RP 2001, 10th Edition: Fire Protection in Refineries</a></h3>
<p>API has published Addendum 1 to API Recommended Practice 2001, 10th Edition - "Fire Protection in Refineries."<br />
<br />
This addendum strengthens existing fire safety measures by introducing new protocols for pre-planning and incident response in refineries.</p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/addendum-1-to-api-rp-2001-10th-ed">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-5ct-casing-and-tubing-addendum-1-11th">Addendum to the 11th edition of the API 5CT, Casing and Tubing</a></h3>
<p><span style="color: black;">The American Petroleum Institute (API) is pleased to announce the publication of an Addendum to the 11th edition of the API 5CT, <em>Casing and Tubing</em>. The update strengthens the requirements for the manufacture of steel casing and tubing used in oil and gas drilling and production operations, enhancing safety, environmental protection and operational integrity.</span></p>
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-5ct-casing-and-tubing-addendum-1-11th">More »</a></p>
</div>
<div class="media-item">
<h3><a href="/products-and-services/standards/important-standards-announcements/api-updates-fire-protection-standard-for-refineries">API Updates Fire Protection Standard for Refineries</a></h3>
<p>October 24, 2024 &ndash; The American Petroleum Institute (API) today announced the publication of Addendum 1 to API RP 2001, 10th Edition, &ldquo;Fire Protection in Refineries.&rdquo; This addendum strengthens existing fire safety measures by introducing new protocols for pre-planning and incident response in refineries.</p>
<br class="t-last-br" />
<p><a class="more" href="/products-and-services/standards/important-standards-announcements/api-updates-fire-protection-standard-for-refineries">More »</a></p>
</div>
<div class="pagination-wrapper">
<div class="row">
<div class="col-lg-8">
<div class="pagination-container"><ul class="pagination"><li class="page-item active"><a>1</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=2&amp;pageSize=10">2</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=3&amp;pageSize=10">3</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=4&amp;pageSize=10">4</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=5&amp;pageSize=10">5</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=6&amp;pageSize=10">6</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=7&amp;pageSize=10">7</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=8&amp;pageSize=10">8</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=9&amp;pageSize=10">9</a></li><li class="page-item"><a href="/products-and-services/standards/important-standards-announcements?page=10&amp;pageSize=10">10</a></li><li class="page-item disabled PagedList-ellipses"><a>&#8230;</a></li><li class="page-item PagedList-skipToNext"><a href="/products-and-services/standards/important-standards-announcements?page=2&amp;pageSize=10" rel="next">»</a></li><li class="page-item PagedList-skipToLast"><a href="/products-and-services/standards/important-standards-announcements?page=12&amp;pageSize=10">»»</a></li></ul></div>
</div>
<div class="col-lg-4">
<form action="/products-and-services/standards/important-standards-announcements" method="post"> <div class="form-group">
<select class="form-control" id="pageSize" name="pageSize" onchange="this.form.submit();"><option value="0">Show All</option>
<option value="50">Show 50</option>
<option value="20">Show 20</option>
<option selected="selected" value="10">Show 10</option>
</select>
</div>
</form> </div>
</div>
</div>
</div>
<div class="right">
<div class="sidebar">
<ul>
<li>
<a href="/products-and-services/standards">Overview</a>
</li>
<li>
<a class="active" href="/products-and-services/standards/important-standards-announcements">Standards News Highlights</a>
</li>
<li>
<a href="/products-and-services/standards/purchase">Purchase</a>
</li>
<li>
<a href="/products-and-services/standards/committees">Committees</a>
</li>
<li>
<a href="/products-and-services/standards/global-standards">Global Standards</a>
</li>
<li>
<a href="/products-and-services/standards/get-involved">Get Involved</a>
</li>
<li>
<a href="/products-and-services/standards/standards-plan">Standards Plan</a>
</li>
<li>
<a href="/products-and-services/standards/standards-inquiries">Requests for Interpretation</a>
</li>
<li>
<a href="/products-and-services/standards/rights-and-usage-policy">Copyright Information</a>
</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<link rel="stylesheet" href="/library/APIWeb/css/subpage.css" />
<!-- ===== Footer placeholder (Footer.cshtml renders here) ===== -->
<!--BEGIN COVEO NOINDEX-->
<footer>
<div class="related-products">
<div class="wrap">
<div class="left">
<div class="hubspot-form">
<span>Sign-Up for Updates</span>
<script charset="utf-8" type="text/javascript" src="//js.hsforms.net/forms/embed/v2.js"></script>
<script>
hbspt.forms.create({
portalId: "20801443",
formId: "3fbd74a7-aee1-41a5-9832-d92fee519bad",
region: "na1"
});
</script>
</div>
</div>
<div class="right">
<h4>Certification Directories</h4>
<p>Free directories listing all of the participants in our certification-related programs.</p>
<a href="https://www.api.org/products-and-services/certifications-directories" class="btn-blue">View All</a>
<hr />
<h4>Request a Quote</h4>
<p>Request a quotation for programs like API Monogram, APIQR, API Standards Subscription, PSSAP®, and more.</p>
<a href="https://www.api.org/products-and-services/get-a-quote" class="btn-blue">Get a Quote</a>
</div>
</div>
</div>
<div class="footer-wrap">
<div class="top-footer">
<div class="left-footer">
<div class="logo-container">
<img src="/-/media/APIWebsite/Thumbs/api-logo-stacked.png" alt="API" />
</div>
<div class="footer-nav">
<a href="/about">About</a>
<a href="/about/careers">Careers</a>
<a href="/contact">Contact</a>
</div>
</div>
<div class="right-footer">
<h4>Stay Connected</h4>
<h5>API Energy</h5>
<ul class="social-nav">
<li>
<a class="btn-social" href="https://www.facebook.com/TheAmericanPetroleumInstitute/">
<i class="fab fa-facebook-f" aria-hidden="true"></i>
<span class="visually-hidden">Follow us on Facebook</span>
</a>
</li>
<li>
<a class="btn-social" href="https://www.instagram.com/americanpetroleum/">
<i class="fab fa-instagram" aria-hidden="true"></i>
<span class="visually-hidden">Follow us on Instagram</span>
</a>
</li>
<li>
<a class="btn-social" href="https://twitter.com/APIenergy">
<i class="fab fa-x-twitter" aria-hidden="true"></i>
<span class="visually-hidden">Follow us on X</span>
</a>
</li>
<li>
<a class="btn-social" href="https://www.linkedin.com/company/americanpetroleuminstitute/">
<i class="fab fa-linkedin-in" aria-hidden="true"></i>
<span class="visually-hidden">
Follow us on LinkedIn
</span>
</a>
</li>
<li>
<a class="btn-social" href="https://www.youtube.com/energy">
<i class="fab fa-youtube" aria-hidden="true"></i>
<span class="visually-hidden">Follow us on YouTube</span>
</a>
</li>
</ul>
<h5>API Global</h5>
<ul class="social-nav">
<li><a class="btn-social" href="https://www.facebook.com/OfficialAPIGlobal"><i class="fab fa-facebook-f" aria-hidden="true"></i></a></li>
<li><a class="btn-social" href="https://twitter.com/apiglobal"><i class="fab fa-x-twitter" aria-hidden="true"></i></a></li>
<li><a class="btn-social" href="https://www.linkedin.com/company/api-global-official-/"><i class="fab fa-linkedin-in" aria-hidden="true"></i></a></li>
<li><a class="btn-social" href="https://www.youtube.com/channel/UCitegkCxi2r-GGJRabGpRKg"><i class="fab fa-youtube" aria-hidden="true"></i></a></li>
</ul>
</div>
</div>
<div class="bottom-footer">
<p class="small">
&copy; Copyright 2026 - API. All Rights Reserved. | <a href="/terms-and-conditions">Terms &amp; Conditions</a> | <a href="/privacy">Privacy</a></p>
<div class="group">
<p><a href="/privacy">Privacy &amp; Cookies Notice</a></p>
<p><a href="/terms-and-conditions">Terms &amp; Conditions</a></p>
</div>
</div>
</div>
<script src="/library/APIWeb/js/menu.js"></script>
<script src="/library/APIWeb/js/mobile-menu.js"></script>
<script src="https://kit.fontawesome.com/a2be981ec3.js" crossorigin="anonymous"></script>
</footer>
<!--END COVEO NOINDEX-->
<!--BEGIN COVEO NOINDEX-->
<!-- Back to Top Button -->
<a href="#" id="backToTop" class="back-to-top" aria-label="Back to Top"></a>
<script>
// Picture element HTML5 shiv for older browsers
document.createElement("picture");
</script>
<!-- Coveo Search Resources -->
<link rel="stylesheet" href="https://static.cloud.coveo.com/searchui/v2.10116/css/CoveoFullSearch.css" />
<link rel="stylesheet" href="https://static.cloud.coveo.com/coveoforsitecore/ui/v0.64.7/css/CoveoForSitecore.css" />
<script class="coveo-script" type="text/javascript" src='https://static.cloud.coveo.com/searchui/v2.10116/js/CoveoJsSearch.Lazy.min.js'></script>
<script class="coveo-for-sitecore-script" type="text/javascript" src='https://static.cloud.coveo.com/coveoforsitecore/ui/v0.64.7/js/CoveoForSitecore.Lazy.min.js'></script>
<script type="text/javascript" src="https://static.cloud.coveo.com/searchui/v2.10116/js/cultures/en.js"></script>
<div>
<!-- Resources -->
<div id="SearchResourcesProperties_a3a158ca24074042a057852064d1ead7"
class="CoveoForSitecoreContext"
data-sc-should-have-analytics-component='true'
data-sc-analytics-enabled='true'
data-sc-current-language='en'
data-prebind-sc-language-field-name='fieldTranslator'
data-sc-language-field-name='_language'
data-sc-labels='{&quot;Created&quot; : &quot;Created&quot; , &quot;Created By&quot; : &quot;Created by&quot; , &quot;Creation Time&quot; : &quot;Creation time.&quot; , &quot;Language&quot; : &quot;Language&quot; , &quot;Last Time Modified&quot; : &quot;Last time modified.&quot; , &quot;Template&quot; : &quot;Template&quot; , &quot;Uniform resource identifier&quot; : &quot;URI&quot; , &quot;Updated By&quot; : &quot;Updated by&quot; , &quot;If the problem persists contact the administrator.&quot; : &quot;If the problem persists contact the administrator.&quot; , &quot;Search is currently unavailable&quot; : &quot;Oops! Something went wrong on the server.&quot; , &quot;Ascending&quot; : &quot;Ascending&quot; , &quot;Descending&quot; : &quot;Descending&quot;}'
data-sc-maximum-age='900000'
data-sc-page-name='important-standards-announcements'
data-sc-page-name-full-path='/sitecore/content/Sites/API2/Home/products-and-services/standards/important-standards-announcements'
data-sc-index-source-name='Coveo_web_index - Prod104'
data-sc-is-in-experience-editor='false'
data-sc-is-user-anonymous='true'
data-sc-item-uri='sitecore://web/{1BA7D892-F03B-45B8-90A2-9F074C53FA6A}?lang=en&amp;ver=1'
data-sc-item-id='1ba7d892-f03b-45b8-90a2-9f074c53fa6a'
data-prebind-sc-latest-version-field-name='fieldTranslator'
data-sc-latest-version-field-name='_latestversion'
data-sc-rest-endpoint-uri='/coveo/rest'
data-sc-analytics-endpoint-uri='/coveo/rest/ua'
data-sc-site-name='api2'
data-sc-field-prefix='f'
data-sc-field-suffix='7509'
data-sc-prefer-source-specific-fields='false'
data-sc-external-fields='[{&quot;fieldName&quot;:&quot;permanentid&quot;,&quot;shouldEscape&quot;:false}]'
data-sc-source-specific-fields='[{&quot;fieldName&quot;:&quot;attachmentparentid&quot;},{&quot;fieldName&quot;:&quot;author&quot;},{&quot;fieldName&quot;:&quot;clickableuri&quot;},{&quot;fieldName&quot;:&quot;collection&quot;},{&quot;fieldName&quot;:&quot;concepts&quot;},{&quot;fieldName&quot;:&quot;date&quot;},{&quot;fieldName&quot;:&quot;filetype&quot;},{&quot;fieldName&quot;:&quot;indexeddate&quot;},{&quot;fieldName&quot;:&quot;isattachment&quot;},{&quot;fieldName&quot;:&quot;language&quot;},{&quot;fieldName&quot;:&quot;printableuri&quot;},{&quot;fieldName&quot;:&quot;rowid&quot;},{&quot;fieldName&quot;:&quot;size&quot;},{&quot;fieldName&quot;:&quot;source&quot;},{&quot;fieldName&quot;:&quot;title&quot;},{&quot;fieldName&quot;:&quot;topparent&quot;},{&quot;fieldName&quot;:&quot;topparentid&quot;},{&quot;fieldName&quot;:&quot;transactionid&quot;},{&quot;fieldName&quot;:&quot;uri&quot;},{&quot;fieldName&quot;:&quot;urihash&quot;}]'
>
</div>
<script type="text/javascript">
var endpointConfiguration = {
itemUri: "sitecore://web/{1BA7D892-F03B-45B8-90A2-9F074C53FA6A}?lang=en&amp;ver=1",
siteName: "api2",
restEndpointUri: "/coveo/rest"
};
if (typeof (CoveoForSitecore) !== "undefined") {
CoveoForSitecore.SearchEndpoint.configureSitecoreEndpoint(endpointConfiguration);
CoveoForSitecore.version = "5.0.1368.1";
var context = document.getElementById("SearchResourcesProperties_a3a158ca24074042a057852064d1ead7");
if (!!context) {
CoveoForSitecore.Context.configureContext(context);
}
}
</script>
</div>
<!--END COVEO NOINDEX-->
<!-- news-policy-and-issues/blog -->
<script>
function formatDateElementsWithCoveo() {
Coveo.$$(document).on('newResultDisplayed', function (event, args) {
var dateElements = args.item.getElementsByClassName('posted-date');
Array.prototype.forEach.call(dateElements, function (elem) {
var timestamp = parseInt(elem.textContent.replace('Posted: ', '').trim(), 10);
var date = new Date(timestamp);
var options = { year: 'numeric', month: 'long', day: 'numeric' };
var formattedDate = date.toLocaleDateString("en-US", options);
if (isNaN(date.getTime())) {
elem.textContent = '';
} else {
elem.textContent = 'Posted: ' + formattedDate;
}
});
});
}
function waitForCoveoReady() {
if (typeof Coveo !== 'undefined' && Coveo.$) {
formatDateElementsWithCoveo();
} else {
setTimeout(waitForCoveoReady, 100);
}
}
document.addEventListener('DOMContentLoaded', waitForCoveoReady);
</script>
<!--END news-policy-and-issues/blog-->
</body>
</html>
+196
View File
@@ -0,0 +1,196 @@
<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:arxiv="http://arxiv.org/schemas/atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
<channel>
<title>physics.app-ph updates on arXiv.org</title>
<link>http://rss.arxiv.org/rss/physics.app-ph</link>
<description>physics.app-ph updates on the arXiv.org e-print archive.</description>
<atom:link href="http://rss.arxiv.org/rss/physics.app-ph" rel="self" type="application/rss+xml"/>
<docs>http://www.rssboard.org/rss-specification</docs>
<language>en-us</language>
<lastBuildDate>Wed, 10 Jun 2026 04:00:28 +0000</lastBuildDate>
<managingEditor>rss-help@arxiv.org</managingEditor>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<skipDays>
<day>Sunday</day>
<day>Saturday</day>
</skipDays>
<item>
<title>Limits of Trap-assisted Photomultiplication Gain</title>
<link>https://arxiv.org/abs/2606.10236</link>
<description>arXiv:2606.10236v1 Announce Type: new
Abstract: Photodiodes based on trap-assisted current injection can exhibit internal photomultiplication with apparent quantum efficiencies far exceeding unity, raising the question of whether such gain fundamentally enhances detector sensitivity. We employ a minimal analytical framework based on a single gain-active trapped state coupling photogenerated carriers to contact injection. The gain is intrinsically self-limiting: the injection process that amplifies the current simultaneously accelerates relaxation of the gain-enabling state, producing an inherently nonlinear, operating-point-dependent response. The form of this nonlinearity is not universal -- once the trap level is generalized to an energetic distribution and recombination is allowed to be bimolecular, the same mechanism yields superlinear, linear, or strongly sublinear responses. A single chord gain is therefore not a meaningful device descriptor, and chord-gain comparisons across the literature conflate devices in different regimes. Treating trap occupancy and injection as coupled stochastic processes, we show that internal gain introduces a strictly non-negative fluctuation penalty from the dissipative dynamics that sustain the gain state. A local, small-signal detectivity exhibits a finite optimum yet cannot exceed the intrinsic thermodynamic limit of the underlying unity-gain photodiode. Gain is thus equivalent to driven stochastic amplification: it can suppress downstream readout noise, but cannot reduce the fundamental noise floor set by the primary photodetection process.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10236v1</guid>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>new</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:creator>Ardalan Armin</dc:creator>
</item>
<item>
<title>Filamentary Transport and Thermoelectric Effects in Mushroom Phase Change Memory Cells</title>
<link>https://arxiv.org/abs/2606.10262</link>
<description>arXiv:2606.10262v1 Announce Type: new
Abstract: We performed a 2D finite-element electrothermal computational study of thermoelectric effects and filamentary electronic transport in Ge$_2$Sb$_2$Te$_5$ mushroom phase change memory cells during Reset and Set operations, accounting for spatial activation energy variations in amorphous Ge$_2$Sb$_2$Te$_5$ and phase-change dynamics. Reset operations with current going from the top electrode to the narrow 4 nm bottom electrode require $\sim$3x less energy and power, and $\sim$2x lower current to achieve the same Reset resistance, compared to the opposite polarity, due to thermoelectric effects. Filamentary conduction, electrical breakdown, thermal runaway, and local crystallization of amorphous Ge$_2$Sb$_2$Te$_5$ depend on current polarity and thermal boundary conditions, and determine the location, shape, and volume of the programming region, which may be significantly smaller than the semi-cylindrical mushroom region. The programming volume does not scale with contact dimensions larger than 10 nm. Larger contact areas introduce increased device-to-device and cycle-to-cycle variability due to filamentary conduction but are expected to lead to higher reliability and endurance.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10262v1</guid>
<category>physics.app-ph</category>
<category>cond-mat.mtrl-sci</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>new</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:creator>Md Samzid Bin Hafiz, Helena Silva, Ali Gokirmak</dc:creator>
</item>
<item>
<title>Fast-Neutron Irradiation Effect in Heteroepitaxial $\beta$-Ga$_2$O$_3$ Schottky Diodes Fabricated on Low-Cost Sapphire Substrates</title>
<link>https://arxiv.org/abs/2606.10269</link>
<description>arXiv:2606.10269v1 Announce Type: new
Abstract: In this work, we investigate the response of Ni/$\beta$-Ga$_2$O$_3$ Schottky barrier diodes fabricated on c-plane sapphire to fast-neutron irradiation up to a fluence of $1\times10^{15}$ n$\cdot$cm$^{-2}$. The LPCVD-grown heteroepitaxial structure consists of an unintentionally doped buffer, an n$^{+}$ contact layer, and an n-type drift layer, with mesa isolation realized by plasma-free Ga-assisted LPCVD etching. Prior to irradiation, the devices exhibit a turn-on voltage of 1.20 V, specific on-resistance of 8.43 m$\Omega\cdot$cm$^2$, ideality factor of 1.32, and Schottky barrier height of 1.29 eV. Following irradiation, the devices remain operational, although the forward current decreases, the turn-on voltage increases to 2.40 V, and the barrier height increases to 1.34 eV. Capacitance-voltage measurements reveal a $\sim$50% reduction in net donor concentration, corresponding to a carrier-removal rate of $\sim$105 cm$^{-1}$. Temperature-dependent measurements from 25 to 250 $^\circ$C confirm that thermionic emission remains the dominant transport mechanism and show significant suppression of reverse leakage current after irradiation. The breakdown voltage increases from 101 to 135 V, consistent with neutron-induced donor compensation. TCAD simulations show a more uniform electric-field distribution and reduced field crowding at the Schottky edge after irradiation. These results provide insight into neutron-induced donor compensation in heteroepitaxial $\beta$-Ga$_2$O$_3$ and demonstrate the ability of LPCVD-grown $\beta$-Ga$_2$O$_3$ Schottky diodes on sapphire to maintain stable operation under high-fluence neutron environments relevant to space and nuclear electronics.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10269v1</guid>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>new</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<dc:creator>Saleh Ahmed Khan, Ahmed Ibreljic, Sourav Sarker, Stephen Margiotta, Anhar Bhuiyan</dc:creator>
</item>
<item>
<title>Virtual-Array Operational Modal Analysis of Rolling Tires Using a Single Tire Cavity Accelerometer</title>
<link>https://arxiv.org/abs/2606.10437</link>
<description>arXiv:2606.10437v1 Announce Type: new
Abstract: The dynamics of rolling tires significantly influence the low-frequency (0-500 Hz) structure-borne noise within vehicles. Accurately characterizing these dynamics under realistic operating conditions remains challenging. Current state-of-the-art methods, primarily relying on Laser Doppler Vibrometers (LDV), are complex to implement, time-intensive, and generally limited to smooth tires in laboratory environments due to issues with speckle formation on treaded surfaces. This study introduces an innovative strategy for Operational Modal Analysis (OMA) of a rolling tire using a single wireless Tire Cavity Accelerometer (TCA) together with two optical sensors. The methodology leverages the non-integer ratio between the tire and drum diameters in a test rig to create a virtual sensor array. By utilizing optical sensors to time-stamp the cleat impact (on the drum) precisely and the TCA position (on the tire), the vibration responses from multiple revolutions are clustered according to the TCA's circumferential position at the moment of impact. This effectively synthesizes responses from an array of virtual sensors distributed around the tire circumference using data from a single test run. The clustered signals are conditioned using order tracking to remove periodic components arising from contact patch deformation. Both Frequency Domain Decomposition (FDD) and Covariance-based Stochastic Subspace Identification (SSI-Cov) were employed for modal identification. The SSI-Cov method proved more robust, successfully identifying 11 circumferential modes up to 240 Hz. The proposed approach offers a significantly more efficient, cost-effective method for characterizing rolling tire dynamics, which is readily applicable to treaded tires and adaptable for on-road testing.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10437v1</guid>
<category>physics.app-ph</category>
<category>physics.data-an</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>new</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<dc:creator>Pradosh Pritam Dash, Ricardo Burdisso, Pablo A Tarazaga</dc:creator>
</item>
<item>
<title>Finite-temperature Fe K-edge X-ray absorption simulations reveal local structural dynamics of an iron(II) photosensitizer in solution and the crystalline phase</title>
<link>https://arxiv.org/abs/2606.10221</link>
<description>arXiv:2606.10221v1 Announce Type: cross
Abstract: Interpreting metal K-edge spectra of flexible photosensitizers requires a structural model that separates electronic signatures from thermal motion, solvent disorder, and crystal-packing effects. We combine Fe K-edge X-ray absorption measurements with second-generation Car--Parrinello ab initio molecular dynamics and all-electron Gaussian and augmented-plane-wave simulations for an iron(II) N-heterocyclic carbene photosensitizer in acetonitrile solution and in the crystalline phase. Ensemble-averaged spectra reproduce the main near-edge features in both environments and preserve the experimentally observed similarity of the first Fe coordination shell upon dissolution. Comparison with radial distributions extracted from extended fine-structure measurements validates the Fe--N and Fe--C coordination shells sampled by the trajectories, while element-resolved pair distributions explain why higher-shell experimental contrast is rapidly lost. The same dynamical ensembles reveal a broad out-of-plane distribution of the terpyridine nitrogen atom and a nearly octahedral distribution of the Fe-centered coordination planes. The results show that finite-temperature X-ray absorption simulations can provide a compact structural-dynamics picture of molecular transition metal photosensitizers by linking local spectra, solvent-phase ligand motion, and medium-range structural disorder within one trajectory-based description.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10221v1</guid>
<category>cond-mat.mtrl-sci</category>
<category>physics.app-ph</category>
<category>physics.chem-ph</category>
<category>physics.comp-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>cross</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<dc:creator>Patrick M\"uller, Lorena Fritsch, Matthias Bauer, Thomas D. K\"uhne</dc:creator>
</item>
<item>
<title>Multi-channel Optical Vision Model</title>
<link>https://arxiv.org/abs/2606.10253</link>
<description>arXiv:2606.10253v1 Announce Type: cross
Abstract: Spatial multiplexing is one of the natural strengths of optics, yet in optical neural networks, it is often used mainly as parallel throughput. Here, we show that spatial multiplexing in an optical neural network can be used not only to process multiple inputs in parallel, but also to define a trainable representational coordinate of the model. In three implemented scenarios, parallel-input processing, class-code readout and channel-mixed feature interaction, spatial channels act as independent learners, structured code dimensions, and interacting feature groups. The programmable free-space optical processor is trained through an online physical-forward/surrogate-backward scheme, where measured optical outputs define the forward pass while a differentiable surrogate estimates gradients and is continually fine-tuned during training from newly acquired optical data. We demonstrate these channel roles in image classification and regression tasks using multi-layer architectures with more than one million trainable optical phase parameters. We further implement a hybrid optical-electronic vision-language model, in which the optical neural network provides visual tokens to a digital transformer decoder for controlled image-captioning tasks. These results establish spatially multiplexed optical channels as a programmable feature and readout space for hybrid optical vision models.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10253v1</guid>
<category>physics.optics</category>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>cross</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<dc:creator>Ali Momeni, Guillaume Noetinger, Tim Tuuva, Romain Fleury</dc:creator>
</item>
<item>
<title>Spontaneous translation of charged droplets during evaporation on dry surfaces</title>
<link>https://arxiv.org/abs/2606.10755</link>
<description>arXiv:2606.10755v1 Announce Type: cross
Abstract: Evaporating sessile droplets are usually treated as capillary objects, but droplets generated by routine handling can carry tens to hundreds of picocoulombs of electric charge. Here we combine Faraday-cup charge measurements with optical imaging to determine how such charge evolves as water droplets evaporate on dry polymer substrates. A zero-time protocol shows that a reproducible initial charge is preserved on poly(methylpentene) (PMP), whereas PDMS, SOCAL-coated surfaces, and polystyrene either exchange, dissipate, or inject charge on contact. On PMP, ensemble-resolved measurements reveal two regimes: the charge remains nearly constant during early evaporation and then decreases abruptly once the droplet reaches a small-volume state. This charge collapse coincides with spontaneous lateral translation rather than jetting or breakup. A Rayleigh-normalized analysis, including a spherical-cap stress correction and measured contact-angle retention scale, shows that motion occurs only after evaporation drives the droplet into a high electro-pinning state. High-speed imaging and kinematic analysis support a picture in which the subsequent motion is governed by repeated contact-line depinning and re-pinning: the total distance traveled is strongly affected by dry-surface pinning, whereas the peak translational velocity serves as a more robust indicator of the discharge strength. These results identify a dry-substrate mode of evaporation-driven electrostatic relaxation, distinct from Coulomb fission on lubricated surfaces, in which substrate electrostatic passivity enables charge retention, droplet geometry selects the instability onset, and whole-droplet translation provides the charge-release pathway.</description>
<guid isPermaLink="false">oai:arXiv.org:2606.10755v1</guid>
<category>cond-mat.soft</category>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>cross</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
<dc:creator>Riming Xu, Yanbo Li, Jiawen Zhang, Jin Wang, Yikai Li</dc:creator>
</item>
<item>
<title>Programmable Integrated Magnonic Meshes</title>
<link>https://arxiv.org/abs/2605.00290</link>
<description>arXiv:2605.00290v2 Announce Type: replace
Abstract: Integrated circuits are a cornerstone of modern information technology, and analog wave-based architectures could enable fast and efficient processing beyond conventional charge electronics. In magnonics, spin waves provide a highly tunable, compact and energy-efficient medium for on-chip microwave signal transport and processing. However, progress has been limited to isolated elements or short devices, severely limiting the overall functional complexity and scalability. Here we realize the key elements of universal magnonic circuitry, using a single-step direct laser writing process in yttrium iron garnet, and monolithically cascade them in multi-stage programmable devices and networks. Using magneto-optical Kerr effect microscopy, we show efficient spin-wave propagation and preserved phase coherence in waveguide structures for hundreds of wavelengths. In coupled waveguides, we observe complete and periodic power transfer over several coupling lengths, and in phase shifters we achieve arbitrary, tunable phase delays. By cascading these elements, we realize programmable splitters, frequency demultiplexers, and phase-controlled 2x2 routers, where output power and relative phase can be programmed on demand via external fields. Finally, we realize programmable magnonic interferometric meshes for on-chip radio-frequency signal routing, with up to six magnonic inputs and outputs and seven cascaded stages, without the need for intermediate amplification. These direct-write cascaded networks bridge a long-standing gap in magnonic scalability, offering a viable pathway toward integrated, large-scale architectures for both classical and quantum processing.</description>
<guid isPermaLink="false">oai:arXiv.org:2605.00290v2</guid>
<category>physics.app-ph</category>
<category>cond-mat.mtrl-sci</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:creator>Piero Florio, Matteo Vitali, Valerio Levati, Rasheed M. Ishola, Luca Ciaccarini Mavilla, Nora Lecis, Carsten Dubs, Riccardo Bertacco, Marco Madami, Silvia Tacchi, Daniela Petti, Edoardo Albisetti</dc:creator>
</item>
<item>
<title>Interpretable deep convolutional model for nonlinear multivariate time series in complex systems</title>
<link>https://arxiv.org/abs/2501.04339</link>
<description>arXiv:2501.04339v2 Announce Type: replace-cross
Abstract: We introduce the Deep Convolutional Interpreter for Time Series (DCIts), a deep-learning architecture for nonlinear multivariate time series that provides sample-specific, locally interpretable descriptions of the underlying interaction structure. Unlike standard black-box forecasters, DCIts learns a time- and lag-dependent transition tensor explicitly factorized into two components: a Focuser, which selects relevant source series and time lags via a sparse masking mechanism, and a Modeler, which assigns signed coefficients to these selected interactions. This decomposition yields a local lag-adjacency structure and signed source-lag contributions for every forecast instance, enabling direct inspection of effective connectivity; when higher-order branches are activated, the same framework yields order-resolved elementwise polynomial contributions. Architecturally, DCIts uses a diverse bank of convolutional filters to capture temporal and cross-variable dependencies, which are mapped through a bottleneck network to the transition tensor. On controlled benchmark datasets with a known interaction structure, we demonstrate that DCIts achieves competitive forecasting error relative to a strong interpretable baseline while recovering stable, signed, lag-resolved interaction patterns. The framework thus prioritizes intrinsic interpretability, using forecasting accuracy as a faithfulness constraint rather than the sole objective.</description>
<guid isPermaLink="false">oai:arXiv.org:2501.04339v2</guid>
<category>stat.ML</category>
<category>cs.LG</category>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace-cross</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<arxiv:DOI>10.1063/5.0325209</arxiv:DOI>
<arxiv:journal_reference>Chaos 36, 063116 (2026)</arxiv:journal_reference>
<dc:creator>Domjan Baric, Davor Horvatic</dc:creator>
</item>
<item>
<title>Probing laser-driven surface and subsurface dynamics via grazing-incidence XFEL scattering and diffraction</title>
<link>https://arxiv.org/abs/2509.12015</link>
<description>arXiv:2509.12015v2 Announce Type: replace-cross
Abstract: We demonstrate a grazing-incidence x-ray platform that simultaneously records time-resolved grazing-incidence small-angle x-ray scattering (GISAXS) and grazing-incidence x-ray diffraction (GID) from a femtosecond laser-irradiated gold film above the melting threshold, with picosecond resolution at an x-ray free-electron laser (XFEL). By tuning the x-ray incidence angle, the probe depth is set to tens of nanometers, enabling depth-selective sensitivity to near-surface dynamics. GISAXS resolves ultrafast changes in surface nanomorphology (correlation length, roughness), while GID quantifies subsurface lattice compression, grain orientation, melting, and recrystallization. The approach overcomes photon-flux limitations of synchrotron grazing-incidence geometries and provides stringent, time-resolved benchmarks for complex theoretical models of ultrafast laser-matter interaction and warm dense matter. Looking ahead, the same depth-selective methodology is well suited to inertial confinement fusion (ICF): it can visualize buried-interface perturbations and interfacial thermal resistance on micron to sub-micron scales that affect instability seeding and burn propagation.</description>
<guid isPermaLink="false">oai:arXiv.org:2509.12015v2</guid>
<category>physics.optics</category>
<category>physics.app-ph</category>
<category>physics.ins-det</category>
<category>physics.plasm-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace-cross</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<arxiv:DOI>10.1107/S2052252526001727</arxiv:DOI>
<arxiv:journal_reference>IUCrJ Vol.13, Pages 249-259 (2026)</arxiv:journal_reference>
<dc:creator>Lisa Randolph, \"Ozg\"ul \"Ozt\"urk, Dmitriy Ksenzov, Lingen Huang, Thomas Kluge, S. V. Rahul, Victorien Bouffetier, Carsten Baehtz, Mohammadreza Banjafar, Erik Brambrink, Fabien Brieuc, Byoung Ick Cho, Sebastian G\"ode, Tobias Held, Hauke H\"oppner, Gerhard Jakob, Mathias Kl\"aui, Zuzana Kon\^opkov\'a, Changhoo Lee, Gyusang Lee, Mikako Makita, Mikhail Mishchenko, Mianzhen Mo, Pascal D. Ndione, Michael Paulus, Alexander Pelka, Franziska Paschke-Bruehl, Thomas R. Preston, Baerbel Rethfeld, Christian R\"odel, Michal \v{S}m\'id, Ling Wang, Sebastian T. Weber, Lennart Wollenweber, Jan-Patrick Schwinkendorf, Christian Gutt, Motoaki Nakatsutsumi</dc:creator>
</item>
<item>
<title>Real-space imaging reveals symmetry-selected nonlinear energy routing in a mechanical resonator</title>
<link>https://arxiv.org/abs/2605.01469</link>
<description>arXiv:2605.01469v2 Announce Type: replace-cross
Abstract: Nonlinear energy exchange between vibrational modes underlies phenomena ranging from internal resonance and wave mixing to frequency-comb generation, yet modal interactions are typically inferred from spectra rather than directly observed in space. Here, we image nonlinear modal energy routing in a nearly mirror-symmetric microelectromechanical resonator using phase-locked multi-harmonic stroboscopic interferometry. By reconstructing the spatial eigenmode content of individual harmonics, we show that harmonics generated by a driven mode can be carried by distinct spatial eigenmodes, directly resolving spatial pathways of nonlinear energy transfer. Our measurements further reveal that this modal routing persists away from integer frequency matching: in the off-resonant regime, generated harmonic components are dominated by eigenmodes sharing the driven mode's mirror parity, whereas spectrally closer opposite-parity modes remain strongly suppressed. A nonlinear modal framework based on geometric nonlinearity shows that the relevant cubic coupling coefficients factorize into symmetry-dependent modal-overlap integrals, identifying mirror parity as the selection rule for nonlinear modal interaction. This work identifies spatial symmetry as a design parameter for nonlinear energy routing and provides a route to symmetry-engineered control of energy flow in multimode nonlinear wave systems.</description>
<guid isPermaLink="false">oai:arXiv.org:2605.01469v2</guid>
<category>physics.optics</category>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace-cross</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
<dc:creator>Ya Zhang, Yuko Terasawa, Qian Liu, Shumpei Takenaka, Hua Li, Yutao Xu, Xueyong Wei, Kazuhiko Hirakawa</dc:creator>
</item>
<item>
<title>Designing single-layer PDMS devices for micron to millimeter-scale deformations</title>
<link>https://arxiv.org/abs/2605.17402</link>
<description>arXiv:2605.17402v2 Announce Type: replace-cross
Abstract: The elasticity of PDMS has played a central role in advancing important microfluidic technologies, ranging from early valves to sophisticated organ-on-a-chip systems. However, most deformable microfluidic devices are based on geometries that require complex multi-layer PDMS architectures and include thin membranes, leading to difficult microfabrication and poor stability. Recently, Jain, Belkadi et al. (Biofabrication 16.3 (2024): 035010) introduced a single-layer PDMS device in which a wide and long microfluidic channel was deformed by pressurizing two adjacent air chambers. While they demonstrated how the channel ceiling deformation can be leveraged to compress biological materials, it remains unknown how the device geometry influences this deformation. Here, a systematic numerical study is performed on 14,336 variants of this device, through which the height of the PDMS layer is identified as the main feature that determines the ceiling deformation. Three modes of channel deformation are identified as the geometry are varied: a U shape with a central minimum, a W shape with two minima and a central maximum, or an inverse U shape with an upward-bulging single maximum. The numerical results are validated in experiments that reproduce the three modes for the predicted geometries and demonstrate vertical ceiling deformations ranging from a few microns to the millimeter scale. The generality of this approach is demonstrated for two example applications: A fully closing single-layer microfluidic valve and an optical lens of controllable anisotropic magnification. This work leverages the rapid prototyping enabled by 3D printing or micro-milling to open new perspectives in microfluidic actuation.</description>
<guid isPermaLink="false">oai:arXiv.org:2605.17402v2</guid>
<category>physics.flu-dyn</category>
<category>physics.app-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace-cross</arxiv:announce_type>
<dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:creator>Leon V. Gebhard, Alexandre S. Avaro, Gabriel Amselem, Charles N. Baroud</dc:creator>
</item>
<item>
<title>Metasurfaces for neutral-atom trapping</title>
<link>https://arxiv.org/abs/2605.30498</link>
<description>arXiv:2605.30498v2 Announce Type: replace-cross
Abstract: Trapped neutral atoms are one of the leading platforms for quantum information technologies, in particular for quantum computing, but scaling them to array sizes needed for utility-scale quantum computing is a major engineering challenge. Here we review optical metasurfaces as an enabling technology that provides fine control over the phase, amplitude, and polarization of light, with pixel counts far exceeding what is available with spatial light modulators (SLMs) and other active devices. The large pixel counts have recently led to demonstrations of arrays of optical tweezers with hundreds of thousands of sites and arrays of optical bottle-beams with complex three-dimensional trapping profiles. The flexibility and scalability of optical metasurfaces provides a route towards miniaturized, integrated, and highly scalable atomic experiments and instruments.</description>
<guid isPermaLink="false">oai:arXiv.org:2605.30498v2</guid>
<category>physics.optics</category>
<category>physics.app-ph</category>
<category>physics.atom-ph</category>
<category>quant-ph</category>
<pubDate>Wed, 10 Jun 2026 00:00:00 -0400</pubDate>
<arxiv:announce_type>replace-cross</arxiv:announce_type>
<dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
<dc:creator>Chengyu Fang, Minjeong Kim, Mark Saffman, Jennifer T. Choy, Mikhail Kats</dc:creator>
</item>
</channel>
</rss>
+37
View File
@@ -0,0 +1,37 @@
<?xml version="1.0"?>
<rss version="2.0" xmlns:prism="http://purl.org/rss/1.0/modules/prism/">
<channel>
<title>Journal of Pressure Vessel Technology Open Issues</title>
<link>https://asmedigitalcollection.asme.org/pressurevesseltech</link>
<description>
</description>
<language>en-us</language>
<pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
<lastBuildDate>Tue, 12 May 2026 00:00:37 GMT</lastBuildDate>
<generator>Silverchair</generator>
<managingEditor>ASMEDigitalCollection@asme.org</managingEditor>
<webMaster>ASMEDigitalCollection@asme.org</webMaster>
<item>
<title>Research on Low-Temperature Mechanical Properties and Fracture Behavior of 09MnNiDR Steel Based on Small Punch Test</title>
<link>https://asmedigitalcollection.asme.org/pressurevesseltech/article/148/5/051504/1232699/Research-on-Low-Temperature-Mechanical-Properties</link>
<pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
<description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;To develop a microdamage evaluation method applicable to in-service equipment under low-temperature conditions, this study systematically investigates the mechanical properties and fracture behavior of 09MnNiDR cryogenic steel over a broad temperature range from room temperature to 196 °C. The small punch test (SPT) technique is employed, supplemented by electron backscatter diffraction (EBSD) and scanning electron microscopy (SEM) for micromechanism analysis. Results indicate that under cryogenic conditions, dislocation slip is suppressed, leading to a more uniform distribution of plastic strain. Concurrently, the deformation process at low temperatures refines the grains within the plastic zone through mechanisms such as mechanical subdivision. As temperature decreases, the material strength increases linearly, exhibiting a significant cryogenic strengthening effect. The fracture mode transitions from ductile to brittle, with a ductile-to-brittle transition zone identified near 150 °C. An empirical formula based on SPT deformation energy is proposed to predict yield and true tensile strength, with prediction errors below 6%. By introducing a normalized energy parameter, an empirical correlation model is established between the SPT ductile-to-brittle transition temperature (DBTT) and the standard Charpy impact transition temperature. This study presents a viable methodology for safety assessment of in-service cryogenic pressure vessels through minimally invasive testing and performance prediction.&lt;/span&gt;</description>
<prism:volume xmlns:prism="prism">148</prism:volume>
<prism:number xmlns:prism="prism">5</prism:number>
<prism:startingPage xmlns:prism="prism">051504</prism:startingPage>
<prism:doi xmlns:prism="prism">10.1115/1.4071740</prism:doi>
<guid>https://asmedigitalcollection.asme.org/pressurevesseltech/article/148/5/051504/1232699/Research-on-Low-Temperature-Mechanical-Properties</guid>
</item>
<item>
<title>Improved Oxidation, Carburization Resistance and Creep Strength of Ethylene Pyrolysis Furnace Tubes at 1100 °C Through Aluminum and Tungsten Alloying</title>
<link>https://asmedigitalcollection.asme.org/pressurevesseltech/article/148/4/041701/1232556/Improved-Oxidation-Carburization-Resistance-and</link>
<pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
<description>&lt;span class="paragraphSection"&gt;&lt;div class="boxTitle"&gt;Abstract&lt;/div&gt;The oxidation resistance, carburization resistance, and mechanical properties of ethylene pyrolysis furnace tube alloys modified by Al/Al-W alloying were comparatively investigated with conventional alloys using various microstructural characterization techniques and mechanical property testing methods. The Al-alloyed 29Cr44Ni4AlNb+microalloy (MA) exhibits superior oxidation and carburization resistance compared to conventional 25Cr35NiNb+MA and 35Cr45NiNb+MA alloys; however, its creep rupture life was significantly reduced. Further addition of W enhanced the solid solution strengthening effect, thereby improving high-temperature tensile properties and mitigating the detrimental impact of Al on creep performance. The creep rupture life of the Al/W-modified 27Cr44Ni5W3Al+MA alloy reached levels comparable to those of conventional alloys while retaining the beneficial effects of Al in improving oxidation and carburization resistance. Through alloying strategies, this study successfully achieved a balance between corrosion resistance and mechanical properties in ethylene pyrolysis furnace tube alloys, enabling them to withstand their harsh service conditions effectively.&lt;/span&gt;</description>
<prism:volume xmlns:prism="prism">148</prism:volume>
<prism:number xmlns:prism="prism">4</prism:number>
<prism:startingPage xmlns:prism="prism">041701</prism:startingPage>
<prism:doi xmlns:prism="prism">10.1115/1.4071682</prism:doi>
<guid>https://asmedigitalcollection.asme.org/pressurevesseltech/article/148/4/041701/1232556/Improved-Oxidation-Carburization-Resistance-and</guid>
</item>
</channel>
</rss>
File diff suppressed because one or more lines are too long
+11
View File
@@ -0,0 +1,11 @@
<html><head><title>Givaudan Sense Colour Explosion</title></head><body>
<!-- 실측 발췌 (2026-06-11, csb.gov givaudan-sense-colour-explosion-) — PDF 앵커 원형 보존:
보고서/부록/업데이트 + recommendation 상태요약 혼재 페이지 -->
<a href="/assets/1/20/Appendix_C_Reactivity_Testing_Results_Publication.pdf?17347" id="CT_InvestigationDetails_8_rptDocumentsCategory_ctl06_rptDocuments_ctl01_lnkDocument" class="bold" target="_blank">Appendix C Reactivity Testing Results and Analysis</a>
<a href="/assets/1/20/Givaudan_Investigation_Report_Publication.pdf?17346" id="CT_InvestigationDetails_8_rptDocumentsCategory_ctl29_rptDocuments_ctl01_lnkDocument" class="bold" target="_blank">Fatal Runaway Reaction and Explosion at Givaudan Sense Colour / D.D. Williamson</a>
<a href="/assets/1/6/Givaudan_Investigation_Update_-_final.pdf?17132" id="CT_InvestigationDetails_8_rptDocumentsCategory_ctl35_rptDocuments_ctl01_lnkDocument" class="bold" target="_blank">Givaudan Explosion Investigation Update</a>
<a target="_blank" href="/assets/recommendation/Status_Change_Summary_CRA_(Givaudan_R12).pdf" > Recommendation Status Change Summary</a>
<a target="_blank" href="/assets/recommendation/Status_Change_Summary_Givaudan_Corp_(Givaudan_R8).pdf" > Recommendation Status Change Summary</a>
<a target="_blank" href="/assets/recommendation/Status_Change_Summary_Givaudan_(Givaudan_R1).pdf" > Recommendation Status Change Summary</a>
<a target="_blank" href="/assets/recommendation/Status_Change_Summary_Givaudan_(Givaudan_R2).pdf" > Recommendation Status Change Summary</a>
</body></html>
+1
View File
@@ -0,0 +1 @@
<?xml version="1.0" encoding="utf-8"?><urlset xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.sitemaps.org/schemas/sitemap/0.9 https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://www.csb.gov/recommendations/preventive-maintenance/</loc><lastmod>2022-06-02T17:17:27-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/site-map/</loc><lastmod>2017-05-05T23:59:28-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/recommendations/preventive-maintenance-investigations/</loc><lastmod>2018-04-27T14:32:25-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/data-quality-/</loc><lastmod>2025-07-28T13:37:44-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/recommendations/preventive-maintenances/</loc><lastmod>2022-06-02T17:19:06-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/video-feedback-form/</loc><lastmod>2017-05-04T18:17:43-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/</loc><lastmod>2017-05-08T16:06:42-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/completed-investigations/</loc><lastmod>2017-05-30T19:02:58-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/current-investigations/</loc><lastmod>2020-10-19T15:06:55-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/</loc><lastmod>2017-03-09T13:38:53-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/take-more-action-to-prevent-dust-explosions/</loc><lastmod>2013-05-17T16:46:08-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/protect-public-employees-from-workplace-accidents/</loc><lastmod>2013-05-17T16:46:34-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url></urlset>
+71
View File
@@ -0,0 +1,71 @@
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>
<![CDATA[Latest Updates]]>
</title>
<description>
<![CDATA[The most recent blogs and online articles from The Economist]]>
</description>
<link>https://www.economist.com/latest</link>
<pubDate>Wed, 10 Jun 2026 21:11:56 +0000</pubDate>
<lastBuildDate>Wed, 10 Jun 2026 21:11:56 +0000</lastBuildDate>
<atom:link href="https://www.economist.com/latest/rss.xml" rel="self" type="application/rss+xml"/>
<item>
<title>
<![CDATA[Syria is an unexpected beneficiary of the Gulf war]]>
</title>
<description>
<![CDATA[The revival of an old oil-export route from Iraq to the Mediterranean helps Syrias new regime]]>
</description>
<link>https://www.economist.com/middle-east-and-africa/2026/06/10/syria-is-an-unexpected-beneficiary-of-the-gulf-war</link>
<guid isPermaLink="false">5737613e-c6cd-4cf0-b7da-fbfb52872f63</guid>
<pubDate>Wed, 10 Jun 2026 19:26:42 +0000</pubDate>
</item>
<item>
<title>
<![CDATA[How to win the World Cup]]>
</title>
<description>
<![CDATA[Being rich helps, but being open to immigration works best of all]]>
</description>
<link>https://www.economist.com/international/2026/06/10/how-to-win-the-world-cup</link>
<guid isPermaLink="false">1019df1e-5c1e-4784-ae0c-31741c176e41</guid>
<pubDate>Wed, 10 Jun 2026 19:07:01 +0000</pubDate>
</item>
<item>
<title>
<![CDATA[American capitalism is run by millionaires, not billionaires]]>
</title>
<description>
<![CDATA[They hide in plain sight—and wield enormous power]]>
</description>
<link>https://www.economist.com/business/2026/06/10/american-capitalism-is-run-by-millionaires-not-billionaires</link>
<guid isPermaLink="false">dbbcb101-a7de-472b-a62c-d969ab033b90</guid>
<pubDate>Wed, 10 Jun 2026 19:01:31 +0000</pubDate>
</item>
<item>
<title>
<![CDATA[New techniques can predict and prevent lung cancer ]]>
</title>
<description>
<![CDATA[A molecular signature can identify those most at risk]]>
</description>
<link>https://www.economist.com/science-and-technology/2026/06/10/new-techniques-can-predict-and-prevent-lung-cancer</link>
<guid isPermaLink="false">dbc7231c-6c7c-42fb-8930-bb099e1d3015</guid>
<pubDate>Wed, 10 Jun 2026 18:48:35 +0000</pubDate>
</item>
<item>
<title>
<![CDATA[The World Cup has always been beset by scandal and strife]]>
</title>
<description>
<![CDATA[So has FIFA, the outfit that administers it]]>
</description>
<link>https://www.economist.com/international/2026/06/10/the-world-cup-has-always-been-beset-by-scandal-and-strife</link>
<guid isPermaLink="false">f2213e72-3531-4894-a33f-47bce2fea4e9</guid>
<pubDate>Wed, 10 Jun 2026 18:25:19 +0000</pubDate>
</item>
</channel>
</rss>
File diff suppressed because one or more lines are too long
+262
View File
@@ -0,0 +1,262 @@
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<channel rdf:about="https://asia.nikkei.com/rss/feed/nar" xml:lang="en-GB">
<title>Nikkei Asia</title>
<link>https://asia.nikkei.com/</link>
<description/>
<items>
<rdf:Seq>
<rdf:li resource="https://asia.nikkei.com/business/media-entertainment/tokyo-disneyland-magic-in-doubt-as-operator-s-stock-falls"/>
<rdf:li resource="https://asia.nikkei.com/business/tech/semiconductors/sk-hynix-to-triple-wafer-capacity-by-2034-chairman-chey"/>
<rdf:li resource="https://asia.nikkei.com/editor-s-picks/china-up-close/analysis-kim-jong-un-emerges-as-winner-in-summit-with-xi-jinping"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/ntt-sets-sights-on-nvidia-ai-race-with-500m-optical-network-fund"/>
<rdf:li resource="https://asia.nikkei.com/politics/japan-to-help-content-industry-sue-over-copyright-infringement-abroad"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/artificial-intelligence/anthropic-plugs-claude-ai-in-japan-for-automated-software-development"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/supply-chain/us-tungsten-scrap-exports-to-japan-soar-on-chinese-curbs"/>
<rdf:li resource="https://asia.nikkei.com/business/business-deals/tdk-to-buy-us-maker-of-ai-data-center-cooling-components-for-up-to-400m"/>
<rdf:li resource="https://asia.nikkei.com/business/insurance/nippon-life-s-private-credit-assets-reach-4.6bn"/>
<rdf:li resource="https://asia.nikkei.com/business/business-trends/us-firms-see-china-as-essential-despite-rising-economic-and-political-risks"/>
<rdf:li resource="https://asia.nikkei.com/economy/fires/manslaughter-other-charges-filed-over-hong-kong-s-wang-fuk-court-fire"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/adb-and-peers-need-to-anchor-international-order-president-kanda-says"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/asia-faces-risks-of-economic-spillover-from-iran-and-ai-disinformation"/>
<rdf:li resource="https://asia.nikkei.com/business/companies/swire-dangles-cathay-shares-in-600m-convertible-bond-issuance"/>
<rdf:li resource="https://asia.nikkei.com/business/materials/shin-etsu-to-set-up-rare-earth-smelter-in-japan-to-ease-reliance-on-china"/>
<rdf:li resource="https://asia.nikkei.com/editor-s-picks/interview/setting-sea-border-with-japan-vital-philippine-foreign-secretary"/>
<rdf:li resource="https://asia.nikkei.com/business/markets/strong-dollar-rally-weighs-heavier-on-struggling-asian-countries"/>
<rdf:li resource="https://asia.nikkei.com/business/fisheries/thailand-s-shrimp-industry-hit-by-malaysia-s-import-ban"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/applied-materials-opens-500m-manufacturing-campus-in-singapore"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/malaysia-s-anwar-warns-against-global-powers-weaponizing-trade"/>
<rdf:li resource="https://asia.nikkei.com/business/energy/apple-and-nvidia-supplier-foxconn-invests-in-vietnam-solar-wind-power"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/sports/world-cup-tests-asia-s-appetite-for-costly-broadcast-rights"/>
<rdf:li resource="https://asia.nikkei.com/business/finance/brookfield-bets-on-its-japan-business-to-top-hong-kong-and-singapore"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/tencent-raises-4.6bn-in-dual-dollar-yuan-bond-issuances"/>
<rdf:li resource="https://asia.nikkei.com/business/automobiles/electric-vehicles/ferrari-luce-ev-highlights-european-struggle-to-lure-back-china-s-superrich"/>
<rdf:li resource="https://asia.nikkei.com/business/markets/commodities/indian-families-scale-back-on-gold-for-weddings-as-prices-hover-near-highs"/>
<rdf:li resource="https://asia.nikkei.com/politics/defense/japan-s-new-defense-document-to-name-china-the-biggest-concern"/>
<rdf:li resource="https://asia.nikkei.com/business/food-beverage/japan-to-bolster-ip-protections-for-prized-new-fruit-vegetable-varieties"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/environment/climate-change/japan-s-jgc-bets-on-carbon-feeding-bacteria-to-create-bioplastics"/>
<rdf:li resource="https://asia.nikkei.com/business/travel-leisure/rural-japan-hopes-to-charm-domestic-travelers-priced-out-of-overseas-trips"/>
<rdf:li resource="https://asia.nikkei.com/business/energy/gm-partners-with-peak-energy-for-sodium-ion-battery-storage"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/tech-asia/japan-seeks-bigger-role-in-asia-s-subsea-cables-as-ai-rewires-demand"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/comment/why-japan-s-takaichi-has-stepped-back-from-boj-rate-hike-debate"/>
<rdf:li resource="https://asia.nikkei.com/business/transportation/japan-flying-car-startup-skydrive-aims-for-the-skies-in-2028"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/hanwha-qcells-kicks-off-first-fully-onshore-us-solar-supply-chain"/>
<rdf:li resource="https://asia.nikkei.com/business/energy/japan-s-mitsubishi-hc-canada-s-brookfield-to-buy-european-wind-solar-farms-in-ai-play2"/>
<rdf:li resource="https://asia.nikkei.com/business/technology/g7-plans-first-joint-statement-for-protecting-minors-on-social-media"/>
<rdf:li resource="https://asia.nikkei.com/business/automobiles/toyota-backs-japan-self-driving-startup-tier-iv-in-development-push"/>
<rdf:li resource="https://asia.nikkei.com/business/softbank/jpmorgan-chase-emerges-as-softbank-group-s-top-lender-surpassing-mizuho"/>
<rdf:li resource="https://asia.nikkei.com/business/energy/malaysia-to-promise-japan-maximum-possible-lng-naphtha"/>
<rdf:li resource="https://asia.nikkei.com/photos/in-focus-mindanao-reels-from-another-deadly-earthquake"/>
<rdf:li resource="https://asia.nikkei.com/politics/international-relations/us-china-tensions/pentagon-blacklists-alibaba-byd-and-baidu-over-alleged-military-ties"/>
<rdf:li resource="https://asia.nikkei.com/economy/bank-of-japan/bank-of-japan-set-to-hike-key-interest-rate-to-1"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/semiconductor-advances-a-must-for-data-centers-says-tokyo-electron-boss"/>
<rdf:li resource="https://asia.nikkei.com/politics/international-relations/xi-shores-up-china-s-sway-in-pyongyang-wary-of-north-korea-russia-ties"/>
<rdf:li resource="https://asia.nikkei.com/opinion/south-korea-election-yoon-s-legacy-partially-survives-progressive-victory"/>
<rdf:li resource="https://asia.nikkei.com/business/automobiles/electric-vehicles/chinese-entrepreneur-s-e-truck-startup-windrose-faces-unpaid-wage-claims"/>
<rdf:li resource="https://asia.nikkei.com/economy/bank-indonesia-raises-rates-0.25-at-emergency-meeting-to-defend-rupiah"/>
<rdf:li resource="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/japan-ex-pm-kishida-calls-for-deeper-energy-ties-with-south-korea"/>
<rdf:li resource="https://asia.nikkei.com/business/markets/equities/binance-eyes-asian-stock-trading-as-bitcoin-slumps"/>
</rdf:Seq>
</items>
</channel>
<item rdf:about="https://asia.nikkei.com/business/media-entertainment/tokyo-disneyland-magic-in-doubt-as-operator-s-stock-falls">
<title><![CDATA[Tokyo Disneyland 'magic' in doubt as operator's stock falls]]></title>
<link>https://asia.nikkei.com/business/media-entertainment/tokyo-disneyland-magic-in-doubt-as-operator-s-stock-falls</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/tech/semiconductors/sk-hynix-to-triple-wafer-capacity-by-2034-chairman-chey">
<title><![CDATA[SK Hynix to triple wafer capacity by 2034: Chairman Chey]]></title>
<link>https://asia.nikkei.com/business/tech/semiconductors/sk-hynix-to-triple-wafer-capacity-by-2034-chairman-chey</link>
</item>
<item rdf:about="https://asia.nikkei.com/editor-s-picks/china-up-close/analysis-kim-jong-un-emerges-as-winner-in-summit-with-xi-jinping">
<title><![CDATA[Analysis: Kim Jong Un emerges as winner in summit with Xi Jinping]]></title>
<link>https://asia.nikkei.com/editor-s-picks/china-up-close/analysis-kim-jong-un-emerges-as-winner-in-summit-with-xi-jinping</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/ntt-sets-sights-on-nvidia-ai-race-with-500m-optical-network-fund">
<title><![CDATA[NTT sets sights on Nvidia, AI race with $500m optical network fund]]></title>
<link>https://asia.nikkei.com/business/technology/ntt-sets-sights-on-nvidia-ai-race-with-500m-optical-network-fund</link>
</item>
<item rdf:about="https://asia.nikkei.com/politics/japan-to-help-content-industry-sue-over-copyright-infringement-abroad">
<title><![CDATA[Japan to help content industry sue over copyright infringement abroad]]></title>
<link>https://asia.nikkei.com/politics/japan-to-help-content-industry-sue-over-copyright-infringement-abroad</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/artificial-intelligence/anthropic-plugs-claude-ai-in-japan-for-automated-software-development">
<title><![CDATA[Anthropic plugs Claude AI in Japan for automated software development]]></title>
<link>https://asia.nikkei.com/business/technology/artificial-intelligence/anthropic-plugs-claude-ai-in-japan-for-automated-software-development</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/supply-chain/us-tungsten-scrap-exports-to-japan-soar-on-chinese-curbs">
<title><![CDATA[US tungsten scrap exports to Japan soar on Chinese curbs]]></title>
<link>https://asia.nikkei.com/spotlight/supply-chain/us-tungsten-scrap-exports-to-japan-soar-on-chinese-curbs</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/business-deals/tdk-to-buy-us-maker-of-ai-data-center-cooling-components-for-up-to-400m">
<title><![CDATA[TDK to buy US maker of AI data center cooling components for up to $400m]]></title>
<link>https://asia.nikkei.com/business/business-deals/tdk-to-buy-us-maker-of-ai-data-center-cooling-components-for-up-to-400m</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/insurance/nippon-life-s-private-credit-assets-reach-4.6bn">
<title><![CDATA[Nippon Life's private credit assets reach $4.6bn]]></title>
<link>https://asia.nikkei.com/business/insurance/nippon-life-s-private-credit-assets-reach-4.6bn</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/business-trends/us-firms-see-china-as-essential-despite-rising-economic-and-political-risks">
<title><![CDATA[US firms see China as essential despite rising economic and political risks]]></title>
<link>https://asia.nikkei.com/business/business-trends/us-firms-see-china-as-essential-despite-rising-economic-and-political-risks</link>
</item>
<item rdf:about="https://asia.nikkei.com/economy/fires/manslaughter-other-charges-filed-over-hong-kong-s-wang-fuk-court-fire">
<title><![CDATA[Manslaughter, other charges filed over Hong Kong's Wang Fuk Court fire]]></title>
<link>https://asia.nikkei.com/economy/fires/manslaughter-other-charges-filed-over-hong-kong-s-wang-fuk-court-fire</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/adb-and-peers-need-to-anchor-international-order-president-kanda-says">
<title><![CDATA[ADB and peers need to 'anchor' international order: President Kanda says]]></title>
<link>https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/adb-and-peers-need-to-anchor-international-order-president-kanda-says</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/asia-faces-risks-of-economic-spillover-from-iran-and-ai-disinformation">
<title><![CDATA[Asia faces risks of economic spillover from Iran and AI disinformation]]></title>
<link>https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/asia-faces-risks-of-economic-spillover-from-iran-and-ai-disinformation</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/companies/swire-dangles-cathay-shares-in-600m-convertible-bond-issuance">
<title><![CDATA[Swire dangles Cathay shares in $600m convertible bond issuance]]></title>
<link>https://asia.nikkei.com/business/companies/swire-dangles-cathay-shares-in-600m-convertible-bond-issuance</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/materials/shin-etsu-to-set-up-rare-earth-smelter-in-japan-to-ease-reliance-on-china">
<title><![CDATA[Shin-Etsu to set up rare-earth smelter in Japan to ease reliance on China]]></title>
<link>https://asia.nikkei.com/business/materials/shin-etsu-to-set-up-rare-earth-smelter-in-japan-to-ease-reliance-on-china</link>
</item>
<item rdf:about="https://asia.nikkei.com/editor-s-picks/interview/setting-sea-border-with-japan-vital-philippine-foreign-secretary">
<title><![CDATA[Setting sea border with Japan vital: Philippine foreign secretary]]></title>
<link>https://asia.nikkei.com/editor-s-picks/interview/setting-sea-border-with-japan-vital-philippine-foreign-secretary</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/markets/strong-dollar-rally-weighs-heavier-on-struggling-asian-countries">
<title><![CDATA[Strong dollar rally weighs heavier on struggling Asian countries]]></title>
<link>https://asia.nikkei.com/business/markets/strong-dollar-rally-weighs-heavier-on-struggling-asian-countries</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/fisheries/thailand-s-shrimp-industry-hit-by-malaysia-s-import-ban">
<title><![CDATA[Thailand's shrimp industry hit by Malaysia's import ban]]></title>
<link>https://asia.nikkei.com/business/fisheries/thailand-s-shrimp-industry-hit-by-malaysia-s-import-ban</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/applied-materials-opens-500m-manufacturing-campus-in-singapore">
<title><![CDATA[Applied Materials opens $500m manufacturing campus in Singapore]]></title>
<link>https://asia.nikkei.com/business/technology/applied-materials-opens-500m-manufacturing-campus-in-singapore</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/malaysia-s-anwar-warns-against-global-powers-weaponizing-trade">
<title><![CDATA[Malaysia's Anwar warns against global powers weaponizing trade]]></title>
<link>https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/malaysia-s-anwar-warns-against-global-powers-weaponizing-trade</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/energy/apple-and-nvidia-supplier-foxconn-invests-in-vietnam-solar-wind-power">
<title><![CDATA[Apple and Nvidia supplier Foxconn invests in Vietnam solar, wind power]]></title>
<link>https://asia.nikkei.com/business/energy/apple-and-nvidia-supplier-foxconn-invests-in-vietnam-solar-wind-power</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/sports/world-cup-tests-asia-s-appetite-for-costly-broadcast-rights">
<title><![CDATA[World Cup tests Asia's appetite for costly broadcast rights]]></title>
<link>https://asia.nikkei.com/spotlight/sports/world-cup-tests-asia-s-appetite-for-costly-broadcast-rights</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/finance/brookfield-bets-on-its-japan-business-to-top-hong-kong-and-singapore">
<title><![CDATA[Brookfield bets on its Japan business to top Hong Kong and Singapore]]></title>
<link>https://asia.nikkei.com/business/finance/brookfield-bets-on-its-japan-business-to-top-hong-kong-and-singapore</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/tencent-raises-4.6bn-in-dual-dollar-yuan-bond-issuances">
<title><![CDATA[Tencent raises $4.6bn in dual dollar, yuan bond issuances]]></title>
<link>https://asia.nikkei.com/business/technology/tencent-raises-4.6bn-in-dual-dollar-yuan-bond-issuances</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/automobiles/electric-vehicles/ferrari-luce-ev-highlights-european-struggle-to-lure-back-china-s-superrich">
<title><![CDATA[Ferrari Luce EV highlights European struggle to lure back China's superrich]]></title>
<link>https://asia.nikkei.com/business/automobiles/electric-vehicles/ferrari-luce-ev-highlights-european-struggle-to-lure-back-china-s-superrich</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/markets/commodities/indian-families-scale-back-on-gold-for-weddings-as-prices-hover-near-highs">
<title><![CDATA[Indian families scale back on gold for weddings as prices hover near highs]]></title>
<link>https://asia.nikkei.com/business/markets/commodities/indian-families-scale-back-on-gold-for-weddings-as-prices-hover-near-highs</link>
</item>
<item rdf:about="https://asia.nikkei.com/politics/defense/japan-s-new-defense-document-to-name-china-the-biggest-concern">
<title><![CDATA[Japan's new defense document to name China the biggest concern]]></title>
<link>https://asia.nikkei.com/politics/defense/japan-s-new-defense-document-to-name-china-the-biggest-concern</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/food-beverage/japan-to-bolster-ip-protections-for-prized-new-fruit-vegetable-varieties">
<title><![CDATA[Japan to bolster IP protections for prized new fruit, vegetable varieties]]></title>
<link>https://asia.nikkei.com/business/food-beverage/japan-to-bolster-ip-protections-for-prized-new-fruit-vegetable-varieties</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/environment/climate-change/japan-s-jgc-bets-on-carbon-feeding-bacteria-to-create-bioplastics">
<title><![CDATA[Japan's JGC bets on carbon-feeding bacteria to create bioplastics]]></title>
<link>https://asia.nikkei.com/spotlight/environment/climate-change/japan-s-jgc-bets-on-carbon-feeding-bacteria-to-create-bioplastics</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/travel-leisure/rural-japan-hopes-to-charm-domestic-travelers-priced-out-of-overseas-trips">
<title><![CDATA[Rural Japan hopes to charm domestic travelers priced out of overseas trips]]></title>
<link>https://asia.nikkei.com/business/travel-leisure/rural-japan-hopes-to-charm-domestic-travelers-priced-out-of-overseas-trips</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/energy/gm-partners-with-peak-energy-for-sodium-ion-battery-storage">
<title><![CDATA[GM partners with Peak Energy for sodium-ion battery storage]]></title>
<link>https://asia.nikkei.com/business/energy/gm-partners-with-peak-energy-for-sodium-ion-battery-storage</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/tech-asia/japan-seeks-bigger-role-in-asia-s-subsea-cables-as-ai-rewires-demand">
<title><![CDATA[Japan seeks bigger role in Asia's subsea cables as AI rewires demand]]></title>
<link>https://asia.nikkei.com/business/technology/tech-asia/japan-seeks-bigger-role-in-asia-s-subsea-cables-as-ai-rewires-demand</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/comment/why-japan-s-takaichi-has-stepped-back-from-boj-rate-hike-debate">
<title><![CDATA[Why Japan's Takaichi has stepped back from BOJ rate hike debate]]></title>
<link>https://asia.nikkei.com/spotlight/comment/why-japan-s-takaichi-has-stepped-back-from-boj-rate-hike-debate</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/transportation/japan-flying-car-startup-skydrive-aims-for-the-skies-in-2028">
<title><![CDATA[Japan flying car startup SkyDrive aims for the skies in 2028]]></title>
<link>https://asia.nikkei.com/business/transportation/japan-flying-car-startup-skydrive-aims-for-the-skies-in-2028</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/hanwha-qcells-kicks-off-first-fully-onshore-us-solar-supply-chain">
<title><![CDATA[Hanwha Qcells kicks off first fully onshore US solar supply chain]]></title>
<link>https://asia.nikkei.com/business/technology/hanwha-qcells-kicks-off-first-fully-onshore-us-solar-supply-chain</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/energy/japan-s-mitsubishi-hc-canada-s-brookfield-to-buy-european-wind-solar-farms-in-ai-play2">
<title><![CDATA[Japan's Mitsubishi HC, Canada's Brookfield to buy European wind, solar farms in AI play]]></title>
<link>https://asia.nikkei.com/business/energy/japan-s-mitsubishi-hc-canada-s-brookfield-to-buy-european-wind-solar-farms-in-ai-play2</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/technology/g7-plans-first-joint-statement-for-protecting-minors-on-social-media">
<title><![CDATA[G7 plans first joint statement for protecting minors on social media]]></title>
<link>https://asia.nikkei.com/business/technology/g7-plans-first-joint-statement-for-protecting-minors-on-social-media</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/automobiles/toyota-backs-japan-self-driving-startup-tier-iv-in-development-push">
<title><![CDATA[Toyota backs Japan self-driving startup Tier IV in development push]]></title>
<link>https://asia.nikkei.com/business/automobiles/toyota-backs-japan-self-driving-startup-tier-iv-in-development-push</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/softbank/jpmorgan-chase-emerges-as-softbank-group-s-top-lender-surpassing-mizuho">
<title><![CDATA[JPMorgan Chase emerges as SoftBank Group's top lender, surpassing Mizuho]]></title>
<link>https://asia.nikkei.com/business/softbank/jpmorgan-chase-emerges-as-softbank-group-s-top-lender-surpassing-mizuho</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/energy/malaysia-to-promise-japan-maximum-possible-lng-naphtha">
<title><![CDATA[Malaysia to promise Japan maximum possible LNG, naphtha]]></title>
<link>https://asia.nikkei.com/business/energy/malaysia-to-promise-japan-maximum-possible-lng-naphtha</link>
</item>
<item rdf:about="https://asia.nikkei.com/photos/in-focus-mindanao-reels-from-another-deadly-earthquake">
<title><![CDATA[In Focus: Mindanao reels from another deadly earthquake]]></title>
<link>https://asia.nikkei.com/photos/in-focus-mindanao-reels-from-another-deadly-earthquake</link>
</item>
<item rdf:about="https://asia.nikkei.com/politics/international-relations/us-china-tensions/pentagon-blacklists-alibaba-byd-and-baidu-over-alleged-military-ties">
<title><![CDATA[Pentagon blacklists Alibaba, BYD and Baidu over alleged military ties]]></title>
<link>https://asia.nikkei.com/politics/international-relations/us-china-tensions/pentagon-blacklists-alibaba-byd-and-baidu-over-alleged-military-ties</link>
</item>
<item rdf:about="https://asia.nikkei.com/economy/bank-of-japan/bank-of-japan-set-to-hike-key-interest-rate-to-1">
<title><![CDATA[Bank of Japan set to hike key interest rate to 1%]]></title>
<link>https://asia.nikkei.com/economy/bank-of-japan/bank-of-japan-set-to-hike-key-interest-rate-to-1</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/semiconductor-advances-a-must-for-data-centers-says-tokyo-electron-boss">
<title><![CDATA[Semiconductor advances a 'must' for data centers, says Tokyo Electron boss]]></title>
<link>https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/semiconductor-advances-a-must-for-data-centers-says-tokyo-electron-boss</link>
</item>
<item rdf:about="https://asia.nikkei.com/politics/international-relations/xi-shores-up-china-s-sway-in-pyongyang-wary-of-north-korea-russia-ties">
<title><![CDATA[Xi shores up China's sway in Pyongyang, wary of North Korea-Russia ties]]></title>
<link>https://asia.nikkei.com/politics/international-relations/xi-shores-up-china-s-sway-in-pyongyang-wary-of-north-korea-russia-ties</link>
</item>
<item rdf:about="https://asia.nikkei.com/opinion/south-korea-election-yoon-s-legacy-partially-survives-progressive-victory">
<title><![CDATA[South Korea election: Yoon's legacy partially survives progressive victory]]></title>
<link>https://asia.nikkei.com/opinion/south-korea-election-yoon-s-legacy-partially-survives-progressive-victory</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/automobiles/electric-vehicles/chinese-entrepreneur-s-e-truck-startup-windrose-faces-unpaid-wage-claims">
<title><![CDATA[Chinese entrepreneur's e-truck startup Windrose faces unpaid wage claims]]></title>
<link>https://asia.nikkei.com/business/automobiles/electric-vehicles/chinese-entrepreneur-s-e-truck-startup-windrose-faces-unpaid-wage-claims</link>
</item>
<item rdf:about="https://asia.nikkei.com/economy/bank-indonesia-raises-rates-0.25-at-emergency-meeting-to-defend-rupiah">
<title><![CDATA[Bank Indonesia raises rates 0.25% at emergency meeting to defend rupiah]]></title>
<link>https://asia.nikkei.com/economy/bank-indonesia-raises-rates-0.25-at-emergency-meeting-to-defend-rupiah</link>
</item>
<item rdf:about="https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/japan-ex-pm-kishida-calls-for-deeper-energy-ties-with-south-korea">
<title><![CDATA[Japan ex-PM Kishida calls for deeper energy ties with South Korea]]></title>
<link>https://asia.nikkei.com/spotlight/the-future-of-asia/future-of-asia-2026/japan-ex-pm-kishida-calls-for-deeper-energy-ties-with-south-korea</link>
</item>
<item rdf:about="https://asia.nikkei.com/business/markets/equities/binance-eyes-asian-stock-trading-as-bitcoin-slumps">
<title><![CDATA[Binance eyes Asian stock trading as Bitcoin slumps]]></title>
<link>https://asia.nikkei.com/business/markets/equities/binance-eyes-asian-stock-trading-as-bitcoin-slumps</link>
</item>
</rdf:RDF>
+26
View File
@@ -0,0 +1,26 @@
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139860, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": "", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 1, "total_tokens": 29, "prompt_tps": 183.51595345126498, "generation_tps": 140349.23521338476, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": "", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 2, "total_tokens": 30, "prompt_tps": 183.51595345126498, "generation_tps": 93.60885515563795, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": "안녕하세요,", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 3, "total_tokens": 31, "prompt_tps": 183.51595345126498, "generation_tps": 70.37263329290622, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": "", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 4, "total_tokens": 32, "prompt_tps": 183.51595345126498, "generation_tps": 62.61454940315543, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": " 만나서", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 5, "total_tokens": 33, "prompt_tps": 183.51595345126498, "generation_tps": 58.7098801868211, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": "", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 6, "total_tokens": 34, "prompt_tps": 183.51595345126498, "generation_tps": 56.35974757228211, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": "assistant", "content": " 반갑습니다!", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 7, "total_tokens": 35, "prompt_tps": 183.51595345126498, "generation_tps": 54.81880127112613, "peak_memory": 34.66827434}}
data: {"id": "chatcmpl-4e188b8b-8617-4054-be82-25fece7b56f1", "object": "chat.completion.chunk", "created": 1781139861, "model": "mlx-community/gemma-4-26b-a4b-it-8bit", "choices": [{"index": 0, "finish_reason": "stop", "delta": {"role": "assistant", "content": "", "tool_calls": []}}], "usage": {"input_tokens": 28, "output_tokens": 7, "total_tokens": 35, "prompt_tps": 183.51595345126498, "generation_tps": 54.81880127112613, "peak_memory": 34.66827434}}
data: [DONE]
+12
View File
@@ -0,0 +1,12 @@
data: {"id":"chatcmpl-96ee9a0d-2f66-4357-876f-951c80c23bb2","object":"chat.completion.chunk","created":1781139880,"model":"/Users/hyungi/mlx-models/Qwen3.6-27B-8bit","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant","content":"","reasoning":null,"tool_calls":null,"tool_call_id":null,"name":null},"logprobs":null}],"usage":{"prompt_tokens":25,"completion_tokens":1,"total_tokens":26,"prompt_tokens_details":{"cached_tokens":0},"prompt_tps":0.0,"generation_tps":0.0,"peak_memory":0.0}}
data: {"id":"chatcmpl-96ee9a0d-2f66-4357-876f-951c80c23bb2","object":"chat.completion.chunk","created":1781139880,"model":"/Users/hyungi/mlx-models/Qwen3.6-27B-8bit","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant","content":"","reasoning":null,"tool_calls":null,"tool_call_id":null,"name":null},"logprobs":null}],"usage":{"prompt_tokens":25,"completion_tokens":2,"total_tokens":27,"prompt_tokens_details":{"cached_tokens":0},"prompt_tps":0.0,"generation_tps":0.0,"peak_memory":0.0}}
data: {"id":"chatcmpl-96ee9a0d-2f66-4357-876f-951c80c23bb2","object":"chat.completion.chunk","created":1781139880,"model":"/Users/hyungi/mlx-models/Qwen3.6-27B-8bit","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant","content":"","reasoning":null,"tool_calls":null,"tool_call_id":null,"name":null},"logprobs":null}],"usage":{"prompt_tokens":25,"completion_tokens":3,"total_tokens":28,"prompt_tokens_details":{"cached_tokens":0},"prompt_tps":0.0,"generation_tps":0.0,"peak_memory":0.0}}
data: {"id":"chatcmpl-96ee9a0d-2f66-4357-876f-951c80c23bb2","object":"chat.completion.chunk","created":1781139881,"model":"/Users/hyungi/mlx-models/Qwen3.6-27B-8bit","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant","content":"","reasoning":null,"tool_calls":null,"tool_call_id":null,"name":null},"logprobs":null}],"usage":{"prompt_tokens":25,"completion_tokens":4,"total_tokens":29,"prompt_tokens_details":{"cached_tokens":0},"prompt_tps":0.0,"generation_tps":0.0,"peak_memory":0.0}}
data: {"id":"chatcmpl-96ee9a0d-2f66-4357-876f-951c80c23bb2","object":"chat.completion.chunk","created":1781139881,"model":"/Users/hyungi/mlx-models/Qwen3.6-27B-8bit","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":"안녕하세요!","reasoning":null,"tool_calls":null,"tool_call_id":null,"name":null},"logprobs":null}],"usage":{"prompt_tokens":25,"completion_tokens":5,"total_tokens":30,"prompt_tokens_details":{"cached_tokens":0},"prompt_tps":0.0,"generation_tps":0.0,"peak_memory":0.0}}
data: [DONE]
+327
View File
@@ -0,0 +1,327 @@
"""crawl-24x7 사이클 3 — 순수 함수/형태 회귀 테스트 (DB 불요).
B-4 signal-only(본문 무절단 + enqueue 가드) + C-4 피드 shape + CSB sitemap diff 파서
+ API 공지 목록 파서 + CCPS beacon 링크 파서 + B-5 (Nikkei RDF = feedparser 네이티브,
코드 분기 불요 박제).
fixture = 2026-06-11 live 박제 (tests/fixtures/, [[feedback_external_api_fixture_first]]).
economist/ieee repo 크기 사유로 item 수만 trim (헤더/푸터/item 구조 byte-faithful).
"""
import re
from datetime import datetime, timedelta, timezone
from pathlib import Path
from types import SimpleNamespace
import feedparser
import pytest
from workers import news_collector
from workers.api_standards_collector import _parse_listing, _parse_pub_date
from workers.ccps_collector import _beacon_pdf_links
from workers.csb_collector import _parse_sitemap, _pdf_links, _should_skip
from workers.news_collector import _clean_html, _entry_body
FIXTURES = Path(__file__).parent / "fixtures"
def _feed(name: str):
return feedparser.parse((FIXTURES / name).read_text(encoding="utf-8"))
def _source(**kw):
return SimpleNamespace(
fetch_method=kw.get("fetch_method", "rss"),
fulltext_policy=kw.get("fulltext_policy", "none"),
source_channel=kw.get("source_channel", "news"),
)
# ── B-4: 본문 선택 정책 ───────────────────────────────────────────────────────
class TestEntryBodyPolicy:
def test_signal_only_preserves_full_abstract(self):
"""arXiv 초록 1.6K자 — 기본 1000자 cap 을 적용하면 꼬리 유실."""
entry = _feed("arxiv_appph_rss.xml").entries[0]
summary = _clean_html(entry.get("summary", "")) # 기본 경로 = 1000자 절단
body, ver = _entry_body(_source(fetch_method="signal-only"), entry, summary)
assert ver == "rss-signal"
assert len(body) > 1000 >= len(summary)
assert "Abstract" in body
def test_feed_full_promotes_ieee_description(self):
entry = _feed("ieee_spectrum_energy_rss.xml").entries[0]
summary = _clean_html(entry.get("summary", ""))
body, ver = _entry_body(_source(fulltext_policy="feed-full"), entry, summary)
assert ver == "rss-feed-full"
assert len(body) > 1000
def test_default_source_keeps_capped_summary(self):
entry = _feed("arxiv_appph_rss.xml").entries[0]
summary = _clean_html(entry.get("summary", ""))
body, ver = _entry_body(_source(), entry, summary)
assert ver == "rss"
assert body == summary
def test_signal_only_title_fallback_when_feed_has_no_summary(self):
"""Nikkei RDF = description 없음 — summary 인자(=title 폴백)로 격하."""
entry = _feed("nikkei_asia_nar_rdf.xml").entries[0]
body, ver = _entry_body(
_source(fetch_method="signal-only"), entry, entry.get("title", "")
)
assert ver == "rss-signal"
assert body == entry.get("title", "") != ""
# ── B-4: enqueue 가드 (signal-only = fulltext/summarize 절대 금지) ────────────
class TestSignalOnlyEnqueueGuard:
@staticmethod
def _patch(monkeypatch):
calls = []
async def fake_enqueue(session, doc_id, stage):
calls.append(stage)
monkeypatch.setattr(news_collector, "enqueue_stage", fake_enqueue)
return calls
@pytest.mark.asyncio
async def test_signal_only_overrides_misconfigured_page_policy(self, monkeypatch):
"""레지스트리가 fulltext_policy='page' 로 잘못 설정돼도 페이지 fetch 0 (방어)."""
calls = self._patch(monkeypatch)
doc = SimpleNamespace(id=1, edit_url="https://x/a")
src = _source(fetch_method="signal-only", fulltext_policy="page")
await news_collector._enqueue_processing(
None, doc, src, datetime.now(timezone.utc)
)
assert calls == ["embed", "chunk"] # fulltext/summarize 부재
@pytest.mark.asyncio
async def test_signal_only_news_respects_30day_gate(self, monkeypatch):
calls = self._patch(monkeypatch)
doc = SimpleNamespace(id=1, edit_url="https://x/a")
old = datetime.now(timezone.utc) - timedelta(days=40)
await news_collector._enqueue_processing(
None, doc, _source(fetch_method="signal-only"), old
)
assert calls == []
@pytest.mark.asyncio
async def test_signal_only_crawl_channel_indexes_regardless_of_age(self, monkeypatch):
calls = self._patch(monkeypatch)
doc = SimpleNamespace(id=1, edit_url="https://x/a")
old = datetime.now(timezone.utc) - timedelta(days=400)
src = _source(fetch_method="signal-only", source_channel="crawl")
await news_collector._enqueue_processing(None, doc, src, old)
assert calls == ["embed", "chunk"]
# ── 연결 계층 1회 재시도 (MOEL 첫 TLS 핸드셰이크 간헐 드랍 실측) ──────────────
class TestConnectRetry:
class _Client:
def __init__(self, errors: list):
self.errors = errors
self.calls = 0
async def get(self, url):
self.calls += 1
if self.errors:
raise self.errors.pop(0)
return "OK"
@pytest.mark.asyncio
async def test_single_connect_error_retried_once(self):
import httpx
client = self._Client([httpx.ConnectError("")])
resp = await news_collector._get_with_connect_retry(client, "https://x/feed")
assert resp == "OK" and client.calls == 2
@pytest.mark.asyncio
async def test_second_retry_absorbs_consecutive_drop(self):
"""드랍이 연결 단위 랜덤이라 재시도 1회도 연속으로 걸림 (MOEL lawinfo 실측)."""
import httpx
client = self._Client([httpx.ConnectError(""), httpx.ConnectError("")])
resp = await news_collector._get_with_connect_retry(client, "https://x/feed")
assert resp == "OK" and client.calls == 3
@pytest.mark.asyncio
async def test_persistent_connect_error_propagates(self):
import httpx
client = self._Client([httpx.ConnectError("")] * 3)
with pytest.raises(httpx.ConnectError):
await news_collector._get_with_connect_retry(client, "https://x/feed")
assert client.calls == 3 # 최대 2회 재시도 — 지속 장애는 circuit 몫
@pytest.mark.asyncio
async def test_non_connect_errors_not_retried(self):
import httpx
client = self._Client([httpx.ReadTimeout("")])
with pytest.raises(httpx.ReadTimeout):
await news_collector._get_with_connect_retry(client, "https://x/feed")
assert client.calls == 1
# ── C-4 / B-4 피드 shape (시드 전 live 박제) ─────────────────────────────────
class TestNikkeiRdfNativeParsing:
"""B-5 'rdf' quirk = 코드 분기 불요 실측 — feedparser 가 RSS 1.0 을 정규화."""
def test_rss10_entries_have_title_and_link(self):
f = _feed("nikkei_asia_nar_rdf.xml")
assert f.version == "rss10"
assert not f.bozo and len(f.entries) >= 10
for e in f.entries:
assert e.get("title", "").strip()
assert e.get("link", "").startswith("https://asia.nikkei.com/")
def test_no_summary_no_dates_means_title_signal(self):
e = _feed("nikkei_asia_nar_rdf.xml").entries[0]
assert not e.get("summary", "")
assert not e.get("published_parsed") and not e.get("updated_parsed")
class TestBloombergFixture:
def test_video_items_mixed_in_feed(self):
"""비디오 항목 혼재 실측 → seed parser_quirk='skip-video' 의 근거."""
links = [e.get("link", "") for e in _feed("bloomberg_markets_rss.xml").entries]
video_pat = re.compile(r"/videos?/") # news_collector skip-video 와 동일 패턴
assert any(video_pat.search(u) for u in links)
assert any("/news/articles/" in u and not video_pat.search(u) for u in links)
def test_articles_have_signal_grade_summary(self):
f = _feed("bloomberg_markets_rss.xml")
assert any(len(e.get("summary", "")) >= 100 for e in f.entries)
class TestAsmeJpvtFixture:
def test_journal_identity_and_abstract(self):
f = _feed("asme_jpvt_openissues_rss.xml")
assert "Pressure Vessel Technology" in f.feed.get("title", "")
assert f.entries
for e in f.entries:
assert len(e.get("summary", "")) >= 200 # 초록 = 본문
class TestArxivFixture:
def test_abs_links_are_stable_dedup_keys(self):
"""replace/cross 재공지는 같은 /abs/ URL — edit_url dedup 이 자연 차단."""
f = _feed("arxiv_appph_rss.xml")
assert f.entries
for e in f.entries:
assert re.match(r"https://arxiv\.org/abs/\d", e.get("link", ""))
def test_announce_type_in_summary(self):
e = _feed("arxiv_appph_rss.xml").entries[0]
assert "Announce Type:" in e.get("summary", "")
class TestEconomistFixture:
def test_oneline_signal_summaries(self):
f = _feed("economist_latest_rss.xml")
assert f.entries
for e in f.entries:
assert e.get("title", "").strip()
assert e.get("link", "").startswith("https://www.economist.com/")
# ── CSB sitemap diff 파서 ────────────────────────────────────────────────────
class TestCsbSitemapParsing:
def test_parse_pairs_with_tz_aware_lastmod(self):
xml = (FIXTURES / "csb_sitemap_sample.xml").read_text(encoding="utf-8")
pairs = _parse_sitemap(xml)
assert pairs
for url, lastmod in pairs:
assert url.startswith("https://www.csb.gov/")
assert lastmod.tzinfo is not None
def test_skip_sections_vs_root_slugs(self):
assert _should_skip("https://www.csb.gov/videos/some-video/")
assert _should_skip("https://www.csb.gov/investigations/completed-investigations/")
assert _should_skip("https://www.csb.gov/site-map/")
assert _should_skip("https://www.csb.gov/") # 홈
# 조사 보고서/뉴스 릴리스 = 루트 슬러그 — 수집 대상
assert not _should_skip("https://www.csb.gov/givaudan-sense-colour-explosion-/")
assert not _should_skip("https://www.csb.gov/recommendations/preventive-maintenance/")
def test_watermark_diff_orders_oldest_first(self):
xml = (FIXTURES / "csb_sitemap_sample.xml").read_text(encoding="utf-8")
pairs = [p for p in _parse_sitemap(xml) if not _should_skip(p[0])]
watermark = min(lm for _, lm in pairs)
changed = sorted(
((u, lm) for u, lm in pairs if lm >= watermark), key=lambda p: p[1]
)
assert changed == sorted(changed, key=lambda p: p[1])
assert len(changed) == len(pairs) # >= 경계 포함
class TestCsbPdfLinks:
HTML = (FIXTURES / "csb_investigation_page_excerpt.html").read_text(encoding="utf-8")
BASE = "https://www.csb.gov/givaudan-sense-colour-explosion-/"
def test_report_pdfs_kept_with_cachebuster_query(self):
links = _pdf_links(self.HTML, self.BASE)
assert any("Givaudan_Investigation_Report_Publication.pdf" in u for u in links)
# cache-buster 쿼리는 다운로드 URL 에 유지 (정규화는 파일명/dedup 축에서만)
assert any("?" in u for u in links)
for u in links:
assert u.startswith("https://www.csb.gov/")
def test_recommendation_status_summaries_excluded(self):
links = _pdf_links(self.HTML, self.BASE)
assert links
assert not any("/assets/recommendation/" in u for u in links)
def test_dedup_by_path(self):
html = (
'<a href="/assets/1/6/r.pdf?100">a</a>'
'<a href="/assets/1/6/r.pdf?200">b</a>'
'<a href="https://evil.example.com/x.pdf">c</a>'
)
links = _pdf_links(html, "https://www.csb.gov/page/")
assert len(links) == 1 # 같은 path 1회 + 외부 호스트 제외
assert links[0].startswith("https://www.csb.gov/assets/1/6/r.pdf")
# ── API 표준 공지 목록 파서 ──────────────────────────────────────────────────
class TestApiListingParsing:
HTML = (FIXTURES / "api_standards_announcements_listing.html").read_text(
encoding="utf-8", errors="replace"
)
def test_ten_unique_detail_links_per_page(self):
urls = _parse_listing(self.HTML)
assert len(urls) == 10
assert len(set(urls)) == 10
for u in urls:
assert u.startswith(
"https://www.api.org/products-and-services/standards/"
"important-standards-announcements/"
)
assert "?" not in u # 페이지네이션 링크(?page=) 미혼입
def test_pub_date_parse(self):
dt = _parse_pub_date("Published June 4, 2026 — API announces ...")
assert dt == datetime(2026, 6, 4, tzinfo=timezone.utc)
assert _parse_pub_date("no date here") is None
assert _parse_pub_date("February 31, 2026") is None # 달력 불가 = None
# ── CCPS beacon 링크 파서 ────────────────────────────────────────────────────
class TestCcpsBeaconLinks:
def test_beacon_filter_and_relative_resolve(self):
html = (
'<a href="/sites/default/files/2026-06/Beacon-June-2026.pdf">June</a>'
'<a href="/sites/default/files/beacon_korean_2026_06.pdf"><b>Korean</b></a>'
'<a href="/sites/default/files/other-brochure.pdf">brochure</a>'
'<a href="/sites/default/files/monthly.pdf">Process Safety Beacon June</a>'
)
links = _beacon_pdf_links(html, "https://www.aiche.org/ccps/resources/process-safety-beacon")
assert "https://www.aiche.org/sites/default/files/2026-06/Beacon-June-2026.pdf" in links
assert any("beacon_korean" in u for u in links)
assert any(u.endswith("/monthly.pdf") for u in links) # 앵커 텍스트 매칭
assert not any("other-brochure" in u for u in links)