Compare commits

..

6 Commits

Author SHA1 Message Date
hyungi b10b89fb8c docs(eval): Phase 2 canary retry GO — success 37/40 (92.5%) failed 2 skipped 1
옵션 C 실행 (2026-05-03 02:36-02:39 UTC):
- 5201 documents stuck processing → failed (conditional UPDATE 1 row)
- 3817 재 enqueue → success 35.8s
- 4059 재 enqueue → success 100.7s
- GPU contention 해소 확인 (free 8820 MiB)

최종 tally: success 37 / failed 2 (3810 corrupt PDF + 5201 scan-likely
timeout) / skipped 1 (5090 MAX_PAGES). Plan 3 게이트 모두 PASS.

다음 = 사용자 승인 게이트 (2-C 진입 + nightly 모드 선택). main 머지 +
parent pull + cron 추가 4단계 대기.

후속 백로그 (Phase 1B+, 별도 PR):
- B1 scan-likely auto-skip (5201 패턴)
- B2 OOM 503 transient (야간 contention 자동 복구)
- B3 queue exhausted → doc.md_status 동기화 (corner case 정리)
2026-05-03 02:41:51 +00:00
hyungi 79dc31bd26 docs(eval): Phase 2 canary 결과 — HALT (failed 4/40 = 10%, but 분류상 Marker 0 fail)
35 success / 3 failed / 1 skipped / 1 stuck processing (corner case).
Plan 게이트 FAIL (success<36 + failed>2). 다만 failure root cause 분석:
- 2/4 = GPU contention (5.93+5.35 GiB 다른 process 점유, free 50 MiB)
- 1/4 = 진짜 corrupt PDF (Pdfium error, non-retryable)
- 1/4 = scan-likely + tiny text + ReadTimeout (Phase 1B corner case)

Marker quality 자체 fail = 0. p50 elapsed 33.2s (1D 34s 와 동등),
text_length_ratio p50 1.00 (1D 1.15 대비 -13%, 정상 범위), 신규 warning 없음.

사용자 결정: A(수용) / B(코드 가드 추가) / C(OOM 2건 즉시 재 enqueue → GO 통과)
/ D(HALT 유지). 추천 C 또는 A.

5201 stuck processing 은 어느 옵션이든 수동 DB 정리 필요 (사용자 승인 후).
2026-05-03 02:33:46 +00:00
hyungi 452d88d561 docs(eval): Phase 2 경로 정책 정정 — 2-B /app/logs vs 2-C /app/scripts canonical
Plan/README 가 /app/scripts 를 통일 경로로 가정했으나 실측 결과 read-only
bind-mount 라 docker cp 불가. soft lock 으로 --build 도 금지. 단계별로
다른 경로 사용해야 함:

- 2-B canary (pre-merge): /app/logs/phase2_backfill.py + /app/logs/*.csv
  (docker cp worktree → /app/logs rw bind-mount). canary 검증 동안
  미검증 코드 main 진입 회피.
- 2-C nightly (post-merge canonical): /app/scripts/phase2_backfill.py +
  /app/evals/markdown/phase2_* (feat/phase2-backfill main 머지 +
  parent git pull 후 bind-mount 자동 활성). cron 도 canonical path.

evals/markdown/README.md 의 enqueue 예제 + 신규 #### 경로 정책 섹션 반영.
2026-05-02 23:53:28 +00:00
hyungi efa31caffb docs(eval): Phase 2 canary sample — 40 docs (seed 20260503)
Bucket distribution (algorithm vs allocated):
- large (>10MB): 6 / 6
- scan_likely (text_density<5): 2 / 2
- study_note born-digital: 10 / 10
- Academic_Paper born-digital: 2 / 8 (under-fill — only 20 born-digital docs total in pool)
- Reference born-digital: 0 / 6 (under-fill — 동상)
- tech_doc (Standard/Manual/Specification): 4 / 4
- minor_doc (Note/Report/Memo/NULL): 4 / 4
- filler (rest from candidates): 12 (picked up under-fill slack)

Note: 1D 의 born-digital bias 가정이 Phase 2 실 모집단과 안 맞음
(text_density 분포가 mixed-dominant: 174/237). 그래도 40 docs 가 large /
scan-likely / 다양 doctype 커버 — canary 진단 목적 충족.

Next: 사용자 승인 게이트 — --no-dry-run enqueue 시점 결정.
2026-05-02 23:46:54 +00:00
hyungi 6db394bcd6 docs(eval): Phase 2 inventory dry-run — 237 pending PDFs, 227 convert candidates
- forecast_skip_reason distribution:
  - none: 227 (convert candidates)
  - over_max_pages_estimated: 10 (file_size > 25MB proxy)
  - handwritten_hint: 0 (1D-A1 skip already in marker_worker)
  - doctype_skip: 0
- file_size_band: S=47 / M=160 / L=30
- text_density_band: mixed=174 / scan-likely=43 / born-digital=20
- doc_type top: study_note 79 / Academic_Paper 57 / Reference 35 / Standard 24 / Manual 19
- 시드 baseline for select-canary (next step)
2026-05-02 23:46:21 +00:00
hyungi c81ce8d366 feat(scripts): Phase 2 markdown backfill — script + README
- scripts/phase2_backfill.py: 5 subcommands
  - inventory: pending PDFs dry-run CSV with skip forecast
  - select-canary: stratified 40 sample (seed 20260503)
  - enqueue: one-shot from sample CSV (--no-dry-run gate)
  - nightly-enqueue: cron-friendly with disable flag / marker /ready /
    active-queue threshold (oldest_age stuck guard) / DB pool guards
  - post-report: final state CSV + 1D baseline comparison MD
- evals/markdown/README.md: Phase 2 section appended
- plan: ~/.claude/plans/iridescent-gathering-clover.md
- depends on Phase 1B handwritten skip 7d0fca2 (marker_worker side guard)
2026-05-02 23:44:59 +00:00
75 changed files with 488 additions and 6305 deletions
-203
View File
@@ -1,203 +0,0 @@
"""Morning Briefing API — read-only + 수동 regenerate.
엔드포인트:
- GET /api/briefing/latest : 가장 최근 briefing
- GET /api/briefing?date=YYYY-MM-DD : 특정 날짜 briefing
- POST /api/briefing/regenerate?date=... : 동기 워커 트리거 (admin), DELETE+INSERT tx
응답은 topic 평면 list (axis 반대 — Phase 4 와 달리 country 그룹 X).
각 topic 안에 country_perspectives JSONB 가 들어있어 cross-country 비교 분석을 표현.
"""
from datetime import date as date_type
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from core.auth import get_current_user, require_admin
from core.database import get_session
from models.briefing import BriefingTopic, MorningBriefing
from models.user import User
router = APIRouter()
# ─── Pydantic 응답 모델 ───
class CountryPerspective(BaseModel):
country: str
summary: str
article_ids: list[int] = []
class KeyQuote(BaseModel):
country: str = ""
source: str = ""
quote: str
class TopicResponse(BaseModel):
topic_rank: int
topic_label: str
headline: str
country_perspectives: list[CountryPerspective]
divergences: list[str]
convergences: list[str]
key_quotes: list[KeyQuote]
historical_context: str | None = None
cluster_members: list[int] = []
article_count: int
country_count: int
importance_score: float
llm_fallback_used: bool
class BriefingResponse(BaseModel):
briefing_date: date_type
window_start: datetime
window_end: datetime
decay_lambda: float
total_articles: int
total_countries: int
total_topics: int
generation_ms: int | None
llm_calls: int
llm_failures: int
status: str
headline_oneliner: str | None = None
topics: list[TopicResponse]
class RegenerateResponse(BaseModel):
status: str
briefing_id: int | None
briefing_date: date_type
total_topics: int
total_articles: int
llm_calls: int
llm_failures: int
generation_ms: int
regenerated: bool
# ─── helpers ───
def _build_response(b: MorningBriefing) -> BriefingResponse:
topics = []
for t in sorted(b.topics, key=lambda x: x.topic_rank):
topics.append(
TopicResponse(
topic_rank=t.topic_rank,
topic_label=t.topic_label,
headline=t.headline,
country_perspectives=[
CountryPerspective(**cp) for cp in (t.country_perspectives or [])
],
divergences=list(t.divergences or []),
convergences=list(t.convergences or []),
key_quotes=[KeyQuote(**q) for q in (t.key_quotes or [])],
historical_context=t.historical_context,
cluster_members=list(t.cluster_members or []),
article_count=t.article_count,
country_count=t.country_count,
importance_score=t.importance_score,
llm_fallback_used=t.llm_fallback_used,
)
)
return BriefingResponse(
briefing_date=b.briefing_date,
window_start=b.window_start,
window_end=b.window_end,
decay_lambda=b.decay_lambda,
total_articles=b.total_articles,
total_countries=b.total_countries,
total_topics=b.total_topics,
generation_ms=b.generation_ms,
llm_calls=b.llm_calls,
llm_failures=b.llm_failures,
status=b.status,
headline_oneliner=b.headline_oneliner,
topics=topics,
)
async def _load_briefing(
session: AsyncSession,
target_date: date_type | None,
) -> MorningBriefing | None:
query = select(MorningBriefing).options(selectinload(MorningBriefing.topics))
if target_date is not None:
query = query.where(MorningBriefing.briefing_date == target_date)
else:
query = query.order_by(MorningBriefing.briefing_date.desc())
query = query.limit(1)
result = await session.execute(query)
return result.scalar_one_or_none()
# ─── Routes ───
@router.get("/latest", response_model=BriefingResponse)
async def get_latest(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""가장 최근 morning briefing."""
b = await _load_briefing(session, target_date=None)
if b is None:
raise HTTPException(status_code=404, detail="아직 생성된 briefing 없음")
return _build_response(b)
@router.get("", response_model=BriefingResponse)
async def get_briefing(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
date: date_type | None = Query(default=None, description="YYYY-MM-DD (KST briefing_date)"),
):
"""특정 날짜 briefing (date 미지정 시 최신)."""
b = await _load_briefing(session, target_date=date)
if b is None:
raise HTTPException(
status_code=404,
detail=f"briefing 없음 (date={date})" if date else "아직 생성된 briefing 없음",
)
return _build_response(b)
@router.post("/regenerate", response_model=RegenerateResponse)
async def regenerate(
user: Annotated[User, Depends(require_admin)],
date: date_type | None = Query(default=None, description="YYYY-MM-DD KST 기준 briefing_date"),
):
"""수동 트리거 (admin). 동기 실행 — delete+insert transaction.
date 미지정 시 오늘 KST. 같은 날 row 존재 시 transaction 안에서 삭제 후 신규 생성.
응답 status='success' | 'partial' | 'failed' | 'empty'.
"""
from workers.briefing_worker import run
result = await run(target_date=date)
if result is None:
raise HTTPException(status_code=500, detail="briefing 워커 실행 실패 (로그 확인)")
return RegenerateResponse(
status=result["status"],
briefing_id=result.get("briefing_id"),
briefing_date=date or datetime.now().date(),
total_topics=result["total_topics"],
total_articles=result["total_articles"],
llm_calls=result["llm_calls"],
llm_failures=result["llm_failures"],
generation_ms=result["generation_ms"],
regenerated=result.get("regenerated", True),
)
-54
View File
@@ -33,7 +33,6 @@ from core.config import settings
from core.database import get_session
from core.utils import file_hash
from models.document import Document
from models.document_image import DocumentImage
from models.queue import ProcessingQueue, enqueue_stage
from models.user import User
from services.document_telemetry import record_analyze_event, sanitize_source
@@ -671,59 +670,6 @@ async def get_document_file(
)
@router.get("/{doc_id}/images/{image_key}/raw")
async def get_document_image_raw(
doc_id: int,
image_key: str,
session: Annotated[AsyncSession, Depends(get_session)],
token: str | None = Query(None, description="Bearer token (img 태그용)"),
):
"""marker 추출 이미지 raw bytes (Phase 1B.5).
md_content 안의 `![alt](docimg:img_NNN)` ref 를 frontend selector 가 이 라우트로 변환.
인증된 사용자만 응답 (단일 사용자 환경, ownership 컬럼 없음).
인증: `<img src=>` 는 Authorization header 를 못 보내므로 `?token=` 쿼리 파라미터
로 access token 을 전달 — 기존 `/{doc_id}/file?token=` 엔드포인트 (iframe 용) 와
동일 패턴.
"""
from core.auth import decode_token
if not token:
raise HTTPException(status_code=401, detail="토큰이 필요합니다")
payload = decode_token(token)
if not payload or payload.get("type") != "access":
raise HTTPException(status_code=401, detail="유효하지 않은 토큰")
# 문서 존재 확인 (image_key 만 있고 doc 가 사라진 케이스 차단)
doc = await session.get(Document, doc_id)
if doc is None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
img = await session.scalar(
select(DocumentImage).where(
DocumentImage.document_id == doc_id,
DocumentImage.image_key == image_key,
)
)
if img is None:
raise HTTPException(status_code=404, detail="이미지를 찾을 수 없습니다")
file_path = Path(img.file_path)
if not file_path.is_file():
raise HTTPException(status_code=410, detail="파일이 사라졌습니다")
return FileResponse(
str(file_path),
media_type=img.mime_type,
headers={
# 인증 라우트라 CDN/공용 cache 금지. 단일 사용자라 private + 1h 충분.
"Cache-Control": "private, max-age=3600",
"ETag": f'"{img.content_hash}"',
},
)
@router.post("/", response_model=DocumentResponse, status_code=201)
async def upload_document(
request: Request,
-680
View File
@@ -1,680 +0,0 @@
"""events API — 개인 운영 로그 / 일정 / 할 일 / 회고 (PR-1).
PR-1 scope (plan beszel-tingly-sloth.md v6):
- POST /api/events (kind=task/calendar_event/activity_log)
- GET /api/events/{id}
- GET /api/events?kind&status&from&to&project_tag&source
- PATCH /api/events/{id} (허용 필드만, 시간 필드 변경 시 reschedule history)
- POST /api/events/{id}/complete | /cancel | /defer | /reactivate
- GET /api/events/today (timezone 정책 적용)
- GET /api/events/inbox
- GET /api/events/activity?from&to
PR-1 제외: DELETE / log shortcut / upcoming / ingest / iCal / ntfy.
"""
import json
import logging
from datetime import date, datetime, timedelta, timezone
from typing import Annotated, Any
from zoneinfo import ZoneInfo
from fastapi import APIRouter, Body, Depends, HTTPException, Query
from pydantic import BaseModel, Field
from sqlalchemy import and_, or_, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.event import Event
from models.event_history import EventHistory
from models.user import User
logger = logging.getLogger(__name__)
router = APIRouter()
DEFAULT_TIMEZONE = "Asia/Seoul"
# PATCH 허용 필드 — status/completed_at/cancelled_at/defer_until/source/source_ref/
# raw_metadata/user_id/created_by 는 lifecycle endpoint 또는 시스템 결정.
PATCH_ALLOWED_FIELDS = {
"title",
"description",
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
"priority",
"project_tag",
"tags",
"memo_document_id",
}
# 시간 필드 변경 시 reschedule history 1건 자동 기록 (defer_until 은 /defer 전용).
RESCHEDULE_TIME_FIELDS = {
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
}
# ─── 스키마 ───
class EventCreate(BaseModel):
title: str
description: str | None = None
kind: str # task | calendar_event | activity_log
status: str | None = None # 미지정 시 kind 별 default
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
all_day: bool = False
timezone: str | None = None
priority: int | None = None
project_tag: str | None = None
tags: list[Any] = Field(default_factory=list)
memo_document_id: int | None = None
source: str = "manual"
source_ref: str | None = None
raw_metadata: dict[str, Any] = Field(default_factory=dict)
class EventPatch(BaseModel):
"""PATCH 허용 필드만. status/completed_at 등 lifecycle 필드는 명시 거부."""
title: str | None = None
description: str | None = None
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
all_day: bool | None = None
timezone: str | None = None
priority: int | None = None
project_tag: str | None = None
tags: list[Any] | None = None
memo_document_id: int | None = None
model_config = {"extra": "forbid"} # 허용 외 필드 → 422
class DeferRequest(BaseModel):
defer_until: datetime
class EventResponse(BaseModel):
id: int
title: str
description: str | None
kind: str
status: str
due_at: datetime | None
start_at: datetime | None
end_at: datetime | None
started_at: datetime | None
ended_at: datetime | None
all_day: bool
timezone: str | None
defer_until: datetime | None
completed_at: datetime | None
cancelled_at: datetime | None
priority: int | None
project_tag: str | None
tags: list[Any]
source: str
source_ref: str | None
raw_metadata: dict[str, Any]
memo_document_id: int | None
user_id: int
created_by: str
created_at: datetime
updated_at: datetime
class EventListResponse(BaseModel):
items: list[EventResponse]
total: int
class EventHistoryResponse(BaseModel):
id: int
event_id: int
changed_at: datetime
changed_by: str
change_kind: str
before: dict[str, Any] | None
after: dict[str, Any]
class EventHistoryListResponse(BaseModel):
items: list[EventHistoryResponse]
# ─── 헬퍼 ───
def _to_response(ev: Event) -> EventResponse:
return EventResponse.model_validate(ev, from_attributes=True)
def _serialize_for_history(ev: Event) -> dict[str, Any]:
"""events_history.before/after 용 dict snapshot (JSON 친화)."""
payload: dict[str, Any] = {}
for col in (
"id",
"title",
"description",
"kind",
"status",
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
"defer_until",
"completed_at",
"cancelled_at",
"priority",
"project_tag",
"tags",
"source",
"source_ref",
"raw_metadata",
"memo_document_id",
"user_id",
"created_by",
):
v = getattr(ev, col, None)
if isinstance(v, datetime):
payload[col] = v.isoformat()
else:
payload[col] = v
return payload
def _actor_for_user(user: User) -> str:
"""사용자 직접 호출 = manual. 향후 이드/email_ingest 는 service token 분기 (PR-3)."""
return "manual"
async def _record_history(
session: AsyncSession,
*,
event: Event,
change_kind: str,
changed_by: str,
before: dict[str, Any] | None,
after: dict[str, Any],
) -> None:
history = EventHistory(
event_id=event.id,
changed_by=changed_by,
change_kind=change_kind,
before=before,
after=after,
)
session.add(history)
async def _load_owned(
session: AsyncSession, event_id: int, user: User
) -> Event:
ev = await session.get(Event, event_id)
if ev is None or ev.user_id != user.id:
raise HTTPException(status_code=404, detail="event not found")
return ev
def _resolve_timezone(tz_name: str | None) -> ZoneInfo:
try:
return ZoneInfo(tz_name or DEFAULT_TIMEZONE)
except Exception:
raise HTTPException(status_code=400, detail=f"invalid timezone: {tz_name}")
def _local_day_bounds(tz_name: str | None) -> tuple[datetime, datetime, datetime]:
"""today 의 [start_utc, end_utc) + now_utc 반환."""
tz = _resolve_timezone(tz_name)
now_local = datetime.now(tz)
today_local = now_local.replace(hour=0, minute=0, second=0, microsecond=0)
tomorrow_local = today_local + timedelta(days=1)
return (
today_local.astimezone(timezone.utc),
tomorrow_local.astimezone(timezone.utc),
now_local.astimezone(timezone.utc),
)
def _apply_activity_log_defaults(payload: dict[str, Any]) -> None:
"""빠른 행동 기록 5초 UX — kind=activity_log 시 status/시간 default."""
if payload.get("kind") != "activity_log":
return
now = datetime.now(timezone.utc)
if not payload.get("status"):
payload["status"] = "done"
if payload.get("ended_at") is None:
payload["ended_at"] = now
if payload.get("started_at") is None:
payload["started_at"] = payload["ended_at"]
if payload.get("status") == "done":
payload.setdefault("completed_at", now)
def _apply_kind_default_status(payload: dict[str, Any]) -> None:
"""kind 별 status default 보정."""
if payload.get("status"):
return
kind = payload.get("kind")
if kind == "calendar_event":
payload["status"] = "scheduled"
elif kind == "task":
payload["status"] = "inbox"
# ─── Create ───
@router.post("/", response_model=EventResponse, status_code=201)
async def create_event(
body: EventCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""events 생성. kind=activity_log 면 status=done/ended_at=now() default."""
payload = body.model_dump(exclude_none=False)
_apply_activity_log_defaults(payload)
_apply_kind_default_status(payload)
if payload["kind"] not in ("task", "calendar_event", "activity_log"):
raise HTTPException(status_code=400, detail="invalid kind")
actor = _actor_for_user(user)
ev = Event(
title=payload["title"],
description=payload.get("description"),
kind=payload["kind"],
status=payload.get("status") or "inbox",
due_at=payload.get("due_at"),
start_at=payload.get("start_at"),
end_at=payload.get("end_at"),
started_at=payload.get("started_at"),
ended_at=payload.get("ended_at"),
all_day=payload.get("all_day") or False,
timezone=payload.get("timezone"),
completed_at=payload.get("completed_at"),
priority=payload.get("priority"),
project_tag=payload.get("project_tag"),
tags=payload.get("tags") or [],
source=payload.get("source") or "manual",
source_ref=payload.get("source_ref"),
raw_metadata=payload.get("raw_metadata") or {},
memo_document_id=payload.get("memo_document_id"),
user_id=user.id,
created_by=actor,
)
session.add(ev)
await session.flush()
await _record_history(
session,
event=ev,
change_kind="create",
changed_by=actor,
before=None,
after=_serialize_for_history(ev),
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
# ─── List / Get ───
@router.get("/", response_model=EventListResponse)
async def list_events(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
kind: str | None = Query(None),
status: str | None = Query(None, description="comma-separated list"),
from_: datetime | None = Query(None, alias="from"),
to: datetime | None = Query(None),
project_tag: str | None = Query(None),
source: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
"""events 목록 — current_user.id 자동 필터. upcoming 은 ?from=now&to=now+7d 로."""
where = [Event.user_id == user.id]
if kind:
where.append(Event.kind == kind)
if status:
statuses = [s.strip() for s in status.split(",") if s.strip()]
if statuses:
where.append(Event.status.in_(statuses))
if project_tag:
where.append(Event.project_tag == project_tag)
if source:
where.append(Event.source == source)
if from_ is not None:
# task: due_at, calendar_event: start_at, activity_log: started_at
where.append(
or_(
Event.due_at >= from_,
Event.start_at >= from_,
Event.started_at >= from_,
)
)
if to is not None:
where.append(
or_(
Event.due_at < to,
Event.start_at < to,
Event.started_at < to,
)
)
base = select(Event).where(and_(*where))
total_q = await session.execute(
select(Event.id).where(and_(*where))
)
total = len(total_q.scalars().all())
rows = await session.execute(
base.order_by(Event.created_at.desc())
.offset((page - 1) * page_size)
.limit(page_size)
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=total)
@router.get("/today", response_model=EventListResponse)
async def list_today(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
timezone: str | None = Query(None, description="기본 Asia/Seoul"),
):
"""오늘 해야 할 것 / 예정된 것. timezone 적용.
포함: task(due_at today) / calendar_event(start_at today) / activity_log(started_at today)
status: inbox/next/scheduled/in_progress 또는 deferred (defer_until <= now() 일 때만).
"""
start_utc, end_utc, now_utc = _local_day_bounds(timezone)
today_clause = or_(
and_(Event.kind == "task", Event.due_at >= start_utc, Event.due_at < end_utc),
and_(
Event.kind == "calendar_event",
Event.start_at >= start_utc,
Event.start_at < end_utc,
),
and_(
Event.kind == "activity_log",
Event.started_at >= start_utc,
Event.started_at < end_utc,
),
)
active_clause = or_(
Event.status.in_(("inbox", "next", "scheduled", "in_progress")),
and_(Event.status == "deferred", Event.defer_until <= now_utc),
)
rows = await session.execute(
select(Event)
.where(Event.user_id == user.id, today_clause, active_clause)
.order_by(Event.start_at.asc(), Event.due_at.asc(), Event.started_at.asc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/inbox", response_model=EventListResponse)
async def list_inbox(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""Inbox — 아직 정리 안 된 것."""
rows = await session.execute(
select(Event)
.where(Event.user_id == user.id, Event.status == "inbox")
.order_by(Event.created_at.desc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/activity", response_model=EventListResponse)
async def list_activity(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
from_: datetime | None = Query(None, alias="from"),
to: datetime | None = Query(None),
):
"""Activity timeline — 한 일 (kind=activity_log + status=done). Today 와 분리."""
where = [
Event.user_id == user.id,
Event.kind == "activity_log",
Event.status == "done",
]
if from_ is not None:
where.append(Event.started_at >= from_)
if to is not None:
where.append(Event.started_at < to)
rows = await session.execute(
select(Event).where(and_(*where)).order_by(Event.started_at.desc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/{event_id}", response_model=EventResponse)
async def get_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
return _to_response(ev)
@router.get("/{event_id}/history", response_model=EventHistoryListResponse)
async def get_event_history(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""events_history 조회 — 상세 페이지 timeline. lifecycle op 자동 기록만 (v1)."""
await _load_owned(session, event_id, user) # owner 검증
rows = await session.execute(
select(EventHistory)
.where(EventHistory.event_id == event_id)
.order_by(EventHistory.changed_at.desc())
)
items = [
EventHistoryResponse.model_validate(h, from_attributes=True)
for h in rows.scalars().all()
]
return EventHistoryListResponse(items=items)
# ─── PATCH ───
@router.patch("/{event_id}", response_model=EventResponse)
async def patch_event(
event_id: int,
body: EventPatch,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""PATCH — 허용 필드만. 시간 필드 변경 시 reschedule history 자동 기록.
status/completed_at/cancelled_at/defer_until 등 lifecycle 필드는 별 endpoint 강제.
"""
ev = await _load_owned(session, event_id, user)
patch = body.model_dump(exclude_unset=True)
if not patch:
return _to_response(ev)
# 안전 검사 — extra=forbid 로 막혀 있지만 한 번 더.
for k in patch:
if k not in PATCH_ALLOWED_FIELDS:
raise HTTPException(status_code=400, detail=f"field not patchable: {k}")
time_changed = any(k in RESCHEDULE_TIME_FIELDS for k in patch)
before_snapshot = _serialize_for_history(ev) if time_changed else None
for k, v in patch.items():
setattr(ev, k, v)
await session.flush()
if time_changed:
actor = _actor_for_user(user)
await _record_history(
session,
event=ev,
change_kind="reschedule",
changed_by=actor,
before=before_snapshot,
after=_serialize_for_history(ev),
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
# ─── Lifecycle ───
async def _transition(
session: AsyncSession,
*,
event: Event,
change_kind: str,
new_status: str,
user: User,
extra_apply: dict[str, Any] | None = None,
) -> Event:
actor = _actor_for_user(user)
before = _serialize_for_history(event)
event.status = new_status
if extra_apply:
for k, v in extra_apply.items():
setattr(event, k, v)
await session.flush()
await _record_history(
session,
event=event,
change_kind=change_kind,
changed_by=actor,
before=before,
after=_serialize_for_history(event),
)
return event
@router.post("/{event_id}/complete", response_model=EventResponse)
async def complete_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
now = datetime.now(timezone.utc)
await _transition(
session,
event=ev,
change_kind="complete",
new_status="done",
user=user,
extra_apply={"completed_at": now},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/cancel", response_model=EventResponse)
async def cancel_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
now = datetime.now(timezone.utc)
await _transition(
session,
event=ev,
change_kind="cancel",
new_status="cancelled",
user=user,
extra_apply={"cancelled_at": now},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/defer", response_model=EventResponse)
async def defer_event(
event_id: int,
body: DeferRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
await _transition(
session,
event=ev,
change_kind="defer",
new_status="deferred",
user=user,
extra_apply={"defer_until": body.defer_until},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/reactivate", response_model=EventResponse)
async def reactivate_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""완료/취소/연기 해제 — kind 따라 기본 status 복귀.
task: inbox, calendar_event: scheduled, activity_log: done 유지 안 함 (activity_log 는 done 이 자연 상태이므로 reactivate 적용 X → 400).
"""
ev = await _load_owned(session, event_id, user)
if ev.kind == "activity_log":
raise HTTPException(
status_code=400, detail="activity_log 는 reactivate 대상 아님"
)
new_status = "scheduled" if ev.kind == "calendar_event" else "inbox"
await _transition(
session,
event=ev,
change_kind="reactivate",
new_status=new_status,
user=user,
extra_apply={"completed_at": None, "cancelled_at": None, "defer_until": None},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
+8 -314
View File
@@ -1,38 +1,22 @@
"""메모 CRUD API — text 메모(file_type='note') + voice 메모 (file_type='immutable', category='audio', source_channel='voice')
doc_type enum = (immutable, editable, note). 기존 audio 파일이 file_type='immutable' + category='audio'
패턴을 사용하므로 voice 메모도 같은 패턴 따름 (enum 확장 회피).
"""
"""메모 CRUD API — 파일 없는 문서(file_type='note')"""
import hashlib
import logging
import os
import re
import uuid
from datetime import datetime, timezone
from pathlib import Path
from typing import Annotated, Any
from typing import Annotated
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from pydantic import BaseModel, Field
from sqlalchemy import delete, func, select, or_, and_
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import delete, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.config import settings
from core.database import get_session
from models.document import Document
from models.event import Event
from models.event_history import EventHistory
from models.queue import ProcessingQueue, enqueue_stage
from models.user import User
# Voice upload 제한 (plan v9 결정 — 10분 / 50MB)
VOICE_MAX_BYTES = 50 * 1024 * 1024
VOICE_ALLOWED_EXTS = {".m4a", ".mp3", ".wav", ".webm", ".ogg", ".opus", ".aac"}
VOICE_ALLOWED_CONTENT_PREFIXES = ("audio/",)
VOICE_NAS_SUBDIR = "PKM/Recordings" # /mnt/nas/Document_Server/PKM/Recordings/{YYYY-MM}/{uuid}.{ext}
logger = logging.getLogger(__name__)
router = APIRouter()
@@ -172,15 +156,6 @@ class MemoResponse(BaseModel):
archived: bool
ask_includable: bool
memo_task_state: dict # {"<task_index>": {"checked_at": "<ISO8601>"}}
# Memo Intake Upgrade PR-2B — AI 추천 분류 (사용자 1-click promote 의 hint)
ai_event_kind: str | None = None
ai_event_confidence: float | None = None
source_channel: str | None = None # voice/memo 등 진입점 식별 (UI 배지)
file_type: str | None = None # audio (voice 메모) vs note (text 메모)
file_path: str | None = None # voice 메모의 NAS audio 경로 (audio player 용)
# PR-4 Email Ingest — 이메일 source 메모 식별 + UI 표시용
source_external_id: str | None = None # email 의 Message-ID 또는 imap UID fallback
email_subject: str | None = None # email_metadata.subject — 메모 카드 부제 / 툴팁
created_at: datetime
updated_at: datetime
@@ -210,13 +185,6 @@ def _to_memo_response(doc: Document) -> MemoResponse:
archived=doc.archived,
ask_includable=doc.ask_includable,
memo_task_state=dict(doc.memo_task_state or {}),
ai_event_kind=doc.ai_event_kind,
ai_event_confidence=doc.ai_event_confidence,
source_channel=doc.source_channel,
file_type=doc.file_type,
file_path=doc.file_path,
source_external_id=doc.source_external_id,
email_subject=(doc.email_metadata or {}).get('subject') if doc.email_metadata else None,
created_at=doc.created_at,
updated_at=doc.updated_at,
)
@@ -273,18 +241,10 @@ async def list_memos(
archived: bool = Query(False, description="true면 아카이브 목록"),
pinned: bool | None = Query(None, description="true면 핀 고정된 메모만"),
):
"""메모 목록 — 활성: 핀 우선 + 최신순 / 아카이브: 최신순 (핀 무시)
PR-2C: source_channel='voice' (음성 메모) 도 포함. 사용자 의도 = 메모는 모든 입력의 inbox.
voice 메모는 file_type='immutable' + category='audio' + source_channel='voice' 패턴.
source_channel 만으로 분리 (file_type 필터는 immutable 다른 binary 까지 끌어옴 — 회피).
"""
# PR-4: inbox_ingest 가 만든 email memo 도 포함 (source_external_id != NULL 로 mailplus_archive 의 archive row 제외)
"""메모 목록 — 활성: 핀 우선 + 최신순 / 아카이브: 최신순 (핀 무시)"""
base = select(Document).where(
or_(
Document.source_channel.in_(("memo", "voice")),
and_(Document.source_channel == "email", Document.source_external_id.isnot(None)),
),
Document.file_type == "note",
Document.source_channel == "memo",
Document.deleted_at == None, # noqa: E711
Document.archived == archived,
)
@@ -523,269 +483,3 @@ async def toggle_ask_includable(
await session.refresh(doc)
return _to_memo_response(doc)
# ─── Memo Intake Upgrade PR-2B: promote to event ───
class PromotePayload(BaseModel):
"""메모 → events 승급. kind 미지정 시 documents.ai_event_kind 사용.
AI worker 는 events row 직접 생성 X — 본 endpoint 만이 사용자 의도 channel.
"""
kind: str | None = None # 'task' | 'calendar_event' | 'activity_log'
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
priority: int | None = None
project_tag: str | None = None
_PROMOTE_KIND_MAP = {
# AI 추천 (event_kind_hint) → events.kind
"task": "task",
"calendar_event": "calendar_event",
"activity_log": "activity_log",
# 'note' / 'reference' 는 promote 대상 아님 (사용자가 명시 kind 지정 필요)
}
@router.post("/{memo_id}/promote-to-event", status_code=201)
async def promote_memo_to_event(
memo_id: int,
body: PromotePayload,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 1건 → events row 1건 생성. memo_document_id 자동 link.
kind 결정 순서: body.kind > documents.ai_event_kind > 400 거부.
한 메모 → N events 가능 (정책: dedup 없음, 사용자 의도 따라).
"""
doc = await session.get(Document, memo_id)
if (
not doc
or doc.deleted_at is not None
or doc.source_channel not in ("memo", "voice")
):
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
# kind 결정
requested = (body.kind or "").strip().lower() or None
ai_hint = (doc.ai_event_kind or "").strip().lower() or None
chosen = requested or ai_hint
event_kind = _PROMOTE_KIND_MAP.get(chosen or "")
if not event_kind:
raise HTTPException(
status_code=400,
detail="promote 할 kind 가 명확하지 않습니다 (task/calendar_event/activity_log 중 1개 지정 또는 ai_event_kind 필요)",
)
# 시간 필드 default — activity_log 는 빠른 행동 기록 UX 그대로
now = datetime.now(timezone.utc)
started_at = body.started_at
ended_at = body.ended_at
completed_at: datetime | None = None
status_val = "inbox"
if event_kind == "activity_log":
ended_at = ended_at or now
started_at = started_at or ended_at
completed_at = now
status_val = "done"
elif event_kind == "calendar_event":
status_val = "scheduled" if body.start_at else "inbox"
title = (doc.title or "").strip() or "메모"
description = doc.extracted_text
ev = Event(
title=title,
description=description,
kind=event_kind,
status=status_val,
due_at=body.due_at,
start_at=body.start_at,
end_at=body.end_at,
started_at=started_at,
ended_at=ended_at,
completed_at=completed_at,
priority=body.priority,
project_tag=body.project_tag,
source="memo",
source_ref=str(doc.id), # 같은 메모 N promote 시 별 row → dedup 의도 X
raw_metadata={
"memo_id": doc.id,
"ai_event_kind": doc.ai_event_kind,
"ai_event_confidence": doc.ai_event_confidence,
"promoted_at": now.isoformat(),
},
memo_document_id=doc.id,
user_id=user.id,
created_by="manual",
)
session.add(ev)
await session.flush()
# events_history.create row (events 도메인 패턴 — events/api/events.py 의 _record_history 와 동일 형태)
history = EventHistory(
event_id=ev.id,
changed_by="manual",
change_kind="create",
before=None,
after={
"id": ev.id,
"title": ev.title,
"kind": ev.kind,
"status": ev.status,
"source": ev.source,
"memo_document_id": ev.memo_document_id,
},
)
session.add(history)
await session.commit()
await session.refresh(ev)
return {
"event_id": ev.id,
"kind": ev.kind,
"status": ev.status,
"memo_document_id": ev.memo_document_id,
}
@router.post("/{memo_id}/dismiss-event-suggestion", response_model=MemoResponse)
async def dismiss_event_suggestion(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""'그냥 메모' — AI 추천 무시 + ai_event_kind='note' 강제. 4 버튼 숨김 신호.
MVP: AI 추천값과 사용자 확정값을 같은 컬럼에 저장 (정확도 측정 흐려짐 가능).
백로그: user_event_kind 별 컬럼 분리 (plan Memo Intake Upgrade 백로그).
"""
doc = await session.get(Document, memo_id)
if (
not doc
or doc.deleted_at is not None
or doc.source_channel not in ("memo", "voice")
):
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.ai_event_kind = "note"
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
# ─── Memo Intake Upgrade PR-2C: voice upload ───
@router.post("/voice", response_model=MemoResponse, status_code=201)
async def upload_voice_memo(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
audio: UploadFile = File(...),
recorded_at: str | None = Form(None),
device_hint: str | None = Form(None),
):
"""애플워치 / 모바일 / 기타 음성 메모 업로드 → STT 큐 → 자동 분류.
PR-2C: source_channel='voice' + file_type='audio'. 기존 stt_worker → classify
파이프라인 자동 통과. plan 원칙: AI worker 는 events 직접 생성 X.
"""
# Content-Type 검증
if audio.content_type and not audio.content_type.startswith(VOICE_ALLOWED_CONTENT_PREFIXES):
raise HTTPException(status_code=415, detail=f"지원되지 않는 Content-Type: {audio.content_type}")
# 확장자 결정
orig_name = audio.filename or ""
ext = (Path(orig_name).suffix or "").lower()
if ext and ext not in VOICE_ALLOWED_EXTS:
raise HTTPException(status_code=415, detail=f"지원되지 않는 확장자: {ext}")
if not ext:
# content_type 으로 추정 (audio/m4a 등)
ext = ".m4a"
# 본문 읽기 + size 검증
payload: bytes = await audio.read()
if len(payload) > VOICE_MAX_BYTES:
raise HTTPException(status_code=413, detail=f"50MB 초과 ({len(payload)//1024//1024}MB)")
if len(payload) == 0:
raise HTTPException(status_code=400, detail="빈 audio")
# 저장 경로 (NAS) — fastapi 컨테이너 안 /documents = NAS mount
nas_root = Path(settings.nas_mount_path)
yyyy_mm = datetime.now(timezone.utc).astimezone().strftime("%Y-%m")
target_dir = nas_root / VOICE_NAS_SUBDIR / yyyy_mm
target_dir.mkdir(parents=True, exist_ok=True)
file_uuid = uuid.uuid4().hex
target_path = target_dir / f"{file_uuid}{ext}"
# fsync + rename(atomic) 패턴 — NAS soft mount 안전 (feedback_nfs_korean_path_normalize 결)
tmp_path = target_path.with_suffix(target_path.suffix + ".tmp")
try:
with open(tmp_path, "wb") as fh:
fh.write(payload)
fh.flush()
os.fsync(fh.fileno())
os.replace(tmp_path, target_path)
except OSError as e:
# NAS 쓰기 실패 graceful — DB row 미생성
if tmp_path.exists():
try:
tmp_path.unlink()
except OSError:
pass
logger.error("voice upload NAS write 실패: %s", e)
raise HTTPException(status_code=503, detail="NAS 저장 실패 (재시도 권장)")
# recorded_at 파싱
rec_at: datetime | None = None
if recorded_at:
try:
rec_at = datetime.fromisoformat(recorded_at.replace("Z", "+00:00"))
except ValueError:
rec_at = None
raw_metadata: dict[str, Any] = {}
if device_hint:
raw_metadata["device_hint"] = device_hint
if rec_at:
raw_metadata["recorded_at"] = rec_at.isoformat()
# file_path 는 NAS root 기준 상대 경로 (다른 documents 컨벤션, /api/documents/{id}/file endpoint 호환)
relative_path = target_path.relative_to(nas_root)
# Document row — file_type='immutable' (binary, doc_type enum 제약) + category='audio' + source_channel='voice'
# 기존 audio 컨테이너 인입과 같은 패턴. source_channel='voice' 로 일반 audio 와 구분.
title_seed = (orig_name or "음성 메모").rsplit(".", 1)[0]
doc = Document(
file_path=str(relative_path),
file_hash=hashlib.sha256(payload).hexdigest(),
file_format=ext.lstrip(".") or "m4a",
file_size=len(payload),
file_type="immutable",
title=title_seed[:80] or "음성 메모",
extracted_text=None, # STT 후 채움
review_status="approved",
source_channel="voice",
category="audio",
ask_includable=True,
pinned=False,
archived=False,
memo_task_state={},
extract_meta=raw_metadata or None,
)
session.add(doc)
await session.flush()
# STT 큐 등록 — 기존 stt_worker → classify → embed → chunk 파이프라인 자동
await enqueue_stage(session, doc.id, "stt")
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
-11
View File
@@ -88,12 +88,6 @@ class Settings(BaseModel):
# NFS 경유 별도 마운트된 Roon 라이브러리.
roon_library_path: str = ""
# KGS Code 등 외부 작성 마크다운 자료 추가 스캔 경로 (PKM 상대 경로, 쉼표 구분).
# env: ADDITIONAL_WATCH_TARGETS=Knowledge/Industrial_Safety/가스기사/KGS_Code,...
# 모두 expected_category="library" 로 처리 (md/pdf/docx 등 문서 확장자만 수락).
# Inbox/Recordings/Videos 기본 스캔 외에 추가만 허용.
additional_watch_targets: list[str] = []
# 분류 체계
taxonomy: dict = {}
document_types: list[str] = []
@@ -114,10 +108,6 @@ def load_settings() -> Settings:
stt_endpoint = os.getenv("STT_ENDPOINT", "http://stt-service:3300")
roon_library_path = os.getenv("ROON_LIBRARY_PATH", "")
# ADDITIONAL_WATCH_TARGETS — 쉼표 구분 (공백 제거)
awt_raw = os.getenv("ADDITIONAL_WATCH_TARGETS", "")
additional_watch_targets = [p.strip() for p in awt_raw.split(",") if p.strip()]
# config.yaml — Docker 컨테이너 내부(/app/config.yaml) 또는 프로젝트 루트
config_path = Path("/app/config.yaml")
if not config_path.exists():
@@ -182,7 +172,6 @@ def load_settings() -> Settings:
ocr_endpoint=ocr_endpoint,
stt_endpoint=stt_endpoint,
roon_library_path=roon_library_path,
additional_watch_targets=additional_watch_targets,
taxonomy=taxonomy,
document_types=document_types,
upload=upload_cfg,
-10
View File
@@ -8,14 +8,12 @@ from sqlalchemy import func, select, text
from api.audio import router as audio_router
from api.auth import router as auth_router
from api.briefing import router as briefing_router
from api.config import router as config_router
from api.dashboard import router as dashboard_router
from api.digest import router as digest_router
from api.document_notes import router as document_notes_router
from api.document_reads import router as document_reads_router
from api.documents import router as documents_router
from api.events import router as events_router
from api.library import router as library_router
from api.memos import router as memos_router
from api.news import router as news_router
@@ -39,13 +37,11 @@ async def lifespan(app: FastAPI):
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
from services.search.query_analyzer import prewarm_analyzer
from workers.briefing_worker import run as morning_briefing_run
from workers.daily_digest import run as daily_digest_run
from workers.digest_worker import run as global_digest_run
from workers.file_watcher import watch_inbox
from workers.law_monitor import run as law_monitor_run
from workers.mailplus_archive import run as mailplus_run
from workers.inbox_ingest import run as inbox_ingest_run
from workers.news_collector import run as news_collector_run
from workers.queue_consumer import consume_queue
from workers.study_queue_consumer import consume_study_queue
@@ -94,12 +90,8 @@ async def lifespan(app: FastAPI):
scheduler.add_job(law_monitor_run, CronTrigger(hour=7), id="law_monitor")
scheduler.add_job(mailplus_run, CronTrigger(hour=7), id="mailplus_morning")
scheduler.add_job(mailplus_run, CronTrigger(hour=18), id="mailplus_evening")
# PR-4: inbox@hyungi.net IMAP ingest (DocumentServer/Ingest 폴더, 5분 cron).
# plan: ~/.claude/plans/document-enchanted-candy.md
scheduler.add_job(inbox_ingest_run, "interval", minutes=5, id="inbox_ingest")
scheduler.add_job(daily_digest_run, CronTrigger(hour=20), id="daily_digest")
scheduler.add_job(global_digest_run, CronTrigger(hour=4, minute=0), id="global_digest")
scheduler.add_job(morning_briefing_run, CronTrigger(hour=5, minute=10), id="morning_briefing")
scheduler.add_job(news_collector_run, "interval", hours=6, id="news_collector")
scheduler.start()
@@ -137,12 +129,10 @@ app.include_router(document_notes_router, prefix="/api/documents", tags=["docume
app.include_router(search_router, prefix="/api/search", tags=["search"])
app.include_router(memos_router, prefix="/api/memos", tags=["memos"])
app.include_router(events_router, prefix="/api/events", tags=["events"])
app.include_router(dashboard_router, prefix="/api/dashboard", tags=["dashboard"])
app.include_router(library_router, prefix="/api/library", tags=["library"])
app.include_router(news_router, prefix="/api/news", tags=["news"])
app.include_router(digest_router, prefix="/api/digest", tags=["digest"])
app.include_router(briefing_router, prefix="/api/briefing", tags=["briefing"])
app.include_router(audio_router, prefix="/api/audio", tags=["audio"])
app.include_router(video_router, prefix="/api/video", tags=["video"])
app.include_router(study_sessions_router, prefix="/api/study-sessions", tags=["study-sessions"])
-97
View File
@@ -1,97 +0,0 @@
"""morning_briefings + briefing_topics 테이블 ORM (야간 수집 뉴스 브리핑).
axis 반대: Phase 4 = country×topic / Briefing = topic×country.
country_perspectives JSONB 안에 topic 여러 국가 관점 array.
"""
from datetime import date, datetime
from sqlalchemy import (
BigInteger,
Boolean,
Date,
DateTime,
Float,
ForeignKey,
Integer,
String,
Text,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class MorningBriefing(Base):
"""하루 단위 브리핑 메타데이터 (KST 자정~05:00 윈도우)"""
__tablename__ = "morning_briefings"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
briefing_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
window_start: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
window_end: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
decay_lambda: Mapped[float] = mapped_column(Float, nullable=False)
total_articles: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_countries: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_topics: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
generation_ms: Mapped[int | None] = mapped_column(Integer)
llm_calls: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
llm_failures: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="success")
headline_oneliner: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
topics: Mapped[list["BriefingTopic"]] = relationship(
back_populates="briefing",
cascade="all, delete-orphan",
order_by="BriefingTopic.topic_rank",
)
class BriefingTopic(Base):
"""1 briefing 안 topic_rank 순 cross-country 비교 분석 결과"""
__tablename__ = "briefing_topics"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
briefing_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("morning_briefings.id", ondelete="CASCADE"),
nullable=False,
)
topic_rank: Mapped[int] = mapped_column(Integer, nullable=False)
topic_label: Mapped[str] = mapped_column(String(120), nullable=False)
headline: Mapped[str] = mapped_column(Text, nullable=False)
country_perspectives: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
divergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
convergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
key_quotes: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
historical_article_ids: Mapped[list | None] = mapped_column(JSONB)
historical_context: Mapped[str | None] = mapped_column(Text)
historical_window_days: Mapped[int | None] = mapped_column(Integer)
cluster_members: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
article_count: Mapped[int] = mapped_column(Integer, nullable=False)
country_count: Mapped[int] = mapped_column(Integer, nullable=False)
importance_score: Mapped[float] = mapped_column(Float, nullable=False)
raw_weight_sum: Mapped[float] = mapped_column(Float, nullable=False)
llm_model: Mapped[str | None] = mapped_column(String(100))
llm_fallback_used: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
briefing: Mapped["MorningBriefing"] = relationship(back_populates="topics")
-16
View File
@@ -47,15 +47,6 @@ class Document(Base):
importance: Mapped[str | None] = mapped_column(String(20), default="medium")
ai_confidence: Mapped[float | None] = mapped_column()
# Memo Intake Upgrade PR-2B — Gemma 4B triage 가 추론한 메모 의도 분류 hint
# ('note' | 'task' | 'calendar_event' | 'activity_log' | 'reference')
# AI 자동 events 생성 X — 사용자 1-click promote 시점에만 events row 생성 (안전 boundary).
ai_event_kind: Mapped[str | None] = mapped_column(
Enum("note", "task", "calendar_event", "activity_log", "reference",
name="event_kind_hint")
)
ai_event_confidence: Mapped[float | None] = mapped_column()
# 3계층: 벡터 임베딩
embedding = mapped_column(Vector(1024), nullable=True)
embed_model_version: Mapped[str | None] = mapped_column(String(50))
@@ -100,17 +91,10 @@ class Document(Base):
preview_hash: Mapped[str | None] = mapped_column(String(64))
preview_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# PR-4 Email Ingest — 외부 source dedup key + 메일 metadata
# source_external_id: email 에선 always non-null (Message-ID 또는 imap UID fallback). 다른 source 는 NULL 가능.
# email_metadata: from/to/cc/subject/folder/uidvalidity/uid/received_at/mailplus_link/attachments[].
source_external_id: Mapped[str | None] = mapped_column(Text)
email_metadata: Mapped[dict | None] = mapped_column(JSONB)
# 메타데이터
source_channel: Mapped[str | None] = mapped_column(
Enum("law_monitor", "devonagent", "email", "web_clip",
"tksafety", "inbox_route", "manual", "drive_sync", "news", "memo",
"voice",
name="source_channel")
)
data_origin: Mapped[str | None] = mapped_column(
-42
View File
@@ -1,42 +0,0 @@
"""document_images ORM (Phase 1B.5) — marker 추출 이미지 메타.
저장: NAS `/documents/extracted_images/{document_id}/{image_key}.{ext}`
표시: GET /api/documents/{doc_id}/images/{image_key}/raw (인증 필요)
md_content ref `![alt](docimg:img_001)` 형식 image_key sequence 기반 결정적이라
재변환 idempotent.
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class DocumentImage(Base):
__tablename__ = "document_images"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
image_key: Mapped[str] = mapped_column(String(32), nullable=False)
relative_path: Mapped[str] = mapped_column(Text, nullable=False)
file_path: Mapped[str] = mapped_column(Text, nullable=False)
mime_type: Mapped[str] = mapped_column(Text, nullable=False)
file_size: Mapped[int] = mapped_column(BigInteger, nullable=False)
content_hash: Mapped[str] = mapped_column(String(64), nullable=False)
width: Mapped[int | None] = mapped_column(Integer)
height: Mapped[int | None] = mapped_column(Integer)
page_index: Mapped[int | None] = mapped_column(Integer)
alt_text: Mapped[str | None] = mapped_column(Text)
source_slug: Mapped[str | None] = mapped_column(Text)
extraction_engine: Mapped[str] = mapped_column(
String(32), nullable=False, default="marker"
)
extraction_engine_version: Mapped[str | None] = mapped_column(String(32))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
-113
View File
@@ -1,113 +0,0 @@
"""events 1차 컨테이너 ORM (개인 운영 로그 / 일정 / 할 일 / 회고)
PR-1 (migrations 239~247) 본체. kind enum 으로 task/calendar_event/activity_log
변형을 통합 관리. memo_document_id 메모 link (optional).
"""
from datetime import datetime
from typing import Any
from sqlalchemy import (
BigInteger,
Boolean,
DateTime,
ForeignKey,
SmallInteger,
String,
Text,
)
from sqlalchemy.dialects.postgresql import ENUM as PgEnum
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
# Postgres enum 재선언 X (create_type=False) — migration 239~243 이 권위.
EventKindEnum = PgEnum(
"task",
"calendar_event",
"activity_log",
name="event_kind",
create_type=False,
)
EventStatusEnum = PgEnum(
"inbox",
"next",
"scheduled",
"in_progress",
"done",
"cancelled",
"deferred",
name="event_status",
create_type=False,
)
EventSourceEnum = PgEnum(
"manual",
"memo",
"email",
"chat",
"webhook",
"git_commit",
"claude_code",
name="event_source",
create_type=False,
)
EventActorEnum = PgEnum(
"manual",
"eid",
"email_ingest",
"system",
name="event_actor",
create_type=False,
)
class Event(Base):
__tablename__ = "events"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
title: Mapped[str] = mapped_column(Text, nullable=False)
description: Mapped[str | None] = mapped_column(Text)
kind: Mapped[str] = mapped_column(EventKindEnum, nullable=False)
status: Mapped[str] = mapped_column(EventStatusEnum, nullable=False, default="inbox")
# 시간 필드 — kind 별 의미가 다름 (CHECK 제약은 migration 244)
due_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
start_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
end_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
ended_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
all_day: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
timezone: Mapped[str | None] = mapped_column(Text)
# lifecycle
defer_until: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
cancelled_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
priority: Mapped[int | None] = mapped_column(SmallInteger)
project_tag: Mapped[str | None] = mapped_column(String(64))
tags: Mapped[list[Any]] = mapped_column(JSONB, nullable=False, default=list)
# 출처 / 외부 식별자
source: Mapped[str] = mapped_column(EventSourceEnum, nullable=False, default="manual")
source_ref: Mapped[str | None] = mapped_column(Text)
raw_metadata: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False, default=dict)
# 메모 link (optional, ON DELETE SET NULL)
memo_document_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="SET NULL")
)
# 인증 / actor
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id"), nullable=False
)
created_by: Mapped[str] = mapped_column(EventActorEnum, nullable=False, default="manual")
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-43
View File
@@ -1,43 +0,0 @@
"""events_history ORM — events 의 lifecycle 변경 이력 (append-only).
PR-1 (migrations 248~249). FK ON DELETE RESTRICT 부모 events row 직접 삭제 차단
(feedback_history_table_fk_restrict.md 이력은 시점 사실).
"""
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import ENUM as PgEnum
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
from models.event import EventActorEnum
HistoryChangeKindEnum = PgEnum(
"create",
"reschedule",
"defer",
"reactivate",
"complete",
"cancel",
name="history_change_kind",
create_type=False,
)
class EventHistory(Base):
__tablename__ = "events_history"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
event_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("events.id", ondelete="RESTRICT"), nullable=False
)
changed_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
changed_by: Mapped[str] = mapped_column(EventActorEnum, nullable=False)
change_kind: Mapped[str] = mapped_column(HistoryChangeKindEnum, nullable=False)
before: Mapped[dict[str, Any] | None] = mapped_column(JSONB)
after: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False)
-46
View File
@@ -1,46 +0,0 @@
너는 다국적 뉴스 비교 분석가다.
아래는 같은 주제로 군집된 야간 수집 뉴스들 — 각 줄 앞 (국가코드 · 소스) 표시로 출처가 표시되어 있다.
이 정보만으로 cross-country 비교 분석을 JSON 으로만 출력하라.
목표:
- 같은 사건을 각 나라가 어떻게 다르게 다루는지 / 무엇이 공통인지를 1페이지 카드 형태로 정리.
- 사용자는 한국어 독자. 한국어로 출력.
절대 금지:
- 제공된 summary 에 없는 사실 추가
- 추측 표현 ("보인다", "~할 것이다", "~할 전망" 등)
- JSON 외의 모든 텍스트 (설명, 마크다운, 코드블록 금지)
- 인용부호 안 원문에 없던 단어 생성 (key_quotes 는 원문 그대로만)
분량 cap (반드시 지킬 것):
- country_perspectives: 최대 10개, 각 summary 는 1~2문장 (한국어 120자 이내)
- divergences: 최대 3개, 각 200자 이내
- convergences: 최대 2개, 각 200자 이내
- key_quotes: 최대 5개, 각 quote 240자 이내
- historical_context: 1~2문장 (한국어 120자 이내), 의미 있을 때만 채우고 아니면 null
출력 형식 (JSON 객체 하나만 출력, 위 cap 초과 금지):
{
"topic_label": "5~10 단어의 한국어 토픽 제목",
"headline": "전체를 한 줄로 압축한 한국어 headline (≤80자)",
"country_perspectives": [
{"country": "KR", "summary": "...", "article_ids": []},
{"country": "US", "summary": "...", "article_ids": []}
],
"divergences": ["A국=X 강조 / B국=Y 비판 / C국=Z 부각"],
"convergences": ["모든 매체가 Z 사실은 일치"],
"key_quotes": [{"country": "US", "source": "NYT", "quote": "..."}],
"historical_context": null
}
규칙:
- country_perspectives 의 country 는 입력 기사의 국가코드 그대로 (대문자).
- article_ids 는 비워둬도 됨 (서버가 채움).
- 단일 국가만 다룬 경우 divergences 는 빈 배열.
- historical_context 는 아래 "이전 흐름 참고" 섹션이 비어있으면 반드시 null.
오늘 새벽 기사 묶음:
{articles_block}
이전 흐름 참고 (직접 인용 금지, 맥락 파악 용도):
{historical_block}
+1 -12
View File
@@ -31,20 +31,9 @@ subject_description: {subject_description}
"recommend_deep_summary": bool,
"recommend_entity_pass": bool,
"escalate_to_26b": bool,
"risk_flags": ["..."],
"event_kind_hint": "note|task|calendar_event|activity_log|reference|null",
"event_kind_confidence": 0.0~1.0
"risk_flags": ["..."]
}}
event_kind_hint 분류 (사용자 메모 inbox triage 용 — AI 가 events row 직접 생성하지 않고 사용자 1-click promote 의 추천만 제공):
- "task": 사용자가 미래에 해야 할 일 (예: "내일 견적 요청", "세무사 전화하기"). due 시각 있어도 task 가능.
- "calendar_event": 시간/날짜가 고정된 일정 (예: "5/15 14:00 회의", "내일 2시 세무사 전화"). 본문에 명시적 시간 단서.
- "activity_log": 이미 한 행동 기록 (예: "방금 PR 머지 완료", "오늘 GPU 서버 점검함"). 과거형 또는 "방금/오늘/지금" 표지.
- "reference": 나중에 참조할 자료/링크/요약 (예: 웹 클립, 외부 자료, "이거 나중에 봐야 함").
- "note": 위 4개 어디에도 명확하지 않은 일반 메모/생각 (default).
- event_kind_confidence: 0.01.0. 명확하지 않으면 낮게 (< 0.5). 사용자가 결정.
- 본문이 짧거나 의도 불명이면 "note" + confidence 낮게.
recommend_deep_summary=true 조건:
- 본문 > 40,000 chars
- 다수 당사자 또는 시계열 전개가 있는 법령/절차/보고서
View File
-80
View File
@@ -1,80 +0,0 @@
"""야간 뉴스 topic-first 클러스터링.
Phase 4 axis 반대: country cluster 아닌 **전체 doc 합쳐서 topic cluster**.
cluster 안에 country 분포가 자동으로 들어감 (doc dict country field).
파라미터 (5h 윈도우용):
- LAMBDA = ln(2)/2h 0.347 (2시간 반감기, 야간 5h 윈도우라 빠른 감쇠)
- threshold = 0.78 고정 (Phase 4 0.75~0.80 중간값)
- MIN_ARTICLES_PER_TOPIC = 2 (야간 sparse 대비 완화)
- MIN_COUNTRIES_PER_TOPIC = 2 (cross-country 가치 핵심)
- MAX_TOPICS = 7 (1페이지 분량)
"""
import math
from core.utils import setup_logger
from services.clustering_common import (
greedy_assign_cluster,
normalize_importance_scores,
)
logger = setup_logger("briefing_clustering")
LAMBDA = math.log(2) / (2.0 / 24.0) # 2시간 반감기 (단위: 일)
THRESHOLD = 0.78
CENTROID_ALPHA = 0.7
MIN_ARTICLES_PER_TOPIC = 2
MIN_COUNTRIES_PER_TOPIC = 2
MAX_TOPICS = 7
def _count_distinct_countries(cluster: dict) -> int:
return len({m.get("country") for m in cluster["members"] if m.get("country")})
def cluster_global(docs: list[dict]) -> list[dict]:
"""모든 country docs 를 합쳐 topic cluster 생성.
Args:
docs: loader.load_night_window 출력 ( dict country field 포함).
Returns:
[{centroid, members, weight_sum, raw_weight_sum, importance_score, country_count}, ...]
- MIN_ARTICLES + MIN_COUNTRIES 충족 cluster
- importance_score 내림차순, MAX_TOPICS cap
"""
if not docs:
logger.info("[briefing] docs=0 → skip")
return []
clusters, raw_count = greedy_assign_cluster(
docs,
threshold=THRESHOLD,
centroid_alpha=CENTROID_ALPHA,
min_articles=MIN_ARTICLES_PER_TOPIC,
max_topics=MAX_TOPICS * 4, # MIN_COUNTRIES 필터 전 buffer
lambda_val=LAMBDA,
)
# MIN_COUNTRIES_PER_TOPIC 필터 — single-country cluster drop
pre_country_filter = len(clusters)
filtered = []
for c in clusters:
cc = _count_distinct_countries(c)
if cc >= MIN_COUNTRIES_PER_TOPIC:
c["country_count"] = cc
filtered.append(c)
clusters = filtered[:MAX_TOPICS]
dropped_country = pre_country_filter - len(clusters)
dropped_min_articles = raw_count - pre_country_filter
# MIN_COUNTRIES + MAX_TOPICS 필터 후 importance 재정규화 (briefing 내 0~1)
normalize_importance_scores(clusters)
logger.info(
f"[briefing] docs={len(docs)} threshold={THRESHOLD} "
f"raw_clusters={raw_count} dropped_min_articles={dropped_min_articles} "
f"dropped_single_country={dropped_country} kept={len(clusters)}"
)
return clusters
-307
View File
@@ -1,307 +0,0 @@
"""Cluster → 26B MLX 비교 분석 호출 + JSON envelope + historical context + fallback row.
Plan §"LLM Parse 실패 시 Fallback Topic Row (고정 형태)":
LLM JSON parse 2 재시도 실패 고정 형태 fallback 저장 (drop 금지).
Plan §"Historical Context":
BRIEFING_HISTORICAL_ENABLED=true cluster centroid historical candidate
cosine top-K 5 (similarity 0.70) 추출 프롬프트 {historical_block} 주입.
LLM 응답 envelope historical_context 옵션 필드.
"""
import asyncio
import json
import os
from pathlib import Path
from typing import Any
import numpy as np
from ai.client import parse_json_response
from core.utils import setup_logger
from services.clustering_common import normalize_vector
logger = setup_logger("briefing_comparator")
LLM_CALL_TIMEOUT = 25 # 초. Phase 4 와 동일
HISTORICAL_TOP_K = 5
HISTORICAL_SIMILARITY_MIN = 0.70
HISTORICAL_WINDOW_DAYS = 30
# JSON envelope cap (프롬프트 + 후처리 양쪽 강제)
MAX_PERSPECTIVES = 10
MAX_DIVERGENCES = 3
MAX_CONVERGENCES = 2
MAX_KEY_QUOTES = 5
MAX_PERSPECTIVE_SUMMARY_LEN = 240 # 한국어 1~2문장 ≤120자 × 2
MAX_HISTORICAL_CONTEXT_LEN = 240
MAX_ARTICLE_IDS_PER_COUNTRY = 5 # country_perspectives[].article_ids 후처리 cap
FALLBACK_HEADLINE = "LLM 분석 실패로 원문 기사 묶음만 표시합니다."
FALLBACK_TOPIC_LABEL = "주요 뉴스 묶음"
_llm_sem = asyncio.Semaphore(1)
_PROMPT_PATH = Path(__file__).resolve().parent.parent.parent / "prompts" / "briefing_comparative.txt"
_PROMPT_TEMPLATE: str | None = None
def historical_enabled() -> bool:
return os.environ.get("BRIEFING_HISTORICAL_ENABLED", "false").lower() in {"1", "true", "yes"}
def _load_prompt() -> str:
global _PROMPT_TEMPLATE
if _PROMPT_TEMPLATE is None:
_PROMPT_TEMPLATE = _PROMPT_PATH.read_text(encoding="utf-8")
return _PROMPT_TEMPLATE
def _build_articles_block(selected: list[dict]) -> str:
lines = []
for i, m in enumerate(selected, start=1):
country = m.get("country") or "??"
source = m.get("ai_sub_group") or ""
text = (m.get("ai_summary_truncated") or m.get("ai_summary") or m.get("title") or "").strip()
lines.append(f"[{i}] ({country} · {source}) {text}")
return "\n".join(lines)
def _build_historical_block(historical_docs: list[dict]) -> str:
if not historical_docs:
return "(과거 참고 자료 없음)"
lines = ["※ 이전 30일 흐름 참고용 — 본 분석에서 직접 인용 금지, 맥락 파악 용도."]
for i, d in enumerate(historical_docs, start=1):
text = (d.get("ai_summary") or d.get("title") or "").strip()
# historical 은 ai_summary 가 길 수 있어 200자 cap
if len(text) > 200:
text = text[:200] + ""
lines.append(f"[H{i}] {text}")
return "\n".join(lines)
def build_prompt(selected: list[dict], historical_docs: list[dict]) -> str:
template = _load_prompt()
articles_block = _build_articles_block(selected)
historical_block = _build_historical_block(historical_docs)
return template.replace("{articles_block}", articles_block).replace(
"{historical_block}", historical_block
)
def retrieve_historical(
cluster: dict,
candidates: list[dict],
*,
top_k: int = HISTORICAL_TOP_K,
sim_min: float = HISTORICAL_SIMILARITY_MIN,
) -> list[dict]:
"""cluster centroid 와 candidate pool 의 cosine top-K (sim ≥ sim_min).
candidates 비어있거나 sim 미달 list.
"""
if not candidates:
return []
centroid = cluster["centroid"]
scored = []
for d in candidates:
v = normalize_vector(d["embedding"])
sim = float(np.dot(centroid, v))
if sim >= sim_min:
scored.append((sim, d))
scored.sort(key=lambda x: -x[0])
return [d for _, d in scored[:top_k]]
async def _try_call_llm(client: Any, prompt: str) -> str:
async with _llm_sem:
return await asyncio.wait_for(
client.call_primary(prompt),
timeout=LLM_CALL_TIMEOUT,
)
def _truncate_str(s: Any, limit: int) -> str:
if not isinstance(s, str):
return ""
s = s.strip()
if len(s) > limit:
s = s[:limit].rstrip() + ""
return s
def _country_article_id_map(cluster: dict) -> dict[str, list[int]]:
"""cluster.members 를 country 별 article_id list 로 그룹 (weight 내림차순).
Phase 4 selection 단계에서 m['weight'] 채워져 있음. 누락 0.0 으로 fallback.
"""
grouped: dict[str, list[tuple[float, int]]] = {}
for m in cluster.get("members", []):
country = (m.get("country") or "").upper()
if not country:
continue
weight = float(m.get("weight", 0.0))
grouped.setdefault(country, []).append((weight, int(m["id"])))
out: dict[str, list[int]] = {}
for country, pairs in grouped.items():
pairs.sort(key=lambda x: -x[0])
out[country] = [doc_id for _, doc_id in pairs]
return out
def _resolve_article_ids(
raw_ids: list,
country: str,
cluster_country_ids: dict[str, list[int]],
) -> list[int]:
"""country_perspectives[].article_ids 후처리.
1) LLM id cluster member 교집합인 것만 유지 (엉뚱한 id 차단).
2) 비어있으면 같은 country cluster member top weight N 자동 주입.
3) 그래도 없으면 [] (country 매핑된 member 부재).
"""
cluster_ids = cluster_country_ids.get(country, [])
cluster_id_set = set(cluster_ids)
# 1) LLM id ∩ cluster
cleaned = []
if isinstance(raw_ids, list):
for x in raw_ids:
try:
doc_id = int(x)
except (TypeError, ValueError):
continue
if doc_id in cluster_id_set and doc_id not in cleaned:
cleaned.append(doc_id)
if cleaned:
return cleaned[:MAX_ARTICLE_IDS_PER_COUNTRY]
# 2) Country fallback top-N
return cluster_ids[:MAX_ARTICLE_IDS_PER_COUNTRY]
def _sanitize_envelope(parsed: dict, cluster: dict) -> dict | None:
"""LLM 응답 envelope 검증 + cap 강제 + article_ids 후처리. None → fallback."""
if not isinstance(parsed, dict):
return None
topic_label = _truncate_str(parsed.get("topic_label"), 120)
headline = _truncate_str(parsed.get("headline"), 200)
if not topic_label or not headline:
return None
# cluster.members 의 country → [id] 매핑을 미리 만들어 후처리 input 으로 사용
country_ids_map = _country_article_id_map(cluster)
# country_perspectives
raw_persp = parsed.get("country_perspectives")
perspectives = []
if isinstance(raw_persp, list):
for p in raw_persp[:MAX_PERSPECTIVES]:
if not isinstance(p, dict):
continue
country = _truncate_str(p.get("country"), 10).upper()
summary = _truncate_str(p.get("summary"), MAX_PERSPECTIVE_SUMMARY_LEN)
raw_ids = p.get("article_ids") or []
article_ids = _resolve_article_ids(raw_ids, country, country_ids_map)
if country and summary:
perspectives.append({
"country": country,
"summary": summary,
"article_ids": article_ids,
})
if not perspectives:
return None
def _str_array(key: str, cap: int, item_limit: int) -> list[str]:
raw = parsed.get(key)
if not isinstance(raw, list):
return []
out = []
for it in raw[:cap]:
t = _truncate_str(it, item_limit)
if t:
out.append(t)
return out
divergences = _str_array("divergences", MAX_DIVERGENCES, 200)
convergences = _str_array("convergences", MAX_CONVERGENCES, 200)
# key_quotes: [{country, source, quote}]
raw_quotes = parsed.get("key_quotes")
quotes = []
if isinstance(raw_quotes, list):
for q in raw_quotes[:MAX_KEY_QUOTES]:
if not isinstance(q, dict):
continue
entry = {
"country": _truncate_str(q.get("country"), 10).upper(),
"source": _truncate_str(q.get("source"), 60),
"quote": _truncate_str(q.get("quote"), 240),
}
if entry["quote"]:
quotes.append(entry)
historical_context = _truncate_str(parsed.get("historical_context"), MAX_HISTORICAL_CONTEXT_LEN) or None
return {
"topic_label": topic_label,
"headline": headline,
"country_perspectives": perspectives,
"divergences": divergences,
"convergences": convergences,
"key_quotes": quotes,
"historical_context": historical_context,
"llm_fallback_used": False,
}
def _make_fallback(cluster: dict) -> dict:
"""Plan §"Fallback Topic Row (고정 형태)". drop 금지, country_perspectives 빈 list."""
return {
"topic_label": FALLBACK_TOPIC_LABEL,
"headline": FALLBACK_HEADLINE,
"country_perspectives": [],
"divergences": [],
"convergences": [],
"key_quotes": [],
"historical_context": None,
"llm_fallback_used": True,
}
async def compare_cluster_with_fallback(
client: Any,
cluster: dict,
selected: list[dict],
historical_docs: list[dict] | None = None,
) -> dict:
"""1 cluster 비교 분석. LLM 2회 재시도 → 실패 시 fallback row.
Returns:
sanitized envelope dict (Plan §"LLM 프롬프트 출력 envelope") + llm_fallback_used.
"""
historical_docs = historical_docs or []
prompt = build_prompt(selected, historical_docs)
for attempt in range(2):
try:
raw = await _try_call_llm(client, prompt)
except asyncio.TimeoutError:
logger.warning(
f"LLM timeout {LLM_CALL_TIMEOUT}s "
f"(attempt={attempt + 1}, cluster size={len(cluster['members'])})"
)
continue
except Exception as e:
logger.warning(f"LLM 호출 실패 attempt={attempt + 1}: {e}")
continue
parsed = parse_json_response(raw)
sanitized = _sanitize_envelope(parsed, cluster) if parsed else None
if sanitized:
return sanitized
logger.warning(
f"envelope 검증 실패 attempt={attempt + 1} "
f"(raw_len={len(raw) if raw else 0}, parsed_keys={list(parsed.keys()) if isinstance(parsed, dict) else None})"
)
return _make_fallback(cluster)
-199
View File
@@ -1,199 +0,0 @@
"""야간 5h 수집 뉴스 윈도우 로드 + country 정규화 + (옵션) 과거 N일 후보 로드.
- KST 자정~05:00 사이 수집된 documents (source_channel='news' OR ai_domain='News').
- country canonical = document_chunks.country first non-null news_sources prefix fallback (Phase 4 동일).
- ai_summary/embedding NULL 제외 (재요약/재임베딩 0 원칙).
- 반환: doc dict list (topic-first cluster 입력. country dict field).
- 과거 retrieval historical doc 후보는 별도 함수 (BRIEFING_HISTORICAL_ENABLED on ).
"""
from datetime import datetime
from typing import Any
import numpy as np
from sqlalchemy import text
from core.database import async_session
from core.utils import setup_logger
logger = setup_logger("briefing_loader")
_NEWS_WINDOW_SQL = text("""
SELECT
d.id,
d.title,
d.ai_summary,
d.embedding,
d.created_at,
d.edit_url,
d.ai_sub_group,
(
SELECT c.country
FROM document_chunks c
WHERE c.doc_id = d.id AND c.country IS NOT NULL
LIMIT 1
) AS chunk_country
FROM documents d
WHERE (d.source_channel = 'news' OR d.ai_domain = 'News')
AND d.deleted_at IS NULL
AND d.created_at >= :window_start
AND d.created_at < :window_end
AND d.embedding IS NOT NULL
AND d.ai_summary IS NOT NULL
""")
_SOURCE_COUNTRY_SQL = text("""
SELECT name, country FROM news_sources WHERE country IS NOT NULL
""")
_HISTORICAL_CANDIDATES_SQL = text("""
SELECT
d.id,
d.title,
d.ai_summary,
d.embedding,
d.created_at
FROM documents d
WHERE (d.source_channel = 'news' OR d.ai_domain = 'News')
AND d.deleted_at IS NULL
AND d.created_at >= :hist_start
AND d.created_at < :hist_end
AND d.embedding IS NOT NULL
AND d.ai_summary IS NOT NULL
""")
def _to_numpy_embedding(raw: Any) -> np.ndarray | None:
if raw is None:
return None
if isinstance(raw, str):
import json
try:
raw = json.loads(raw)
except json.JSONDecodeError:
return None
try:
arr = np.asarray(raw, dtype=np.float32)
except (TypeError, ValueError):
return None
if arr.size == 0:
return None
return arr
async def _load_source_country_map(session) -> dict[str, str]:
"""news_sources name → country prefix 매핑 (Phase 4 패턴 미러)."""
rows = await session.execute(_SOURCE_COUNTRY_SQL)
mapping: dict[str, str] = {}
for name, country in rows:
if not name or not country:
continue
prefix = name.split(" ")[0].strip()
if prefix and prefix not in mapping:
mapping[prefix] = country
tokens = name.split(" ")
if len(tokens) >= 3:
source_prefix = " ".join(tokens[:-1]).strip()
if source_prefix and source_prefix not in mapping:
mapping[source_prefix] = country
return mapping
async def load_night_window(
window_start: datetime,
window_end: datetime,
) -> list[dict]:
"""야간 윈도우 뉴스 docs 를 country 채워진 list 로 반환.
Returns:
[{id, title, ai_summary, embedding, created_at, edit_url, ai_sub_group, country}, ...]
country 매핑 실패한 doc drop (cross-country 비교가 핵심이므로).
"""
docs: list[dict] = []
null_country = 0
async with async_session() as session:
source_country = await _load_source_country_map(session)
result = await session.execute(
_NEWS_WINDOW_SQL,
{"window_start": window_start, "window_end": window_end},
)
for row in result.mappings():
embedding = _to_numpy_embedding(row["embedding"])
if embedding is None:
continue
country = row["chunk_country"]
if not country:
ai_sub_group = (row["ai_sub_group"] or "").strip()
if ai_sub_group:
country = source_country.get(ai_sub_group)
if not country:
null_country += 1
continue
docs.append({
"id": int(row["id"]),
"title": row["title"] or "",
"ai_summary": row["ai_summary"] or "",
"embedding": embedding,
"created_at": row["created_at"],
"edit_url": row["edit_url"] or "",
"ai_sub_group": row["ai_sub_group"] or "",
"country": country.upper(),
})
if null_country:
logger.warning(
f"[loader] country 매핑 실패 drop {null_country}"
f"(chunk_country + news_sources prefix 둘 다 fail)"
)
logger.info(
f"[loader] night window {window_start} ~ {window_end}"
f"{len(docs)}건 ({len({d['country'] for d in docs})}개 국가)"
)
return docs
async def load_historical_candidates(
hist_start: datetime,
hist_end: datetime,
exclude_ids: set[int],
) -> list[dict]:
"""과거 N일 doc 후보 (BRIEFING_HISTORICAL_ENABLED=true 시만 호출).
cluster centroid cosine 비교용 raw candidate pool. country 매핑
(LLM 분석 input 으로만 사용하고 표시 ).
Args:
exclude_ids: 오늘 윈도우 article id (중복 retrieval 회피).
Returns:
[{id, title, ai_summary, embedding, created_at}, ...]
"""
out: list[dict] = []
async with async_session() as session:
result = await session.execute(
_HISTORICAL_CANDIDATES_SQL,
{"hist_start": hist_start, "hist_end": hist_end},
)
for row in result.mappings():
doc_id = int(row["id"])
if doc_id in exclude_ids:
continue
embedding = _to_numpy_embedding(row["embedding"])
if embedding is None:
continue
out.append({
"id": doc_id,
"title": row["title"] or "",
"ai_summary": row["ai_summary"] or "",
"embedding": embedding,
"created_at": row["created_at"],
})
logger.info(f"[loader] historical candidates: {len(out)} docs (window {hist_start.date()} ~ {hist_end.date()})")
return out
-261
View File
@@ -1,261 +0,0 @@
"""야간 수집 뉴스 브리핑 파이프라인 (Plan §"PR-MorningBriefing-1 Backend").
흐름: load_night_window cluster_global select_for_llm (k=7)
(옵션) historical retrieval compare_cluster_with_fallback DB save.
regenerate 정책: briefing_date UNIQUE 충돌 transaction 안에서 DELETE+INSERT.
"""
import time
from datetime import date, datetime, timedelta, timezone
from typing import Any
from zoneinfo import ZoneInfo
from sqlalchemy import delete
from ai.client import AIClient
from core.database import async_session
from core.utils import setup_logger
from models.briefing import BriefingTopic, MorningBriefing
from services.briefing.clustering import LAMBDA, cluster_global
from services.briefing.comparator import (
HISTORICAL_WINDOW_DAYS,
compare_cluster_with_fallback,
historical_enabled,
retrieve_historical,
)
from services.briefing.loader import load_historical_candidates, load_night_window
from services.digest.selection import select_for_llm
logger = setup_logger("briefing_pipeline")
KST = ZoneInfo("Asia/Seoul")
NIGHT_WINDOW_HOURS = 5 # KST 00:00 ~ 05:00
SELECT_K = 7 # Plan §"Clustering 파라미터" briefing K_PER_CLUSTER=7
SELECT_LAMBDA_MMR = 0.6 # Plan briefing MMR lambda 0.6
PIPELINE_HARD_CAP = 600 # 초. Phase 4 와 동일
def _compute_window(target_date: date | None = None) -> tuple[datetime, datetime, date]:
"""target_date (KST 자정 시작일) → (window_start_utc, window_end_utc, kst_date).
target_date=None 오늘 KST.
"""
if target_date is None:
target_date = datetime.now(KST).date()
start_kst = datetime.combine(target_date, datetime.min.time(), tzinfo=KST)
end_kst = start_kst + timedelta(hours=NIGHT_WINDOW_HOURS)
return start_kst.astimezone(timezone.utc), end_kst.astimezone(timezone.utc), target_date
def _is_usable_topic(envelope: dict, topic_label: str) -> bool:
"""fallback row 가 아닌 진짜 LLM 결과인지 판정."""
if envelope.get("llm_fallback_used"):
return False
if not envelope.get("country_perspectives"):
return False
if topic_label == "주요 뉴스 묶음":
return False
return True
def _compute_status(llm_calls: int, fallback_count: int, usable_count: int, has_topics: bool) -> str:
"""Plan §"Status 4-state 판정표"."""
if not has_topics or llm_calls == 0:
return "empty"
if usable_count == 0:
return "failed"
fallback_pct = (fallback_count / llm_calls) if llm_calls else 0.0
if fallback_pct >= 0.5:
return "failed"
if fallback_count > 0 or usable_count < llm_calls:
return "partial"
return "success"
def _build_topic_row(
rank: int,
cluster: dict,
envelope: dict,
historical_docs: list[dict] | None,
primary_model: str,
) -> BriefingTopic:
historical_ids = None
historical_window = None
if historical_enabled():
historical_ids = [d["id"] for d in (historical_docs or [])]
historical_window = HISTORICAL_WINDOW_DAYS
return BriefingTopic(
topic_rank=rank,
topic_label=envelope["topic_label"],
headline=envelope["headline"],
country_perspectives=envelope["country_perspectives"],
divergences=envelope["divergences"],
convergences=envelope["convergences"],
key_quotes=envelope["key_quotes"],
historical_article_ids=historical_ids,
historical_context=envelope.get("historical_context"),
historical_window_days=historical_window,
cluster_members=[m["id"] for m in cluster["members"]],
article_count=len(cluster["members"]),
country_count=cluster.get("country_count", 0),
importance_score=cluster.get("importance_score", 0.0),
raw_weight_sum=cluster.get("raw_weight_sum", 0.0),
llm_model=primary_model,
llm_fallback_used=envelope.get("llm_fallback_used", False),
)
async def _save_briefing(
briefing_date: date,
window_start: datetime,
window_end: datetime,
total_articles: int,
total_countries: int,
topic_rows: list[BriefingTopic],
llm_calls: int,
llm_failures: int,
generation_ms: int,
status: str,
) -> int:
"""briefing_date UNIQUE 충돌은 DELETE+INSERT transaction 으로 처리."""
async with async_session() as session:
await session.execute(
delete(MorningBriefing).where(MorningBriefing.briefing_date == briefing_date)
)
new = MorningBriefing(
briefing_date=briefing_date,
window_start=window_start,
window_end=window_end,
decay_lambda=LAMBDA,
total_articles=total_articles,
total_countries=total_countries,
total_topics=len(topic_rows),
generation_ms=generation_ms,
llm_calls=llm_calls,
llm_failures=llm_failures,
status=status,
)
new.topics = topic_rows
session.add(new)
await session.commit()
return new.id
async def run_briefing_pipeline(target_date: date | None = None) -> dict[str, Any]:
"""야간 뉴스 브리핑 1회 실행. cron 또는 수동 regenerate API 에서 호출.
Returns:
{briefing_id, status, total_topics, total_articles, llm_calls, llm_failures, generation_ms, regenerated}
"""
start = time.time()
window_start, window_end, briefing_date = _compute_window(target_date)
logger.info(
f"[briefing] start date={briefing_date} window {window_start} ~ {window_end} "
f"decay_lambda={LAMBDA:.4f} historical={'on' if historical_enabled() else 'off'}"
)
# 1. Load night window
docs = await load_night_window(window_start, window_end)
total_articles = len(docs)
total_countries_in_window = len({d["country"] for d in docs})
# 2. Cluster (topic-first)
clusters = cluster_global(docs)
if not clusters:
briefing_id = await _save_briefing(
briefing_date=briefing_date,
window_start=window_start,
window_end=window_end,
total_articles=total_articles,
total_countries=total_countries_in_window,
topic_rows=[],
llm_calls=0,
llm_failures=0,
generation_ms=int((time.time() - start) * 1000),
status="empty",
)
logger.info(f"[briefing] empty (no usable clusters) → briefing_id={briefing_id}")
return {
"briefing_id": briefing_id,
"status": "empty",
"total_topics": 0,
"total_articles": total_articles,
"llm_calls": 0,
"llm_failures": 0,
"generation_ms": int((time.time() - start) * 1000),
"regenerated": True,
}
# 3. (옵션) Historical candidate pool 1회 로드
historical_candidates: list[dict] = []
if historical_enabled():
hist_end = window_start # 오늘 윈도우 직전까지
hist_start = hist_end - timedelta(days=HISTORICAL_WINDOW_DAYS)
exclude = {d["id"] for d in docs}
historical_candidates = await load_historical_candidates(hist_start, hist_end, exclude)
# 4. cluster 별 LLM 호출
client = AIClient()
primary_model = client.ai.primary.model
topic_rows: list[BriefingTopic] = []
llm_calls = 0
llm_failures = 0
usable_count = 0
try:
for rank, cluster in enumerate(clusters, start=1):
selected = select_for_llm(cluster, k=SELECT_K, lambda_mmr=SELECT_LAMBDA_MMR)
historical_docs = (
retrieve_historical(cluster, historical_candidates)
if historical_enabled() else []
)
llm_calls += 1
envelope = await compare_cluster_with_fallback(
client, cluster, selected, historical_docs=historical_docs
)
if envelope.get("llm_fallback_used"):
llm_failures += 1
if _is_usable_topic(envelope, envelope["topic_label"]):
usable_count += 1
topic_rows.append(
_build_topic_row(rank, cluster, envelope, historical_docs, primary_model)
)
finally:
await client.close()
generation_ms = int((time.time() - start) * 1000)
status = _compute_status(llm_calls, llm_failures, usable_count, has_topics=bool(topic_rows))
briefing_id = await _save_briefing(
briefing_date=briefing_date,
window_start=window_start,
window_end=window_end,
total_articles=total_articles,
total_countries=total_countries_in_window,
topic_rows=topic_rows,
llm_calls=llm_calls,
llm_failures=llm_failures,
generation_ms=generation_ms,
status=status,
)
fallback_pct = (llm_failures / llm_calls * 100.0) if llm_calls else 0.0
logger.info(
f"[briefing] done id={briefing_id} status={status} topics={len(topic_rows)} "
f"usable={usable_count}/{llm_calls} fallback={llm_failures}/{llm_calls} ({fallback_pct:.1f}%) "
f"elapsed={generation_ms / 1000:.1f}s"
)
return {
"briefing_id": briefing_id,
"status": status,
"total_topics": len(topic_rows),
"total_articles": total_articles,
"llm_calls": llm_calls,
"llm_failures": llm_failures,
"generation_ms": generation_ms,
"regenerated": True,
}
-124
View File
@@ -1,124 +0,0 @@
"""Cluster 알고리즘 공통 util — digest(country×topic) / briefing(topic×country) 양쪽이 import.
추출 원칙:
- digest.clustering.cluster_country / briefing.clustering.cluster_global country 축은 caller 책임.
- 모듈은 docs list (이미 분류된 슬라이스 또는 전체) 대한 순수 greedy assign + normalize.
- LAMBDA / threshold / EMA alpha / MIN_ARTICLES caller 주입 (Phase 4 = 3 / Briefing = 2시간 ).
"""
import math
from datetime import datetime, timezone
import numpy as np
SCORE_FLOOR = 0.01
def normalize_vector(v: np.ndarray) -> np.ndarray:
norm = float(np.linalg.norm(v))
if norm == 0.0:
return v
return v / norm
def time_decay_weight(now: datetime, created_at: datetime, lambda_val: float) -> float:
"""exp(-λ · days_ago). created_at naive → UTC 가정."""
if created_at.tzinfo is None:
created_at = created_at.replace(tzinfo=timezone.utc)
days = (now - created_at).total_seconds() / 86400.0
if days < 0:
days = 0.0
return math.exp(-lambda_val * days)
def adaptive_threshold_by_density(
n_docs: int,
*,
low_n: int = 50,
high_n: int = 200,
t_low: float = 0.75,
t_mid: float = 0.78,
t_high: float = 0.80,
) -> float:
"""문서 밀도 기반 동적 threshold — fragmentation / blob 동시 방어."""
if n_docs > high_n:
return t_high
if n_docs < low_n:
return t_low
return t_mid
def greedy_assign_cluster(
docs: list[dict],
*,
threshold: float,
centroid_alpha: float = 0.7,
min_articles: int = 3,
max_topics: int = 10,
now: datetime | None = None,
lambda_val: float,
) -> tuple[list[dict], int]:
"""time-decay weight 적용 + greedy cosine assign + EMA centroid + MIN drop.
Args:
docs: [{embedding: np.ndarray, created_at: datetime, ...}]. 함수가 in-place `weight` 추가.
threshold: cosine 유사도 cluster 병합 임계.
centroid_alpha: EMA 계수 (0.7 = 기존 70% 유지).
min_articles: cluster 최소 article (미만 drop).
max_topics: 상위 cluster 보존 개수.
now: 기준 시각 (default = datetime.now(UTC)).
lambda_val: time-decay λ (caller 윈도우 폭에 맞게 주입).
Returns:
(clusters, raw_cluster_count_before_drop)
clusters = [{centroid, members, weight_sum, raw_weight_sum, importance_score}, ...]
"""
if not docs:
return [], 0
now = now or datetime.now(timezone.utc)
for d in docs:
d["weight"] = time_decay_weight(now, d["created_at"], lambda_val)
docs_sorted = sorted(docs, key=lambda d: -d["weight"])
clusters: list[dict] = []
for d in docs_sorted:
v = normalize_vector(d["embedding"])
best_idx, best_sim = -1, 0.0
for i, c in enumerate(clusters):
sim = float(np.dot(c["centroid"], v))
if sim > best_sim and sim >= threshold:
best_sim, best_idx = sim, i
if best_idx >= 0:
c = clusters[best_idx]
c["centroid"] = centroid_alpha * c["centroid"] + (1.0 - centroid_alpha) * v
c["centroid"] = normalize_vector(c["centroid"])
c["members"].append(d)
c["weight_sum"] += d["weight"]
else:
clusters.append({
"centroid": v,
"members": [d],
"weight_sum": d["weight"],
})
raw_count = len(clusters)
clusters = [c for c in clusters if len(c["members"]) >= min_articles]
clusters.sort(key=lambda c: -c["weight_sum"])
clusters = clusters[:max_topics]
normalize_importance_scores(clusters)
return clusters, raw_count
def normalize_importance_scores(clusters: list[dict], *, floor: float = SCORE_FLOOR) -> None:
"""cluster.weight_sum 을 0~1 로 정규화 + floor. in-place. raw_weight_sum 보존."""
if not clusters:
return
max_w = max(c["weight_sum"] for c in clusters)
for c in clusters:
normalized = (c["weight_sum"] / max_w) if max_w > 0 else 0.0
c["raw_weight_sum"] = c["weight_sum"]
c["importance_score"] = max(normalized, floor)
+84 -18
View File
@@ -1,16 +1,20 @@
"""Phase 4 Global Digest — country 내 topic cluster (time-decay + EMA + adaptive threshold).
"""Time-decay weight + adaptive threshold + EMA centroid greedy clustering.
알고리즘 코어는 `app/services/clustering_common.py` 추출되어 briefing 모듈과 공유.
파일은 Phase 4 고유 파라미터 (LAMBDA = ln(2)/3 , MIN 3, MAX 10) country 호출만 담당.
플랜의 핵심 결정:
- λ = ln(2)/3 (3 반감기)
- threshold: 0.75 / 0.78 / 0.80 (밀도 기반 adaptive)
- centroid: EMA α=0.7 (단순 평균의 seed bias / drift 방어)
- min_articles_per_topic = 3, max_topics_per_country = 10
- importance_score: country 0~1 normalize + max(score, 0.01) floor
- raw_weight_sum 별도 보존 (cross-day 트렌드 분석용)
"""
import math
from datetime import datetime, timezone
import numpy as np
from core.utils import setup_logger
from services.clustering_common import (
adaptive_threshold_by_density,
greedy_assign_cluster,
)
logger = setup_logger("digest_clustering")
@@ -18,32 +22,94 @@ LAMBDA = math.log(2) / 3 # 3일 반감기 — 사용자 확정값
CENTROID_ALPHA = 0.7 # EMA: 기존 중심 70% 유지, 새 멤버 30% 반영
MIN_ARTICLES_PER_TOPIC = 3
MAX_TOPICS_PER_COUNTRY = 10
SCORE_FLOOR = 0.01 # UI 0 표시 문제 사전 차단
def adaptive_threshold(n_docs: int) -> float:
"""Phase 4 임계 (0.75 / 0.78 / 0.80). 외부 import 호환용 alias."""
return adaptive_threshold_by_density(n_docs)
"""문서 밀도 기반 동적 threshold — fragmentation/blob 동시 방어."""
if n_docs > 200:
return 0.80
if n_docs < 50:
return 0.75
return 0.78
def _normalize(v: np.ndarray) -> np.ndarray:
norm = float(np.linalg.norm(v))
if norm == 0.0:
return v
return v / norm
def _decay_weight(now: datetime, created_at: datetime) -> float:
"""exp(-λ * days_ago). created_at 이 naive 면 UTC 가정."""
if created_at.tzinfo is None:
created_at = created_at.replace(tzinfo=timezone.utc)
days = (now - created_at).total_seconds() / 86400.0
if days < 0:
days = 0.0
return math.exp(-LAMBDA * days)
def cluster_country(country: str, docs: list[dict]) -> list[dict]:
"""단일 country 의 docs 를 cluster 로 묶어 정렬 + normalize 후 반환.
공통 util `greedy_assign_cluster` 위에 country 라벨 로깅만 추가.
Args:
country: 국가 코드 (KR, US, ...)
docs: loader.load_news_window 출력 (단일 country 슬라이스)
Returns:
[{centroid, members, weight_sum, raw_weight_sum, importance_score}, ...]
- members weight 채워진 doc dict 리스트
- 정렬: importance_score 내림차순, 최대 MAX_TOPICS_PER_COUNTRY
"""
if not docs:
logger.info(f"[{country}] docs=0 → skip")
return []
threshold = adaptive_threshold(len(docs))
clusters, raw_count = greedy_assign_cluster(
docs,
threshold=threshold,
centroid_alpha=CENTROID_ALPHA,
min_articles=MIN_ARTICLES_PER_TOPIC,
max_topics=MAX_TOPICS_PER_COUNTRY,
lambda_val=LAMBDA,
)
now = datetime.now(timezone.utc)
# time-decay weight 계산 + 가중치 높은 순으로 seed 우선
for d in docs:
d["weight"] = _decay_weight(now, d["created_at"])
docs.sort(key=lambda d: -d["weight"])
clusters: list[dict] = []
for d in docs:
v = _normalize(d["embedding"])
best_idx, best_sim = -1, 0.0
for i, c in enumerate(clusters):
sim = float(np.dot(c["centroid"], v))
if sim > best_sim and sim >= threshold:
best_sim, best_idx = sim, i
if best_idx >= 0:
c = clusters[best_idx]
# EMA centroid update — drift 방지
c["centroid"] = CENTROID_ALPHA * c["centroid"] + (1.0 - CENTROID_ALPHA) * v
c["centroid"] = _normalize(c["centroid"])
c["members"].append(d)
c["weight_sum"] += d["weight"]
else:
clusters.append({
"centroid": v,
"members": [d],
"weight_sum": d["weight"],
})
raw_count = len(clusters)
clusters = [c for c in clusters if len(c["members"]) >= MIN_ARTICLES_PER_TOPIC]
dropped = raw_count - len(clusters)
clusters.sort(key=lambda c: -c["weight_sum"])
clusters = clusters[:MAX_TOPICS_PER_COUNTRY]
# country 내 normalize (0~1) + floor
if clusters:
max_w = max(c["weight_sum"] for c in clusters)
for c in clusters:
normalized = (c["weight_sum"] / max_w) if max_w > 0 else 0.0
c["raw_weight_sum"] = c["weight_sum"]
c["importance_score"] = max(normalized, SCORE_FLOOR)
logger.info(
f"[{country}] docs={len(docs)} threshold={threshold} "
+14 -15
View File
@@ -6,27 +6,24 @@ ai_summary 길이는 LLM 토큰 보호를 위해 SUMMARY_TRUNCATE 로 제한.
import numpy as np
from services.clustering_common import normalize_vector as _normalize
K_PER_CLUSTER = 5
LAMBDA_MMR = 0.7 # relevance 70% / diversity 30%
SUMMARY_TRUNCATE = 300 # long tail ai_summary 방어
def select_for_llm(
cluster: dict,
k: int = K_PER_CLUSTER,
*,
lambda_mmr: float = LAMBDA_MMR,
summary_truncate: int = SUMMARY_TRUNCATE,
) -> list[dict]:
def _normalize(v: np.ndarray) -> np.ndarray:
norm = float(np.linalg.norm(v))
if norm == 0.0:
return v
return v / norm
def select_for_llm(cluster: dict, k: int = K_PER_CLUSTER) -> list[dict]:
"""cluster 내 LLM 호출용 대표 article 들 선정.
Args:
cluster: clustering.cluster_country / briefing.cluster_global 결과 단일 cluster
k: 선정 개수 (Phase 4=5, briefing=7)
lambda_mmr: relevance vs diversity (Phase 4=0.7, briefing=0.6)
summary_truncate: ai_summary 자르기 길이 (LLM 토큰 보호)
cluster: clustering.cluster_country 결과 단일 cluster
k: 선정 개수 (기본 5)
Returns:
선정된 doc dict 리스트. 항목에 ai_summary_truncated 필드가 추가됨.
@@ -36,6 +33,7 @@ def select_for_llm(
selected = list(members)
else:
centroid = cluster["centroid"]
# relevance = centroid 유사도 × decay weight
for m in members:
v = _normalize(m["embedding"])
m["_rel"] = float(np.dot(centroid, v)) * m["weight"]
@@ -51,13 +49,14 @@ def select_for_llm(
float(np.dot(v, _normalize(s["embedding"])))
for s in selected
)
return lambda_mmr * c["_rel"] - (1.0 - lambda_mmr) * max_sim
return LAMBDA_MMR * c["_rel"] - (1.0 - LAMBDA_MMR) * max_sim
pick = max(candidates, key=mmr_score)
selected.append(pick)
candidates.remove(pick)
# LLM 입력 토큰 보호
for m in selected:
m["ai_summary_truncated"] = (m.get("ai_summary") or "")[:summary_truncate]
m["ai_summary_truncated"] = (m.get("ai_summary") or "")[:SUMMARY_TRUNCATE]
return selected
-43
View File
@@ -1,43 +0,0 @@
"""Morning Briefing 워커 — 야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석.
- APScheduler cron (매일 05:10 KST, PR-3 에서 등록) + 수동 호출 공용 진입점
- PIPELINE_HARD_CAP = 600 hard cap 으로 cron stuck 절대 방지
- 단독 실행: `python -m workers.briefing_worker`
"""
import asyncio
from datetime import date
from core.utils import setup_logger
from services.briefing.pipeline import run_briefing_pipeline
logger = setup_logger("briefing_worker")
PIPELINE_HARD_CAP = 600
async def run(target_date: date | None = None) -> dict | None:
"""APScheduler + 수동 호출 공용 진입점.
Args:
target_date: KST 기준 briefing_date (None = 오늘). API regenerate 명시 지정 가능.
"""
try:
result = await asyncio.wait_for(
run_briefing_pipeline(target_date),
timeout=PIPELINE_HARD_CAP,
)
logger.info(f"[briefing] 워커 완료: {result}")
return result
except asyncio.TimeoutError:
logger.error(
f"[briefing] HARD CAP {PIPELINE_HARD_CAP}s 초과 — 워커 강제 중단. "
f"기존 briefing 은 commit 시점에만 갱신되므로 그대로 유지됨."
)
except Exception as e:
logger.exception(f"[briefing] 워커 실패: {e}")
return None
if __name__ == "__main__":
asyncio.run(run())
+4 -113
View File
@@ -21,12 +21,9 @@ PR-B B-1 tier triage (신규, 4B gemma Ollama):
from __future__ import annotations
import json
import re
import time
from datetime import datetime, timezone
import yaml
from pydantic import BaseModel, Field, ValidationError
from sqlalchemy import text as sql_text
from sqlalchemy.ext.asyncio import AsyncSession
@@ -87,11 +84,6 @@ class TriageOutput(BaseModel):
escalate_to_26b: bool = False
risk_flags: list[str] = Field(default_factory=list)
# Memo Intake Upgrade PR-2B — 메모 의도 분류 hint (선택 응답)
# 4B 가 출력하지 않으면 None 유지. AI 자동 events 생성 X (사용자 promote 시점만).
event_kind_hint: str | None = None # 'note' | 'task' | 'calendar_event' | 'activity_log' | 'reference'
event_kind_confidence: float | None = None # 0.01.0
# ───────────────────────── legacy classify (primary) ──────────────────
@@ -309,42 +301,8 @@ def _distill(triage_out: TriageOutput, limit: int = 2000) -> str:
return "\n".join(parts)[:limit]
# ───────────────────── frontmatter 파싱 (옵션 C) ──────────────────────
# YAML frontmatter (--- ... ---) + body 분리. body 가 없거나 frontmatter 가 형식 오류여도 안전하게 fallback.
_FM_PATTERN = re.compile("^---\\s*\\n(.*?)\\n---\\s*\\n?(.*)$", re.DOTALL)
def _parse_frontmatter(extracted_text: str) -> tuple[dict, str]:
"""extracted_text 시작에 YAML frontmatter 가 있으면 (frontmatter_dict, body) 반환.
없으면 ({}, extracted_text). YAML 파싱 실패 시도 ({}, extracted_text) 안전 fallback.
"""
if not extracted_text or not extracted_text.startswith("---"):
return {}, extracted_text
m = _FM_PATTERN.match(extracted_text)
if not m:
return {}, extracted_text
fm_text, body = m.group(1), m.group(2)
try:
fm = yaml.safe_load(fm_text)
if not isinstance(fm, dict):
return {}, extracted_text
return fm, body
except yaml.YAMLError:
return {}, extracted_text
# frontmatter 우선 인식: code/section/source_pdf/source_pages/source_basis/verified_level/verification_pending
# 등 원문 추적 메타데이터는 LLM 이 절대 덮어쓰지 못하게 차단.
_FRONTMATTER_PRESERVED_KEYS = {
"code", "section", "source_pdf", "source_pages", "source_basis",
"verified_level", "verification_pending", "source_type", "kgs_code",
}
# ───────────────────────── main process ────────────────────────────────
async def process(document_id: int, session: AsyncSession) -> None:
"""문서 분류 + 요약 + tier triage.
@@ -376,59 +334,6 @@ async def process(document_id: int, session: AsyncSession) -> None:
if not doc.extracted_text:
raise ValueError(f"문서 ID {document_id}: extracted_text가 비어있음")
# ─── 옵션 C: markdown frontmatter 우선 인식 ───────────────────────────
# KGS Code 등 외부 작성 마크다운은 frontmatter 에 정확한 메타가 있다.
# title / tags / ai_summary / ai_domain 은 frontmatter 에 있으면 그대로 사용,
# 없는 필드만 LLM 호출. code/section/source_pages/verified_level 등 원문
# 추적 메타는 documents.md_frontmatter JSONB 에 보존하고 LLM 이 덮어쓰지 못하게 한다.
fm, body = _parse_frontmatter(doc.extracted_text)
if fm:
# frontmatter 전체를 md_frontmatter JSONB 에 저장 (원문 추적용 single source)
doc.md_frontmatter = fm
# 우선 반영 (LLM 보다 신뢰도 높음, frontmatter 가 authoritative)
if fm.get("title"):
doc.title = str(fm["title"])
fm_tags = fm.get("tags")
if isinstance(fm_tags, list) and fm_tags:
# ai_tags 에 frontmatter 태그 우선 적재 (LLM 이 추가만 가능)
doc.ai_tags = [str(t) for t in fm_tags]
if fm.get("ai_domain"):
doc.ai_domain = str(fm["ai_domain"])
parts = doc.ai_domain.split("/")
if len(parts) > 1 and not doc.ai_sub_group:
doc.ai_sub_group = parts[1]
if fm.get("ai_sub_group"):
doc.ai_sub_group = str(fm["ai_sub_group"])
if fm.get("document_type"):
doc.document_type = str(fm["document_type"])
if fm.get("ai_summary"):
doc.ai_summary = str(fm["ai_summary"])
if fm.get("importance") in ("high", "medium", "low"):
doc.importance = fm["importance"]
# 핵심 메타 (title + ai_domain + ai_summary) 가 모두 frontmatter 로 채워졌으면
# LLM classify/summarize 스킵. tier triage 도 스킵 (frontmatter 가 더 정확).
# frontmatter 미커버 필드는 그대로 두어 향후 필요 시 manual UI 채움.
if doc.title and doc.ai_domain and doc.ai_summary:
if not doc.ai_confidence:
doc.ai_confidence = 1.0 # frontmatter 는 사람이 작성한 단정값
doc.ai_processed_at = datetime.now(timezone.utc)
doc.ai_model_version = "frontmatter@manual"
await session.commit()
logger.info(f"doc {document_id}: frontmatter 옵션 C → classify/summarize/triage 전부 skip")
return
# 일부만 frontmatter 에 있을 때는 LLM 으로 미설정 필드 보완. 단 _FRONTMATTER_PRESERVED_KEYS
# 는 이미 md_frontmatter 에 있으므로 LLM 이 ai_domain/document_type 등에 영향 못 준다.
logger.info(f"doc {document_id}: frontmatter 부분 인식 → LLM 으로 미설정 필드 보완")
client = AIClient()
try:
# ─── 1. Legacy classify (primary 26B) ───
@@ -439,19 +344,17 @@ async def process(document_id: int, session: AsyncSession) -> None:
if not parsed:
raise ValueError(f"AI 응답에서 JSON 추출 실패: {raw_response[:200]}")
# domain 검증 (frontmatter 가 이미 채웠으면 LLM 결과 무시)
# domain 검증
domain = _validate_domain(parsed.get("domain", ""))
if not doc.ai_domain:
doc.ai_domain = domain
doc.ai_domain = domain
# sub_group은 domain 경로에서 추출 (호환성)
parts = domain.split("/")
doc.ai_sub_group = parts[1] if len(parts) > 1 else ""
# document_type 검증 (frontmatter 가 이미 채웠으면 LLM 결과 무시)
# document_type 검증
doc_type = parsed.get("document_type", "")
if not doc.document_type:
doc.document_type = doc_type if doc_type in DOCUMENT_TYPES else "Note"
doc.document_type = doc_type if doc_type in DOCUMENT_TYPES else "Note"
# confidence
confidence = parsed.get("confidence", 0.5)
@@ -661,18 +564,6 @@ async def _apply_triage_result(
if not parse_error:
doc.ai_tldr = (triage_out.tldr or "").strip() or None
doc.ai_bullets = triage_out.bullets or []
# Memo Intake Upgrade PR-2B — event kind hint (4B 가 출력했을 때만)
# 허용 enum 외 값이면 무시 (DB enum 제약). AI worker 는 events row 직접 생성 X.
valid_kinds = {"note", "task", "calendar_event", "activity_log", "reference"}
hint = (triage_out.event_kind_hint or "").strip().lower() or None
if hint in valid_kinds:
doc.ai_event_kind = hint
try:
conf = triage_out.event_kind_confidence
if conf is not None and 0.0 <= float(conf) <= 1.0:
doc.ai_event_confidence = float(conf)
except (TypeError, ValueError):
pass
doc.ai_analysis_tier = "triage"
# R2 — backlog guard (hard 제외 soft escalate 만 억제)
+1 -19
View File
@@ -31,9 +31,6 @@ AUDIO_EXTS = {".mp3", ".m4a", ".opus", ".wav", ".flac", ".ogg"}
VIDEO_DIRECT_EXTS = {".mp4", ".webm"} # 브라우저 direct play
VIDEO_QUARANTINE_EXTS = {".mov", ".mkv", ".avi"} # 변환 필요, 보관만
# library (외부 작성 학습 자료) 폴더 — md/pdf/docx 등 문서 확장자만 수락
LIBRARY_DOC_EXTS = {".md", ".pdf", ".docx", ".doc", ".txt", ".rtf", ".html", ".odt"}
# 스캔 대상: (하위경로, 예상 category) — None 은 문서함(카테고리 미지정)
SCAN_TARGETS: list[tuple[str, str | None]] = [
("Inbox", None),
@@ -80,15 +77,6 @@ def _route_media(path: Path, expected_category: str | None) -> tuple[str | None,
return ("video", True, None)
return (None, False, None) # 기타 → skip
if expected_category == "library":
# 외부 작성 학습 자료 (KGS Code, 시행규칙 등). 문서 확장자만 수락.
# frontmatter 해석은 classify_worker (옵션 C) 가 담당. file_watcher 는 라우팅만.
if ext in LIBRARY_DOC_EXTS:
return ("library", False, "extract")
if ext in AUDIO_EXTS or ext in VIDEO_DIRECT_EXTS or ext in VIDEO_QUARANTINE_EXTS:
return (None, False, None) # audio/video 잘못 들어오면 skip
return (None, False, None) # 기타 알 수 없는 확장자 skip
# Inbox: 문서 파이프 (기존). audio/video 확장자가 실수로 여기 들어오면 skip.
if ext in AUDIO_EXTS or ext in VIDEO_DIRECT_EXTS or ext in VIDEO_QUARANTINE_EXTS:
return (None, False, None)
@@ -104,14 +92,8 @@ async def watch_inbox():
new_count = 0
changed_count = 0
# 동적 스캔 대상 합성: 기본 (Inbox/Recordings/Videos) + env 로 확장된 library 경로
# settings.additional_watch_targets 는 PKM 상대 경로 리스트 (예: "Knowledge/Industrial_Safety/가스기사/KGS_Code")
targets = list(SCAN_TARGETS)
for extra_path in settings.additional_watch_targets:
targets.append((extra_path, "library"))
async with async_session() as session:
for sub, expected_category in targets:
for sub, expected_category in SCAN_TARGETS:
scan_root = pkm_root / sub
if not scan_root.exists():
continue
-337
View File
@@ -1,337 +0,0 @@
"""inbox@hyungi.net IMAP ingest 워커 — PR-4 Email Ingest.
Plan: ~/.claude/plans/document-enchanted-candy.md
MailPlus `DocumentServer/Ingest` 폴더 (또는 사용자 지정 폴더) 5 cron 으로
polling. 메일을 source_channel='email' memo 생성하고 classify queue 등록.
기존 mailplus_archive (INBOX root archive) 폴더 분리 + source_external_id dedup 으로
중복 .
정책 (사용자 라운드 2026-05-12):
- email_ingest events row 직접 생성 X. memo 만들고 사용자 promote events 생성.
- source_external_id always non-null (Message-ID 정규화 또는 imap:{folder}:{uidvalidity}:{uid}).
- DB unique index dedup 진실원장. \\Seen flag best-effort.
- 첨부는 metadata , NAS 연동은 PR.
"""
import asyncio
import email
import hashlib
import imaplib
import os
import re
from datetime import datetime, timedelta, timezone
from email.header import decode_header
from email.utils import parsedate_to_datetime
from typing import Any
import httpx
from sqlalchemy import select
from sqlalchemy.dialects.postgresql import insert as pg_insert
from core.database import async_session
from core.utils import setup_logger
from models.automation import AutomationState
from models.document import Document
from models.queue import enqueue_stage
logger = setup_logger("inbox_ingest")
_TAG_RE = re.compile(r"<[^>]+>")
_WS_RE = re.compile(r"[ \t]+")
_NL_RE = re.compile(r"\n{3,}")
_MID_BRACKETS = re.compile(r"^<|>$")
def _decode_mime_header(raw: str | None) -> str:
if not raw:
return ""
parts = decode_header(raw)
out = []
for data, charset in parts:
if isinstance(data, bytes):
out.append(data.decode(charset or "utf-8", errors="replace"))
else:
out.append(data)
return "".join(out)
def _normalize_message_id(raw: str | None) -> str | None:
if not raw:
return None
mid = raw.strip()
if not mid:
return None
mid = _MID_BRACKETS.sub("", mid).strip()
return mid.lower() or None
def _build_source_external_id(message_id: str | None, folder: str, uidvalidity: str, uid: int) -> str:
if message_id:
return message_id
return f"imap:{folder}:{uidvalidity}:{uid}"
def _strip_html(html: str) -> str:
text = _TAG_RE.sub("", html)
text = text.replace("&nbsp;", " ").replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">").replace("&quot;", '"').replace("&#39;", "'")
text = _WS_RE.sub(" ", text)
text = _NL_RE.sub("\n\n", text)
return text.strip()
def _extract_body(msg: email.message.Message) -> str:
text_part = None
html_part = None
if msg.is_multipart():
for part in msg.walk():
ctype = part.get_content_type()
if part.get("Content-Disposition", "").startswith("attachment"):
continue
if ctype == "text/plain" and text_part is None:
text_part = part
elif ctype == "text/html" and html_part is None:
html_part = part
else:
ctype = msg.get_content_type()
if ctype == "text/plain":
text_part = msg
elif ctype == "text/html":
html_part = msg
target = text_part or html_part
if target is None:
return ""
try:
payload = target.get_payload(decode=True)
if payload is None:
return ""
charset = target.get_content_charset() or "utf-8"
body = payload.decode(charset, errors="replace")
except Exception as e:
logger.warning(f"[inbox_ingest] body decode 실패: {e}")
return ""
if target.get_content_type() == "text/html":
body = _strip_html(body)
return body.strip()
def _extract_attachments_meta(msg: email.message.Message) -> list[dict]:
out: list[dict] = []
if not msg.is_multipart():
return out
for idx, part in enumerate(msg.walk()):
disp = part.get("Content-Disposition", "")
if not disp.startswith("attachment") and not part.get_filename():
continue
filename = _decode_mime_header(part.get_filename()) or f"unnamed_{idx}"
size: int | None = None
try:
payload = part.get_payload(decode=True)
if payload is not None:
size = len(payload)
except Exception:
size = None
out.append({
"filename": filename,
"mime": part.get_content_type(),
"size": size,
"part_id": idx,
})
return out
def _parse_received_at(raw: str | None) -> str | None:
if not raw:
return None
try:
dt = parsedate_to_datetime(raw)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt.astimezone(timezone.utc).isoformat()
except Exception:
return None
def _fetch_uids_sync(host: str, port: int, user: str, password: str, folder: str, since_days: int) -> tuple[str, list[tuple[int, bytes]]]:
"""동기 IMAP fetch — 폴더 선택 + UID SEARCH SINCE + RFC822 fetch.
Returns (uidvalidity_str, [(uid, raw_bytes), ...]).
"""
conn = imaplib.IMAP4_SSL(host, port, timeout=30)
try:
conn.login(user, password)
typ, _ = conn.select(f'"{folder}"')
if typ != "OK":
raise RuntimeError(f"folder select 실패: {folder}")
# uidvalidity 추출 (STATUS 또는 SELECT 응답)
typ, status_data = conn.status(f'"{folder}"', "(UIDVALIDITY)")
uidvalidity = "0"
if typ == "OK" and status_data:
m = re.search(rb"UIDVALIDITY (\d+)", status_data[0])
if m:
uidvalidity = m.group(1).decode()
since = (datetime.now(timezone.utc) - timedelta(days=since_days)).strftime("%d-%b-%Y")
typ, data = conn.uid("search", None, f"SINCE {since}")
if typ != "OK":
return uidvalidity, []
uids = data[0].split() if data and data[0] else []
results = []
for uid_bytes in uids:
uid = int(uid_bytes)
typ, msg_data = conn.uid("fetch", uid_bytes, "(RFC822)")
if typ != "OK" or not msg_data or msg_data[0] is None:
continue
raw = msg_data[0][1]
if isinstance(raw, bytes):
results.append((uid, raw))
return uidvalidity, results
finally:
try:
conn.logout()
except Exception:
pass
async def _heartbeat(url: str | None, success: bool):
if not url:
return
target = url if success else f"{url.rstrip('/')}/fail"
try:
async with httpx.AsyncClient(timeout=10) as c:
await c.get(target)
except Exception as e:
logger.debug(f"[inbox_ingest] HC heartbeat 실패: {e}")
async def run():
"""5분 cron 진입점 — DocumentServer/Ingest 폴더 polling + memo 생성."""
enabled = os.getenv("INBOX_INGEST_ENABLED", "false").lower() == "true"
if not enabled:
logger.debug("[inbox_ingest] INBOX_INGEST_ENABLED=false, skip")
return
host = os.getenv("MAILPLUS_HOST", "")
port = int(os.getenv("MAILPLUS_PORT", "993"))
user = os.getenv("MAILPLUS_USER", "")
password = os.getenv("MAILPLUS_PASS", "")
folder = os.getenv("INBOX_INGEST_FOLDER", "DocumentServer/Ingest")
since_days = int(os.getenv("INBOX_INGEST_DAYS", "14"))
hc_url = os.getenv("INBOX_INGEST_HC_URL", "").strip() or None
if not all([host, user, password]):
logger.warning("[inbox_ingest] MailPlus 인증 정보 미설정 — skip")
return
try:
uidvalidity, emails = await asyncio.to_thread(
_fetch_uids_sync, host, port, user, password, folder, since_days,
)
except Exception as e:
logger.error(f"[inbox_ingest] IMAP fetch 실패: {e}")
await _heartbeat(hc_url, success=False)
return
if not emails:
logger.info("[inbox_ingest] 새 메일 0건")
await _heartbeat(hc_url, success=True)
return
created = 0
skipped = 0
parse_failed = 0
async with async_session() as session:
for uid, raw_bytes in emails:
try:
msg = email.message_from_bytes(raw_bytes)
message_id = _normalize_message_id(msg.get("Message-ID"))
source_external_id = _build_source_external_id(message_id, folder, uidvalidity, uid)
# ON CONFLICT DO NOTHING (DB unique 진실원장)
stmt = (
pg_insert(Document)
.values(
file_path=None,
file_hash=hashlib.sha256(raw_bytes).hexdigest(),
file_format="eml",
file_size=len(raw_bytes),
file_type="note",
source_channel="email",
source_external_id=source_external_id,
# 이메일 본문/제목/metadata 는 아래에서 채움 (placeholder 로 일단 row 생성)
title=_decode_mime_header(msg.get("Subject"))[:500] or "(제목없음)",
extracted_text="",
email_metadata={},
)
.on_conflict_do_nothing(
index_elements=["source_external_id"],
index_where="source_channel = 'email' AND source_external_id IS NOT NULL",
)
.returning(Document.id)
)
result = await session.execute(stmt)
row = result.first()
if row is None:
skipped += 1
continue
doc_id = row[0]
# 본문/metadata parse (row 생성 후 실패 = email_metadata.parse_error 기록)
try:
body = _extract_body(msg)
attachments = _extract_attachments_meta(msg)
metadata: dict[str, Any] = {
"from": _decode_mime_header(msg.get("From")),
"to": [_decode_mime_header(a) for a in msg.get_all("To", [])],
"cc": [_decode_mime_header(a) for a in msg.get_all("Cc", [])],
"subject": _decode_mime_header(msg.get("Subject")),
"folder": folder,
"uidvalidity": uidvalidity,
"uid": uid,
"received_at": _parse_received_at(msg.get("Date")),
"attachments": attachments,
}
except Exception as parse_exc:
logger.warning(f"[inbox_ingest] doc {doc_id} parse 실패: {parse_exc}")
body = ""
metadata = {"parse_error": str(parse_exc), "folder": folder, "uidvalidity": uidvalidity, "uid": uid}
parse_failed += 1
# UPDATE 로 본문/metadata 채움
doc = (await session.execute(select(Document).where(Document.id == doc_id))).scalar_one()
doc.extracted_text = body
doc.email_metadata = metadata
# classify pipeline 진입 (4B triage fail 상태는 별 PR. ingest 자체는 정상 진행)
await enqueue_stage(session, doc_id, "classify")
created += 1
except Exception as e:
logger.error(f"[inbox_ingest] UID {uid} 처리 실패: {e}")
# automation_state 기록 (참고용, dedup 진실원장 아님)
state = await session.execute(
select(AutomationState).where(AutomationState.job_name == "inbox_ingest")
)
state_row = state.scalar_one_or_none()
now_utc = datetime.now(timezone.utc)
if state_row:
state_row.last_run_at = now_utc
state_row.last_check_value = str(uid) if emails else state_row.last_check_value
else:
session.add(AutomationState(
job_name="inbox_ingest",
last_check_value=str(uid) if emails else "0",
last_run_at=now_utc,
))
await session.commit()
logger.info(
f"[inbox_ingest] folder={folder} fetched={len(emails)} created={created} "
f"skipped(dedup)={skipped} parse_failed={parse_failed}"
)
await _heartbeat(hc_url, success=True)
+10 -242
View File
@@ -1,19 +1,13 @@
"""marker_worker — markdown stage 소비. Phase 1B + Phase 1B.5 (ImgAuth).
"""marker_worker — markdown stage 소비. Phase 1B Round 5.
플로우:
classify_worker 완료 enqueue 'markdown' stage (또는 reprocess 스크립트가 force=True enqueue)
classify_worker 완료 enqueue 'markdown' stage
marker_worker.process()
doc_type / 확장자 / page_count 가드 marker-service POST /convert
응답 이미지 NAS persist + document_images UPSERT + md_content ref 정규화
md_content 저장 또는 doc-level failed (404/422) 또는 transient raise (5xx queue retry)
이미지 저장 위치: NAS `/documents/extracted_images/{document_id}/{image_key}.{ext}`
md_content ref 형식: `![alt](docimg:img_001)` image_key sequence 기반 결정적 idempotent.
plan: ~/.claude/plans/piped-humming-crystal.md
plan: ~/.claude/plans/plan-idempotent-sundae.md
"""
import base64
import hashlib
import logging
import os
import re
@@ -22,13 +16,10 @@ from typing import Any
import fitz # PyMuPDF
import httpx
from sqlalchemy import delete, desc, select, update
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy import update
from sqlalchemy.ext.asyncio import AsyncSession
from models.document import Document
from models.document_image import DocumentImage
from models.queue import ProcessingQueue
logger = logging.getLogger(__name__)
@@ -36,22 +27,6 @@ MARKER_ENDPOINT = "http://marker-service:3300/convert"
MARKER_TIMEOUT = 300 # 큰 PDF 5 분 한도
MAX_PAGES = 200 # 페이지 hard limit
# Phase 1B.5: 이미지 NAS persist 토글. rollback 시 false → 응답 images 무시 + md_content
# rewrite skip → placeholder card 폴백 자연 유지. 환경변수 미설정 = 기본 활성화.
MARKDOWN_IMAGE_PERSIST = os.getenv("MARKDOWN_IMAGE_PERSIST", "true").lower() in ("1", "true", "yes")
EXTRACTED_IMAGES_ROOT = Path("/documents/extracted_images")
# md_content image ref 정규식. alt + href 캡처. 외부 URL 은 보존 (slug 매칭 안 되는 경우).
_IMAGE_REF_RE = re.compile(r"!\[([^\]]*)\]\(([^)]+)\)")
_FORMAT_TO_MIME = {
"png": "image/png",
"jpeg": "image/jpeg",
"jpg": "image/jpeg",
"webp": "image/webp",
"gif": "image/gif",
}
# Phase 1B = PDF only. DOCX 등은 후속 Phase.
SUPPORTED_EXTENSIONS = {".pdf"}
@@ -112,11 +87,6 @@ async def process(document_id: int, session: AsyncSession) -> None:
logger.warning(f"[marker] document {document_id} not found")
return
# ---- (0) force_reprocess flag — reprocess 스크립트가 queue payload 로 전달 ----
force_reprocess = await _read_force_reprocess(session, document_id)
if force_reprocess:
logger.info(f"markdown_force_reprocess id={document_id}")
# ---- (1) doc_type skip ----
if doc.document_type in SKIP_DOC_TYPES:
logger.info(
@@ -197,46 +167,17 @@ async def process(document_id: int, session: AsyncSession) -> None:
return
logger.exception(f"[marker] http error id={document_id}: {exc}")
raise
except httpx.TransportError as exc:
# transport 계층 오류 — Connect/Read/Write/RemoteProtocol/Timeout 류 전부 transient.
# marker-service 응답끊김(RemoteProtocolError) / 모델 워밍업 / 큰 PDF I/O 끊김 등에서 발생.
# queue retry 대상 (max_attempts 까지 자동 재시도).
logger.warning(
f"[marker] transient error id={document_id} kind={type(exc).__name__}: {exc}"
)
except (httpx.ConnectError, httpx.TimeoutException) as exc:
logger.warning(f"[marker] transient error id={document_id}: {exc}")
raise
except Exception as exc:
logger.exception(f"[marker] unexpected error id={document_id}: {exc}")
await _fail(session, document_id, str(exc)[:1000])
return
# ---- (7) image persist + md_content rewrite (Phase 1B.5) ----
md_content_raw = data["md_content"]
images_resp = data.get("images") if MARKDOWN_IMAGE_PERSIST else None
saved_images: list[dict[str, Any]] = []
if images_resp:
try:
saved_images = _persist_images_to_nas(document_id, images_resp)
except OSError as exc:
# NAS 일시 끊김 등 — transient. queue retry 로 복구.
logger.warning(
f"[marker] image persist NAS write failed id={document_id}: "
f"{type(exc).__name__}: {exc}"
)
raise
# md_content 안의 ref 를 stable internal scheme `docimg:img_NNN` 으로 정규화.
slug_to_key = {img["source_slug"]: img["image_key"] for img in saved_images}
md_content = _rewrite_image_refs(md_content_raw, slug_to_key)
# quality 메트릭은 정규화 *후* md_content 기준 (실제 저장본). image_count 도 정확.
# ---- (7) success ----
md_content = data["md_content"]
quality = _compute_quality(md_content, doc.extracted_text or "", data["raw_metrics"])
if data.get("images_truncated"):
quality.setdefault("warnings", []).append("images_truncated")
# ---- (8) DB 트랜잭션 — documents UPDATE + document_images UPSERT + 고아 row DELETE ----
orphan_paths = await _sync_document_images(session, document_id, saved_images, data)
await session.execute(
update(Document).where(Document.id == document_id).values(
@@ -245,7 +186,7 @@ async def process(document_id: int, session: AsyncSession) -> None:
md_extraction_engine=data["engine"],
md_extraction_engine_version=data["engine_version"],
md_extraction_quality=quality,
md_content_hash=hashlib.sha256(md_content.encode("utf-8")).hexdigest(),
md_content_hash=data["md_content_hash"],
md_source_hash=doc.file_hash,
md_generated_at=_now(),
md_extraction_error=None,
@@ -255,184 +196,11 @@ async def process(document_id: int, session: AsyncSession) -> None:
)
)
await session.commit()
# ---- (9) commit 후 고아 NAS 파일 unlink (best-effort, 실패해도 DB 정합 유지) ----
for orphan_path in orphan_paths:
try:
Path(orphan_path).unlink(missing_ok=True)
except Exception as exc:
logger.warning(
f"[marker] orphan image unlink failed id={document_id} path={orphan_path}: "
f"{type(exc).__name__}: {exc}"
)
logger.info(
f"[marker] success id={document_id} len={len(md_content)} "
f"images={len(saved_images)} orphans_removed={len(orphan_paths)} "
f"elapsed_ms={data['elapsed_ms']}"
f"[marker] success id={document_id} len={len(md_content)} elapsed_ms={data['elapsed_ms']}"
)
async def _read_force_reprocess(session: AsyncSession, document_id: int) -> bool:
"""현재 markdown stage queue 행의 payload.force_reprocess 조회. 없으면 False."""
row = await session.scalar(
select(ProcessingQueue)
.where(
ProcessingQueue.document_id == document_id,
ProcessingQueue.stage == "markdown",
ProcessingQueue.status == "processing",
)
.order_by(desc(ProcessingQueue.id))
.limit(1)
)
if not row or not row.payload:
return False
return bool(row.payload.get("force_reprocess"))
def _persist_images_to_nas(
document_id: int, images_resp: list[dict[str, Any]]
) -> list[dict[str, Any]]:
"""marker 응답 이미지 list 를 NAS 에 저장하고 메타 dict 리스트 반환.
image_key sequence 기반 결정적 (`img_001` `img_NNN`, marker 출력 순서 = 안정적).
같은 doc 재변환 같은 key 같은 path overwrite idempotent.
"""
img_root = EXTRACTED_IMAGES_ROOT / str(document_id)
img_root.mkdir(parents=True, exist_ok=True)
saved: list[dict[str, Any]] = []
for seq, img in enumerate(images_resp, start=1):
try:
raw_bytes = base64.b64decode(img["bytes_b64"])
except Exception as exc:
logger.warning(
f"[marker] image base64 decode failed id={document_id} "
f"seq={seq} slug={img.get('slug')}: {exc}"
)
continue
fmt = (img.get("format") or "png").lower()
ext = "jpeg" if fmt == "jpg" else fmt
image_key = f"img_{seq:03d}"
filename = f"{image_key}.{ext}"
rel_path = f"extracted_images/{document_id}/{filename}"
abs_path = img_root / filename
# NAS write — 실패 시 OSError raise (transient retry).
abs_path.write_bytes(raw_bytes)
saved.append({
"image_key": image_key,
"source_slug": img.get("slug") or "",
"relative_path": rel_path,
"file_path": str(abs_path),
"mime_type": _FORMAT_TO_MIME.get(ext, "application/octet-stream"),
"file_size": len(raw_bytes),
"content_hash": hashlib.sha256(raw_bytes).hexdigest(),
"width": img.get("width"),
"height": img.get("height"),
})
return saved
def _rewrite_image_refs(md_text: str, slug_to_key: dict[str, str]) -> str:
"""md_content 안의 `![alt](slug)` ref 를 `![alt](docimg:image_key)` 로 정규화.
- slug_to_key 없는 href 원본 유지 (외부 URL / 경로 변형 )
- alt 그대로 보존
- 매칭 = href (a) 정확히 slug 같음 OR (b) basename slug 같음
(marker `_page_0_Picture_3.jpeg` 또는 `subdir/_page_0_Picture_3.jpeg` 어느 쪽도 emit 가능)
"""
if not slug_to_key:
return md_text
def _replace(match: re.Match) -> str:
alt, href = match.group(1), match.group(2)
# 정확 매치 우선
if href in slug_to_key:
return f"![{alt}](docimg:{slug_to_key[href]})"
# basename 매치 fallback
basename = href.rsplit("/", 1)[-1]
if basename in slug_to_key:
return f"![{alt}](docimg:{slug_to_key[basename]})"
return match.group(0)
return _IMAGE_REF_RE.sub(_replace, md_text)
async def _sync_document_images(
session: AsyncSession,
document_id: int,
saved_images: list[dict[str, Any]],
response_data: dict[str, Any],
) -> list[str]:
"""document_images 동기화 — 신규 keys UPSERT + 고아 row DELETE.
반환: commit unlink 해야 NAS 파일 경로 리스트.
"""
# 기존 row 조회 (file_path 보존 — 고아 파일 unlink 용)
existing = (await session.execute(
select(DocumentImage.image_key, DocumentImage.file_path)
.where(DocumentImage.document_id == document_id)
)).all()
existing_map = {key: path for key, path in existing}
new_keys = {img["image_key"] for img in saved_images}
orphan_keys = set(existing_map.keys()) - new_keys
orphan_paths = [existing_map[k] for k in orphan_keys]
# UPSERT — image_key 가 같으면 file_path/hash/dimensions 등을 갱신.
for img in saved_images:
stmt = (
pg_insert(DocumentImage)
.values(
document_id=document_id,
image_key=img["image_key"],
relative_path=img["relative_path"],
file_path=img["file_path"],
mime_type=img["mime_type"],
file_size=img["file_size"],
content_hash=img["content_hash"],
width=img.get("width"),
height=img.get("height"),
page_index=img.get("page_index"),
alt_text=img.get("alt_text"),
source_slug=img.get("source_slug"),
extraction_engine=response_data.get("engine") or "marker",
extraction_engine_version=response_data.get("engine_version"),
)
.on_conflict_do_update(
index_elements=["document_id", "image_key"],
set_={
"relative_path": img["relative_path"],
"file_path": img["file_path"],
"mime_type": img["mime_type"],
"file_size": img["file_size"],
"content_hash": img["content_hash"],
"width": img.get("width"),
"height": img.get("height"),
"page_index": img.get("page_index"),
"alt_text": img.get("alt_text"),
"source_slug": img.get("source_slug"),
"extraction_engine": response_data.get("engine") or "marker",
"extraction_engine_version": response_data.get("engine_version"),
},
)
)
await session.execute(stmt)
if orphan_keys:
await session.execute(
delete(DocumentImage).where(
DocumentImage.document_id == document_id,
DocumentImage.image_key.in_(orphan_keys),
)
)
return orphan_paths
def _compute_quality(md: str, raw_text: str, raw_metrics: dict[str, Any]) -> dict[str, Any]:
"""1B 휴리스틱 quality. 임계 판정 미적용 (Phase 1D 후행)."""
heading_lines = re.findall(r"^(#{1,6})\s", md, flags=re.MULTILINE)
-7
View File
@@ -32,13 +32,6 @@ MAILPLUS_PORT=993
MAILPLUS_SMTP_PORT=465
MAILPLUS_USER=hyungi
MAILPLUS_PASS=
# PR-4 inbox@hyungi.net IMAP ingest (DocumentServer/Ingest 폴더 → memo).
# enable 시 INBOX_INGEST_ENABLED=true. 폴더는 alias 수신 메일이 자동 이동되는 곳.
INBOX_INGEST_ENABLED=false
INBOX_INGEST_FOLDER=DocumentServer/Ingest
INBOX_INGEST_DAYS=14
INBOX_INGEST_HC_URL=
# ─── Synology Calendar (CalDAV, 태스크 관리) ───
CALDAV_URL=https://ds1525.hyungi.net/caldav/
+8 -8
View File
@@ -83,10 +83,11 @@ services:
restart: unless-stopped
stt-service:
# 2026-05-08 (D9 Track B revised): GPU is canonical STT owner.
# 정책: Mac mini = Gemma 26B 전용 우선이므로 STT/Whisper 는 호출량 무관 GPU 서버 소유.
# 이전 "Mac mini 이전본" 주석은 trace 오인 기반이었고 본 revised 결정으로 폐기.
# fastapi 의 STT_ENDPOINT 는 `http://stt-service:3300` (compose 내부 DNS) 사용.
# 2026-04-24: STT 가 Mac mini (faster-whisper, 192.168.1.122:8804 / 100.76.254.116:8804)
# 로 이전됨. GPU 에서 컨테이너는 더 이상 기동하지 않는다. 복원이 필요하면
# `docker compose --profile legacy up -d stt-service` 로 legacy 프로파일 활성화.
# fastapi 의 STT_ENDPOINT 도 Mac mini 주소를 가리킴 (아래 environment 참고).
profiles: [legacy]
build: ./services/stt
expose:
- "3300"
@@ -190,10 +191,9 @@ services:
- OCR_ENDPOINT=http://ocr-service:3200
- MARKER_ENDPOINT=http://marker-service:3300
- MARKER_CONTAINER_PATH_PREFIX=/documents
# 2026-05-08 (D9 Track B revised): GPU stt-service 정식 승격, 내부 DNS 사용.
- STT_ENDPOINT=http://stt-service:3300
# KGS Code 등 외부 학습 자료 추가 스캔 경로 (host .env 에서 주입). 빈 값이면 비활성.
- ADDITIONAL_WATCH_TARGETS=${ADDITIONAL_WATCH_TARGETS:-}
# 2026-04-24 STT Mac mini 이전: 기본값 100.76.254.116:8804 (Tailscale), 필요 시
# MAC_MINI_HOST env 로 192.168.1.122 등 LAN IP 주입.
- STT_ENDPOINT=http://${MAC_MINI_HOST:-100.76.254.116}:8804
restart: unless-stopped
frontend:
-95
View File
@@ -1,95 +0,0 @@
# events API quickref (PR-1 → PR-2 frontend reference)
**Plan**: `~/.claude/plans/beszel-tingly-sloth.md` v6
**PR-1 closure**: 2026-05-11, schema + endpoint registration 검증 완료. JWT 의존 HTTP behavior 는 PR-2 UI smoke 로 자연 검증.
**Authoritative contract**: `GET /openapi.json` (자동 생성). 본 문서는 frontend 개발자용 cheat sheet.
## 인증
모든 events endpoint = JWT Bearer (기존 `get_current_user` 의존성).
```bash
# 로그인 → access_token
TOKEN=$(curl -s -X POST https://document.hyungi.net/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"<USERNAME>","password":"<PASSWORD>","totp_code":"<TOTP_IF_ENABLED>"}' \
| jq -r .access_token)
# events 호출
curl -H "Authorization: Bearer $TOKEN" https://document.hyungi.net/api/events/today
```
Frontend (SvelteKit 5) 는 기존 `lib/api.ts` 의 fetch wrapper (JWT 자동 첨부) 그대로 사용.
## 9 endpoint
### Create / List / Detail
| Method | Path | 용도 |
|---|---|---|
| `POST` | `/api/events/` | 생성. kind=task/calendar_event/activity_log. activity_log 면 status=done + ended_at=now() default. |
| `GET` | `/api/events/` | 목록. `?kind&status&from&to&project_tag&source&page&page_size`. Upcoming = `?from=now&to=now+7d&status=scheduled,next,deferred`. |
| `GET` | `/api/events/{id}` | 상세. |
| `PATCH` | `/api/events/{id}` | edit. **허용**: title/description/시간 7필드/priority/project_tag/tags/memo_document_id. **금지** (422): status/completed_at/cancelled_at/defer_until/source/source_ref/raw_metadata/user_id/created_by. 시간 필드 변경 시 `reschedule` history 자동. |
### Lifecycle (각 호출당 events_history 1 row 자동)
| Method | Path | 효과 |
|---|---|---|
| `POST` | `/api/events/{id}/complete` | status=done + completed_at=now() |
| `POST` | `/api/events/{id}/cancel` | status=cancelled + cancelled_at=now() |
| `POST` | `/api/events/{id}/defer` | body `{defer_until: ISO}` → status=deferred + defer_until 설정 |
| `POST` | `/api/events/{id}/reactivate` | task → inbox, calendar_event → scheduled, activity_log → 400 거부 |
### View
| Method | Path | 정책 |
|---|---|---|
| `GET` | `/api/events/today` | `?timezone=Asia/Seoul` (default). due_at/start_at/started_at 이 오늘이고 status ∈ {inbox, next, scheduled, in_progress} 또는 (deferred AND defer_until <= now). |
| `GET` | `/api/events/inbox` | status=inbox 만. |
| `GET` | `/api/events/activity` | `?from&to`. kind=activity_log + status=done 만. Today 와 분리. |
## kind / status 의미 (UI 분기 가이드)
| kind | 주요 시간 필드 | default status | UI hint |
|---|---|---|---|
| `task` | due_at (start_at/end_at optional, "14:00 전화" 같은 시각 task 허용) | inbox | 체크박스 + 마감일 표시 |
| `calendar_event` | start_at (필수) + end_at (optional) | scheduled | 캘린더 일정 카드 |
| `activity_log` | started_at OR ended_at (둘 다 NULL 금지) | done | "방금 한 일" 입력 / Activity 타임라인 |
| status | 의미 |
|---|---|
| `inbox` | 아직 정리 안 됨 |
| `next` | 다음 행동으로 선정 (시간 미정) |
| `scheduled` | 시간/날짜 잡힘 |
| `in_progress` | 진행 중 |
| `done` | 완료 |
| `cancelled` | 취소 |
| `deferred` | defer_until 까지 숨김 |
## 빠른 행동 기록 5초 UX (PR-2 핵심 가설)
```js
// 1입력 필드 → Enter → POST /api/events
api.post('/api/events/', {
kind: 'activity_log',
title: '<사용자 입력>'
// status/started_at/ended_at/completed_at 모두 server-side default
})
```
→ status=done + started_at=ended_at=completed_at=now() 자동 → Activity 탭 즉시 반영 + 새로고침 유지.
## 검증되지 않은 HTTP behavior 항목 (PR-2 smoke 시 닫기)
PR-1 closure 시 schema/endpoint registration 만 자동 검증. 아래 5건은 frontend 호출하며 자연 검증:
1. POST `/api/events` kind=activity_log + title only → status=done + 시간 default 채워짐
2. POST `/api/events/{id}/complete` 호출 → events_history row 1건 (change_kind=complete) 자동 생성
3. PATCH `/api/events/{id}` 시간 필드 변경 → reschedule history 자동
4. PATCH `/api/events/{id}` 금지 필드 (status 등) 시도 → 422 응답 (Pydantic extra=forbid)
5. GET `/api/events/today?timezone=Asia/Seoul` → 오늘 (KST 기준) 항목만 반환, deferred 는 defer_until ≤ now 조건 만족 시만
## events_history 조회 (PR-2 history timeline)
PR-1 에는 history 조회 endpoint 없음. PR-2 시 상세 페이지 timeline 필요하면 `GET /api/events/{id}/history` 신규 추가 (간단 — events_history 테이블 ORDER BY changed_at). 별 endpoint 추가 시 본 문서 갱신.
-142
View File
@@ -1,142 +0,0 @@
# Storage Layout — Document Server
**버전**: 2026-05-11 (Storage PR-1 read-only inventory 결과 반영)
**역할 분리 원칙**: `feedback_knowledge_in_db_blob_in_nas.md` — 지식·검색 대상 = DB, binary·source = NAS.
**Plan 정합**: `~/.claude/plans/beszel-tingly-sloth.md` v6 의 Storage Backbone 트랙.
## 전제
- DB 본체 (Postgres) = GPU 서버 잔류 (CPU/RAM 우위, 결정 라운드 9).
- NAS = bulk binary / source / 변경 적은 자료 backbone.
- AI 호출은 GPU/맥미니로 라우팅 (Document Server 와 별 채널).
## 머신 / 마운트 토폴로지 (2026-05-11 실측)
| 머신 | 역할 | IP | 디스크 |
|---|---|---|---|
| GPU 서버 (RTX 4070 Ti Super, Ubuntu) | Docker 호스트 — fastapi / postgres / worker / Caddy / frontend | LAN 192.168.1.186 / Tailscale 100.111.160.84 | rootfs 936G / 182G used / 715G free |
| DS1525+ (Synology) | NFS export + Hyper Backup + DR Tier 1 | LAN 192.168.1.227 / Tailscale 100.101.79.37 | `/volume4` 3.5TB / 401GB used / 3.1TB free |
| Mac mini M4 Pro | MLX inference (Document Server 외부) | LAN 192.168.1.x / Tailscale 100.76.254.116 | (Document Server storage 영역 0) |
GPU 서버의 NFS mount (`/proc/mounts` 실측):
```
192.168.1.227:/volume4/Document_Server /mnt/nas/Document_Server
nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,
soft,proto=tcp,timeo=10,retrans=3,_netdev
```
**의미**:
- 이미 **soft mount + timeo=10 + retrans=3** 적용 중 → plan v6 Storage PR-2 의 "실측 후 mount 전략 결정" 의 *현 baseline 이 곧 권고안*. 변경 필요 X.
- vers=4.1, NFSv4 (NFSv3 아님 → grace period / lock 처리 OK).
- LAN 직결 (Tailscale 경유 아님) — D10 (DS1525+ Tailscale TCP 차단) 영향 없음.
## NAS volume 안 Document Server layout
`/mnt/nas/Document_Server/` (= NAS `/volume4/Document_Server/`):
| 경로 | 용도 | 크기 | 변경 빈도 | 분류 |
|---|---|---|---|---|
| `PKM/` | 사용자 컨텐츠 원본 (knowledge + 입력 inbox + references) | **38GB** | 일 단위 | 이미 NAS ✓ |
| `PKM/Knowledge/` | 정리된 지식 자료 (압력용기, ASME, 가스, 등) | 31GB | 일 단위 | 이미 NAS ✓ |
| `PKM/References/` | 레퍼런스 PDF | 5.5GB | 주 단위 | 이미 NAS ✓ |
| `PKM/Inbox/` | 업로드 진입 (drag-drop / file_watcher) | 1.5GB | 시간 단위 | 이미 NAS ✓ |
| `PKM/.preview/` | LibreOffice PDF 변환 cache | (소량, 6 entry) | 신규 업로드 시 | 이미 NAS ✓ (검토: GPU 로 이동 가치 작음) |
| `PKM/Recordings/` | 음성 녹음 | 20KB | 거의 X | 이미 NAS ✓ |
| `PKM/Videos/` | 비디오 | 116KB | 거의 X | 이미 NAS ✓ |
| `PKM/Archive/` | 아카이브 | 0 | — | 이미 NAS ✓ |
| `Main/` | 별 디렉토리 (의미 확인 필요) | 14GB | ? | 이미 NAS ✓ |
| `extracted_images/` | Markdown Phase 1B.5 추출 이미지 (`document_images`) | 1.2MB | 변환 시 | 이미 NAS ✓ |
| `study_question_images/` | 가스기사 문제 첨부 이미지 (PR-8) | 12MB | 회차 업로드 시 | 이미 NAS ✓ |
| `프로그래밍 전 참고 자료/` | (소량) | 56KB | 거의 X | 이미 NAS ✓ |
**결론**: NAS 의 binary/source layer 는 **이미 잘 구성되어 있음**. 추가 이관 후보 (Storage PR-3/4 candidate) 는 사실상 없음 — 단, **PR-4 MailPlus ingest** 도입 시 `extracted_emails/` 신규 디렉토리 추가 예정.
## GPU 서버 local storage (Postgres + worker 모델 + 로그)
| 경로 / Volume | 용도 | 크기 | 분류 |
|---|---|---|---|
| Postgres data dir (`pgdata` named volume) | DB 본체 | **1.1GB** | GPU 잔류 (plan 결정) |
| `hyungi_document_server_ollama_data` | Ollama 모델 weight (embedding / 분류 / 임베딩 등) | **11.9GB** | GPU 잔류 (GPU 인접 필수) |
| `hyungi_document_server_stt_models` | STT (faster-whisper 등) 모델 | 3.0GB | GPU 잔류 |
| `hyungi_document_server_reranker_cache` | bge-reranker-v2-m3 cache | 2.1GB | GPU 잔류 |
| `hyungi_document_server_ocr_models` | OCR 모델 | 1.4GB | GPU 잔류 |
| `hyungi_document_server_marker_models` | marker-pdf cache | 4KB (외부 HF cache 사용 추정) | GPU 잔류 |
| `hyungi_document_server_caddy_data` | Caddy ACME / log | (작음) | GPU 잔류 |
| `~/Documents/code/hyungi_Document_Server/logs/` | fastapi 로그 (rotation) | 21MB | GPU 잔류 |
**합계**: AI 모델 캐시 ~18.5GB + Postgres 1.1GB + 로그/Caddy < 100MB ≈ GPU local 20GB 사용 중.
## 컨테이너 마운트 매트릭스 (`docker-compose.yml` 발췌)
| 컨테이너 | 마운트 | 모드 | 비고 |
|---|---|---|---|
| postgres | `pgdata:/var/lib/postgresql/data` + `./migrations:/docker-entrypoint-initdb.d` | rw | DB 본체 named volume |
| kordoc-service | `${NAS}/Document_Server:/documents` | **ro** | PDF/HWP parse |
| ocr-service | `${NAS}/Document_Server:/documents` + `ocr_models:/root/.cache` | **ro** + rw | |
| marker-service | `${NAS}/Document_Server:/documents` + `marker_models:/models` | **ro** + rw | PDF→markdown |
| stt-service | `${NAS}/Document_Server:/documents` + `stt_models:/root/.cache` | **ro** + rw | |
| ai-gateway / ollama | `ollama_data:/root/.ollama` | rw | GPU 인접 |
| reranker | `reranker_cache:/data` | rw | |
| fastapi | `${NAS}/Document_Server:/documents` + `./config.yaml ./domain_policy.yaml ./scripts ./logs ./migrations` | **rw** + ro/rw | NAS 쓰기 권한은 fastapi 만 |
| caddy | `./Caddyfile + caddy_data:/data` | rw | |
**관찰**:
- worker 컨테이너 (kordoc/ocr/marker/stt) 는 모두 NAS **read-only** 마운트 → 원본 안전.
- fastapi 만 NAS **rw** → 업로드/preview/extracted_images 쓰기 단일 책임.
- `./migrations` 이 postgres 의 `docker-entrypoint-initdb.d` 와 fastapi 의 `/app/migrations` 양쪽에 마운트. 단 실제 migration runner 는 fastapi `init_db()` 만 사용 (postgres init scripts 는 첫 생성 시만 실행 → 효과 X, 안전).
## 정책 정리
### NAS 에 두는 것 (binary / source / 변경 적음)
- 사용자 컨텐츠 원본 (`PKM/*`)
- 변환 cache (`PKM/.preview/`, `extracted_images/`)
- 학습 자료 첨부 (`study_question_images/`)
- 향후: `extracted_emails/` (PR-4)
### GPU 서버 local 에 두는 것 (latency 민감 / hot path / 큰 caching)
- Postgres data dir (DB 본체)
- HNSW index, pgvector index (Postgres 내부)
- AI 모델 weight cache (embedding / OCR / STT / reranker / marker / ollama)
- worker 임시 캐시 (각 컨테이너 named volume)
- 로그 (`./logs`)
### 명확하게 어디에 두지 않음
- 검색 인덱스: Postgres 안 (pgvector + pg_trgm) — NAS X
- chunk text: Postgres 안 (`document_chunks.text_content`) — NAS X
- 메모 본문: Postgres 안 (`documents.extracted_text`) — NAS X
- AI 분류 결과: Postgres 안 (`documents.ai_domain/ai_tags`) — NAS X
## NFS mount 옵션 권고 (Storage PR-2 baseline)
현 GPU 서버 mount = 이미 권고안과 같음:
```
192.168.1.227:/volume4/Document_Server /mnt/nas/Document_Server nfs4
rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,
soft,proto=tcp,timeo=10,retrans=3,_netdev
```
**근거**:
- `soft` — NAS down 시 process 영구 hang 회피 (plan v6 Storage PR-2 핵심 요구). 단 쓰기 도중 부분 실패 위험 → fastapi 의 `upload` / `extract` / `marker` worker 가 fsync + rename(atomic) + DB row 작성 ordering 필수 (코드 검토는 PR-2 영역).
- `timeo=10/retrans=3` — 1초 단위 timeout × 3회. 30초 안 NAS 응답 없으면 EIO. 운영 1년+ 안정 (별 incident 없음).
- `vers=4.1` — NFSv3 의 lock daemon 불필요, idempotent.
- `_netdev` — boot 시 network 이후 마운트.
### Storage PR-2 시 점검 항목 (read-only PR-1 scope 외)
- 한글 NFC/NFD 정규화 wrapper (`feedback_nfs_korean_path_normalize.md`) — 기존 `document_images` 경로 처리 코드 일반화
- uid/gid 매핑 (Synology user ↔ Docker container user)
- NAS 장애 시뮬레이션 gate (운영 mount 아닌 isolated test mount 부터)
- 쓰기 경로 fsync + rename(atomic) + 실패 명시 처리
## 백업 정책
- NAS = Synology Hyper Backup (DR Tier 1, 별 트랙 `project_db_backup.md` 참조)
- Postgres = pkm DB → Tier 0 GPU + Tier 1 DS1525+ Vol3 + Tier 2 C2 Hyper Backup (별 트랙)
- AI 모델 캐시 = 백업 불필요 (재다운로드 가능)
## 다음 단계
- Storage PR-2 (NFS 표준화 + 정규화 wrapper + 장애 시뮬 gate) — 운영 변경 동반 → 별 PR
- Storage PR-3/4 (원본 이관) — **현 상태 이미 NAS, 추가 작업 0**
- Storage PR-5 (events `raw_metadata` 첨부 = extracted_emails) — events PR-4 (MailPlus ingest) 와 합류
-89
View File
@@ -1,89 +0,0 @@
# 음성 메모 — iOS Shortcuts 가이드
**Endpoint**: `POST https://document.hyungi.net/api/memos/voice` (multipart/form-data)
**인증**: Bearer JWT (단기 access token, 1시간). 만료 시 web 로그인 후 token 재복사.
**제한**: 10분 / 50MB. m4a/mp3/wav/webm/ogg/opus/aac.
**플로우**: 업로드 → STT (faster-whisper) → 자동 분류 (4B Gemma triage) → `/memos` inbox 노출.
장기 service token 발급 endpoint 는 PR-2C 안 포함 X — 1–2주 운영 후 끊김 빈도 보고 별 PR 결정 (plan v9 Memo Intake Upgrade 백로그).
## 1. JWT access token 복사
1. 데스크탑/모바일 브라우저에서 `https://document.hyungi.net/login` 로그인 + TOTP (활성 시)
2. 개발자 도구 콘솔에서 다음 명령 1줄 — access_token 출력
```js
// Memo Intake Upgrade 임시 패턴 — service token PR 도입 전까지만
localStorage.getItem('access_token') ??
// 또는 fetch refresh 후 res.access_token (refresh cookie 유효 시)
(await (await fetch('/api/auth/refresh', {method:'POST',credentials:'include'})).json()).access_token
```
3. 출력된 긴 문자열 (eyJ...) 을 Shortcuts 에 입력 (다음 단계)
> 1시간 후 만료. iOS Shortcut 사용 중 401 발생 시 token 다시 복사 + Shortcut 의 Text action 값 갱신.
## 2. iOS Shortcuts 만들기 (Apple Watch + iPhone 공용)
iPhone 의 단축어 (Shortcuts) 앱에서 새 단축어 생성:
### 단축어 이름
`메모 녹음` (또는 임의)
### Action 1: Dictate Text (받아쓰기 텍스트) 또는 Record Audio (오디오 녹음)
**옵션 A — Whisper STT 사용 (권장, 한국어 정확도 ↑)**:
- `Record Audio` action 추가
- Audio Quality: `Normal` (m4a 4kbps 정도, 충분)
- Start Recording: `On Tap`
- Stop Recording: `On Tap`
**옵션 B — iOS native 받아쓰기 (간단, 정확도 보통)**:
- `Dictate Text` action 추가 + 결과를 본문에 직접 넣어 `POST /api/memos/` (text endpoint) 호출
이 가이드는 옵션 A (Whisper STT 경유) 기준.
### Action 2: Get Contents of URL
- URL: `https://document.hyungi.net/api/memos/voice`
- Method: `POST`
- Headers:
- `Authorization`: `Bearer <위 1단계 토큰>` (Text action 으로 분리해두면 갱신 편함)
- Request Body: `Form`
- `audio`: 위 Action 1 의 출력 (Recorded Audio File)
- `recorded_at` (옵션): `Current Date``Format Date` → ISO 8601
- `device_hint` (옵션): `apple_watch` 또는 `iphone`
### Action 3: Show Result (옵션)
- 결과 JSON 의 `id` 확인 (메모 inbox 에 들어갔는지)
### Apple Watch 노출
- 단축어 이름이 `메모 녹음` 이면 Siri 로 "Hey Siri, 메모 녹음" 호출 시 Watch 에서 실행
- 또는 단축어 앱의 별 표시 → Watch 의 단축어 앱 첫 화면에 고정
## 3. 동작 확인
녹음 후 1–수분 안에:
1. `https://document.hyungi.net/memos` 진입
2. 최상단에 새 메모 카드 (🎙️ 음성 배지 + audio player)
3. STT 완료 후 카드 본문에 변환 텍스트 + AI 분류 배지 (task/calendar/activity/reference/note 중 하나)
4. AI 추천이 task/일정/활동이면 4 버튼 표시 — 1-click 으로 events 승급
## 4. 트러블슈팅
| 증상 | 원인 / 조치 |
|---|---|
| 401 응답 | JWT 만료 → web 에서 다시 복사 |
| 413 응답 | 50MB 초과 → 녹음 시간 단축 또는 quality 낮춤 |
| 415 응답 | Content-Type 또는 확장자 미지원 → m4a/mp3 사용 |
| 503 응답 | NAS 쓰기 실패 (드물게) → 1–2분 후 재시도 |
| 메모는 생겼는데 본문 비어 있음 | STT 처리 중 (5분 polling). `stt-service` 컨테이너 health 확인 |
| AI 분류 배지 안 나타남 | classify worker 가 아직 큐 처리 안 함. `processing_queue` stage=classify pending 확인 |
## 5. 운영 1–2주 후 결정 사항
- **service token 발급 endpoint 신설**: 단기 access token 만료가 자주 끊기면 별 PR.
- **frontend mic widget**: 브라우저 직접 녹음 진입점 추가 (현재는 iOS Shortcuts 만).
- **녹음 자동 정리**: 음성 메모 인입 빈도가 높아지면 archive 정책 또는 NAS 폴더 size 모니터링.
- **STT 한국어 정확도 평가**: sample 5건 사용자 평가 후 large-v3 vs 다른 모델 (whisper-v3-turbo 등) 비교.
+4 -7
View File
@@ -189,10 +189,7 @@ cd ~/Documents/code/hyungi_Document_Server && git fetch origin \
&& git merge --ff-only origin/feat/phase2-backfill && git push origin main
```
이유: 미검증 코드를 main 에 미리 박지 않음 / canary 결과 따라 worktree 에서 hot-fix 가능 / nightly cron 은 canonical path 사용 (script 자체).
**추가 (2026-05-03)**: nightly cron 의 `--log-tsv` 와 post-report 출력은 `/app/logs/` 사용 (위 표의 canonical path 가 아님). `/app/evals/markdown/` 는 fastapi 컨테이너에 **bind-mount 되어 있지 않아** 컨테이너 writable layer 에 쓰면 컨테이너 재기동 시 유실. `/app/logs/` 는 rw bind-mount → host `~/Documents/code/hyungi_Document_Server/logs/` 에 영구 저장. 주 1회 commit 시 `cp ~/Documents/code/hyungi_Document_Server/logs/phase2_nightly_log.tsv evals/markdown/` 로 복사 후 git add.
이유: 미검증 코드를 main 에 미리 박지 않음 / canary 결과 따라 worktree 에서 hot-fix 가능 / nightly cron 은 canonical path 사용 (cp 단계 없음).
### enqueue (one-shot, 사용자 승인 게이트)
```bash
@@ -211,7 +208,7 @@ docker exec hyungi_document_server-fastapi-1 python /app/logs/phase2_backfill.py
```bash
docker exec hyungi_document_server-fastapi-1 python /app/scripts/phase2_backfill.py nightly-enqueue \
--limit 50 --max-active-queue 5 \
--log-tsv /app/logs/phase2_nightly_log.tsv # /app/evals/ 미 bind-mount, /app/logs/ rw 사용
--log-tsv /app/evals/markdown/phase2_nightly_log.tsv
```
- 가드 순서: disable flag (`/tmp/phase2_disable`) → marker /ready → active_queue ≤ threshold → DB pool 비어있지 않음 → enqueue.
- 매 sweep log_tsv 한 줄. abort_reason ∈ {disable_flag / marker_unhealthy / active_queue_threshold / pool_empty / empty}.
@@ -220,8 +217,8 @@ docker exec hyungi_document_server-fastapi-1 python /app/scripts/phase2_backfill
### post-report
```bash
docker exec hyungi_document_server-fastapi-1 python /app/scripts/phase2_backfill.py post-report \
--output-csv /app/logs/phase2_post_report.csv \
--output-md /app/logs/phase2_post_report.md \
--output-csv /app/evals/markdown/phase2_post_report.csv \
--output-md /app/evals/markdown/phase2_post_report.md \
--phase2-start 2026-05-03T00:00:00Z
```
- `--phase2-start` ISO timestamp 이후 `md_generated_at` 만 집계 (Phase 2 코드 push 시점 권장).
-10
View File
@@ -200,15 +200,5 @@ body {
}
.markdown-doc .md-image-placeholder-icon { font-style: normal; opacity: 0.7; }
/* Phase 1B.5: ImgAuth selector placeholder figure 안에 삽입하는 실제 <img>.
selector figure 자식만 교체하므로 figure margin 유지 별도 wrapper 불필요. */
.markdown-doc .md-image-placeholder > .md-image {
display: block;
max-width: 100%;
height: auto;
border-radius: 6px;
/* placeholder card 의 dashed 테두리는 swap 시 children 만 교체되어 자연 제거됨. */
}
/* Phase 1C: frontmatter 박스 — 본문 위 메타 표시 */
.md-frontmatter dt { font-weight: 500; }
@@ -16,12 +16,9 @@
* - md_status badge (processing/success/skipped/failed) — MarkdownStatusBadge 위임
*/
import { renderDocMarkdown } from '$lib/utils/docMarkdown';
import { getAccessToken } from '$lib/api';
import MarkdownStatusBadge from '$lib/components/MarkdownStatusBadge.svelte';
type Props = {
/** Phase 1B.5: docimg:img_NNN ref 를 /api/documents/{id}/images/{key}/raw 로 변환할 때 필요. */
documentId?: number | null;
mdContent?: string | null;
mdFrontmatter?: Record<string, unknown> | null;
extractedText?: string | null;
@@ -34,7 +31,6 @@
};
let {
documentId = null,
mdContent = null,
mdFrontmatter = null,
extractedText = null,
@@ -97,47 +93,6 @@
h.insertBefore(a, h.firstChild);
}
});
// Phase 1B.5: docimg:img_NNN placeholder → 실제 <img> swap.
// - data-md-image-internal="1" 인 figure 만 대상 (외부 URL placeholder 는 그대로 유지)
// - documentId 미전달 시 swap 안 함 (prop 누락 / list view 등 안전)
// - fetch 실패 시 placeholder 유지 (onerror 로 figure 복원)
$effect(() => {
void renderedHtml;
if (!containerRef || documentId == null) return;
const placeholders = containerRef.querySelectorAll<HTMLElement>(
'figure.md-image-placeholder[data-md-image-internal="1"]',
);
for (const ph of placeholders) {
if (ph.dataset.mdImageSwapped === '1') continue;
const ref = ph.getAttribute('data-md-image-src');
if (!ref) continue;
const key = ref.split(':', 2)[1];
if (!key) continue;
const alt = ph.getAttribute('data-md-image-alt') ?? '';
// <img> 는 Authorization header 를 못 보내므로 ?token= 쿼리 파라미터로 access
// token 전달 (기존 /api/documents/{id}/file?token= iframe 패턴과 동일).
const accessToken = getAccessToken();
if (!accessToken) {
// 로그아웃 상태 — placeholder 유지
continue;
}
const img = document.createElement('img');
img.src = `/api/documents/${documentId}/images/${encodeURIComponent(key)}/raw?token=${encodeURIComponent(accessToken)}`;
img.alt = alt;
img.loading = 'lazy';
img.className = 'md-image';
img.onerror = () => {
// fetch 실패 → placeholder 복원 (img 만 제거, figure 는 그대로)
try { img.remove(); } catch {}
delete ph.dataset.mdImageSwapped;
};
// figure 안 내용을 img 로 교체 (figure 자체는 보존 → 다음 effect 재실행 시 idempotent 마커 검사 가능)
ph.replaceChildren(img);
ph.dataset.mdImageSwapped = '1';
}
});
</script>
<div class="mb-2">
+1 -11
View File
@@ -2,7 +2,7 @@
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { api } from '$lib/api';
import { ChevronRight, ChevronDown, FolderOpen, FolderTree, Inbox, Clock, Mail, Scale, StickyNote, GraduationCap, CalendarCheck } from 'lucide-svelte';
import { ChevronRight, ChevronDown, FolderOpen, FolderTree, Inbox, Clock, Mail, Scale, StickyNote, GraduationCap } from 'lucide-svelte';
let tree = $state([]);
let loading = $state(true);
@@ -209,16 +209,6 @@
메모
</span>
</a>
<a
href="/events"
class="flex items-center justify-between px-3 py-2 rounded-md text-sm transition-colors
{$page.url.pathname.startsWith('/events') ? 'bg-accent/15 text-accent' : 'text-text hover:bg-surface'}"
>
<span class="flex items-center gap-2">
<CalendarCheck size={16} />
events
</span>
</a>
<a
href="/study"
class="flex items-center justify-between px-3 py-2 rounded-md text-sm transition-colors
+6 -21
View File
@@ -1,14 +1,12 @@
/**
* markdown (Phase 1C + Phase 1B.5 MarkdownDoc ).
* markdown (Phase 1C MarkdownDoc ).
*
* mathMarkdown.ts (study ·) study .
*
* :
* - GFM heading id (anchor id , prefix=doc-)
* - placeholder card . `data-md-image-internal="1"` (href
* `docimg:img_NNN`) MarkdownDoc selector mount <img> .
* `data-md-image-internal="0"` ( URL) placeholder
* fetch (privacy + dependency).
* - placeholder card (1B.5 ImgAuth wiring ).
* src data-md-image-src escape 1B.5 selector .
*
* KaTeX / DOMPurify mathMarkdown.ts .
*/
@@ -48,15 +46,12 @@ docMarked.use({
image(token: any): string {
const rawHref = (token?.href ?? '') as string;
const rawAlt = (token?.text ?? '') as string;
const isInternal = rawHref.startsWith('docimg:');
const basename = rawHref.split('/').pop() ?? rawHref;
const labelSrc = rawAlt || basename || '이미지';
const safeHref = escAttr(rawHref);
const safeAlt = escAttr(rawAlt);
const safeLabel = escText(`[이미지: ${labelSrc} — 아직 표시되지 않음]`);
const internalFlag = isInternal ? '1' : '0';
return (
`<figure class="md-image-placeholder" data-md-img="1" data-md-image-src="${safeHref}" data-md-image-internal="${internalFlag}" data-md-image-alt="${safeAlt}">` +
`<figure class="md-image-placeholder" data-md-img="1" data-md-image-src="${safeHref}">` +
`<div class="md-image-placeholder-card">` +
`<span class="md-image-placeholder-icon" aria-hidden="true">🖼️</span>` +
`<span class="md-image-placeholder-label">${safeLabel}</span>` +
@@ -70,18 +65,8 @@ docMarked.use({
const SANITIZE_OPTS = {
USE_PROFILES: { html: true },
// KaTeX (style + aria-hidden), heading anchor (id), 이미지 마킹 (data-md-img,
// data-md-image-src + data-md-image-internal + data-md-image-alt — 1B.5 ImgAuth
// selector 용), figure caption (figure/figcaption)
ADD_ATTR: [
'style',
'aria-hidden',
'id',
'data-md-img',
'data-md-image-src',
'data-md-image-internal',
'data-md-image-alt',
'loading',
],
// data-md-image-src — 1B.5 ImgAuth selector 용), figure caption (figure/figcaption)
ADD_ATTR: ['style', 'aria-hidden', 'id', 'data-md-img', 'data-md-image-src', 'loading'],
ADD_TAGS: ['figure', 'figcaption'],
FORBID_TAGS: ['script', 'iframe', 'object', 'embed', 'link', 'meta'],
FORBID_ATTR: ['onerror', 'onclick', 'onload', 'onmouseover', 'onfocus'],
-242
View File
@@ -1,242 +0,0 @@
// events 도메인 helper (PR-2 UI MVP).
// type 정의 + 표시용 라벨 + lifecycle 호출 wrapper.
import { api } from '$lib/api';
export type EventKind = 'task' | 'calendar_event' | 'activity_log';
export type EventStatus =
| 'inbox'
| 'next'
| 'scheduled'
| 'in_progress'
| 'done'
| 'cancelled'
| 'deferred';
export type EventSource =
| 'manual'
| 'memo'
| 'email'
| 'chat'
| 'webhook'
| 'git_commit'
| 'claude_code';
export type EventActor = 'manual' | 'eid' | 'email_ingest' | 'system';
export interface EventItem {
id: number;
title: string;
description: string | null;
kind: EventKind;
status: EventStatus;
due_at: string | null;
start_at: string | null;
end_at: string | null;
started_at: string | null;
ended_at: string | null;
all_day: boolean;
timezone: string | null;
defer_until: string | null;
completed_at: string | null;
cancelled_at: string | null;
priority: number | null;
project_tag: string | null;
tags: unknown[];
source: EventSource;
source_ref: string | null;
raw_metadata: Record<string, unknown>;
memo_document_id: number | null;
user_id: number;
created_by: EventActor;
created_at: string;
updated_at: string;
}
export interface EventListResponse {
items: EventItem[];
total: number;
}
export const KIND_LABEL: Record<EventKind, string> = {
task: '할 일',
calendar_event: '일정',
activity_log: '기록',
};
export const STATUS_LABEL: Record<EventStatus, string> = {
inbox: 'Inbox',
next: '다음',
scheduled: '예정',
in_progress: '진행 중',
done: '완료',
cancelled: '취소',
deferred: '연기',
};
export const STATUS_COLOR: Record<EventStatus, string> = {
inbox: 'bg-slate-100 text-slate-700',
next: 'bg-indigo-100 text-indigo-700',
scheduled: 'bg-blue-100 text-blue-700',
in_progress: 'bg-amber-100 text-amber-700',
done: 'bg-emerald-100 text-emerald-700',
cancelled: 'bg-rose-100 text-rose-700',
deferred: 'bg-slate-100 text-slate-500',
};
export const KIND_COLOR: Record<EventKind, string> = {
task: 'border-l-4 border-indigo-400',
calendar_event: 'border-l-4 border-blue-400',
activity_log: 'border-l-4 border-emerald-400',
};
// ─── API wrappers ───
export interface EventCreatePayload {
title: string;
description?: string | null;
kind: EventKind;
status?: EventStatus | null;
due_at?: string | null;
start_at?: string | null;
end_at?: string | null;
started_at?: string | null;
ended_at?: string | null;
all_day?: boolean;
timezone?: string | null;
priority?: number | null;
project_tag?: string | null;
tags?: unknown[];
memo_document_id?: number | null;
source?: EventSource;
source_ref?: string | null;
raw_metadata?: Record<string, unknown>;
}
export async function createEvent(payload: EventCreatePayload): Promise<EventItem> {
return api<EventItem>('/events/', {
method: 'POST',
body: JSON.stringify(payload),
});
}
// 빠른 행동 기록 (5초 UX 핵심) — title 1개만 받고 서버 default 활용.
export async function logActivity(title: string, projectTag?: string | null): Promise<EventItem> {
const payload: EventCreatePayload = {
title,
kind: 'activity_log',
};
if (projectTag) payload.project_tag = projectTag;
return createEvent(payload);
}
export interface EventPatchPayload {
title?: string;
description?: string | null;
due_at?: string | null;
start_at?: string | null;
end_at?: string | null;
started_at?: string | null;
ended_at?: string | null;
all_day?: boolean;
timezone?: string | null;
priority?: number | null;
project_tag?: string | null;
tags?: unknown[];
memo_document_id?: number | null;
}
export async function patchEvent(id: number, payload: EventPatchPayload): Promise<EventItem> {
return api<EventItem>(`/events/${id}`, {
method: 'PATCH',
body: JSON.stringify(payload),
});
}
export async function getEvent(id: number): Promise<EventItem> {
return api<EventItem>(`/events/${id}`);
}
export async function listToday(timezone = 'Asia/Seoul'): Promise<EventListResponse> {
return api<EventListResponse>(`/events/today?timezone=${encodeURIComponent(timezone)}`);
}
export async function listInbox(): Promise<EventListResponse> {
return api<EventListResponse>('/events/inbox');
}
export async function listActivity(fromIso?: string, toIso?: string): Promise<EventListResponse> {
const params = new URLSearchParams();
if (fromIso) params.set('from', fromIso);
if (toIso) params.set('to', toIso);
const qs = params.toString();
return api<EventListResponse>(`/events/activity${qs ? '?' + qs : ''}`);
}
export async function listUpcoming(days = 7): Promise<EventListResponse> {
const now = new Date();
const to = new Date(now.getTime() + days * 86400_000);
const params = new URLSearchParams({
from: now.toISOString(),
to: to.toISOString(),
status: 'scheduled,next,deferred',
});
return api<EventListResponse>(`/events/?${params.toString()}`);
}
// Lifecycle endpoints
export async function completeEvent(id: number): Promise<EventItem> {
return api<EventItem>(`/events/${id}/complete`, { method: 'POST' });
}
export async function cancelEvent(id: number): Promise<EventItem> {
return api<EventItem>(`/events/${id}/cancel`, { method: 'POST' });
}
export async function deferEvent(id: number, deferUntilIso: string): Promise<EventItem> {
return api<EventItem>(`/events/${id}/defer`, {
method: 'POST',
body: JSON.stringify({ defer_until: deferUntilIso }),
});
}
export async function reactivateEvent(id: number): Promise<EventItem> {
return api<EventItem>(`/events/${id}/reactivate`, { method: 'POST' });
}
// 표시 유틸
export function formatDateTimeKst(iso: string | null | undefined, opts?: { dateOnly?: boolean }): string {
if (!iso) return '—';
const d = new Date(iso);
if (isNaN(d.getTime())) return '—';
if (opts?.dateOnly) {
return d.toLocaleDateString('ko-KR', { timeZone: 'Asia/Seoul', month: 'numeric', day: 'numeric', weekday: 'short' });
}
return d.toLocaleString('ko-KR', {
timeZone: 'Asia/Seoul',
month: 'numeric',
day: 'numeric',
hour: '2-digit',
minute: '2-digit',
});
}
export function relativeTimeKo(iso: string | null | undefined, nowMs: number = Date.now()): string {
if (!iso) return '';
const t = new Date(iso).getTime();
if (isNaN(t)) return '';
const diff = nowMs - t;
const abs = Math.abs(diff);
const future = diff < 0;
if (abs < 60_000) return future ? '곧' : '방금';
const min = Math.round(abs / 60_000);
if (min < 60) return future ? `${min}분 후` : `${min}분 전`;
const hr = Math.round(min / 60);
if (hr < 24) return future ? `${hr}시간 후` : `${hr}시간 전`;
const day = Math.round(hr / 24);
if (day < 30) return future ? `${day}일 후` : `${day}일 전`;
const mo = Math.round(day / 30);
if (mo < 12) return future ? `${mo}달 후` : `${mo}달 전`;
const yr = Math.round(mo / 12);
return future ? `${yr}년 후` : `${yr}년 전`;
}
+1 -1
View File
@@ -98,7 +98,7 @@
<Button variant="ghost" size="sm" href="/ask" class={isActive('/ask') ? 'text-accent' : ''}>질문</Button>
<Button variant="ghost" size="sm" href="/memos" class={isActive('/memos') ? 'text-accent' : ''}>메모</Button>
<Button variant="ghost" size="sm" href="/study" class={isActive('/study') ? 'text-accent' : ''}>공부</Button>
<Button variant="ghost" size="sm" href="/news" class={isActive('/news') ? 'text-accent' : ''}>아침 브리핑</Button>
<Button variant="ghost" size="sm" href="/news" class={isActive('/news') ? 'text-accent' : ''}>뉴스</Button>
<Button variant="ghost" size="sm" href="/inbox" class={isActive('/inbox') ? 'text-accent' : ''}>Inbox</Button>
<div class="relative">
<IconButton
@@ -249,7 +249,6 @@
<Card class="min-h-[500px]">
{#if viewerType === 'markdown' || viewerType === 'hwp-markdown'}
<MarkdownDoc
documentId={doc.id}
mdContent={doc.md_content}
mdFrontmatter={doc.md_frontmatter}
mdStatus={doc.md_status}
@@ -284,7 +283,6 @@
</div>
{#if pdfViewMode === 'markdown' && canShowMarkdown}
<MarkdownDoc
documentId={doc.id}
mdContent={doc.md_content}
mdFrontmatter={doc.md_frontmatter}
mdStatus={doc.md_status}
@@ -337,7 +335,6 @@
</div>
{#if doc.md_content || doc.extracted_text}
<MarkdownDoc
documentId={doc.id}
mdContent={doc.md_content}
mdFrontmatter={doc.md_frontmatter}
mdStatus={doc.md_status}
-311
View File
@@ -1,311 +0,0 @@
<script lang="ts">
import { onMount } from 'svelte';
import { goto } from '$app/navigation';
import { addToast } from '$lib/stores/toast';
import Button from '$lib/components/ui/Button.svelte';
import Card from '$lib/components/ui/Card.svelte';
import EmptyState from '$lib/components/ui/EmptyState.svelte';
import Skeleton from '$lib/components/ui/Skeleton.svelte';
import Tabs from '$lib/components/ui/Tabs.svelte';
import {
Inbox,
Sun,
CalendarClock,
History,
Check,
X as XIcon,
Pause,
RotateCcw,
Plus,
} from 'lucide-svelte';
import {
type EventItem,
type EventKind,
KIND_LABEL,
STATUS_LABEL,
STATUS_COLOR,
KIND_COLOR,
logActivity,
listToday,
listInbox,
listActivity,
listUpcoming,
completeEvent,
cancelEvent,
deferEvent,
reactivateEvent,
formatDateTimeKst,
relativeTimeKo,
} from '$lib/utils/events';
// 활성 탭
let activeTab = $state('today');
// 데이터
let todayItems = $state<EventItem[]>([]);
let inboxItems = $state<EventItem[]>([]);
let upcomingItems = $state<EventItem[]>([]);
let activityItems = $state<EventItem[]>([]);
let loading = $state<Record<string, boolean>>({ today: true, inbox: true, upcoming: true, activity: true });
// 빠른 행동 기록 (5초 UX 핵심)
let quickInput = $state('');
let quickSubmitting = $state(false);
let quickInputRef = $state<HTMLInputElement | null>(null);
// 상대 시각 tick (1분)
let nowTick = $state(Date.now());
$effect(() => {
const id = setInterval(() => { nowTick = Date.now(); }, 60_000);
return () => clearInterval(id);
});
onMount(() => {
loadAll();
});
async function loadAll() {
await Promise.all([loadToday(), loadInbox(), loadActivity(), loadUpcoming()]);
}
async function loadToday() {
loading.today = true;
try {
const res = await listToday();
todayItems = res.items;
} catch (err) {
addToast('error', '오늘 일정 로드 실패');
} finally {
loading.today = false;
}
}
async function loadInbox() {
loading.inbox = true;
try {
const res = await listInbox();
inboxItems = res.items;
} catch (err) {
addToast('error', 'Inbox 로드 실패');
} finally {
loading.inbox = false;
}
}
async function loadActivity() {
loading.activity = true;
try {
// 최근 7일
const to = new Date();
const from = new Date(to.getTime() - 7 * 86400_000);
const res = await listActivity(from.toISOString(), to.toISOString());
activityItems = res.items;
} catch (err) {
addToast('error', '활동 로드 실패');
} finally {
loading.activity = false;
}
}
async function loadUpcoming() {
loading.upcoming = true;
try {
const res = await listUpcoming(7);
upcomingItems = res.items;
} catch (err) {
addToast('error', 'Upcoming 로드 실패');
} finally {
loading.upcoming = false;
}
}
// 빠른 행동 기록 — Enter 한 번에 저장
async function submitQuick(e?: Event) {
e?.preventDefault();
const title = quickInput.trim();
if (!title || quickSubmitting) return;
quickSubmitting = true;
try {
const item = await logActivity(title);
// 새 항목을 Activity 탭 최상단에 즉시 반영
activityItems = [item, ...activityItems];
quickInput = '';
addToast('success', '기록됨');
// 입력 포커스 유지 (연속 입력)
quickInputRef?.focus();
} catch (err) {
const detail = (err as { detail?: string })?.detail ?? '저장 실패';
addToast('error', detail);
} finally {
quickSubmitting = false;
}
}
// Lifecycle ops
async function doComplete(item: EventItem) {
try {
const updated = await completeEvent(item.id);
replaceItem(updated);
addToast('success', '완료 처리');
} catch (err) {
addToast('error', '완료 실패');
}
}
async function doCancel(item: EventItem) {
if (!confirm(`"${item.title}" 취소할까요?`)) return;
try {
const updated = await cancelEvent(item.id);
replaceItem(updated);
addToast('success', '취소됨');
} catch (err) {
addToast('error', '취소 실패');
}
}
async function doDefer(item: EventItem) {
// 기본 = 내일 같은 시각
const dt = new Date(Date.now() + 86400_000);
const isoLocal = dt.toISOString().slice(0, 16);
const input = prompt('연기할 시각 (YYYY-MM-DDTHH:MM, KST)', isoLocal);
if (!input) return;
try {
const iso = new Date(input).toISOString();
const updated = await deferEvent(item.id, iso);
replaceItem(updated);
addToast('success', '연기됨');
} catch (err) {
addToast('error', '연기 실패');
}
}
async function doReactivate(item: EventItem) {
try {
const updated = await reactivateEvent(item.id);
replaceItem(updated);
addToast('success', '재활성');
} catch (err) {
addToast('error', '재활성 실패');
}
}
function replaceItem(item: EventItem) {
const replacer = (arr: EventItem[]) =>
arr.map((x) => (x.id === item.id ? item : x));
todayItems = replacer(todayItems);
inboxItems = replacer(inboxItems);
upcomingItems = replacer(upcomingItems);
activityItems = replacer(activityItems);
}
function timeLabel(item: EventItem): string {
if (item.kind === 'task') {
return item.due_at ? `마감 ${formatDateTimeKst(item.due_at)}` : '';
}
if (item.kind === 'calendar_event') {
const s = item.start_at ? formatDateTimeKst(item.start_at) : '';
const e = item.end_at ? formatDateTimeKst(item.end_at) : '';
return s && e ? `${s} ${e}` : s;
}
// activity_log
return item.started_at ? relativeTimeKo(item.started_at, nowTick) : '';
}
const tabs = [
{ id: 'today', label: '오늘' },
{ id: 'inbox', label: 'Inbox' },
{ id: 'upcoming', label: '예정' },
{ id: 'activity', label: '활동' },
];
</script>
<svelte:head>
<title>events · hyungi PKM</title>
</svelte:head>
<div class="mx-auto max-w-3xl space-y-6 px-4 py-6">
<header class="flex items-end justify-between gap-3">
<div class="space-y-1">
<h1 class="text-2xl font-semibold">events</h1>
<p class="text-sm text-slate-500">개인 운영 로그 · 일정 · 할 일 · 회고</p>
</div>
<Button variant="secondary" size="sm" icon={Plus} href="/events/new">새 항목</Button>
</header>
<!-- 빠른 행동 기록 (5초 UX 핵심) -->
<form onsubmit={submitQuick} class="flex items-stretch gap-2 rounded-lg border border-slate-200 bg-white p-2 shadow-sm">
<input
bind:this={quickInputRef}
bind:value={quickInput}
type="text"
placeholder="방금 한 일 기록… (Enter 저장)"
class="flex-1 rounded-md border-0 bg-transparent px-2 py-2 text-sm outline-none focus:ring-0"
disabled={quickSubmitting}
autocomplete="off"
/>
<Button type="submit" variant="primary" size="sm" loading={quickSubmitting} disabled={!quickInput.trim()} icon={Plus}>
기록
</Button>
</form>
<Tabs {tabs} bind:value={activeTab}>
{#snippet children(activeId)}
{#if activeId === 'today'}
{@render eventList(todayItems, loading.today, 'today', Sun, '오늘 할 일이 없습니다')}
{:else if activeId === 'inbox'}
{@render eventList(inboxItems, loading.inbox, 'inbox', Inbox, 'Inbox 가 비어 있습니다')}
{:else if activeId === 'upcoming'}
{@render eventList(upcomingItems, loading.upcoming, 'upcoming', CalendarClock, '예정된 일정 없음')}
{:else if activeId === 'activity'}
{@render eventList(activityItems, loading.activity, 'activity', History, '최근 활동 없음')}
{/if}
{/snippet}
</Tabs>
</div>
{#snippet eventList(items: EventItem[], isLoading: boolean, tabId: string, EmptyIcon: typeof Sun, emptyMsg: string)}
{#if isLoading}
<div class="space-y-2">
{#each Array(3) as _}
<Skeleton class="h-16 rounded-lg" />
{/each}
</div>
{:else if items.length === 0}
<EmptyState icon={EmptyIcon} message={emptyMsg} />
{:else}
<ul class="space-y-2">
{#each items as item (item.id)}
<li>
<Card class="flex items-start gap-3 p-3 {KIND_COLOR[item.kind]}">
<div class="flex-1 min-w-0">
<div class="flex items-center gap-2 text-xs text-slate-500">
<span>{KIND_LABEL[item.kind]}</span>
<span class="rounded px-1.5 py-0.5 text-[10px] {STATUS_COLOR[item.status]}">
{STATUS_LABEL[item.status]}
</span>
{#if item.project_tag}
<span class="text-slate-400">#{item.project_tag}</span>
{/if}
</div>
<a href="/events/{item.id}" class="mt-1 block break-words text-sm font-medium hover:underline">
{item.title}
</a>
{#if timeLabel(item)}
<div class="mt-0.5 text-xs text-slate-500">{timeLabel(item)}</div>
{/if}
</div>
<div class="flex shrink-0 items-center gap-1">
{#if item.status !== 'done' && item.status !== 'cancelled' && item.kind !== 'activity_log'}
<Button variant="ghost" size="sm" icon={Check} onclick={() => doComplete(item)} title="완료" />
<Button variant="ghost" size="sm" icon={Pause} onclick={() => doDefer(item)} title="연기" />
<Button variant="ghost" size="sm" icon={XIcon} onclick={() => doCancel(item)} title="취소" />
{:else if (item.status === 'done' || item.status === 'cancelled') && item.kind !== 'activity_log'}
<Button variant="ghost" size="sm" icon={RotateCcw} onclick={() => doReactivate(item)} title="재활성" />
{/if}
</div>
</Card>
</li>
{/each}
</ul>
{/if}
{/snippet}
@@ -1,308 +0,0 @@
<script lang="ts">
import { onMount } from 'svelte';
import { goto } from '$app/navigation';
import { page } from '$app/state';
import { api } from '$lib/api';
import { addToast } from '$lib/stores/toast';
import Button from '$lib/components/ui/Button.svelte';
import Card from '$lib/components/ui/Card.svelte';
import Skeleton from '$lib/components/ui/Skeleton.svelte';
import TextInput from '$lib/components/ui/TextInput.svelte';
import Textarea from '$lib/components/ui/Textarea.svelte';
import {
ArrowLeft,
Check,
X as XIcon,
Pause,
RotateCcw,
Save,
} from 'lucide-svelte';
import {
type EventItem,
type EventKind,
KIND_LABEL,
STATUS_LABEL,
STATUS_COLOR,
KIND_COLOR,
getEvent,
patchEvent,
completeEvent,
cancelEvent,
deferEvent,
reactivateEvent,
formatDateTimeKst,
relativeTimeKo,
} from '$lib/utils/events';
const eventId = $derived(parseInt(page.params.id ?? '0', 10));
let item = $state<EventItem | null>(null);
let history = $state<HistoryEntry[]>([]);
let loading = $state(true);
let saving = $state(false);
// edit fields (PATCH 허용만)
let editTitle = $state('');
let editDescription = $state('');
let editDueAt = $state('');
let editStartAt = $state('');
let editEndAt = $state('');
let editAllDay = $state(false);
let editPriority = $state<number | ''>('');
let editProjectTag = $state('');
interface HistoryEntry {
id: number;
event_id: number;
changed_at: string;
changed_by: string;
change_kind: string;
before: Record<string, unknown> | null;
after: Record<string, unknown>;
}
const CHANGE_KIND_LABEL: Record<string, string> = {
create: '생성',
reschedule: '시간 변경',
defer: '연기',
reactivate: '재활성',
complete: '완료',
cancel: '취소',
};
onMount(() => {
if (!eventId) {
addToast('error', '유효하지 않은 ID');
goto('/events');
return;
}
void load();
});
async function load() {
loading = true;
try {
const [ev, hist] = await Promise.all([
getEvent(eventId),
api<{ items: HistoryEntry[] }>(`/events/${eventId}/history`),
]);
item = ev;
history = hist.items;
// edit fields seed
editTitle = ev.title;
editDescription = ev.description ?? '';
editDueAt = isoToLocalInput(ev.due_at);
editStartAt = isoToLocalInput(ev.start_at);
editEndAt = isoToLocalInput(ev.end_at);
editAllDay = ev.all_day;
editPriority = ev.priority ?? '';
editProjectTag = ev.project_tag ?? '';
} catch (err) {
const detail = (err as { detail?: string })?.detail ?? '로드 실패';
addToast('error', detail);
if ((err as { status?: number })?.status === 404) goto('/events');
} finally {
loading = false;
}
}
function isoToLocalInput(iso: string | null): string {
if (!iso) return '';
const d = new Date(iso);
if (isNaN(d.getTime())) return '';
// YYYY-MM-DDTHH:MM (KST 변환은 브라우저 default)
const off = d.getTimezoneOffset();
const local = new Date(d.getTime() - off * 60_000);
return local.toISOString().slice(0, 16);
}
function localInputToIso(s: string): string | null {
if (!s) return null;
const d = new Date(s);
if (isNaN(d.getTime())) return null;
return d.toISOString();
}
async function save() {
if (!item || saving) return;
saving = true;
try {
const patch: Record<string, unknown> = {};
if (editTitle !== item.title) patch.title = editTitle;
if ((editDescription || null) !== (item.description ?? null))
patch.description = editDescription || null;
const newDue = localInputToIso(editDueAt);
if (newDue !== item.due_at) patch.due_at = newDue;
const newStart = localInputToIso(editStartAt);
if (newStart !== item.start_at) patch.start_at = newStart;
const newEnd = localInputToIso(editEndAt);
if (newEnd !== item.end_at) patch.end_at = newEnd;
if (editAllDay !== item.all_day) patch.all_day = editAllDay;
const newPriority = editPriority === '' ? null : Number(editPriority);
if (newPriority !== item.priority) patch.priority = newPriority;
if ((editProjectTag || null) !== (item.project_tag ?? null))
patch.project_tag = editProjectTag || null;
if (Object.keys(patch).length === 0) {
addToast('info', '변경 사항 없음');
return;
}
const updated = await patchEvent(item.id, patch);
item = updated;
addToast('success', '저장됨');
// history 가 새 reschedule row 가질 수 있어 reload
await reloadHistory();
} catch (err) {
const detail = (err as { detail?: string })?.detail ?? '저장 실패';
addToast('error', detail);
} finally {
saving = false;
}
}
async function reloadHistory() {
if (!item) return;
try {
const hist = await api<{ items: HistoryEntry[] }>(`/events/${item.id}/history`);
history = hist.items;
} catch (err) {
// silent
}
}
async function lifecycle(action: 'complete' | 'cancel' | 'defer' | 'reactivate') {
if (!item) return;
try {
let updated: EventItem;
if (action === 'complete') updated = await completeEvent(item.id);
else if (action === 'cancel') {
if (!confirm(`"${item.title}" 취소할까요?`)) return;
updated = await cancelEvent(item.id);
} else if (action === 'defer') {
const dt = new Date(Date.now() + 86400_000);
const input = prompt('연기할 시각 (YYYY-MM-DDTHH:MM, KST)', isoToLocalInput(dt.toISOString()));
if (!input) return;
updated = await deferEvent(item.id, new Date(input).toISOString());
} else {
updated = await reactivateEvent(item.id);
}
item = updated;
addToast('success', '처리됨');
await reloadHistory();
} catch (err) {
const detail = (err as { detail?: string })?.detail ?? '실패';
addToast('error', detail);
}
}
function actorLabel(actor: string): string {
return { manual: '나', eid: '이드', email_ingest: '메일', system: '시스템' }[actor] ?? actor;
}
</script>
<svelte:head>
<title>{item?.title ?? 'events'} · hyungi PKM</title>
</svelte:head>
<div class="mx-auto max-w-3xl space-y-6 px-4 py-6">
<div class="flex items-center gap-2">
<Button variant="ghost" size="sm" icon={ArrowLeft} href="/events" />
<h1 class="text-xl font-semibold">events 상세</h1>
</div>
{#if loading}
<Skeleton class="h-64 rounded-lg" />
{:else if item}
<Card class="space-y-4 p-4 {KIND_COLOR[item.kind]}">
<div class="flex flex-wrap items-center gap-2 text-xs text-slate-500">
<span>{KIND_LABEL[item.kind]}</span>
<span class="rounded px-1.5 py-0.5 text-[10px] {STATUS_COLOR[item.status]}">
{STATUS_LABEL[item.status]}
</span>
<span class="text-slate-400">id #{item.id}</span>
<span class="text-slate-400">source: {item.source}</span>
</div>
<div class="space-y-3">
<label class="block">
<span class="block text-xs text-slate-500">제목</span>
<TextInput bind:value={editTitle} />
</label>
<label class="block">
<span class="block text-xs text-slate-500">설명 (markdown)</span>
<Textarea bind:value={editDescription} rows={4} />
</label>
{#if item.kind === 'task'}
<label class="block">
<span class="block text-xs text-slate-500">마감 (due_at)</span>
<input type="datetime-local" bind:value={editDueAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
{/if}
{#if item.kind === 'calendar_event' || item.kind === 'task'}
<div class="flex flex-wrap gap-3">
<label class="block">
<span class="block text-xs text-slate-500">시작 (start_at)</span>
<input type="datetime-local" bind:value={editStartAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">종료 (end_at)</span>
<input type="datetime-local" bind:value={editEndAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
</div>
<label class="inline-flex items-center gap-2 text-sm">
<input type="checkbox" bind:checked={editAllDay} />
<span>all-day</span>
</label>
{/if}
<div class="flex flex-wrap gap-3">
<label class="block">
<span class="block text-xs text-slate-500">우선순위 (14)</span>
<input type="number" min="1" max="4" bind:value={editPriority} class="w-20 rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">프로젝트 태그</span>
<TextInput bind:value={editProjectTag} placeholder="(없음)" />
</label>
</div>
</div>
<div class="flex flex-wrap items-center justify-between gap-2 pt-2">
<div class="flex flex-wrap items-center gap-2">
{#if item.status !== 'done' && item.status !== 'cancelled' && item.kind !== 'activity_log'}
<Button variant="primary" size="sm" icon={Check} onclick={() => lifecycle('complete')}>완료</Button>
<Button variant="secondary" size="sm" icon={Pause} onclick={() => lifecycle('defer')}>연기</Button>
<Button variant="ghost" size="sm" icon={XIcon} onclick={() => lifecycle('cancel')}>취소</Button>
{:else if (item.status === 'done' || item.status === 'cancelled') && item.kind !== 'activity_log'}
<Button variant="secondary" size="sm" icon={RotateCcw} onclick={() => lifecycle('reactivate')}>재활성</Button>
{/if}
</div>
<Button variant="primary" size="sm" icon={Save} loading={saving} onclick={save}>저장</Button>
</div>
</Card>
<section class="space-y-2">
<h2 class="text-sm font-semibold text-slate-700">변경 이력</h2>
{#if history.length === 0}
<p class="text-sm text-slate-500">이력 없음</p>
{:else}
<ol class="space-y-1.5 border-l border-slate-200 pl-4">
{#each history as h (h.id)}
<li class="relative">
<span class="absolute -left-[21px] top-1.5 h-2 w-2 rounded-full bg-slate-300"></span>
<div class="text-xs text-slate-500">
{formatDateTimeKst(h.changed_at)} · {actorLabel(h.changed_by)}
</div>
<div class="text-sm">
{CHANGE_KIND_LABEL[h.change_kind] ?? h.change_kind}
</div>
</li>
{/each}
</ol>
{/if}
</section>
{/if}
</div>
-171
View File
@@ -1,171 +0,0 @@
<script lang="ts">
import { goto } from '$app/navigation';
import { addToast } from '$lib/stores/toast';
import Button from '$lib/components/ui/Button.svelte';
import Card from '$lib/components/ui/Card.svelte';
import TextInput from '$lib/components/ui/TextInput.svelte';
import Textarea from '$lib/components/ui/Textarea.svelte';
import { ArrowLeft } from 'lucide-svelte';
import {
type EventKind,
type EventCreatePayload,
KIND_LABEL,
createEvent,
} from '$lib/utils/events';
let kind = $state<EventKind>('task');
let title = $state('');
let description = $state('');
let dueAt = $state('');
let startAt = $state('');
let endAt = $state('');
let allDay = $state(false);
let startedAt = $state('');
let endedAt = $state('');
let priority = $state<number | ''>('');
let projectTag = $state('');
let submitting = $state(false);
function localToIso(s: string): string | null {
if (!s) return null;
const d = new Date(s);
return isNaN(d.getTime()) ? null : d.toISOString();
}
async function submit() {
if (submitting) return;
if (!title.trim()) {
addToast('error', '제목을 입력하세요');
return;
}
submitting = true;
try {
const payload: EventCreatePayload = {
title: title.trim(),
kind,
};
if (description.trim()) payload.description = description.trim();
if (kind === 'task') {
if (dueAt) payload.due_at = localToIso(dueAt);
if (startAt) payload.start_at = localToIso(startAt);
} else if (kind === 'calendar_event') {
if (!startAt) {
addToast('error', 'calendar_event 는 시작 시간 필수');
submitting = false;
return;
}
payload.start_at = localToIso(startAt);
if (endAt) payload.end_at = localToIso(endAt);
payload.all_day = allDay;
} else if (kind === 'activity_log') {
if (startedAt) payload.started_at = localToIso(startedAt);
if (endedAt) payload.ended_at = localToIso(endedAt);
// 모두 비우면 서버 default (now())
}
if (priority !== '') payload.priority = Number(priority);
if (projectTag.trim()) payload.project_tag = projectTag.trim();
const item = await createEvent(payload);
addToast('success', '생성됨');
goto(`/events/${item.id}`);
} catch (err) {
const detail = (err as { detail?: string })?.detail ?? '생성 실패';
addToast('error', detail);
} finally {
submitting = false;
}
}
</script>
<svelte:head>
<title>새 events · hyungi PKM</title>
</svelte:head>
<div class="mx-auto max-w-xl space-y-4 px-4 py-6">
<div class="flex items-center gap-2">
<Button variant="ghost" size="sm" icon={ArrowLeft} href="/events" />
<h1 class="text-xl font-semibold">새 events</h1>
</div>
<Card class="space-y-4 p-4">
<div>
<span class="block text-xs text-slate-500">유형</span>
<div class="mt-1 flex gap-2">
{#each ['task', 'calendar_event', 'activity_log'] as k}
<button
type="button"
class="rounded-md border px-3 py-1.5 text-sm transition {kind === k ? 'border-indigo-500 bg-indigo-50 text-indigo-700' : 'border-slate-200 hover:bg-slate-50'}"
onclick={() => (kind = k as EventKind)}
>
{KIND_LABEL[k as EventKind]}
</button>
{/each}
</div>
</div>
<label class="block">
<span class="block text-xs text-slate-500">제목 *</span>
<TextInput bind:value={title} placeholder="할 / 일정 / 기록한 행동" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">설명 (markdown, 선택)</span>
<Textarea bind:value={description} rows={3} />
</label>
{#if kind === 'task'}
<label class="block">
<span class="block text-xs text-slate-500">마감 (due_at, 선택)</span>
<input type="datetime-local" bind:value={dueAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">시작 (start_at, 선택 — "14:00 전화" 같은 경우)</span>
<input type="datetime-local" bind:value={startAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
{:else if kind === 'calendar_event'}
<div class="flex flex-wrap gap-3">
<label class="block">
<span class="block text-xs text-slate-500">시작 (start_at) *</span>
<input type="datetime-local" bind:value={startAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" required />
</label>
<label class="block">
<span class="block text-xs text-slate-500">종료 (end_at, 선택)</span>
<input type="datetime-local" bind:value={endAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
</div>
<label class="inline-flex items-center gap-2 text-sm">
<input type="checkbox" bind:checked={allDay} />
<span>all-day</span>
</label>
{:else if kind === 'activity_log'}
<p class="text-xs text-slate-500">
비워두면 시작·종료 모두 <code>now()</code> 로 기록됩니다 (방금 한 일).
</p>
<div class="flex flex-wrap gap-3">
<label class="block">
<span class="block text-xs text-slate-500">시작 시각 (선택)</span>
<input type="datetime-local" bind:value={startedAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">종료 시각 (선택)</span>
<input type="datetime-local" bind:value={endedAt} class="rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
</div>
{/if}
<div class="flex flex-wrap gap-3">
<label class="block">
<span class="block text-xs text-slate-500">우선순위 (14)</span>
<input type="number" min="1" max="4" bind:value={priority} class="w-20 rounded-md border border-slate-200 px-2 py-1 text-sm" />
</label>
<label class="block">
<span class="block text-xs text-slate-500">프로젝트 태그</span>
<TextInput bind:value={projectTag} placeholder="예: doc-server" />
</label>
</div>
<div class="flex justify-end">
<Button variant="primary" size="md" loading={submitting} onclick={submit}>생성</Button>
</div>
</Card>
</div>
+5 -108
View File
@@ -3,8 +3,7 @@
import { api } from '$lib/api';
import { addToast } from '$lib/stores/toast';
import { renderMemoHtml, todayIso, countHiddenTasks, DEFAULT_HIDE_AFTER_MS } from '$lib/utils/memoRenderer';
import { Pin, PinOff, Pencil, Trash2, Eye, EyeOff, X, Check, Archive, ArchiveRestore, ListChecks, Bold, Heading, CalendarDays, Mic, Calendar, Activity, ArrowRight, FileText, BookOpen, Mail } from 'lucide-svelte';
import { getAccessToken } from '$lib/api';
import { Pin, PinOff, Pencil, Trash2, Eye, EyeOff, X, Check, Archive, ArchiveRestore, ListChecks, Bold, Heading, CalendarDays } from 'lucide-svelte';
import Button from '$lib/components/ui/Button.svelte';
import Card from '$lib/components/ui/Card.svelte';
import EmptyState from '$lib/components/ui/EmptyState.svelte';
@@ -250,54 +249,6 @@
}
}
// ─── PR-2B: 메모 → events 1-click promote ───
async function promoteMemo(memoId, kind) {
const labels = { task: '할 일', calendar_event: '일정', activity_log: '활동 기록' };
try {
const res = await api(`/memos/${memoId}/promote-to-event`, {
method: 'POST',
body: JSON.stringify({ kind }),
});
addToast('success', `${labels[kind]} 로 승급 (events #${res.event_id})`);
// 로컬 상태 갱신 — promoted 표시를 위해 메모에 임시 마킹 (서버 미반영, UX 힌트만)
memos = memos.map((m) => m.id === memoId ? { ...m, _last_promoted: { event_id: res.event_id, kind } } : m);
} catch (err) {
addToast('error', err?.detail || '승급 실패');
}
}
async function dismissEventSuggestion(memoId) {
try {
const updated = await api(`/memos/${memoId}/dismiss-event-suggestion`, { method: 'POST' });
memos = memos.map((m) => (m.id === memoId ? updated : m));
} catch (err) {
addToast('error', '처리 실패');
}
}
// voice 메모 audio URL — /api/documents/{id}/file?token= 패턴 재사용
function voiceAudioUrl(memoId) {
const token = getAccessToken();
return `/api/documents/${memoId}/file?token=${encodeURIComponent(token ?? '')}`;
}
// ai_event_kind 별 라벨 / 색상
const KIND_LABELS = {
note: '메모',
task: '할 일',
calendar_event: '일정',
activity_log: '활동',
reference: '참조',
};
const KIND_BADGE_CLASS = {
note: 'bg-surface text-dim',
task: 'bg-indigo-100 text-indigo-700',
calendar_event: 'bg-blue-100 text-blue-700',
activity_log: 'bg-emerald-100 text-emerald-700',
reference: 'bg-amber-100 text-amber-700',
};
async function handleCheckboxClick(e, memo) {
const target = e.target;
if (target.tagName !== 'INPUT' || target.type !== 'checkbox') return;
@@ -522,41 +473,9 @@
</div>
{:else}
<!-- ═══ 읽기 모드 ═══ -->
<!-- PR-2B/2C: 분류 배지 + voice icon + 마지막 promote 결과 -->
{#if memo.source_channel === 'voice' || memo.source_channel === 'email' || memo.ai_event_kind || memo._last_promoted}
<div class="flex flex-wrap items-center gap-1.5 mb-1.5">
{#if memo.source_channel === 'voice'}
<span class="inline-flex items-center gap-1 rounded px-1.5 py-0.5 text-[10px] bg-rose-100 text-rose-700" title="음성 메모">
<Mic size={10} /> 음성
</span>
{/if}
{#if memo.source_channel === 'email'}
<span class="inline-flex items-center gap-1 rounded px-1.5 py-0.5 text-[10px] bg-sky-100 text-sky-700" title={memo.email_subject || '이메일 inbox'}>
<Mail size={10} /> 이메일
</span>
{/if}
{#if memo.ai_event_kind && memo.ai_event_kind !== 'note'}
<span class="inline-flex items-center rounded px-1.5 py-0.5 text-[10px] {KIND_BADGE_CLASS[memo.ai_event_kind] || 'bg-surface text-dim'}">
AI 추천: {KIND_LABELS[memo.ai_event_kind] || memo.ai_event_kind}{memo.ai_event_confidence != null ? ` · ${Math.round(memo.ai_event_confidence * 100)}%` : ''}
</span>
{/if}
{#if memo._last_promoted}
<a href={`/events/${memo._last_promoted.event_id}`} class="inline-flex items-center gap-1 rounded px-1.5 py-0.5 text-[10px] bg-emerald-100 text-emerald-700 hover:bg-emerald-200">
<ArrowRight size={10} /> events #{memo._last_promoted.event_id}
</a>
{/if}
</div>
{/if}
{#if memo.title && memo.title !== memo.content?.split('\n')[0]?.replace(/^#+\s*/, '').slice(0, 80)}
<p class="text-xs font-semibold text-dim mb-1">{memo.title}</p>
{/if}
<!-- voice 메모 audio player -->
{#if memo.source_channel === 'voice' && memo.file_path}
<audio controls preload="metadata" src={voiceAudioUrl(memo.id)} class="w-full mb-2 h-9"></audio>
{/if}
<!-- svelte-ignore a11y_click_events_have_key_events -->
<!-- svelte-ignore a11y_no_static_element_interactions -->
<div
@@ -564,14 +483,10 @@
class:show-hidden={showHiddenByMemo[memo.id]}
onclick={(e) => handleCheckboxClick(e, memo)}
>
{#if memo.content}
{@html renderMemoHtml(memo.content || '', {
taskStates: memo.memo_task_state ?? {},
now: nowTick,
})}
{:else if memo.source_channel === 'voice'}
<p class="text-xs text-dim italic">음성 → 텍스트 변환 대기 중…</p>
{/if}
{@html renderMemoHtml(memo.content || '', {
taskStates: memo.memo_task_state ?? {},
now: nowTick,
})}
</div>
{#if countHiddenTasks(memo.memo_task_state, nowTick, DEFAULT_HIDE_AFTER_MS) > 0 || showHiddenByMemo[memo.id]}
<button
@@ -588,24 +503,6 @@
{/if}
{/if}
<!-- PR-2B: AI triage 결과 → 1-click promote 버튼 (분류 결과 있고 dismissed 아닌 메모) -->
{#if editingId !== memo.id && memo.ai_event_kind && memo.ai_event_kind !== 'note' && !memo._last_promoted && !showArchived}
<div class="flex flex-wrap gap-1 mt-2 pt-2 border-t border-default/30">
<button onclick={() => promoteMemo(memo.id, 'task')} class={`inline-flex items-center gap-1 px-2 py-1 rounded text-[11px] transition-colors ${memo.ai_event_kind === 'task' ? 'bg-indigo-500 text-white hover:bg-indigo-600' : 'bg-surface text-dim hover:bg-surface-hover hover:text-text'}`}>
<FileText size={11} /> 할 일로
</button>
<button onclick={() => promoteMemo(memo.id, 'calendar_event')} class={`inline-flex items-center gap-1 px-2 py-1 rounded text-[11px] transition-colors ${memo.ai_event_kind === 'calendar_event' ? 'bg-blue-500 text-white hover:bg-blue-600' : 'bg-surface text-dim hover:bg-surface-hover hover:text-text'}`}>
<Calendar size={11} /> 일정으로
</button>
<button onclick={() => promoteMemo(memo.id, 'activity_log')} class={`inline-flex items-center gap-1 px-2 py-1 rounded text-[11px] transition-colors ${memo.ai_event_kind === 'activity_log' ? 'bg-emerald-500 text-white hover:bg-emerald-600' : 'bg-surface text-dim hover:bg-surface-hover hover:text-text'}`}>
<Activity size={11} /> 활동으로
</button>
<button onclick={() => dismissEventSuggestion(memo.id)} class="inline-flex items-center gap-1 px-2 py-1 rounded text-[11px] bg-surface text-dim hover:bg-surface-hover hover:text-text transition-colors">
<X size={11} /> 그냥 메모
</button>
</div>
{/if}
<!-- 태그 + 하단 -->
{#if editingId !== memo.id}
{#if memo.user_tags?.length || memo.ai_tags?.length}
+335 -194
View File
@@ -1,216 +1,357 @@
<script lang="ts">
// 야간 수집 뉴스 브리핑 (Morning Briefing) — 매일 KST 05:10 cron 으로 만들어진
// topic×country 비교 분석 1페이지 카드. 기존 article list / source tree /
// 북마크 / 노트 / 필터 UI 는 폐기 (PR-MorningBriefing-2 swap).
<script>
import { onMount } from 'svelte';
import { api, type ApiError } from '$lib/api';
import Card from '$lib/components/ui/Card.svelte';
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { api } from '$lib/api';
import { addToast } from '$lib/stores/toast';
import { marked } from 'marked';
import DOMPurify from 'dompurify';
import { Bookmark, BookmarkCheck } from 'lucide-svelte';
type CountryPerspective = {
country: string;
summary: string;
article_ids: number[];
};
type KeyQuote = {
country: string;
source: string;
quote: string;
};
type BriefingTopic = {
topic_rank: number;
topic_label: string;
headline: string;
country_perspectives: CountryPerspective[];
divergences: string[];
convergences: string[];
key_quotes: KeyQuote[];
historical_context: string | null;
cluster_members: number[];
article_count: number;
country_count: number;
importance_score: number;
llm_fallback_used: boolean;
};
type Briefing = {
briefing_date: string;
window_start: string;
window_end: string;
total_articles: number;
total_countries: number;
total_topics: number;
llm_calls: number;
llm_failures: number;
status: 'success' | 'partial' | 'failed' | 'empty';
headline_oneliner: string | null;
topics: BriefingTopic[];
};
const COUNTRY_META: Record<string, { flag: string; label: string }> = {
KR: { flag: '🇰🇷', label: '한국' },
US: { flag: '🇺🇸', label: '미국' },
JP: { flag: '🇯🇵', label: '일본' },
CN: { flag: '🇨🇳', label: '중국' },
HK: { flag: '🇭🇰', label: '홍콩' },
TW: { flag: '🇹🇼', label: '대만' },
DE: { flag: '🇩🇪', label: '독일' },
FR: { flag: '🇫🇷', label: '프랑스' },
GB: { flag: '🇬🇧', label: '영국' },
UK: { flag: '🇬🇧', label: '영국' },
IN: { flag: '🇮🇳', label: '인도' },
RU: { flag: '🇷🇺', label: '러시아' },
IR: { flag: '🇮🇷', label: '이란' },
IL: { flag: '🇮🇱', label: '이스라엘' },
PH: { flag: '🇵🇭', label: '필리핀' },
AU: { flag: '🇦🇺', label: '호주' },
NL: { flag: '🇳🇱', label: '네덜란드' },
};
function countryLabel(code: string): string {
const meta = COUNTRY_META[code?.toUpperCase()];
return meta ? `${meta.flag} ${meta.label}` : code;
function renderMd(text) {
return DOMPurify.sanitize(marked(text), {
USE_PROFILES: { html: true }, FORBID_TAGS: ['style', 'script'],
FORBID_ATTR: ['onerror', 'onclick'], ALLOW_UNKNOWN_PROTOCOLS: false,
});
}
let briefing = $state<Briefing | null>(null);
let articles = $state([]);
let total = $state(0);
let loading = $state(true);
let errorMsg = $state<string | null>(null);
let selectedArticle = $state(null);
let filterSource = $state('');
let showUnreadOnly = $state(false);
let showPinnedOnly = $state(false);
let sourceTree = $state({});
let currentPage = $state(1);
let noteEditing = $state(false);
let noteText = $state('');
let contentEditing = $state(false);
let contentText = $state('');
let filterOpen = $state(false);
let expandedSources = $state({});
let previewEl = $state(null);
const PAPER_NAMES = {
'경향신문': '경향신문', '朝日新聞': '朝日新聞', 'NYT': 'NYT',
'Le Monde': 'Le Monde', 'Der Spiegel': 'Der Spiegel', '新华网': '新华网',
};
onMount(async () => {
try {
briefing = await api<Briefing>('/briefing/latest');
} catch (e) {
const err = e as ApiError;
errorMsg = err?.status === 404
? '아직 생성된 브리핑이 없습니다. 매일 새벽 05:10 KST 에 자동 생성됩니다.'
: (err?.detail || '브리핑을 불러오지 못했습니다.');
} finally {
loading = false;
}
const srcData = await api('/news/sources');
const tree = {};
srcData.forEach(s => {
let paper = '';
for (const [key] of Object.entries(PAPER_NAMES)) {
if (s.name.startsWith(key)) { paper = key; break; }
}
if (!paper) paper = s.name;
const cat = s.category || '';
if (!tree[paper]) tree[paper] = [];
if (cat && !tree[paper].includes(cat)) tree[paper].push(cat);
});
sourceTree = tree;
} catch (e) {}
loadArticles();
});
const fallbackPct = $derived(
briefing && briefing.llm_calls > 0
? Math.round((briefing.llm_failures / briefing.llm_calls) * 100)
: 0
);
async function loadArticles() {
loading = true;
try {
const params = new URLSearchParams();
params.set('page', String(currentPage));
params.set('page_size', '30');
if (filterSource) params.set('source', filterSource);
if (showUnreadOnly) params.set('unread_only', 'true');
if (showPinnedOnly) params.set('pinned_only', 'true');
const data = await api(`/news/articles?${params}`);
articles = data.items;
total = data.total;
// 필터 변경 후 selectedArticle 재동기화
if (selectedArticle) {
const match = articles.find(a => a.id === selectedArticle.id);
selectedArticle = match || null;
}
} catch (err) {
addToast('error', '뉴스 로딩 실패');
} finally { loading = false; }
}
function selectArticle(article) {
selectedArticle = article;
noteEditing = false;
contentEditing = false;
if (!article.is_read) markRead(article);
// 미리보기 스크롤 초기화
requestAnimationFrame(() => {
previewEl?.scrollTo(0, 0);
});
}
function closeArticle() {
selectedArticle = null;
noteEditing = false;
contentEditing = false;
}
async function markRead(article) {
try {
await api(`/documents/${article.id}`, { method: 'PATCH', body: JSON.stringify({ is_read: true }) });
article.is_read = true;
articles = [...articles];
} catch (e) {}
}
async function markAllRead() {
try {
const result = await api('/news/mark-all-read', { method: 'POST' });
addToast('success', `${result.marked}건 읽음 처리`);
articles = articles.map(a => ({ ...a, is_read: true }));
} catch (e) { addToast('error', '실패'); }
}
async function togglePin(article) {
if (!article) return;
try {
const newPinned = !article.pinned;
await api(`/documents/${article.id}`, {
method: 'PATCH',
body: JSON.stringify({ pinned: newPinned })
});
articles = articles.map(a =>
a.id === article.id ? { ...a, pinned: newPinned } : a
);
if (selectedArticle?.id === article.id) {
selectedArticle = { ...selectedArticle, pinned: newPinned };
}
} catch (e) {
addToast('error', '책갈피 변경 실패');
}
}
async function saveContent() {
try {
const newText = selectedArticle.extracted_text + '\n\n---\n\n' + contentText;
await api(`/documents/${selectedArticle.id}/content`, { method: 'PUT', body: JSON.stringify({ content: newText }) });
selectedArticle.extracted_text = newText;
contentEditing = false;
addToast('success', '본문 저장됨');
} catch (e) { addToast('error', '저장 실패'); }
}
async function saveNote() {
try {
await api(`/documents/${selectedArticle.id}`, { method: 'PATCH', body: JSON.stringify({ user_note: noteText }) });
selectedArticle.user_note = noteText;
noteEditing = false;
addToast('success', '메모 저장됨');
} catch (e) { addToast('error', '저장 실패'); }
}
function timeAgo(dateStr) {
const diff = Date.now() - new Date(dateStr).getTime();
const mins = Math.floor(diff / 60000);
if (mins < 60) return `${mins} `;
const hours = Math.floor(mins / 60);
if (hours < 24) return `${hours}시간 `;
return `${Math.floor(hours / 24)}일 전`;
}
function applyFilter(src) {
filterSource = src;
filterOpen = false;
}
let prevFilter = '';
$effect(() => {
const key = `${filterSource}|${showUnreadOnly}|${showPinnedOnly}`;
if (key !== prevFilter) { prevFilter = key; currentPage = 1; loadArticles(); }
});
let unreadCount = $derived(articles.filter(a => !a.is_read).length);
</script>
<div class="mx-auto max-w-3xl px-4 py-6 space-y-4">
<header class="space-y-1">
<h1 class="text-xl font-semibold">야간 뉴스 브리핑</h1>
<p class="text-sm text-dim">
{#if briefing}
{briefing.briefing_date} 새벽 수집 · 총 {briefing.total_articles}건 / {briefing.total_countries}개국 / {briefing.total_topics}개 토픽
<div class="flex h-full overflow-hidden">
<!-- 데스크톱 사이드바 필터 -->
<div class="hidden lg:block w-48 shrink-0 border-r border-[var(--border)] bg-[var(--sidebar-bg)] p-3 overflow-y-auto">
<h2 class="text-xs font-semibold text-[var(--text-dim)] uppercase mb-3">필터</h2>
<button onclick={() => applyFilter('')} class="w-full text-left px-2 py-1.5 rounded text-sm mb-1 {filterSource === '' ? 'bg-[var(--accent)]/15 text-[var(--accent)]' : 'text-[var(--text-dim)] hover:bg-[var(--surface)]'}">📰 전체</button>
{#each Object.entries(sourceTree) as [paper, categories]}
<div class="mb-0.5">
<button onclick={() => { applyFilter(paper); expandedSources[paper] = !expandedSources[paper]; }}
class="w-full text-left px-2 py-1.5 rounded text-sm flex items-center justify-between {filterSource === paper ? 'bg-[var(--accent)]/15 text-[var(--accent)]' : 'text-[var(--text-dim)] hover:bg-[var(--surface)]'}">
<span>{paper}</span>
<span class="text-[10px]">{expandedSources[paper] ? '▼' : '▶'}</span>
</button>
{#if expandedSources[paper] && categories.length > 0}
{#each categories as cat}
<button onclick={() => applyFilter(`${paper}/${cat}`)}
class="w-full text-left pl-6 pr-2 py-1 rounded text-xs {filterSource === `${paper}/${cat}` ? 'bg-[var(--accent)]/15 text-[var(--accent)]' : 'text-[var(--text-dim)] hover:bg-[var(--surface)]'}">{cat}</button>
{/each}
{/if}
</div>
{/each}
<hr class="my-3 border-[var(--border)]">
<label class="flex items-center gap-2 px-2 text-xs text-[var(--text-dim)]">
<input type="checkbox" bind:checked={showUnreadOnly} class="rounded">
읽지 않음만
</label>
<label class="flex items-center gap-2 px-2 mt-2 text-xs text-[var(--text-dim)]">
<input type="checkbox" bind:checked={showPinnedOnly} class="rounded">
책갈피만
</label>
</div>
<!-- 메인 -->
<div class="flex-1 flex flex-col min-h-0">
<!-- 상단 바 -->
<div class="flex items-center justify-between px-3 py-2 border-b border-[var(--border)] shrink-0">
<div class="flex items-center gap-2">
<!-- 모바일 필터 드롭다운 -->
<button onclick={() => filterOpen = !filterOpen}
class="lg:hidden flex items-center gap-1 px-2 py-1 text-xs rounded border border-[var(--border)] text-[var(--text-dim)]">
📰 {filterSource || '전체'} <span class="text-[10px]">{filterOpen ? '▲' : '▼'}</span>
</button>
<!-- 모바일 안읽음 -->
<label class="lg:hidden flex items-center gap-1 text-xs text-[var(--text-dim)]">
<input type="checkbox" bind:checked={showUnreadOnly} class="rounded"> 안읽음
</label>
<!-- 모바일 책갈피 -->
<label class="lg:hidden flex items-center gap-1 text-xs text-[var(--text-dim)]">
<input type="checkbox" bind:checked={showPinnedOnly} class="rounded"> 책갈피
</label>
<span class="text-xs text-[var(--text-dim)]">{total}</span>
{#if unreadCount > 0}
<span class="text-[10px] text-[var(--accent)]">{unreadCount} 안읽음</span>
{/if}
</div>
<button onclick={markAllRead} class="text-xs text-[var(--text-dim)] hover:text-[var(--accent)] px-2 py-1 rounded border border-[var(--border)]">전체 읽음</button>
</div>
<!-- 모바일 필터 드롭다운 내용 -->
{#if filterOpen}
<div class="lg:hidden border-b border-[var(--border)] bg-[var(--sidebar-bg)] px-3 py-2 max-h-60 overflow-y-auto">
<button onclick={() => applyFilter('')} class="block w-full text-left px-2 py-2 rounded text-sm {filterSource === '' ? 'text-[var(--accent)]' : 'text-[var(--text-dim)]'}">📰 전체</button>
{#each Object.entries(sourceTree) as [paper, categories]}
<button onclick={() => applyFilter(paper)} class="block w-full text-left px-2 py-2 rounded text-sm {filterSource === paper ? 'text-[var(--accent)]' : 'text-[var(--text-dim)]'}">{paper}</button>
{#each categories as cat}
<button onclick={() => applyFilter(`${paper}/${cat}`)} class="block w-full text-left pl-6 pr-2 py-1.5 rounded text-xs {filterSource === `${paper}/${cat}` ? 'text-[var(--accent)]' : 'text-[var(--text-dim)]'}">{cat}</button>
{/each}
{/each}
</div>
{/if}
<!-- 기사 리스트 -->
<div class="news-list overflow-y-auto {selectedArticle
? 'shrink-0 h-[35%] min-h-[180px] max-h-[260px] border-b border-[var(--border)] lg:h-[40%] lg:min-h-0 lg:max-h-none'
: 'flex-1'}">
{#if loading}
<div class="p-4 space-y-2">
{#each Array(5) as _}
<div class="h-16 bg-[var(--surface)] rounded animate-pulse"></div>
{/each}
</div>
{:else if articles.length === 0}
<div class="text-center py-16 text-[var(--text-dim)]"><p class="text-sm">뉴스가 없습니다</p></div>
{:else}
매일 KST 자정~05:00 누적 뉴스를 주제별로 다국 비교 분석합니다.
{/if}
</p>
</header>
{#if loading}
<Card>
<p class="text-sm text-dim">불러오는 중…</p>
</Card>
{:else if errorMsg}
<Card>
<p class="text-sm">{errorMsg}</p>
</Card>
{:else if briefing}
{#if briefing.status === 'empty'}
<Card>
<p class="text-sm">
오늘 새벽({briefing.briefing_date}) 다국 비교 가능한 토픽이 없습니다.
</p>
<p class="mt-2 text-xs text-dim">
(수집 뉴스 0건 또는 2개국 이상 다룬 주제 없음)
</p>
</Card>
{:else}
{#if briefing.status === 'failed'}
<div class="border border-error/40 bg-error/10 text-sm rounded-md px-4 py-3">
⚠ LLM 분석 실패율이 높습니다 ({briefing.llm_failures}/{briefing.llm_calls}, {fallbackPct}%). 일부 토픽이 원문 묶음으로 표시됩니다.
</div>
{:else if briefing.status === 'partial'}
<div class="border border-warning/40 bg-warning/10 text-sm rounded-md px-4 py-3">
일부 토픽 LLM 실패 ({briefing.llm_failures}/{briefing.llm_calls}). 다른 토픽은 정상 분석되었습니다.
</div>
{/if}
{#each briefing.topics as topic (topic.topic_rank)}
<Card>
<div class="space-y-3">
{#each articles as article}
<!-- svelte-ignore a11y_no_static_element_interactions -->
<div onclick={() => selectArticle(article)} role="button" tabindex="0"
onkeydown={(e) => { if (e.key === 'Enter' || e.key === ' ') { e.preventDefault(); selectArticle(article); } }}
class="w-full text-left px-3 py-2 lg:px-4 lg:py-3 border-b border-[var(--border)]/30 hover:bg-[var(--surface)] active:bg-[var(--surface)] transition-colors cursor-pointer
{selectedArticle?.id === article.id ? 'bg-[var(--accent)]/5 border-l-2 border-l-[var(--accent)]' : ''}">
<div class="flex items-start gap-2">
<span class="text-xs text-faint shrink-0 pt-1">#{topic.topic_rank}</span>
<div class="flex-1 min-w-0">
<h2 class="text-base font-semibold leading-snug">
{topic.topic_label}
{#if topic.llm_fallback_used}
<span class="ml-1 text-xs text-dim">(원문 묶음)</span>
{/if}
</h2>
<p class="text-sm text-dim mt-1">{topic.headline}</p>
<p class="text-xs text-faint mt-1">
{topic.country_count}개국 · {topic.article_count}
</p>
<span class="mt-1.5 text-[10px] shrink-0 {article.is_read ? 'text-[var(--text-dim)]' : 'text-[var(--accent)]'}">{article.is_read ? '○' : '●'}</span>
<div class="min-w-0 flex-1">
<p class="text-[13px] lg:text-sm leading-snug {article.is_read ? 'text-[var(--text-dim)]' : 'font-medium'}">{article.title}</p>
<p class="text-xs text-[var(--text-dim)] mt-1 line-clamp-1 lg:line-clamp-2">{(article.ai_summary?.replace(/[*#_`~]/g, '') || article.extracted_text?.split('\n').filter(l => l.trim() && l !== article.title)[0] || '').slice(0, 120)}</p>
<div class="flex items-center gap-2 mt-1 text-[10px] text-[var(--text-dim)]">
<span>{article.ai_sub_group || ''}</span>
{#if article.ai_tags?.length}<span>{article.ai_tags[0]?.split('/').pop()}</span>{/if}
<span>{timeAgo(article.created_at)}</span>
</div>
</div>
<button
onclick={(e) => { e.stopPropagation(); togglePin(article); }}
class="shrink-0 p-2 -m-1 rounded text-[var(--text-dim)] hover:text-[var(--accent)] transition-colors"
aria-label={article.pinned ? '책갈피 해제' : '책갈피 추가'}
>
{#if article.pinned}
<BookmarkCheck size={14} class="text-[var(--accent)]" />
{:else}
<Bookmark size={14} />
{/if}
</button>
</div>
</div>
{/each}
{/if}
</div>
{#if topic.country_perspectives.length > 0}
<div class="space-y-1.5">
{#each topic.country_perspectives as cp}
<div class="text-sm leading-relaxed">
<span class="font-medium">{countryLabel(cp.country)}</span>
<span class="text-dim mx-1">·</span>
<span>{cp.summary}</span>
{#if cp.article_ids.length > 0}
<span class="ml-1 text-xs text-faint">
{#each cp.article_ids as id, i}
{#if i > 0}<span class="mx-0.5">·</span>{/if}<a
href={`/documents/${id}`}
class="hover:text-accent"
>#{id}</a>
{/each}
</span>
{/if}
</div>
{/each}
</div>
{/if}
<!-- 페이지네이션 -->
{#if total > 30 && !selectedArticle}
<div class="flex justify-center gap-1 py-2 border-t border-[var(--border)] shrink-0">
{#each Array(Math.min(Math.ceil(total / 30), 10)) as _, i}
<button onclick={() => { currentPage = i + 1; loadArticles(); }}
class="px-2.5 py-1 rounded text-xs {currentPage === i + 1 ? 'bg-[var(--accent)] text-white' : 'text-[var(--text-dim)]'}">{i + 1}</button>
{/each}
</div>
{/if}
{#if topic.divergences.length > 0}
<div class="text-xs">
<span class="text-dim">차이 </span>
<span class="text-text">{topic.divergences.join(' · ')}</span>
</div>
{/if}
{#if topic.convergences.length > 0}
<div class="text-xs">
<span class="text-dim">공통 </span>
<span class="text-text">{topic.convergences.join(' · ')}</span>
</div>
{/if}
{#if topic.key_quotes.length > 0}
<ul class="text-xs space-y-1 border-l-2 border-default pl-3">
{#each topic.key_quotes as q}
<li>
<span class="text-dim">{countryLabel(q.country)} · {q.source}</span>
<span class="text-text">"{q.quote}"</span>
</li>
{/each}
</ul>
{/if}
{#if topic.historical_context}
<p class="text-xs text-faint italic">
↩ 지난 흐름 · {topic.historical_context}
</p>
<!-- 미리보기 패널 (모바일+데스크톱 통합) -->
{#if selectedArticle}
<div bind:this={previewEl} class="news-preview flex-1 overflow-y-auto bg-[var(--surface)] min-h-0">
<div class="flex items-center justify-between px-3 py-1.5 lg:px-5 lg:py-2 border-b border-[var(--border)] sticky top-0 bg-[var(--surface)] z-10">
<div class="flex items-center gap-2 text-xs text-[var(--text-dim)]">
<span>{selectedArticle.ai_sub_group}</span><span>·</span><span>{timeAgo(selectedArticle.created_at)}</span>
</div>
<div class="flex items-center gap-2">
<button
onclick={() => togglePin(selectedArticle)}
class="text-xs flex items-center gap-1 {selectedArticle.pinned ? 'text-[var(--accent)]' : 'text-[var(--text-dim)] hover:text-[var(--accent)]'}"
aria-label={selectedArticle.pinned ? '책갈피 해제' : '책갈피 추가'}
>
{#if selectedArticle.pinned}
<BookmarkCheck size={14} /> 책갈피
{:else}
<Bookmark size={14} /> 책갈피
{/if}
</button>
{#if selectedArticle.edit_url}<a href={selectedArticle.edit_url} target="_blank" rel="noopener noreferrer" class="text-xs text-[var(--accent)]">원문 →</a>{/if}
<button onclick={closeArticle} class="text-xs text-[var(--text-dim)] hover:text-[var(--text)]">닫기</button>
</div>
</div>
<div class="p-3 lg:p-5">
<h2 class="text-base lg:text-lg font-bold mb-2 lg:mb-3">{selectedArticle.title}</h2>
{#if selectedArticle.ai_summary}
<div class="mb-4 p-3 bg-[var(--accent)]/5 border border-[var(--accent)]/20 rounded-lg">
<h4 class="text-[10px] font-semibold text-[var(--accent)] uppercase mb-1">AI 요약</h4>
<div class="text-sm markdown-body">{@html renderMd(selectedArticle.ai_summary)}</div>
</div>
{/if}
<div class="markdown-body mb-4">{@html renderMd(selectedArticle.extracted_text || '')}</div>
<div class="border-t border-[var(--border)] pt-4 mt-4">
<h4 class="text-xs font-semibold text-[var(--text-dim)] mb-2">본문 입력</h4>
{#if contentEditing}
<textarea bind:value={contentText} class="w-full h-32 px-3 py-2 bg-[var(--bg)] border border-[var(--border)] rounded-lg text-sm text-[var(--text)] resize-y outline-none" placeholder="기사 전문을 붙여넣으세요..."></textarea>
<div class="flex gap-2 mt-2"><button onclick={saveContent} class="px-3 py-1 text-xs bg-[var(--accent)] text-white rounded">저장</button><button onclick={() => contentEditing = false} class="px-3 py-1 text-xs text-[var(--text-dim)]">취소</button></div>
{:else}<button onclick={() => { contentText = ''; contentEditing = true; }} class="text-xs text-[var(--text-dim)] hover:text-[var(--accent)]">+ 본문 입력</button>{/if}
</div>
<div class="border-t border-[var(--border)] pt-4 mt-4">
<h4 class="text-xs font-semibold text-[var(--text-dim)] mb-2">메모</h4>
{#if noteEditing}
<textarea bind:value={noteText} class="w-full h-20 px-3 py-2 bg-[var(--bg)] border border-[var(--border)] rounded-lg text-sm text-[var(--text)] resize-y outline-none" placeholder="메모..."></textarea>
<div class="flex gap-2 mt-2"><button onclick={saveNote} class="px-3 py-1 text-xs bg-[var(--accent)] text-white rounded">저장</button><button onclick={() => noteEditing = false} class="px-3 py-1 text-xs text-[var(--text-dim)]">취소</button></div>
{:else}
<button onclick={() => { noteText = selectedArticle.user_note || ''; noteEditing = true; }} class="text-xs text-[var(--text-dim)] hover:text-[var(--accent)]">+ 메모 추가</button>
{#if selectedArticle.user_note}<div class="mt-2 p-3 bg-[var(--bg)] rounded-lg text-sm">{selectedArticle.user_note}</div>{/if}
{/if}
</div>
</Card>
{/each}
</div>
</div>
{/if}
{/if}
</div>
</div>
-38
View File
@@ -1,38 +0,0 @@
-- 236_document_images.sql (1/3) — Markdown Canonical Phase 1B.5 ImgAuth
--
-- marker-service 가 PDF 변환 시 추출한 이미지를 NAS 에 영구 저장하고 메타를 DB 에 기록.
--
-- 저장 위치: NAS `/documents/extracted_images/{document_id}/{image_key}.{ext}`
-- (file_watcher 가 보는 PKM 경로와 분리 — 자동 인덱싱 안 됨).
--
-- 표시: GET /api/documents/{doc_id}/images/{image_key}/raw (인증 필요)
--
-- md_content 의 ref 형식: `![alt](docimg:img_001)` — image_key 가 sequence 기반 결정적
-- 이라 재변환 시 idempotent.
--
-- soft delete 미사용. 재변환 시 UPSERT + 고아 row DELETE + NAS 파일 unlink.
-- ON DELETE CASCADE — 문서 삭제 시 이미지 row 도 정리 (NAS 파일 unlink 는 worker 책임).
--
-- 참고 패턴: study_question_images (PR-8, migration 198). 동일 NFS trail.
--
-- single statement per file (asyncpg exec_driver_sql 제약, feedback_migration_runner_single_statement.md).
CREATE TABLE IF NOT EXISTS document_images (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
image_key VARCHAR(32) NOT NULL,
relative_path TEXT NOT NULL,
file_path TEXT NOT NULL,
mime_type TEXT NOT NULL,
file_size BIGINT NOT NULL,
content_hash VARCHAR(64) NOT NULL,
width INTEGER,
height INTEGER,
page_index INTEGER,
alt_text TEXT,
source_slug TEXT,
extraction_engine VARCHAR(32) NOT NULL DEFAULT 'marker',
extraction_engine_version VARCHAR(32),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (document_id, image_key)
);
@@ -1,7 +0,0 @@
-- 237_document_images_doc_idx.sql (2/3) — Markdown Canonical Phase 1B.5
--
-- document_id 단독 인덱스. UNIQUE(document_id, image_key) 가 prefix 매칭으로 같은 효과를
-- 내지만 explicit 단독 인덱스를 둬서 향후 image_key 변경 시에도 안정적으로 유지.
CREATE INDEX IF NOT EXISTS idx_document_images_document_id
ON document_images (document_id);
@@ -1,7 +0,0 @@
-- 238_document_images_hash_idx.sql (3/3) — Markdown Canonical Phase 1B.5
--
-- content_hash 인덱스 — cross-document 이미지 dedup 후속 PR 대비. 본 1B.5 에서는
-- 미사용 (활성 dedup 은 후속). 인덱스만 미리 깔아둠.
CREATE INDEX IF NOT EXISTS idx_document_images_content_hash
ON document_images (content_hash);
-6
View File
@@ -1,6 +0,0 @@
-- events 도메인 PR-1 (1/11) — event_kind enum
-- task: 해야 할 일 (due_at/start_at/end_at 자유 조합 허용)
-- calendar_event: 시간 블록이 있는 일정 (start_at 필수)
-- activity_log: 이미 한 행동 기록 (started_at 또는 ended_at 필수)
CREATE TYPE event_kind AS ENUM ('task', 'calendar_event', 'activity_log');
-18
View File
@@ -1,18 +0,0 @@
-- events 도메인 PR-1 (2/11) — event_status enum
-- inbox: 아직 정리 안 됨
-- next: 다음 행동으로 선정됨 (시간 미정)
-- scheduled: 시간/날짜가 잡힘
-- in_progress: 진행 중
-- done: 완료
-- cancelled: 취소
-- deferred: 특정 시점 전까지 숨김 (defer_until 사용)
CREATE TYPE event_status AS ENUM (
'inbox',
'next',
'scheduled',
'in_progress',
'done',
'cancelled',
'deferred'
);
-12
View File
@@ -1,12 +0,0 @@
-- events 도메인 PR-1 (3/11) — event_source enum
-- 데이터가 어디서 왔는가 (created_by 와는 별 축)
CREATE TYPE event_source AS ENUM (
'manual',
'memo',
'email',
'chat',
'webhook',
'git_commit',
'claude_code'
);
-10
View File
@@ -1,10 +0,0 @@
-- events 도메인 PR-1 (4/11) — event_actor enum
-- 어떤 actor/process 가 row 를 만들었는가 (created_by + events_history.changed_by 공용)
-- source 와 분리: source=email + created_by=email_ingest 같은 직교 축
CREATE TYPE event_actor AS ENUM (
'manual',
'eid',
'email_ingest',
'system'
);
@@ -1,17 +0,0 @@
-- events 도메인 PR-1 (5/11) — history_change_kind enum
-- events_history 의 lifecycle 변경 유형
-- create: 신규 생성
-- reschedule: 시간 필드 변경 (due_at/start_at/end_at/started_at/ended_at/timezone/all_day)
-- defer: /defer endpoint 호출 (defer_until 설정)
-- reactivate: 완료/취소/연기 해제
-- complete: /complete endpoint 호출
-- cancel: /cancel endpoint 호출
CREATE TYPE history_change_kind AS ENUM (
'create',
'reschedule',
'defer',
'reactivate',
'complete',
'cancel'
);
-38
View File
@@ -1,38 +0,0 @@
-- events 도메인 PR-1 (6/11) — events 1차 컨테이너 테이블
-- 개인 운영 로그 / 일정 / 할 일 / 회고용 activity database 의 본체.
-- 기존 documents/tasks/ask_events 와 직교한 새 도메인.
CREATE TABLE IF NOT EXISTS events (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
description TEXT, -- markdown
kind event_kind NOT NULL,
status event_status NOT NULL DEFAULT 'inbox',
due_at TIMESTAMPTZ, -- task 위주
start_at TIMESTAMPTZ, -- calendar_event 위주
end_at TIMESTAMPTZ, -- calendar_event 위주
started_at TIMESTAMPTZ, -- 실제 수행 시각 (activity_log 위주)
ended_at TIMESTAMPTZ, -- 실제 수행 종료
all_day BOOLEAN NOT NULL DEFAULT false,
timezone TEXT,
defer_until TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
priority SMALLINT CHECK (priority BETWEEN 1 AND 4),
project_tag VARCHAR(64),
tags JSONB NOT NULL DEFAULT '[]'::jsonb,
source event_source NOT NULL DEFAULT 'manual',
source_ref TEXT, -- Message-ID 등 (TEXT, 충분한 길이)
raw_metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
memo_document_id BIGINT REFERENCES documents(id) ON DELETE SET NULL,
user_id BIGINT NOT NULL REFERENCES users(id),
created_by event_actor NOT NULL DEFAULT 'manual',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- CHECK 제약 (라운드 10: task 제약 제거 + activity_log 완화 — 미래 확장 막지 않게)
CONSTRAINT events_calendar_event_requires_start
CHECK (kind <> 'calendar_event' OR start_at IS NOT NULL),
CONSTRAINT events_activity_log_requires_time
CHECK (kind <> 'activity_log' OR started_at IS NOT NULL OR ended_at IS NOT NULL)
);
-7
View File
@@ -1,7 +0,0 @@
-- events 도메인 PR-1 (7/11) — partial unique on (source, source_ref)
-- 외부 source dedup (이메일 Message-ID, git commit hash, webhook event_id 등).
-- PR-1 에서는 schema 만 박고 dedup 동작 검증은 PR-4 MailPlus ingest 시.
CREATE UNIQUE INDEX IF NOT EXISTS events_source_ref_uq
ON events (source, source_ref)
WHERE source_ref IS NOT NULL;
-7
View File
@@ -1,7 +0,0 @@
-- events 도메인 PR-1 (8/11) — active events partial index
-- Today/Upcoming/Inbox view 의 핵심 인덱스.
-- done/cancelled 는 활성 list 에서 빠지므로 partial 로 사이즈 절감.
CREATE INDEX IF NOT EXISTS idx_events_active
ON events (user_id, due_at, start_at)
WHERE status IN ('inbox', 'next', 'scheduled', 'deferred');
-7
View File
@@ -1,7 +0,0 @@
-- events 도메인 PR-1 (9/11) — activity_log timeline 전용 partial index
-- Activity 탭 (한 일 / 회고) view 핵심.
-- events_history 와 이름 분리 (라운드 11 — idx_events_history_user 가 아닌 events 본 테이블 index).
CREATE INDEX IF NOT EXISTS idx_events_activity_user_started
ON events (user_id, started_at DESC)
WHERE kind = 'activity_log';
-15
View File
@@ -1,15 +0,0 @@
-- events 도메인 PR-1 (10/11) — events_history 변경 이력 테이블
-- lifecycle op 자동 기록 (create / reschedule / defer / reactivate / complete / cancel).
-- 일반 PATCH (title/description 등) 의 history 는 v1 X (폭증 회피).
-- ON DELETE RESTRICT: 이력은 시점 사실 → 부모 events row 직접 삭제 차단
-- (feedback_history_table_fk_restrict.md). events 자체는 /cancel 로 soft-cancel.
CREATE TABLE IF NOT EXISTS events_history (
id BIGSERIAL PRIMARY KEY,
event_id BIGINT NOT NULL REFERENCES events(id) ON DELETE RESTRICT,
changed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
changed_by event_actor NOT NULL,
change_kind history_change_kind NOT NULL,
before JSONB, -- create 시 NULL
after JSONB NOT NULL
);
-5
View File
@@ -1,5 +0,0 @@
-- events 도메인 PR-1 (11/11) — events_history (event_id, changed_at) index
-- 일정 상세 페이지의 history timeline 조회 핵심.
CREATE INDEX IF NOT EXISTS idx_events_history_event
ON events_history (event_id, changed_at);
-11
View File
@@ -1,11 +0,0 @@
-- Memo Intake Upgrade PR-2B (1/4) — event_kind_hint enum
-- 메모 본문에서 AI(Gemma 4B triage) 가 추론한 메모 의도 분류.
-- events.kind (event_kind) 와 별 type 으로 분리 — 의도 hint 만, 실제 events row 생성은 사용자 promote 시점.
CREATE TYPE event_kind_hint AS ENUM (
'note',
'task',
'calendar_event',
'activity_log',
'reference'
);
@@ -1,6 +0,0 @@
-- Memo Intake Upgrade PR-2B (2/4) — documents.ai_event_kind 컬럼 추가
-- AI 추천값. 사용자 1-click 승급 또는 dismiss 기준이 됨.
-- nullable: classify worker 미통과 (extracted_text NULL 등) 행은 비어 있음.
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS ai_event_kind event_kind_hint;
@@ -1,5 +0,0 @@
-- Memo Intake Upgrade PR-2B (3/4) — documents.ai_event_confidence 컬럼 추가
-- 4B triage 의 ai_event_kind 신뢰도 (0.001.00). full-auto promote 결정 임계값에 활용.
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS ai_event_confidence NUMERIC(3, 2) CHECK (ai_event_confidence IS NULL OR (ai_event_confidence >= 0 AND ai_event_confidence <= 1));
@@ -1,7 +0,0 @@
-- Memo Intake Upgrade PR-2B (4/4) — partial index on ai_event_kind
-- 메모 list 의 분류 결과 필터 + 사용자 inbox triage 흐름 핵심 인덱스.
-- ai_event_kind IS NULL (분류 대기 / 미통과) 행은 size 절감 목적 partial 로 제외.
CREATE INDEX IF NOT EXISTS idx_documents_ai_event_kind
ON documents (ai_event_kind, created_at DESC)
WHERE ai_event_kind IS NOT NULL;
-5
View File
@@ -1,5 +0,0 @@
-- Memo Intake Upgrade PR-2C — source_channel enum 에 'voice' 추가
-- 음성 메모 진입점 (애플워치 / 기타 mic) 의 source 식별.
-- file_type='audio' + category='audio' + source_channel='voice' 조합으로 메모 UI 에 노출.
ALTER TYPE source_channel ADD VALUE IF NOT EXISTS 'voice';
@@ -1,19 +0,0 @@
-- Morning Briefing 부모 테이블 (일일 1행, briefing_date UNIQUE).
-- asyncpg prepared statement 가 multi-statement 불허라 인덱스 + 자식 테이블은 별 migration.
CREATE TABLE morning_briefings (
id BIGSERIAL PRIMARY KEY,
briefing_date DATE NOT NULL,
window_start TIMESTAMPTZ NOT NULL,
window_end TIMESTAMPTZ NOT NULL,
decay_lambda DOUBLE PRECISION NOT NULL,
total_articles INTEGER NOT NULL DEFAULT 0,
total_countries INTEGER NOT NULL DEFAULT 0,
total_topics INTEGER NOT NULL DEFAULT 0,
generation_ms INTEGER,
llm_calls INTEGER NOT NULL DEFAULT 0,
llm_failures INTEGER NOT NULL DEFAULT 0,
status VARCHAR(20) NOT NULL DEFAULT 'success',
headline_oneliner TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (briefing_date)
)
-1
View File
@@ -1 +0,0 @@
CREATE INDEX idx_morning_briefings_date ON morning_briefings (briefing_date DESC)
-26
View File
@@ -1,26 +0,0 @@
-- Morning Briefing 자식 (topic_rank 순, UNIQUE briefing_id + topic_rank).
-- country_perspectives/divergences/convergences/key_quotes JSONB (cap 은 application).
-- historical_* 3 컬럼은 BRIEFING_HISTORICAL_ENABLED on 시만 채움 (nullable).
CREATE TABLE briefing_topics (
id BIGSERIAL PRIMARY KEY,
briefing_id BIGINT NOT NULL REFERENCES morning_briefings(id) ON DELETE CASCADE,
topic_rank INTEGER NOT NULL,
topic_label VARCHAR(120) NOT NULL,
headline TEXT NOT NULL,
country_perspectives JSONB NOT NULL DEFAULT '[]',
divergences JSONB NOT NULL DEFAULT '[]',
convergences JSONB NOT NULL DEFAULT '[]',
key_quotes JSONB NOT NULL DEFAULT '[]',
historical_article_ids JSONB,
historical_context TEXT,
historical_window_days INTEGER,
cluster_members JSONB NOT NULL DEFAULT '[]',
article_count INTEGER NOT NULL,
country_count INTEGER NOT NULL,
importance_score DOUBLE PRECISION NOT NULL,
raw_weight_sum DOUBLE PRECISION NOT NULL,
llm_model VARCHAR(100),
llm_fallback_used BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (briefing_id, topic_rank)
)
-1
View File
@@ -1 +0,0 @@
CREATE INDEX idx_briefing_topics_briefing_rank ON briefing_topics (briefing_id, topic_rank)
@@ -1,7 +0,0 @@
-- PR-4 Email Ingest — documents.source_external_id 컬럼 추가
-- 외부 source 의 dedup key. email source 에서는 always non-null (ingest 코드 책임).
-- Message-ID 정규화 또는 imap:{folder}:{uidvalidity}:{uid} fallback.
-- 다른 source_channel 에서는 NULL 허용 (별 의미 부여 시 nullable→unique 검토).
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS source_external_id TEXT;
@@ -1,9 +0,0 @@
-- PR-4 Email Ingest — documents.email_metadata JSONB 컬럼 추가
-- 구조: {from, to[], cc[], subject, folder, uidvalidity, uid, received_at,
-- mailplus_link, attachments: [{filename, mime, size, part_id}],
-- parse_error?: string}
-- mailplus_archive (기존 INBOX root archive 워커) 가 만든 row 는 NULL 유지.
-- inbox_ingest 가 만든 row 만 채움.
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS email_metadata JSONB;
@@ -1,7 +0,0 @@
-- PR-4 Email Ingest — partial unique on (source_external_id) for email source
-- inbox_ingest 의 dedup 진실원장. 같은 메일 재 ingest 시 ON CONFLICT DO NOTHING.
-- mailplus_archive 의 INBOX root archive row 는 source_external_id=NULL 이라 자동 제외.
CREATE UNIQUE INDEX IF NOT EXISTS uq_documents_email_source_external_id
ON documents (source_external_id)
WHERE source_channel = 'email' AND source_external_id IS NOT NULL;
-115
View File
@@ -1,115 +0,0 @@
# Storage Inventory — 2026-05-11
**Storage PR-1 read-only inventory 실측 결과**.
**측정 명령**: GPU 서버 (100.111.160.84) SSH `df` / `mount` / `du` / `docker volume ls` / `docker run -v ... alpine du`. **read-only**, 운영 변경 없음.
**참조**: `docs/storage_layout.md` (영구 문서) + plan `~/.claude/plans/beszel-tingly-sloth.md` v6.
## 1. 디스크 사용 요약
### GPU 서버 (Ubuntu, RTX 4070 Ti Super)
```
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 936G 182G 715G 21% /
192.168.1.227:/volume4/Document_Server 3.5T 401G 3.1T 12% /mnt/nas/Document_Server (NFS)
192.168.1.227:/volume4/Comic 3.5T 401G 3.1T 12% /mnt/comic (NFS, Document Server 외)
192.168.1.227:/volume1/Video (별 volume)
```
### NAS NFS 옵션 (이미 plan v6 권고안 baseline)
```
nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,
soft,proto=tcp,timeo=10,retrans=3,_netdev
```
## 2. NAS volume 안 Document Server 디렉토리 (`du -sh`)
| 경로 | 크기 |
|---|---|
| `/mnt/nas/Document_Server/PKM/` | **38GB** |
| `/mnt/nas/Document_Server/PKM/Knowledge/` | 31GB |
| `/mnt/nas/Document_Server/PKM/References/` | 5.5GB |
| `/mnt/nas/Document_Server/PKM/Inbox/` | 1.5GB |
| `/mnt/nas/Document_Server/PKM/Videos/` | 116KB |
| `/mnt/nas/Document_Server/PKM/Recordings/` | 20KB |
| `/mnt/nas/Document_Server/PKM/Archive/` | 0 |
| `/mnt/nas/Document_Server/PKM/.preview/` | (6 entry, 소량) |
| `/mnt/nas/Document_Server/Main/` | **14GB** (의미 확인 필요) |
| `/mnt/nas/Document_Server/extracted_images/` | 1.2MB |
| `/mnt/nas/Document_Server/study_question_images/` | 12MB |
| `/mnt/nas/Document_Server/프로그래밍 전 참고 자료/` | 56KB |
**합계** ≈ 52GB (NAS volume 401GB used 중 일부)
## 3. GPU 서버 local Docker volume (`docker run -v ... alpine du -sh /data`)
| Volume | 크기 |
|---|---|
| `hyungi_document_server_pgdata` | 1.1GB (postgres container 안 `du -sh /var/lib/postgresql/data`) |
| `hyungi_document_server_ollama_data` | **11.9GB** |
| `hyungi_document_server_stt_models` | 3.0GB |
| `hyungi_document_server_reranker_cache` | 2.1GB |
| `hyungi_document_server_ocr_models` | 1.4GB |
| `hyungi_document_server_marker_models` | 4KB (HF cache 다른 위치로 추정) |
| `hyungi_document_server_caddy_data` | (작음, 미측정) |
**GPU local 합계** ≈ 19.5GB (시스템 rootfs 182GB used 중 일부, 나머지는 OS/Docker image/홈/별 작업)
## 4. 컨테이너 마운트 매트릭스 (`docker-compose.yml` 추출)
```
postgres:
- pgdata:/var/lib/postgresql/data
- ./migrations:/docker-entrypoint-initdb.d (init scripts, 실제 사용 X)
kordoc-service / ocr-service / marker-service / stt-service:
- ${NAS_NFS_PATH}:/documents (ro)
- {worker}_models:/root/.cache OR /models (rw)
fastapi:
- ${NAS_NFS_PATH}:/documents (rw) ← NAS 쓰기 권한 단일 책임
- ./config.yaml ./domain_policy.yaml ./scripts ./logs ./migrations
ai-gateway: ollama_data
reranker: reranker_cache
caddy: ./Caddyfile + caddy_data
```
## 5. 이관 후보 분류
### 확정 NAS (이미 잘 분리됨, 추가 작업 X)
- `PKM/*` (38GB) — 사용자 컨텐츠 원본
- `extracted_images/` — Markdown Phase 1B.5
- `study_question_images/` — 가스기사 PR-8
- `Main/` (14GB) — 의미 확인 필요지만 이미 NAS
### 확정 GPU local (latency 민감 / AI 인접)
- Postgres pgdata (1.1GB)
- AI 모델 cache (ollama / STT / reranker / OCR / marker) ≈ 18.5GB
- 로그 / Caddy data
### 회색 → 검토 후 결정 (Storage PR-2 영역)
- `PKM/.preview/` (LibreOffice PDF 변환 cache) — 현재 NAS, 재생성 가능 → GPU local 로 옮길지 검토. *단 가치 작음, 후순위*.
- `Main/` 14GB 의 실제 내용 확인 — Document Server 가 사용하는지, 별 트랙인지.
### PR-4 도입 시 신규 디렉토리
- `extracted_emails/{event_id}/` — MailPlus forward ingest 첨부 (Storage PR-5 / events PR-4 합류)
## 6. 핵심 인사이트
1. **NAS 의 binary layer 는 이미 잘 구성**. Storage 백본 트랙의 큰 부분이 *이미 진행됨*. 추가 PR-3/4 의 작업량 = 거의 없음 (PR-5 만 events PR-4 와 합류).
2. **NFS mount 옵션 = plan v6 권고안 baseline 과 정확히 같음** (`soft,timeo=10,retrans=3,vers=4.1`). Storage PR-2 는 mount 옵션 변경 아닌 *애플리케이션 layer* (정규화 wrapper / 장애 처리 / uid 매핑 / 시뮬 gate) 에 집중.
3. **Postgres data dir 1.1GB 매우 작음**. DB 본체 이관 자체는 기술적으로 SSD 480GB NAS 에 1000+ 배 여유. 단 plan 결정 = GPU 잔류 (CPU/RAM 우위, 라운드 9). 본 inventory 는 그 결정을 *변경하지 않음*.
4. **GPU rootfs 21% 사용 (715GB 여유)** — AI 모델 캐시 늘어도 1–2년 여유. 단기 storage 압박 없음.
5. **fastapi 만 NAS rw**, 다른 worker 는 ro → 원본 안전 보장. 좋은 분리.
## 7. 후속 작업 (Storage PR-1 외)
- **Storage PR-2** (별 PR): 정규화 wrapper / uid·gid 매핑 / NAS 장애 시뮬 gate (test mount 부터) / fastapi 쓰기 경로 fsync+rename 패턴 점검
- **Storage PR-5** (events PR-4 합류): `extracted_emails/{event_id}/` 표준 + 첨부 저장 패턴
- **Main/ 14GB 의미 확인** (1회성 chore, 본 PR 범위 외): 사용 안 하는 legacy 면 archive 검토
## 8. 본 PR 산출물 (커밋 대상)
- `docs/storage_layout.md` (영구 정책 문서)
- `reports/storage_inventory_2026-05-11.md` (본 파일, 측정 결과 snapshot)
- **코드 / 운영 / mount / fstab / DB / file_path 변경 0건** (read-only inventory PR)
@@ -1,195 +0,0 @@
"""Phase 1B.5 ImgAuth — 기존 marker success 문서 targeted 재변환.
목적:
Phase 1B 배포 (2026-05-01) ~ 1B.5 직전 사이에 marker_worker 만든 success 문서들은
md_content 안에 깨진 ref (`![](_page_0_Picture_3.jpeg)` ) 있고 NAS 파일 부재.
1B.5 deploy 문서들을 force_reprocess=true 다시 큐에 넣어 이미지 persist +
md_content ref 정규화 (`docimg:img_NNN`) 적용한다.
사용자 못박은 절차 (plan: piped-humming-crystal.md Step 6):
1. 1B.5 merge/deploy 전에는 기존 success 문서 절대 건드리지 않음
2. 1B.5 deploy + 신규 업로드 1~2 + sample 5 검증 후에만 28 진행
3. anchor/baseline 보존: pre-snapshot CSV + 재변환 전후 quality metric 비교값
4. Phase 2 cron 분리 ( 단계, 야간 시간 단발 실행)
실행:
# 1) 후보 + pre-snapshot 출력 (CSV stdout)
docker compose exec fastapi python /app/scripts/marker_reprocess_existing_success.py --dry-run
# 2) sample 모드 (지정한 doc_id 만 enqueue)
docker compose exec fastapi python /app/scripts/marker_reprocess_existing_success.py \\
--apply --only 4809,5127,5180,5183
# 3) 전체 28건 enqueue (sample 검증 통과 후)
docker compose exec fastapi python /app/scripts/marker_reprocess_existing_success.py --apply
# 4) snapshot CSV 파일 저장
docker compose exec fastapi python /app/scripts/marker_reprocess_existing_success.py \\
--dry-run --snapshot-csv /app/logs/marker_pre_imgauth_snapshot_2026MMDD.csv
배포 검증:
- 모든 후보가 md_status='success' 유지
- document_images row 분포 (예상: ~20 이미지 보유, ~8 없음)
- md_content `docimg:` ref == document_images row per doc
- anchor doc 4809 quality 비교 (heading_count / table_row_count / text_length_ratio ±5%)
"""
import argparse
import asyncio
import csv
import json
import os
import sys
from io import StringIO
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
CANDIDATES_SQL = """
SELECT id, md_status, md_content_hash, md_extraction_engine, md_extraction_engine_version,
md_extraction_quality, md_generated_at,
file_format, file_path, title
FROM documents
WHERE md_status = 'success'
AND md_extraction_engine = 'marker'
ORDER BY id
"""
def _serialize_row(row) -> dict:
quality = row.md_extraction_quality
return {
"id": row.id,
"md_status": row.md_status,
"md_content_hash": row.md_content_hash,
"md_extraction_engine": row.md_extraction_engine,
"md_extraction_engine_version": row.md_extraction_engine_version,
"md_extraction_quality": json.dumps(quality, ensure_ascii=False) if quality else "",
"md_generated_at": row.md_generated_at.isoformat() if row.md_generated_at else "",
"file_format": row.file_format,
"file_path": row.file_path,
"title": row.title or "",
}
async def run(*, apply: bool, only_ids: set[int] | None, snapshot_csv: str | None) -> int:
database_url = os.getenv(
"DATABASE_URL",
"postgresql+asyncpg://pkm:pkm@localhost:5432/pkm",
)
engine = create_async_engine(database_url)
session_factory = async_sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False
)
try:
async with session_factory() as session:
rows = (await session.execute(text(CANDIDATES_SQL))).all()
if only_ids:
rows = [r for r in rows if r.id in only_ids]
print(f"=== marker success 후보 = {len(rows)}건 ===")
if not rows:
print("후보 없음 — 종료.")
return 0
# pre-snapshot CSV 출력
buf = StringIO()
writer = csv.DictWriter(
buf,
fieldnames=[
"id", "md_status", "md_content_hash", "md_extraction_engine",
"md_extraction_engine_version", "md_extraction_quality",
"md_generated_at", "file_format", "file_path", "title",
],
)
writer.writeheader()
for row in rows:
writer.writerow(_serialize_row(row))
csv_text = buf.getvalue()
if snapshot_csv:
with open(snapshot_csv, "w", encoding="utf-8") as f:
f.write(csv_text)
print(f"[snapshot] {snapshot_csv}{len(rows)}행 기록")
else:
print("\n=== Pre-snapshot CSV ===")
print(csv_text)
if not apply:
print(f"\n[dry-run] {len(rows)}건 영향. --apply 로 실제 enqueue.")
return 0
# enqueue — UNIQUE(document_id, stage) WHERE status IN ('pending', 'processing')
# 가 있으므로 활성 markdown 행이 없는 doc 만 통과. 충돌 시 silent skip.
# CAST(:payload AS jsonb) — `::jsonb` postfix 캐스트는 SQLAlchemy text()
# 의 named-param prefix `:` 와 충돌해 syntax error 발생.
ENQUEUE_SQL = text("""
INSERT INTO processing_queue (document_id, stage, status, payload)
VALUES (:doc_id, 'markdown', 'pending', CAST(:payload AS jsonb))
ON CONFLICT DO NOTHING
""")
payload = json.dumps({
"force_reprocess": True,
"reason": "phase_1b5_imgauth_targeted_reprocess",
})
inserted = 0
for row in rows:
result = await session.execute(
ENQUEUE_SQL, {"doc_id": row.id, "payload": payload}
)
if result.rowcount > 0:
inserted += 1
await session.commit()
print(f"\n[apply] enqueue 완료 — {inserted}/{len(rows)} 건 신규 markdown 큐 추가")
print(" (skip = 이미 활성 markdown 큐 행이 있는 문서)")
return 0
finally:
await engine.dispose()
def _parse_only_ids(arg: str | None) -> set[int] | None:
if not arg:
return None
out: set[int] = set()
for part in arg.split(","):
part = part.strip()
if part:
out.add(int(part))
return out or None
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--apply", action="store_true", help="실제 enqueue (기본 dry-run)")
parser.add_argument("--dry-run", action="store_true", help="명시적 dry-run (default 동등)")
parser.add_argument(
"--only", type=str, default=None,
help="쉼표 구분 doc_id 화이트리스트 (sample 검증용, 예: 4809,5127,5180)",
)
parser.add_argument(
"--snapshot-csv", type=str, default=None,
help="pre-snapshot 을 stdout 대신 이 경로의 CSV 파일로 저장",
)
args = parser.parse_args()
if args.apply and args.dry_run:
parser.error("--apply 와 --dry-run 동시 지정 불가")
only_ids = _parse_only_ids(args.only)
return asyncio.run(run(
apply=args.apply,
only_ids=only_ids,
snapshot_csv=args.snapshot_csv,
))
if __name__ == "__main__":
sys.exit(main())
+6 -87
View File
@@ -1,14 +1,9 @@
"""marker-service — POST /convert: PDF → markdown + 추출 이미지 base64.
"""marker-service — POST /convert: PDF → markdown (텍스트만, 이미지 제외).
Phase 1B (2026-05-01) 텍스트만 응답, 이미지 폐기.
Phase 1B.5 ( 변경) `_images` 직렬화해서 base64 응답에 포함. NAS write 권한이
없는 stateless 변환기 유지 (fastapi NAS persist 담당).
plan: ~/.claude/plans/piped-humming-crystal.md
Phase 1B Round 5 /ready 정확한 status code, warmup 실패 가시화, 변환 실패 = 422.
plan: ~/.claude/plans/plan-idempotent-sundae.md
"""
import base64
import hashlib
import io
import logging
import os
import threading
@@ -16,7 +11,7 @@ import time
from pathlib import Path
from fastapi import FastAPI, HTTPException, Response
from pydantic import BaseModel, Field
from pydantic import BaseModel
from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict
@@ -40,12 +35,6 @@ _warmup_done = False
_warmup_error: str | None = None
_warmup_lock = threading.Lock()
# 이미지 응답 cap. base64 응답 크기 폭주 방지. 사용자 PDF 풀 측정 (Phase 1D) 시
# 가장 이미지 많은 문서가 ~30건 수준 → 200 은 안전 마진. 초과 시 truncate flag 응답.
MAX_IMAGES_PER_DOC = int(os.getenv("MARKER_MAX_IMAGES_PER_DOC", "200"))
# per-image 최대 raw bytes (base64 전). 그래픽이 많은 풀페이지 스캔 회피.
MAX_BYTES_PER_IMAGE = int(os.getenv("MARKER_MAX_BYTES_PER_IMAGE", str(10 * 1024 * 1024)))
def _ensure_warmup() -> None:
"""첫 /convert 또는 startup hook 시 모델 로드. HF cache volume 활용."""
@@ -80,15 +69,6 @@ class ConvertRequest(BaseModel):
max_pages: int | None = None
class ConvertImage(BaseModel):
"""marker 추출 이미지 1건. fastapi 가 NAS 에 쓰고 docimg:img_NNN 으로 ref 정규화."""
slug: str # marker 원본 slug (예: '_page_0_Picture_3.jpeg')
format: str # 'png' | 'jpeg' | 'webp' | 'gif'
width: int | None = None
height: int | None = None
bytes_b64: str # base64-encoded raw bytes
class ConvertResponse(BaseModel):
md_content: str
md_content_hash: str
@@ -96,8 +76,6 @@ class ConvertResponse(BaseModel):
engine_version: str
elapsed_ms: int
raw_metrics: dict
images: list[ConvertImage] = Field(default_factory=list)
images_truncated: bool = False
@app.get("/ready")
@@ -146,11 +124,9 @@ async def convert(req: ConvertRequest):
},
) from exc
md_text, _meta, raw_images = text_from_rendered(rendered)
md_text, _meta, _images = text_from_rendered(rendered)
elapsed_ms = int((time.monotonic() - start) * 1000)
images_payload, truncated = _serialize_images(raw_images, str(p))
return ConvertResponse(
md_content=md_text,
md_content_hash=hashlib.sha256(md_text.encode("utf-8")).hexdigest(),
@@ -159,63 +135,6 @@ async def convert(req: ConvertRequest):
elapsed_ms=elapsed_ms,
raw_metrics={
"page_count": getattr(rendered, "page_count", None),
"image_count_extracted": len(raw_images) if raw_images else 0,
"image_count_returned": len(images_payload),
"image_count_extracted": len(_images) if _images else 0,
},
images=images_payload,
images_truncated=truncated,
)
def _serialize_images(raw_images, src_path: str) -> tuple[list[ConvertImage], bool]:
"""marker 의 `_images` (dict[slug, PIL.Image]) → base64 ConvertImage 리스트.
가드:
- MAX_IMAGES_PER_DOC 초과 head 반환 + truncated=True
- per-image 직렬화 실패 해당 이미지만 skip + warn (전체 fail )
- per-image 결과 byte 크기가 MAX_BYTES_PER_IMAGE 초과 skip + warn
"""
if not raw_images:
return [], False
items = list(raw_images.items())
truncated = len(items) > MAX_IMAGES_PER_DOC
if truncated:
logger.warning(
f"[marker-service] images truncated path={src_path} "
f"total={len(items)} cap={MAX_IMAGES_PER_DOC}"
)
items = items[:MAX_IMAGES_PER_DOC]
out: list[ConvertImage] = []
for slug, pil_img in items:
try:
fmt_raw = (pil_img.format or "PNG").upper()
# WebP/GIF 도 marker 가 emit 가능하지만 본 1B.5 기준은 PNG/JPEG 우선.
# 알 수 없는 포맷이면 PNG 로 강제 (lossless re-encode).
fmt = fmt_raw if fmt_raw in {"PNG", "JPEG", "WEBP", "GIF"} else "PNG"
buf = io.BytesIO()
pil_img.save(buf, format=fmt)
raw_bytes = buf.getvalue()
if len(raw_bytes) > MAX_BYTES_PER_IMAGE:
logger.warning(
f"[marker-service] image too large skipped path={src_path} "
f"slug={slug} bytes={len(raw_bytes)} cap={MAX_BYTES_PER_IMAGE}"
)
continue
out.append(
ConvertImage(
slug=slug,
format=fmt.lower(),
width=pil_img.width,
height=pil_img.height,
bytes_b64=base64.b64encode(raw_bytes).decode("ascii"),
)
)
except Exception as exc:
logger.warning(
f"[marker-service] image serialize failed path={src_path} "
f"slug={slug}: {type(exc).__name__}: {exc}"
)
continue
return out, truncated
-269
View File
@@ -1,269 +0,0 @@
"""Briefing historical 분기 회귀 — Plan §"Verification 9".
3 경로 검증:
1. flag off retrieve_historical 호출 , prompt {historical_block} = "(과거 참고 자료 없음)"
2. flag on + fixture top-K similarity 0.70 docs 반환
3. flag on + zero match list (no fallback hallucination)
"""
import os
import sys
from datetime import datetime, timezone, timedelta
from pathlib import Path
import numpy as np
import pytest
# PYTHONPATH = /app (디렉토리 안에서 실행 가정 또는 sys.path 추가)
APP_DIR = Path(__file__).resolve().parent.parent / "app"
if str(APP_DIR) not in sys.path:
sys.path.insert(0, str(APP_DIR))
from services.briefing.comparator import (
HISTORICAL_SIMILARITY_MIN,
HISTORICAL_TOP_K,
_build_historical_block,
_make_fallback,
_sanitize_envelope,
build_prompt,
historical_enabled,
retrieve_historical,
)
from services.clustering_common import normalize_vector
def _make_doc(doc_id: int, embedding: np.ndarray, hours_ago: int = 1) -> dict:
return {
"id": doc_id,
"title": f"doc {doc_id}",
"ai_summary": f"summary {doc_id}",
"embedding": embedding,
"created_at": datetime.now(timezone.utc) - timedelta(hours=hours_ago),
}
def _make_cluster_with_centroid(centroid_vec: np.ndarray) -> dict:
return {
"centroid": normalize_vector(centroid_vec),
"members": [],
}
def test_flag_default_off():
"""env 미설정 → historical disabled."""
os.environ.pop("BRIEFING_HISTORICAL_ENABLED", None)
assert historical_enabled() is False
def test_flag_on():
os.environ["BRIEFING_HISTORICAL_ENABLED"] = "true"
try:
assert historical_enabled() is True
finally:
os.environ.pop("BRIEFING_HISTORICAL_ENABLED", None)
def test_historical_block_empty_when_no_docs():
"""경로 1: flag off 또는 historical_docs=[] → 빈 라벨."""
block = _build_historical_block([])
assert block == "(과거 참고 자료 없음)"
def test_historical_block_has_label_when_docs():
docs = [_make_doc(1, np.ones(1024, dtype=np.float32))]
block = _build_historical_block(docs)
assert "이전 30일 흐름" in block
assert "직접 인용 금지" in block
assert "[H1]" in block
def test_retrieve_historical_topk():
"""경로 2: flag on + fixture top-K similarity ≥ threshold."""
# cluster centroid = 모두 1 방향
centroid = np.ones(8, dtype=np.float32)
cluster = _make_cluster_with_centroid(centroid)
# 후보 10개: 5개는 centroid 와 유사 (sim≈1.0), 5개는 직교 (sim≈0)
similar_emb = np.ones(8, dtype=np.float32)
orthogonal_emb = np.array([1, -1, 1, -1, 1, -1, 1, -1], dtype=np.float32)
candidates = (
[_make_doc(i, similar_emb + np.random.rand(8).astype(np.float32) * 0.01) for i in range(1, 6)]
+ [_make_doc(10 + i, orthogonal_emb) for i in range(5)]
)
out = retrieve_historical(cluster, candidates, top_k=5, sim_min=0.70)
assert len(out) == 5
# 모두 similar 그룹 (id 1~5) 만 선택됨
selected_ids = {d["id"] for d in out}
assert selected_ids.issubset({1, 2, 3, 4, 5})
def test_retrieve_historical_zero_match():
"""경로 3: 모든 candidate similarity < threshold → 빈 list."""
centroid = np.ones(8, dtype=np.float32)
cluster = _make_cluster_with_centroid(centroid)
orthogonal_emb = np.array([1, -1, 1, -1, 1, -1, 1, -1], dtype=np.float32)
candidates = [_make_doc(i, orthogonal_emb) for i in range(5)]
out = retrieve_historical(cluster, candidates, top_k=5, sim_min=0.70)
assert out == []
def test_retrieve_historical_empty_candidates():
centroid = np.ones(8, dtype=np.float32)
cluster = _make_cluster_with_centroid(centroid)
assert retrieve_historical(cluster, [], top_k=5) == []
def test_sanitize_envelope_valid():
cluster = {"members": [{"id": 1}, {"id": 2}]}
parsed = {
"topic_label": "이란 충돌",
"headline": "긴장 격화",
"country_perspectives": [
{"country": "kr", "summary": "유가 충격", "article_ids": [1]},
{"country": "us", "summary": "외교 압박", "article_ids": [2]},
],
"divergences": ["KR=경제 / US=외교"],
"convergences": ["민간 사상 우려 공통"],
"key_quotes": [{"country": "US", "source": "NYT", "quote": "Tehran ..."}],
"historical_context": "지난 3주 6회 공방",
}
sanitized = _sanitize_envelope(parsed, cluster)
assert sanitized is not None
assert sanitized["topic_label"] == "이란 충돌"
# country 대문자 변환
assert sanitized["country_perspectives"][0]["country"] == "KR"
assert sanitized["historical_context"] == "지난 3주 6회 공방"
assert sanitized["llm_fallback_used"] is False
def test_sanitize_envelope_empty_perspectives_to_fallback():
"""country_perspectives 비어 있으면 None (caller 가 fallback 발동)."""
cluster = {"members": []}
parsed = {
"topic_label": "X",
"headline": "Y",
"country_perspectives": [],
}
assert _sanitize_envelope(parsed, cluster) is None
def test_fallback_row_fixed_form():
"""Plan §"Fallback Topic Row 고정 형태"."""
cluster = {"members": [{"id": 1}]}
fb = _make_fallback(cluster)
assert fb["topic_label"] == "주요 뉴스 묶음"
assert fb["country_perspectives"] == []
assert fb["divergences"] == []
assert fb["convergences"] == []
assert fb["key_quotes"] == []
assert fb["historical_context"] is None
assert fb["llm_fallback_used"] is True
def test_prompt_includes_both_blocks():
selected = [_make_doc(1, np.ones(8, dtype=np.float32))]
selected[0]["country"] = "KR"
selected[0]["ai_sub_group"] = "경향신문"
selected[0]["ai_summary_truncated"] = "오늘 한국 뉴스"
prompt = build_prompt(selected, historical_docs=[])
assert "{articles_block}" not in prompt # 치환됨
assert "{historical_block}" not in prompt
assert "(KR · 경향신문)" in prompt
assert "(과거 참고 자료 없음)" in prompt
def test_perspective_summary_cap_enforced():
"""sanitize 가 길이 cap 강제."""
cluster = {"members": []}
long_summary = "" * 500 # 500자, cap=240
parsed = {
"topic_label": "T",
"headline": "H",
"country_perspectives": [{"country": "KR", "summary": long_summary, "article_ids": []}],
}
s = _sanitize_envelope(parsed, cluster)
assert s is not None
assert len(s["country_perspectives"][0]["summary"]) <= 241 # 240 + "…"
def test_article_ids_fallback_when_llm_empty():
"""LLM 이 article_ids 를 비워두면 같은 country cluster member top-N 자동 주입."""
cluster = {
"members": [
{"id": 101, "country": "KR", "weight": 0.9},
{"id": 102, "country": "KR", "weight": 0.8},
{"id": 103, "country": "KR", "weight": 0.7},
{"id": 201, "country": "US", "weight": 0.5},
]
}
parsed = {
"topic_label": "T",
"headline": "H",
"country_perspectives": [
{"country": "KR", "summary": "한국 시각", "article_ids": []},
{"country": "US", "summary": "미국 시각", "article_ids": []},
],
}
s = _sanitize_envelope(parsed, cluster)
assert s is not None
kr = next(p for p in s["country_perspectives"] if p["country"] == "KR")
us = next(p for p in s["country_perspectives"] if p["country"] == "US")
assert kr["article_ids"] == [101, 102, 103] # weight desc
assert us["article_ids"] == [201]
def test_article_ids_intersect_with_cluster():
"""LLM 이 엉뚱한 id 를 넣으면 cluster member 와 교집합만."""
cluster = {
"members": [
{"id": 101, "country": "KR", "weight": 0.9},
{"id": 102, "country": "KR", "weight": 0.8},
]
}
parsed = {
"topic_label": "T",
"headline": "H",
"country_perspectives": [
{"country": "KR", "summary": "한국 시각", "article_ids": [101, 999, 888]},
],
}
s = _sanitize_envelope(parsed, cluster)
assert s is not None
assert s["country_perspectives"][0]["article_ids"] == [101]
def test_article_ids_capped_to_max():
"""후처리 후에도 country 당 MAX_ARTICLE_IDS_PER_COUNTRY cap."""
cluster = {
"members": [
{"id": i, "country": "KR", "weight": 1.0 / i} for i in range(1, 15)
]
}
parsed = {
"topic_label": "T",
"headline": "H",
"country_perspectives": [
{"country": "KR", "summary": "한국 시각", "article_ids": []},
],
}
s = _sanitize_envelope(parsed, cluster)
assert s is not None
from services.briefing.comparator import MAX_ARTICLE_IDS_PER_COUNTRY
assert len(s["country_perspectives"][0]["article_ids"]) == MAX_ARTICLE_IDS_PER_COUNTRY
def test_max_perspectives_cap():
cluster = {"members": []}
parsed = {
"topic_label": "T",
"headline": "H",
"country_perspectives": [
{"country": f"C{i}", "summary": "s", "article_ids": []} for i in range(20)
],
}
s = _sanitize_envelope(parsed, cluster)
assert s is not None
assert len(s["country_perspectives"]) <= 10
-169
View File
@@ -1,169 +0,0 @@
"""Phase 1B.5 ImgAuth — marker_worker 의 순수 헬퍼 단위 테스트.
DB / NAS / marker-service 실접속이 필요한 통합 테스트는 파일 (배포 실행).
파일은 image-bytes mocking 만으로 검증 가능한 부분 (rewrite 로직 + persist 매핑).
plan: ~/.claude/plans/piped-humming-crystal.md
"""
from __future__ import annotations
import base64
import os
import sys
import pytest
# tests/ → 프로젝트 루트 → app/
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from workers.marker_worker import (
_persist_images_to_nas,
_rewrite_image_refs,
)
# ─── _rewrite_image_refs ───
def test_rewrite_exact_slug_match():
md = "본문\n\n![도식 1](_page_0_Picture_3.jpeg)\n\n"
out = _rewrite_image_refs(md, {"_page_0_Picture_3.jpeg": "img_001"})
assert "![도식 1](docimg:img_001)" in out
assert "_page_0_Picture_3.jpeg" not in out
def test_rewrite_basename_match_with_subdir_href():
md = "![](sub/_page_2_Figure_1.png)"
out = _rewrite_image_refs(md, {"_page_2_Figure_1.png": "img_007"})
assert out == "![](docimg:img_007)"
def test_rewrite_preserves_external_urls():
md = "외부 ![logo](https://example.com/x.png) 와 내부 ![](slug.png)"
out = _rewrite_image_refs(md, {"slug.png": "img_002"})
# 외부 URL 는 그대로, 내부 slug 만 docimg 로 치환.
assert "https://example.com/x.png" in out
assert "(docimg:img_002)" in out
def test_rewrite_preserves_alt_text():
md = "![긴 한국어 alt 설명 with $math$](slug.jpeg)"
out = _rewrite_image_refs(md, {"slug.jpeg": "img_001"})
assert out == "![긴 한국어 alt 설명 with $math$](docimg:img_001)"
def test_rewrite_no_slug_map_is_noop():
md = "![](slug.png)"
assert _rewrite_image_refs(md, {}) == md
def test_rewrite_unknown_slug_kept():
md = "![](unknown_slug.png)"
out = _rewrite_image_refs(md, {"other.png": "img_001"})
assert out == md
def test_rewrite_idempotent_on_already_normalized():
"""이미 docimg:img_NNN 인 ref 는 slug 매칭 실패 → 변경 없음 (재변환 idempotent)."""
md = "![alt](docimg:img_001)"
out = _rewrite_image_refs(md, {"_page_0.jpeg": "img_001"})
assert out == md
def test_rewrite_multiple_images():
md = "![a](s1.png) text ![b](s2.png) ![c](s3.jpg)"
out = _rewrite_image_refs(md, {
"s1.png": "img_001",
"s2.png": "img_002",
"s3.jpg": "img_003",
})
assert "(docimg:img_001)" in out
assert "(docimg:img_002)" in out
assert "(docimg:img_003)" in out
# ─── _persist_images_to_nas ───
def _make_png_bytes() -> bytes:
"""1x1 transparent PNG (signature + IHDR + IDAT + IEND)."""
return bytes.fromhex(
"89504e470d0a1a0a" # signature
"0000000d49484452" # IHDR len + type
"00000001000000010806000000" # 1x1 RGBA
"1f15c4890000000d4944415478"
"9c626001000000ffff03000006"
"00057ce4ec5d0000000049454e44ae426082"
)
def test_persist_sequential_image_keys(tmp_path, monkeypatch):
# NAS root 를 tmp_path 로 redirect
monkeypatch.setattr(
"workers.marker_worker.EXTRACTED_IMAGES_ROOT",
tmp_path / "extracted_images",
)
payload = [
{"slug": "_page_0.png", "format": "png",
"bytes_b64": base64.b64encode(_make_png_bytes()).decode("ascii")},
{"slug": "_page_1.png", "format": "png",
"bytes_b64": base64.b64encode(_make_png_bytes()).decode("ascii")},
{"slug": "_page_2.png", "format": "png",
"bytes_b64": base64.b64encode(_make_png_bytes()).decode("ascii")},
]
saved = _persist_images_to_nas(document_id=999, images_resp=payload)
assert [s["image_key"] for s in saved] == ["img_001", "img_002", "img_003"]
assert all(s["mime_type"] == "image/png" for s in saved)
assert all(s["file_size"] > 0 for s in saved)
assert all(s["source_slug"].startswith("_page_") for s in saved)
# NAS 파일 실재 확인
for s in saved:
from pathlib import Path
assert Path(s["file_path"]).is_file()
def test_persist_idempotent_on_rerun(tmp_path, monkeypatch):
"""같은 doc_id 두번 persist → 같은 image_key 같은 path 에 overwrite."""
monkeypatch.setattr(
"workers.marker_worker.EXTRACTED_IMAGES_ROOT",
tmp_path / "extracted_images",
)
raw = _make_png_bytes()
payload = [{"slug": "_page_0.png", "format": "png",
"bytes_b64": base64.b64encode(raw).decode("ascii")}]
s1 = _persist_images_to_nas(document_id=42, images_resp=payload)
s2 = _persist_images_to_nas(document_id=42, images_resp=payload)
assert s1[0]["image_key"] == s2[0]["image_key"] == "img_001"
assert s1[0]["file_path"] == s2[0]["file_path"]
assert s1[0]["content_hash"] == s2[0]["content_hash"]
def test_persist_skips_invalid_base64(tmp_path, monkeypatch):
"""깨진 base64 는 skip — 다른 이미지 처리는 계속."""
monkeypatch.setattr(
"workers.marker_worker.EXTRACTED_IMAGES_ROOT",
tmp_path / "extracted_images",
)
raw = _make_png_bytes()
payload = [
{"slug": "_page_0.png", "format": "png", "bytes_b64": "@@@invalid@@@"},
{"slug": "_page_1.png", "format": "png",
"bytes_b64": base64.b64encode(raw).decode("ascii")},
]
saved = _persist_images_to_nas(document_id=7, images_resp=payload)
# 첫 번째 invalid skip, 두 번째만 저장. seq 는 그대로 진행 → img_002 가 됨.
assert len(saved) == 1
assert saved[0]["image_key"] == "img_002"
assert saved[0]["source_slug"] == "_page_1.png"
def test_persist_empty_images_returns_empty(tmp_path, monkeypatch):
monkeypatch.setattr(
"workers.marker_worker.EXTRACTED_IMAGES_ROOT",
tmp_path / "extracted_images",
)
assert _persist_images_to_nas(document_id=1, images_resp=[]) == []