431d4fe010
야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석 1페이지 카드.
Phase 4 Global Digest 와 코드/로직/테이블 분리, 알고리즘만 services/clustering_common 공유.
Backend 신규:
- migrations/255_morning_briefings.sql: morning_briefings + briefing_topics
(briefing_date UNIQUE, UNIQUE(briefing_id,topic_rank), FK CASCADE,
historical_* 3컬럼 nullable, cluster_members JSONB, country_perspectives
JSONB, status 4-state success|partial|failed|empty)
- app/models/briefing.py: SQLAlchemy ORM
- app/services/briefing/loader.py: KST 5h 윈도우 + news_sources prefix
fallback (Phase 4 패턴 미러) + historical candidate pool 로더
- app/services/briefing/clustering.py: cluster_global topic-first
(LAMBDA=ln(2)/2h, MIN_COUNTRIES_PER_TOPIC=2, MAX_TOPICS=7)
- app/services/briefing/comparator.py: call_primary 26B + JSON envelope
sanitize (cap perspectives 10 / divergences 3 / convergences 2 /
quotes 5) + fallback row 고정 형태 + retrieve_historical cosine top-K
- app/services/briefing/pipeline.py: load→cluster→select(K=7,λ=0.6)
→historical→compare→status 4-state→delete+insert transaction
- app/workers/briefing_worker.py: APScheduler/수동 호출 공용 진입점,
600s hard cap
- app/prompts/briefing_comparative.txt: 한국어 비교 분석 JSON 프롬프트,
{articles_block} + {historical_block} 2섹션, 인용 금지 라벨
- app/api/briefing.py: GET /latest, GET ?date=, POST /regenerate?date=
(admin, sync delete+insert tx, regenerated:true)
Backend 수정:
- app/main.py: briefing_router 등록 (/api/briefing prefix). scheduler
등록은 PR-3 에서.
- app/services/digest/selection.py: select_for_llm 매개변수화 (K, λ
caller 주입). Phase 4 동작은 default 값으로 보존.
Historical 정책:
- BRIEFING_HISTORICAL_ENABLED env flag, default off.
- flag off → historical_* 컬럼 모두 NULL, prompt {historical_block} 빈
라벨, retrieval 호출 안 함.
- flag on (PR-1b 에서 enable) → cluster centroid 와 과거 30일 doc
embedding cosine top-K 5 (sim≥0.70), prompt 에 주입.
Country canonical (실측 확인 후):
- documents.country 컬럼 부재 확정
- document_chunks.country 매칭률 0% (chunks 자체가 뉴스에 안 만들어짐)
- 유일 country 신호 = news_sources prefix 매핑 (Phase 4 와 동일)
Tests:
- tests/test_briefing_historical.py: 3 경로 회귀 (flag off/on with
fixture/on zero match) + sanitize cap + fallback row 형태.
Verification: PR-1.8 에서 GPU 컨테이너 pytest + 수동 regenerate.
98 lines
3.8 KiB
Python
98 lines
3.8 KiB
Python
"""morning_briefings + briefing_topics 테이블 ORM (야간 수집 뉴스 브리핑).
|
||
|
||
axis 반대: Phase 4 = country×topic / Briefing = topic×country.
|
||
country_perspectives JSONB 안에 한 topic 의 여러 국가 관점 array.
|
||
"""
|
||
|
||
from datetime import date, datetime
|
||
|
||
from sqlalchemy import (
|
||
BigInteger,
|
||
Boolean,
|
||
Date,
|
||
DateTime,
|
||
Float,
|
||
ForeignKey,
|
||
Integer,
|
||
String,
|
||
Text,
|
||
)
|
||
from sqlalchemy.dialects.postgresql import JSONB
|
||
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
||
|
||
from core.database import Base
|
||
|
||
|
||
class MorningBriefing(Base):
|
||
"""하루 단위 브리핑 메타데이터 (KST 자정~05:00 윈도우)"""
|
||
|
||
__tablename__ = "morning_briefings"
|
||
|
||
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
|
||
briefing_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
|
||
window_start: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
|
||
window_end: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
|
||
decay_lambda: Mapped[float] = mapped_column(Float, nullable=False)
|
||
|
||
total_articles: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
|
||
total_countries: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
|
||
total_topics: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
|
||
|
||
generation_ms: Mapped[int | None] = mapped_column(Integer)
|
||
llm_calls: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
|
||
llm_failures: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
|
||
status: Mapped[str] = mapped_column(String(20), nullable=False, default="success")
|
||
|
||
headline_oneliner: Mapped[str | None] = mapped_column(Text)
|
||
|
||
created_at: Mapped[datetime] = mapped_column(
|
||
DateTime(timezone=True), nullable=False, default=datetime.now
|
||
)
|
||
|
||
topics: Mapped[list["BriefingTopic"]] = relationship(
|
||
back_populates="briefing",
|
||
cascade="all, delete-orphan",
|
||
order_by="BriefingTopic.topic_rank",
|
||
)
|
||
|
||
|
||
class BriefingTopic(Base):
|
||
"""1 briefing 안 topic_rank 순 cross-country 비교 분석 결과"""
|
||
|
||
__tablename__ = "briefing_topics"
|
||
|
||
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
|
||
briefing_id: Mapped[int] = mapped_column(
|
||
BigInteger,
|
||
ForeignKey("morning_briefings.id", ondelete="CASCADE"),
|
||
nullable=False,
|
||
)
|
||
|
||
topic_rank: Mapped[int] = mapped_column(Integer, nullable=False)
|
||
topic_label: Mapped[str] = mapped_column(String(120), nullable=False)
|
||
headline: Mapped[str] = mapped_column(Text, nullable=False)
|
||
|
||
country_perspectives: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
|
||
divergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
|
||
convergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
|
||
key_quotes: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
|
||
|
||
historical_article_ids: Mapped[list | None] = mapped_column(JSONB)
|
||
historical_context: Mapped[str | None] = mapped_column(Text)
|
||
historical_window_days: Mapped[int | None] = mapped_column(Integer)
|
||
|
||
cluster_members: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
|
||
article_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||
country_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||
importance_score: Mapped[float] = mapped_column(Float, nullable=False)
|
||
raw_weight_sum: Mapped[float] = mapped_column(Float, nullable=False)
|
||
|
||
llm_model: Mapped[str | None] = mapped_column(String(100))
|
||
llm_fallback_used: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
|
||
|
||
created_at: Mapped[datetime] = mapped_column(
|
||
DateTime(timezone=True), nullable=False, default=datetime.now
|
||
)
|
||
|
||
briefing: Mapped["MorningBriefing"] = relationship(back_populates="topics")
|