feat(briefing): add morning briefing schema + services + api (historical off)

야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석 1페이지 카드.
Phase 4 Global Digest 와 코드/로직/테이블 분리, 알고리즘만 services/clustering_common 공유.

Backend 신규:
- migrations/255_morning_briefings.sql: morning_briefings + briefing_topics
  (briefing_date UNIQUE, UNIQUE(briefing_id,topic_rank), FK CASCADE,
  historical_* 3컬럼 nullable, cluster_members JSONB, country_perspectives
  JSONB, status 4-state success|partial|failed|empty)
- app/models/briefing.py: SQLAlchemy ORM
- app/services/briefing/loader.py: KST 5h 윈도우 + news_sources prefix
  fallback (Phase 4 패턴 미러) + historical candidate pool 로더
- app/services/briefing/clustering.py: cluster_global topic-first
  (LAMBDA=ln(2)/2h, MIN_COUNTRIES_PER_TOPIC=2, MAX_TOPICS=7)
- app/services/briefing/comparator.py: call_primary 26B + JSON envelope
  sanitize (cap perspectives 10 / divergences 3 / convergences 2 /
  quotes 5) + fallback row 고정 형태 + retrieve_historical cosine top-K
- app/services/briefing/pipeline.py: load→cluster→select(K=7,λ=0.6)
  →historical→compare→status 4-state→delete+insert transaction
- app/workers/briefing_worker.py: APScheduler/수동 호출 공용 진입점,
  600s hard cap
- app/prompts/briefing_comparative.txt: 한국어 비교 분석 JSON 프롬프트,
  {articles_block} + {historical_block} 2섹션, 인용 금지 라벨
- app/api/briefing.py: GET /latest, GET ?date=, POST /regenerate?date=
  (admin, sync delete+insert tx, regenerated:true)

Backend 수정:
- app/main.py: briefing_router 등록 (/api/briefing prefix). scheduler
  등록은 PR-3 에서.
- app/services/digest/selection.py: select_for_llm 매개변수화 (K, λ
  caller 주입). Phase 4 동작은 default 값으로 보존.

Historical 정책:
- BRIEFING_HISTORICAL_ENABLED env flag, default off.
- flag off → historical_* 컬럼 모두 NULL, prompt {historical_block} 빈
  라벨, retrieval 호출 안 함.
- flag on (PR-1b 에서 enable) → cluster centroid 와 과거 30일 doc
  embedding cosine top-K 5 (sim≥0.70), prompt 에 주입.

Country canonical (실측 확인 후):
- documents.country 컬럼 부재 확정
- document_chunks.country 매칭률 0% (chunks 자체가 뉴스에 안 만들어짐)
- 유일 country 신호 = news_sources prefix 매핑 (Phase 4 와 동일)

Tests:
- tests/test_briefing_historical.py: 3 경로 회귀 (flag off/on with
  fixture/on zero match) + sanitize cap + fallback row 형태.

Verification: PR-1.8 에서 GPU 컨테이너 pytest + 수동 regenerate.
This commit is contained in:
Hyungi Ahn
2026-05-12 12:58:50 +09:00
parent 1ca6d8b522
commit 431d4fe010
13 changed files with 1466 additions and 7 deletions
+67
View File
@@ -0,0 +1,67 @@
-- 야간 수집 뉴스 브리핑 (Morning Briefing)
-- 매일 KST 자정~05:00 사이 수집된 뉴스를 topic×country 비교 분석 1페이지 카드.
-- 트리거: 05:10 KST APScheduler cron (PR-3 에서 등록). Phase 4 와 axis 반대 (topic-first).
-- 코드/로직/테이블 모두 Phase 4 와 분리. 공통 알고리즘만 services/clustering_common 공유.
-- 부모: 일일 1행
CREATE TABLE morning_briefings (
id BIGSERIAL PRIMARY KEY,
briefing_date DATE NOT NULL, -- KST 기준 (윈도우 자정 시작일)
window_start TIMESTAMPTZ NOT NULL, -- UTC 환산 자정
window_end TIMESTAMPTZ NOT NULL, -- UTC 환산 05:00 KST
decay_lambda DOUBLE PRECISION NOT NULL, -- 실제 사용된 time-decay λ (briefing = ln(2)/2h)
total_articles INTEGER NOT NULL DEFAULT 0,
total_countries INTEGER NOT NULL DEFAULT 0,
total_topics INTEGER NOT NULL DEFAULT 0,
generation_ms INTEGER,
llm_calls INTEGER NOT NULL DEFAULT 0,
llm_failures INTEGER NOT NULL DEFAULT 0,
status VARCHAR(20) NOT NULL DEFAULT 'success', -- success | partial | failed | empty
headline_oneliner TEXT, -- 향후 별 단계 (지금은 NULL)
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (briefing_date) -- idempotency: regenerate 시 DELETE+INSERT
);
CREATE INDEX idx_morning_briefings_date ON morning_briefings (briefing_date DESC);
-- 자식: 1 briefing 안 topic_rank 순 (importance_score 내림차순)
CREATE TABLE briefing_topics (
id BIGSERIAL PRIMARY KEY,
briefing_id BIGINT NOT NULL REFERENCES morning_briefings(id) ON DELETE CASCADE,
topic_rank INTEGER NOT NULL, -- 1..N
topic_label VARCHAR(120) NOT NULL, -- "이란-이스라엘 충돌" 등 5~10 단어 한국어
headline TEXT NOT NULL, -- 1줄 요약
-- LLM 비교 분석 결과 (JSONB cap: perspectives ≤10, divergences ≤3, convergences ≤2, quotes ≤5)
country_perspectives JSONB NOT NULL DEFAULT '[]'::jsonb, -- [{country, summary, article_ids}, ...]
divergences JSONB NOT NULL DEFAULT '[]'::jsonb, -- ["A국 X / B국 Y", ...]
convergences JSONB NOT NULL DEFAULT '[]'::jsonb, -- ["모두 Z 일치", ...]
key_quotes JSONB NOT NULL DEFAULT '[]'::jsonb, -- [{country, source, quote}, ...]
-- Historical context (BRIEFING_HISTORICAL_ENABLED=false default off, PR-1b 에서 on)
historical_article_ids JSONB, -- [doc_id, ...] top-K 과거 참고 (페이지 노출 X)
historical_context TEXT, -- LLM 생성 "지난 흐름" 1~2문장
historical_window_days INTEGER, -- 사용된 N (기본 30)
-- Cluster 메타 (LLM 성공/실패와 무관, UI 폴백 link 용)
cluster_members JSONB NOT NULL DEFAULT '[]'::jsonb, -- [doc_id, ...] cluster 전체 article
article_count INTEGER NOT NULL,
country_count INTEGER NOT NULL, -- cluster 안 distinct country 수 (MIN_COUNTRIES_PER_TOPIC=2 필터 통과)
importance_score DOUBLE PRECISION NOT NULL, -- briefing 내 0~1 normalized
raw_weight_sum DOUBLE PRECISION NOT NULL,
llm_model VARCHAR(100),
llm_fallback_used BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (briefing_id, topic_rank)
);
CREATE INDEX idx_briefing_topics_briefing_rank ON briefing_topics (briefing_id, topic_rank);