hyungi_document_server

Author	SHA1	Message	Date
hyungi	dc9cbcc669	fix(pipeline): 조용한 실패 3건 — 빈 추출/요약 success 박제 + misfire 침묵 스킵 차단 H1 marker_worker: PDF arm + split arm 에 빈 md_content 가드(office arm 동형 raise → queue 재시도 후 failed). 빈 추출(스캔/이미지 PDF)을 md_status=success+빈 md 로 박제하던 불변식 위반 제거. H2 summarize_worker: 빈/think-only 요약을 ai_summary= 로 박제(completed 마크)하던 것 raise 로 가시화 + briefing/digest loader 에 length(ai_summary)>0 방어(기존 누출 행도 배제). H4 main.py: AsyncIOScheduler job_defaults misfire_grace_time 1s→45s — 단일 루프 1초 혼잡에 1분 컨슈머 틱이 run time missed 로 침묵 스킵하던 것 차단(coalesce 유지). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 05:03:03 +00:00
hyungi	f325bd0509	feat(observability): digest/briefing 을 처리 보드에 맥미니 작업으로 노출 (background_jobs) 큐 밖 cron 생성 작업(global_digest/morning_briefing)이 processing_queue stage 가 아니라 보드에 안 잡혀, 맥미니가 11분짜리 digest 를 돌려도 idle 처럼 보였다. `ebbcaf8` 의 background_jobs 메커니즘 재사용: - digest_worker/briefing_worker = start_job→finish_job (best-effort, 본작업 무해) - pipeline = cluster 완료마다 heartbeat(processed/total) → 진행바 - queue_overview = kind→machine 맵으로 payload 에 machine 필드 (맥미니 귀속) - 보드 = 머신 레인에 dot 점등 + "생성 중: <label> N/T" 표시 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 03:36:57 +00:00
hyungi	a82b0724df	fix(news): digest/briefing 생성 LLM 타임아웃 게이트 단일소스화 + deep_summary 컨슈머 분리 2026-06-11 맥미니 모델 교체(Gemma4 26B→Qwen3.6-27B-6bit, 콜당 ~90~300s)의 타임아웃 상향 sweep 이 config.yaml/synthesis 만 갱신하고 digest/briefing 코드의 하드코딩 LLM_CALL_TIMEOUT=25(빠른 Gemma 기준)를 누락 → digest 600s 하드캡 초과로 06-10 이후 미생성, briefing 4/4 LLM 폴백(status=failed). (적대 리뷰로 블로커 정정: concurrency=1 사설 세마포로는 digest 44~68 클러스터가 하드캡에 여전히 걸림 + llm_gate 영구 룰 위반.) - 타임아웃·재시도·하드캡을 config.pipeline 단일소스로 이관(digest_llm_timeout_s=300, attempts=2, pipeline_hard_cap_s=3000). 다음 모델 교체 때 재발 차단. - digest/briefing LLM 호출을 사설 Semaphore 제거하고 전역 MLX gate(BACKGROUND) 경유로 변경 — llm_gate 영구 룰(같은 endpoint 단일 게이트, 새 Semaphore 금지) 준수 + ask/eid(FOREGROUND)와 조율. 동시성 lever = 기존 mlx_gate_concurrency 2→4 (continuous batching 실측 — 3동시콜 wall 121s ≈ 단일콜, 직렬 대비 ~3배). - digest/briefing pipeline cluster 루프를 asyncio.gather 동시 실행으로 전환 (실동시성은 게이트가 제한, rank/순서 보존). - deep_summary(70~300s)를 메인 consume_queue 에서 분리해 consume_deep_queue 신설 (markdown/fast split 선례) — 단일 deep 호출이 1분 틱 초과로 메인 큐를 영구 coalesce 시키던 문제 제거. - 죽은 PIPELINE_HARD_CAP=600(briefing/pipeline.py) 제거, summarizer docstring 갱신, deep 컨슈머 disjoint/hold 테스트 추가. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 23:29:56 +00:00
hyungi	a6db6c999b	fix(safety): B-4 리뷰 반영 — 단일 술어 중앙화 + study/briefing 경로 커버 적대 리뷰(10에이전트) 확정 반영: - license_filter.py 신설 — restricted_exclude_sql(raw)/restricted_exclude_orm(ORM) 단일 정의. retrieval _license_sql·digest·briefing·study 풀이가 공유(드리프트 방지). - major: explanation_rag(study 문제 AI 풀이 RAG)에 술어 누락 → doc_meta 쿼리에 ORM 적용(valid_doc_ids 경유로 청크도 차단). briefing/loader 2쿼리에 누락 → digest 와 동일 술어 추가(news restricted 부재=방어적·경로 일관성). - blocker(low-impact): file_watcher changed-doc 경로 material/license 보정(merge 주입· license 부재 시만 — extract_meta clobber 회피, pre-B-4 적재분 동기화). - 테스트: 단일-source 검증 + ORM 구성 스모크 2건 추가. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 14:52:04 +09:00
hyungi	1d3d61d31e	fix(briefing): lower clustering threshold 0.78 → 0.70 배포 후 관측 결과 (2026-05-13 새벽): - 126 docs / 7 countries 인데 THRESHOLD=0.78 로 raw_clusters=124, dropped_min_articles=122, kept=1. - 거의 매 article 이 별 cluster 로 갈려 토픽 묶음 실패. - 같은 cron 어제 (5/12) 는 101 docs 에서 6 topics 성공 — 그날 뉴스가 우연히 같은 토픽으로 더 모인 case. 수동 측정 (5/13 동일 docs): - 0.78 → kept=1 - 0.70 → kept=5 (allowed) 영구 변경 = THRESHOLD=0.70. cross-country 필터 (MIN_COUNTRIES≥2) + min_articles(≥2) 그대로 유지하므로 noise topic 위험은 제한적. 원본 주석 (0.75~0.80 중간값) 도 갱신.	2026-05-12 21:44:00 +00:00
Hyungi Ahn	6966be9cf6	fix(briefing): backfill country_perspectives[].article_ids from cluster members LLM 이 article_ids 를 자율적으로 비워두는 케이스 (2026-05-12 첫 briefing 6 topics 모두 빈 list) 를 서버에서 보정. 후처리 정책 (_resolve_article_ids): 1. LLM 이 준 id ∩ cluster member id (엉뚱한 id 차단, hallucination 방어) 2. 비어있으면 같은 country cluster member top weight N 개 자동 주입 3. cluster 안 country 매칭 멤버 0 → [] per-country cap = MAX_ARTICLE_IDS_PER_COUNTRY = 5. weight 내림차순. API 계약 강화: country_perspectives 가 있는 topic 은 article_ids ≥ 1 보장 (같은 country cluster member 존재 시). frontend / 외부 채널 / archive UI 모두 신뢰 가능. tests 3 케이스 추가.	2026-05-12 13:15:26 +09:00
Hyungi Ahn	431d4fe010	feat(briefing): add morning briefing schema + services + api (historical off) 야간 수집 뉴스 (KST 00:00~05:00) topic×country 비교 분석 1페이지 카드. Phase 4 Global Digest 와 코드/로직/테이블 분리, 알고리즘만 services/clustering_common 공유. Backend 신규: - migrations/255_morning_briefings.sql: morning_briefings + briefing_topics (briefing_date UNIQUE, UNIQUE(briefing_id,topic_rank), FK CASCADE, historical_* 3컬럼 nullable, cluster_members JSONB, country_perspectives JSONB, status 4-state success\|partial\|failed\|empty) - app/models/briefing.py: SQLAlchemy ORM - app/services/briefing/loader.py: KST 5h 윈도우 + news_sources prefix fallback (Phase 4 패턴 미러) + historical candidate pool 로더 - app/services/briefing/clustering.py: cluster_global topic-first (LAMBDA=ln(2)/2h, MIN_COUNTRIES_PER_TOPIC=2, MAX_TOPICS=7) - app/services/briefing/comparator.py: call_primary 26B + JSON envelope sanitize (cap perspectives 10 / divergences 3 / convergences 2 / quotes 5) + fallback row 고정 형태 + retrieve_historical cosine top-K - app/services/briefing/pipeline.py: load→cluster→select(K=7,λ=0.6) →historical→compare→status 4-state→delete+insert transaction - app/workers/briefing_worker.py: APScheduler/수동 호출 공용 진입점, 600s hard cap - app/prompts/briefing_comparative.txt: 한국어 비교 분석 JSON 프롬프트, {articles_block} + {historical_block} 2섹션, 인용 금지 라벨 - app/api/briefing.py: GET /latest, GET ?date=, POST /regenerate?date= (admin, sync delete+insert tx, regenerated:true) Backend 수정: - app/main.py: briefing_router 등록 (/api/briefing prefix). scheduler 등록은 PR-3 에서. - app/services/digest/selection.py: select_for_llm 매개변수화 (K, λ caller 주입). Phase 4 동작은 default 값으로 보존. Historical 정책: - BRIEFING_HISTORICAL_ENABLED env flag, default off. - flag off → historical_* 컬럼 모두 NULL, prompt {historical_block} 빈 라벨, retrieval 호출 안 함. - flag on (PR-1b 에서 enable) → cluster centroid 와 과거 30일 doc embedding cosine top-K 5 (sim≥0.70), prompt 에 주입. Country canonical (실측 확인 후): - documents.country 컬럼 부재 확정 - document_chunks.country 매칭률 0% (chunks 자체가 뉴스에 안 만들어짐) - 유일 country 신호 = news_sources prefix 매핑 (Phase 4 와 동일) Tests: - tests/test_briefing_historical.py: 3 경로 회귀 (flag off/on with fixture/on zero match) + sanitize cap + fallback row 형태. Verification: PR-1.8 에서 GPU 컨테이너 pytest + 수동 regenerate.	2026-05-12 12:58:50 +09:00

7 Commits