feat(category): law 카테고리 분리 — enum + backfill + classify skip

- migrations/152: ALTER TYPE doc_category ADD VALUE 'law' (DDL only; PG16 단일-트랜잭션 제약상 backfill 은 별도)
- models/document.py: Enum 에 'law' 추가 (7 활성 + 3 유보)
- workers/law_monitor.py: Document(..., category='law') — 신규 유입부터 세팅
- workers/classify_worker.py: source_channel='law_monitor' early-return + 최소 필드 (ai_domain='법령', ai_tags=['법령'], importance='medium'). AI classify skip — 법령 구조 고정/외부 source of truth/자동 재수집
- scripts/backfill_category.py: law 분기 + WHERE re-target ((source_channel='law_monitor' AND category='document')) + VERIFY cat_law/law_source_count + fail 조건
- api/documents.py: default 목록 제외에 law_monitor 추가 (news 와 동일 패턴)
- api/dashboard.py: documents count FILTER 에 law_monitor 제외 (category_counts.law 는 기존 GROUP BY category 로 자동 노출)
- frontend/Sidebar.svelte: '법령 알림' 버튼 ?source=law_monitor → ?category=law (explicit category 경로가 default exclusion 을 skip)

plan: ~/.claude/plans/stateless-churning-raccoon.md
axis 원칙: category=UI 축, policy/telemetry=source_channel+ai_domain 축 (feedback_category_vs_ai_domain_axis.md)

배포 순서: push → GPU pull → compose up --build fastapi frontend → backfill --dry-run → --apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-24 09:14:56 +09:00
parent 91d0fcd023
commit e88640d3d8
8 changed files with 58 additions and 13 deletions
+29 -8
View File
@@ -1,13 +1,21 @@
"""§1 백필 — documents.category 전체 행 채우기.
plan: luminous-sprouting-hamster.md §1
확장: stateless-churning-raccoon.md (law 카테고리 분기 + idempotent 재실행)
매핑 규칙 (category IS NULL 인 모든 행 대상):
매핑 규칙:
source_channel='law_monitor' → category='law'
source_channel='news' → category='news'
source_channel='memo' → category='memo'
file_type='note' → category='memo'
user_tags 에 '@library/' 태그 보유 → category='library'
그 외 → category='document'
대상 WHERE:
category IS NULL
OR (source_channel='law_monitor' AND category='document')
— law 신규 enum 도입 전 backfill 로 'document' 배정됐던 206건 재분류.
자동 library 전이 금지 — 기존 @library/ 태그 보유분만 'library' 로 이행.
audio/video 는 §3 이후 생성 (백필 대상 없음).
@@ -36,6 +44,7 @@ WITH classified AS (
SELECT
id,
CASE
WHEN source_channel = 'law_monitor' THEN 'law'
WHEN source_channel = 'news' THEN 'news'
WHEN source_channel = 'memo' THEN 'memo'
WHEN file_type = 'note' THEN 'memo'
@@ -49,6 +58,7 @@ WITH classified AS (
END AS target_category
FROM documents
WHERE category IS NULL
OR (source_channel = 'law_monitor' AND category = 'document')
)
SELECT target_category, COUNT(*) AS n FROM classified GROUP BY 1 ORDER BY 2 DESC;
"""
@@ -56,6 +66,7 @@ SELECT target_category, COUNT(*) AS n FROM classified GROUP BY 1 ORDER BY 2 DESC
APPLY_SQL = """
UPDATE documents
SET category = CASE
WHEN source_channel = 'law_monitor' THEN 'law'::doc_category
WHEN source_channel = 'news' THEN 'news'::doc_category
WHEN source_channel = 'memo' THEN 'memo'::doc_category
WHEN file_type = 'note' THEN 'memo'::doc_category
@@ -67,7 +78,8 @@ UPDATE documents
) THEN 'library'::doc_category
ELSE 'document'::doc_category
END
WHERE category IS NULL;
WHERE category IS NULL
OR (source_channel = 'law_monitor' AND category = 'document');
"""
VERIFY_SQL = """
@@ -82,7 +94,9 @@ SELECT
COALESCE(user_tags, '[]'::jsonb)
) AS t
WHERE t LIKE '@library/%'
)) AS has_library_tag;
)) AS has_library_tag,
(SELECT COUNT(*) FROM documents WHERE category = 'law') AS cat_law,
(SELECT COUNT(*) FROM documents WHERE source_channel = 'law_monitor') AS law_source_count;
"""
DIST_SQL = """
@@ -138,11 +152,13 @@ async def run(apply: bool) -> int:
# 4. verify
print("\n=== 백필 후 검증 ===")
row = (await session.execute(text(VERIFY_SQL))).one()
print(f" all_rows = {row.all_rows}")
print(f" categorized = {row.categorized}")
print(f" uncategorized = {row.uncategorized}")
print(f" cat_library = {row.cat_library}")
print(f" has_library_tag = {row.has_library_tag}")
print(f" all_rows = {row.all_rows}")
print(f" categorized = {row.categorized}")
print(f" uncategorized = {row.uncategorized}")
print(f" cat_library = {row.cat_library}")
print(f" has_library_tag = {row.has_library_tag}")
print(f" cat_law = {row.cat_law}")
print(f" law_source_count = {row.law_source_count}")
fail = []
if row.uncategorized != 0:
@@ -154,6 +170,11 @@ async def run(apply: bool) -> int:
f"cat_library={row.cat_library} != has_library_tag={row.has_library_tag} "
"(자동 전이 없음 정책 위반)"
)
if row.cat_law != row.law_source_count:
fail.append(
f"cat_law={row.cat_law} != law_source_count={row.law_source_count} "
"(law_monitor 전체가 category='law' 여야 함)"
)
if fail:
print("\n!! 검증 실패:")
for f in fail: