6fdc48e5b6
PR-A policy 레이어를 재사용하여 classify_worker 에 tier triage 경로를 추가.
Legacy ai_summary / ai_domain / ai_suggestion 은 유지 (회귀 0), tldr/bullets/
detail/inconsistencies 는 별도 필드로 분리.
Migrations (156~160):
- 156 documents: ai_tldr, ai_bullets, ai_detail_summary, ai_inconsistencies,
ai_analysis_tier 5컬럼
- 157 process_stage 에 'deep_summary' ADD VALUE 단독 (Postgres 동일 트랜잭션
제약 회피)
- 158 processing_queue.payload JSONB (envelope 전달)
- 159 analyze_events 에 tier + suppressed_reason
- 160 suppressed_reason partial index
Models/ORM:
- Document: 5컬럼 Mapped 추가
- ProcessingQueue: deep_summary enum 확장 + payload 필드, enqueue_stage 에
payload 옵션
- AnalyzeEvent: PR-A shadow 6컬럼 + PR-B tier/suppressed_reason
Workers:
- classify_worker: 기존 legacy 경로 뒤에 _run_tier_triage 추가.
- _match_subject_domain(doc, text): source_channel + 본문 keywords + ai_domain
prefix 로 PR-A policy 의 subject_domain 이름 결정 (category 매칭 금지).
- R1 TriageOutput pydantic + JSON 깨짐 fallback (triage_json_invalid).
- R2 _check_backlog_guard(): 30분 window ratio > threshold OR pending 초과면
soft escalate suppress. hard escalate 는 통과.
- R3 _slice_text_ranges(): 260k 초과 시 head 120k + mid 20k + tail 120k 3조각.
- escalate 시 EscalationEnvelope 구성 + {envelope, subject_domain} payload 로
deep_summary enqueue.
- deep_summary_worker (신규): queue payload 에서 envelope + subject_domain 읽기 →
render_26b("p3c_deep_summary", subject_domain) + MLX 호출 (llm_gate Semaphore(1)
경유) → ai_detail_summary + ai_inconsistencies 저장 + ai_analysis_tier='deep'.
_filter_inconsistencies 로 허용 kind (version_drift / procedure_conflict /
source_conflict / missing_basis) 만 통과 — 구매/계약 kind drop.
- queue_consumer: workers dict 에 deep_summary 추가 + BATCH_SIZE=1. next_stages
는 건드리지 않음 — classify → embed/chunk 는 그대로, deep_summary 는 독립 체인.
Telemetry:
- record_analyze_event: subject_domain / risk_flags / escalation_reasons /
confidence / policy_version / shadow_would_route_to / tier / escalated_to_26b /
suppressed_reason 파라미터 확장. classify/deep worker 가 mode="summary_triage"
또는 "summary_deep" 로 기록.
API:
- DocumentResponse 에 ai_tldr / ai_bullets / ai_detail_summary /
ai_inconsistencies / ai_analysis_tier 5필드 노출.
Prompts:
- classify.txt 에 DEPRECATED 주석만 추가 (파일 유지 — rollback 경로 보존).
- PR-A 의 app/prompts/policy/p3a_short_summary.txt (4B) 와 p3c_deep_summary.txt
(26B) 를 그대로 사용. 내 소유의 summary_triage.txt / summary_deep.txt 는 중복
이라 별도 커밋에서 제거하지 않고 바로 생성 전 삭제.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
120 lines
4.9 KiB
Plaintext
120 lines
4.9 KiB
Plaintext
[DEPRECATED 2026-04-24] — summary_triage.txt 로 이관됨 (PR-B B-1 tier routing).
|
|
이 파일은 B-1 안정화 기간 동안 rollback 경로를 위해 유지. 신규 호출 경로는
|
|
summary_triage.txt + summary_deep.txt 조합 사용. 실제 삭제는 별도 cleanup PR.
|
|
|
|
You are a document classification AI. Analyze the document below and respond ONLY in JSON format. No other text.
|
|
|
|
## Response Format
|
|
{
|
|
"domain": "Level1/Level2/Level3",
|
|
"document_type": "one of document_types",
|
|
"facet_doctype": "one of facet_doctypes or null",
|
|
"confidence": 0.85,
|
|
"tags": ["tag1", "tag2"],
|
|
"importance": "medium",
|
|
"sourceChannel": "inbox_route",
|
|
"dataOrigin": "work or external",
|
|
"docPurpose": "business or knowledge"
|
|
}
|
|
|
|
## Domain Taxonomy (select the most specific leaf node)
|
|
|
|
Philosophy/
|
|
Ethics, Metaphysics, Epistemology, Logic, Aesthetics, Eastern_Philosophy, Western_Philosophy
|
|
|
|
Language/
|
|
Korean, English, Japanese, Translation, Linguistics
|
|
|
|
Engineering/
|
|
Mechanical/ Piping, HVAC, Equipment
|
|
Electrical/ Power, Instrumentation
|
|
Chemical/ Process, Material
|
|
Civil
|
|
Network/ Server, Security, Infrastructure
|
|
|
|
Industrial_Safety/
|
|
Legislation/ Act, Decree, Foreign_Law, Korea_Law_Archive, Enforcement_Rule, Public_Notice, SAPA
|
|
Theory/ Industrial_Safety_General, Safety_Health_Fundamentals
|
|
Academic_Papers/ Safety_General, Risk_Assessment_Research
|
|
Cases/ Domestic, International
|
|
Practice/ Checklist, Contractor_Management, Safety_Education, Emergency_Plan, Patrol_Inspection, Permit_to_Work, PPE, Safety_Plan
|
|
Risk_Assessment/ KRAS, JSA, Checklist_Method
|
|
Safety_Manager/ Appointment, Duty_Record, Improvement, Inspection, Meeting
|
|
Health_Manager/ Appointment, Duty_Record, Ergonomics, Health_Checkup, Mental_Health, MSDS, Work_Environment
|
|
|
|
Programming/
|
|
Programming_Language/ Python, JavaScript, Go, Rust
|
|
Framework/ FastAPI, SvelteKit, React
|
|
DevOps/ Docker, CI_CD, Linux_Administration
|
|
AI_ML/ Large_Language_Model, Computer_Vision, Data_Science
|
|
Database
|
|
Software_Architecture
|
|
|
|
General/
|
|
Reading_Notes, Self_Development, Business, Science, History
|
|
|
|
## Classification Rules
|
|
- domain MUST be the most specific leaf node (e.g., Industrial_Safety/Practice/Patrol_Inspection, NOT Industrial_Safety/Practice)
|
|
- domain MUST be exactly ONE path
|
|
- If content spans multiple domains, choose by PRIMARY purpose
|
|
- If safety content is >30%, prefer Industrial_Safety
|
|
- If code is included, prefer Programming
|
|
- 2-level paths allowed ONLY when no leaf exists (e.g., Engineering/Civil)
|
|
|
|
## Document Types (select exactly ONE)
|
|
Reference, Standard, Manual, Drawing, Template, Note, Academic_Paper, Law_Document, Report, Memo, Checklist, Meeting_Minutes, Specification, 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
|
|
|
|
### Document Type Detection Rules
|
|
- Step-by-step instructions → Manual
|
|
- Legal clauses/regulations → Law_Document
|
|
- Technical requirements → Specification
|
|
- Meeting discussion → Meeting_Minutes
|
|
- Checklist format → Checklist
|
|
- Academic/research format → Academic_Paper
|
|
- Technical drawings → Drawing / 도면
|
|
- 발주 내역, 품목·수량·단가 표 → 발주서
|
|
- 공급자/공급받는자/세액 양식 → 세금계산서
|
|
- 거래 명세/납품 명세 → 명세표
|
|
- 자격 증빙·수료·재직 → 증명서
|
|
- 업무·프로젝트 추진안 → 계획서
|
|
- 공사 시방·재료 기준 → 시방서
|
|
- If unclear → Note
|
|
|
|
## facet_doctype (실무 문서 유형 식별 신호)
|
|
Select ONE of: 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
|
|
If the document clearly does NOT fit any of the above, return null.
|
|
- This field is independent of document_type — use it to flag business-document types
|
|
that drive 자료실(library) 자동 분류 제안.
|
|
- 발주서 / 세금계산서 / 명세표 는 자료실 "거래" 분류의 승인 대기 제안으로 연결된다.
|
|
|
|
## Confidence (0.0 ~ 1.0)
|
|
- How confident are you in the domain classification?
|
|
- 0.85+ = high confidence, 0.6~0.85 = moderate, <0.6 = uncertain
|
|
|
|
## Tags
|
|
- Free-form tags (Korean or English)
|
|
- Include: person names, technology names, concepts, project names
|
|
- Maximum 5 tags
|
|
|
|
## Importance
|
|
- high: urgent or critical documents
|
|
- medium: normal working documents
|
|
- low: reference or archive material
|
|
|
|
## sourceChannel
|
|
- inbox_route (this classification)
|
|
|
|
## dataOrigin
|
|
- work: company-related (TK, Technicalkorea, factory, production)
|
|
- external: external reference (news, papers, laws, general info)
|
|
|
|
## docPurpose
|
|
- business: 업무 수행에 직접 사용 (양식, 보고서, 체크리스트, 제출물, 계획서)
|
|
- knowledge: 참조·학습·보관 목적 (법령, 논문, 기사, 레퍼런스, 기술 문서, 교육 자료)
|
|
- Template, Checklist, Report, Specification → business 가능성 높음
|
|
- Academic_Paper, Law_Document, Reference, Standard → knowledge 가능성 높음
|
|
- Meeting_Minutes, Memo → 문맥 판단 (실행 기록이면 business, 참조용이면 knowledge)
|
|
|
|
## Document to classify
|
|
{document_text}
|