8fdea88676
plan: ~/.claude/plans/luminous-sprouting-hamster.md §1
- migrations/143_category.sql: doc_category enum (6 활성 + 3 유보) +
documents.category + documents.ai_suggestion JSONB + 2 idx.
- app/models/document.py: category (Enum, create_type=False), ai_suggestion (JSONB).
- app/prompts/classify.txt: document_type enum 에 7 실무 doctype 추가
(발주서/세금계산서/명세표/도면/증명서/계획서/시방서) + facet_doctype
필드 directive.
- config.yaml: document_types 에 7 항목 추가 (worker 검증 통과).
- app/workers/classify_worker.py: FACET_DOCTYPES / LIBRARY_SUGGESTION_DOCTYPES
상수, facet_doctype 파싱(기존값 미덮어씀), 발주서/세금계산서/명세표
감지 시 ai_suggestion={proposed_category=library, proposed_path=@library/
거래/{YYYY}/{doctype}, source_updated_at=doc.updated_at.isoformat(), ...}.
category / user_tags 자동 전이 금지 (suggestion-only).
- app/api/documents.py:
· DocumentResponse 에 category / ai_suggestion 노출
· GET /documents ?category=<cat> / ?has_suggestion / ?proposed_category
(category 지정 시 기본 news/memo 제외 해제 — §2 승인 UI 계약)
· GET /documents/library 를 Document.category=='library' 기반으로 재구현
(path subquery 는 user_tags 유지 — 분류 내부 서가 경로)
· POST /documents/{id}/accept-suggestion — FOR UPDATE + idempotent no-op +
dual 409 stale (payload source_updated_at / documents.updated_at) +
user_tags idempotent append
· DELETE /documents/{id}/suggestion — idempotent, stale 검사 없음
- scripts/backfill_category.py: dry-run / apply. 매핑(news/memo/@library/else)
+ 3-way 상대 검증 (all_rows==categorized, uncategorized==0,
cat_library==has_library_tag — 자동 전이 금지 정책 검증).
남은 DoD (원격 배포 후): docker compose up → migration 143 적용 → backfill
apply → smoke (drive_sync 발주서 업로드 suggestion 생성 / category 유지,
accept-suggestion idempotency + 409 stale 두 벡터, /documents?category=library
== /documents/library 건수 일치).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
116 lines
4.6 KiB
Plaintext
116 lines
4.6 KiB
Plaintext
You are a document classification AI. Analyze the document below and respond ONLY in JSON format. No other text.
|
|
|
|
## Response Format
|
|
{
|
|
"domain": "Level1/Level2/Level3",
|
|
"document_type": "one of document_types",
|
|
"facet_doctype": "one of facet_doctypes or null",
|
|
"confidence": 0.85,
|
|
"tags": ["tag1", "tag2"],
|
|
"importance": "medium",
|
|
"sourceChannel": "inbox_route",
|
|
"dataOrigin": "work or external",
|
|
"docPurpose": "business or knowledge"
|
|
}
|
|
|
|
## Domain Taxonomy (select the most specific leaf node)
|
|
|
|
Philosophy/
|
|
Ethics, Metaphysics, Epistemology, Logic, Aesthetics, Eastern_Philosophy, Western_Philosophy
|
|
|
|
Language/
|
|
Korean, English, Japanese, Translation, Linguistics
|
|
|
|
Engineering/
|
|
Mechanical/ Piping, HVAC, Equipment
|
|
Electrical/ Power, Instrumentation
|
|
Chemical/ Process, Material
|
|
Civil
|
|
Network/ Server, Security, Infrastructure
|
|
|
|
Industrial_Safety/
|
|
Legislation/ Act, Decree, Foreign_Law, Korea_Law_Archive, Enforcement_Rule, Public_Notice, SAPA
|
|
Theory/ Industrial_Safety_General, Safety_Health_Fundamentals
|
|
Academic_Papers/ Safety_General, Risk_Assessment_Research
|
|
Cases/ Domestic, International
|
|
Practice/ Checklist, Contractor_Management, Safety_Education, Emergency_Plan, Patrol_Inspection, Permit_to_Work, PPE, Safety_Plan
|
|
Risk_Assessment/ KRAS, JSA, Checklist_Method
|
|
Safety_Manager/ Appointment, Duty_Record, Improvement, Inspection, Meeting
|
|
Health_Manager/ Appointment, Duty_Record, Ergonomics, Health_Checkup, Mental_Health, MSDS, Work_Environment
|
|
|
|
Programming/
|
|
Programming_Language/ Python, JavaScript, Go, Rust
|
|
Framework/ FastAPI, SvelteKit, React
|
|
DevOps/ Docker, CI_CD, Linux_Administration
|
|
AI_ML/ Large_Language_Model, Computer_Vision, Data_Science
|
|
Database
|
|
Software_Architecture
|
|
|
|
General/
|
|
Reading_Notes, Self_Development, Business, Science, History
|
|
|
|
## Classification Rules
|
|
- domain MUST be the most specific leaf node (e.g., Industrial_Safety/Practice/Patrol_Inspection, NOT Industrial_Safety/Practice)
|
|
- domain MUST be exactly ONE path
|
|
- If content spans multiple domains, choose by PRIMARY purpose
|
|
- If safety content is >30%, prefer Industrial_Safety
|
|
- If code is included, prefer Programming
|
|
- 2-level paths allowed ONLY when no leaf exists (e.g., Engineering/Civil)
|
|
|
|
## Document Types (select exactly ONE)
|
|
Reference, Standard, Manual, Drawing, Template, Note, Academic_Paper, Law_Document, Report, Memo, Checklist, Meeting_Minutes, Specification, 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
|
|
|
|
### Document Type Detection Rules
|
|
- Step-by-step instructions → Manual
|
|
- Legal clauses/regulations → Law_Document
|
|
- Technical requirements → Specification
|
|
- Meeting discussion → Meeting_Minutes
|
|
- Checklist format → Checklist
|
|
- Academic/research format → Academic_Paper
|
|
- Technical drawings → Drawing / 도면
|
|
- 발주 내역, 품목·수량·단가 표 → 발주서
|
|
- 공급자/공급받는자/세액 양식 → 세금계산서
|
|
- 거래 명세/납품 명세 → 명세표
|
|
- 자격 증빙·수료·재직 → 증명서
|
|
- 업무·프로젝트 추진안 → 계획서
|
|
- 공사 시방·재료 기준 → 시방서
|
|
- If unclear → Note
|
|
|
|
## facet_doctype (실무 문서 유형 식별 신호)
|
|
Select ONE of: 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
|
|
If the document clearly does NOT fit any of the above, return null.
|
|
- This field is independent of document_type — use it to flag business-document types
|
|
that drive 자료실(library) 자동 분류 제안.
|
|
- 발주서 / 세금계산서 / 명세표 는 자료실 "거래" 분류의 승인 대기 제안으로 연결된다.
|
|
|
|
## Confidence (0.0 ~ 1.0)
|
|
- How confident are you in the domain classification?
|
|
- 0.85+ = high confidence, 0.6~0.85 = moderate, <0.6 = uncertain
|
|
|
|
## Tags
|
|
- Free-form tags (Korean or English)
|
|
- Include: person names, technology names, concepts, project names
|
|
- Maximum 5 tags
|
|
|
|
## Importance
|
|
- high: urgent or critical documents
|
|
- medium: normal working documents
|
|
- low: reference or archive material
|
|
|
|
## sourceChannel
|
|
- inbox_route (this classification)
|
|
|
|
## dataOrigin
|
|
- work: company-related (TK, Technicalkorea, factory, production)
|
|
- external: external reference (news, papers, laws, general info)
|
|
|
|
## docPurpose
|
|
- business: 업무 수행에 직접 사용 (양식, 보고서, 체크리스트, 제출물, 계획서)
|
|
- knowledge: 참조·학습·보관 목적 (법령, 논문, 기사, 레퍼런스, 기술 문서, 교육 자료)
|
|
- Template, Checklist, Report, Specification → business 가능성 높음
|
|
- Academic_Paper, Law_Document, Reference, Standard → knowledge 가능성 높음
|
|
- Meeting_Minutes, Memo → 문맥 판단 (실행 기록이면 business, 참조용이면 knowledge)
|
|
|
|
## Document to classify
|
|
{document_text}
|