Compare commits

..

2 Commits

Author SHA1 Message Date
Hyungi Ahn 96d57789bd feat(ai): align primary model with mlx-proxy actually loaded model
mlx-proxy on the mac mini currently loads
mlx-community/gemma-4-26b-a4b-it-8bit, but config.yaml was still
requesting mlx-community/Qwen3.5-35B-A3B-4bit. The proxy was silently
serving the loaded model regardless, but the mismatch made debugging
and log tracing harder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:56:38 +00:00
Hyungi Ahn 32c96d6191 fix(deploy): primary endpoint -> mlx-proxy 8801
100.76.254.116:8800 -> :8801 to route through mlx-proxy and gain
/status observability (active_jobs / total_requests).
2026-04-08 02:56:08 +00:00
859 changed files with 2372 additions and 98960 deletions
-4
View File
@@ -1,4 +0,0 @@
clients/
**/.build/
**/*.xcodeproj/
**/DerivedData/
-15
View File
@@ -17,11 +17,6 @@ logs/
# 데이터 (법령 다운로드 등)
data/
# eval/calibration 실행 결과 (baseline jsonl 등)
# reports/ 는 이미 tracked 파일 있음 → 전체 ignore 하지 않음
results/
artifacts/
# macOS
.DS_Store
._*
@@ -37,13 +32,3 @@ node_modules/
# Docker volumes
pgdata/
caddy_data/
# Host venv (run_eval 등 host에서 실행)
.venv/
# 작업 전 백업 / 롤백 스냅샷 (working tree only, git history 보존이 source of truth)
*.bak
*.bak-*
*.bak_*
*.pre-*
.pre-*/
+151 -106
View File
@@ -1,73 +1,113 @@
# hyungi_Document_Server — Claude Code 작업 가이드
## Infrastructure Reference 📌
운영 사실 (모델명 / 엔드포인트 / IP / 컨테이너 / 포트 / drift) 의 단일 진실 소스(SSOT):
**`~/.claude/projects/-Users-hyungiahn/memory/infra_inventory.md`**
이 파일과 inventory 가 충돌하면 **inventory 가 정답**. 본 CLAUDE.md 는 코딩 규칙·워크플로우·코드 구조에 집중하고 운영 값은 박지 않는다.
운영 변경 정책 (inventory → config → deploy → verify):
1. `infra_inventory.md` 먼저 갱신
2. `config.yaml` / `credentials.env` 갱신
3. deploy (commit → push → GPU pull → `docker compose up -d --build`)
4. verify (smoke endpoint, postgres count, 모니터링)
순서 어기면 drift. 발견 시 inventory `Drift Log` 등록.
**Search experiment soft lock**: Phase 2 search refactor / QueryAnalyzer / run_eval 진행 중일 때 GPU 서버의 `docker compose restart`, `config.yaml` 수정, Ollama pull 금지. flag = `~/.claude/.search-experiment-active`.
---
## 프로젝트 개요
Self-hosted PKM(Personal Knowledge Management) + 다국 뉴스 비교 분석 웹 애플리케이션.
GPU 서버가 메인 (Docker Compose / DB / 검색 / OCR / 마커), Mac mini = MLX 추론 + Whisper STT, Synology NAS = 파일 원본.
Self-hosted PKM(Personal Knowledge Management) 웹 애플리케이션.
FastAPI + PostgreSQL(pgvector) + SvelteKit + Docker Compose 기반.
GPU 서버를 메인 서버, Mac mini를 AI 추론, Synology NAS를 파일 저장소로 사용.
## 핵심 문서
1. `README.md` — 외부 소개 (기술 스택 / 주요 기능 / Quick Start)
2. `docs/architecture.md` — 전체 시스템 아키텍처
3. `docs/deploy.md` — Docker Compose 배포 가이드
4. `docs/development-stages.md` — Phase roadmap (역사적 맥락)
1. `docs/architecture.md` — 전체 시스템 아키텍처 (DB 스키마, AI 전략, 인프라, UI 설계)
2. `docs/deploy.md` — Docker Compose 배포 가이드
3. `docs/development-stages.md` — Phase 0~5 개발 단계별 가이드
## 기술 스택
| 영역 | 기술 |
|------|------|
| 백엔드 | FastAPI (Python 3.11+), SQLAlchemy 2.0 async, APScheduler |
| DB | PostgreSQL 16 + pgvector + pg_trgm (단일 `pkm` DB) |
| 백엔드 | FastAPI (Python 3.11+) |
| 데이터베이스 | PostgreSQL 16 + pgvector + pg_trgm |
| 프론트엔드 | SvelteKit 5 (runes mode) + Tailwind CSS 4 |
| 문서 파싱 | kordoc (HWP/HWPX/PDF → MD), LibreOffice headless (오피스), marker (PDF → markdown) |
| OCR | Surya OCR (docker compose `ocr-service`, GPU) |
| STT | MLX Whisper (Mac mini), GPU faster-whisper 는 legacy profile |
| 리버스 프록시 | Caddy (HTTP only, 앞단 home-caddy 가 HTTPS 종료) |
| 인증 | JWT (access) + HttpOnly cookie (refresh) + TOTP 2FA |
| 문서 파싱 | kordoc (HWP/HWPX/PDF → Markdown) + LibreOffice (오피스 → 텍스트/PDF) |
| 리버스 프록시 | Caddy (HTTP only, 앞단 프록시에서 HTTPS 처리) |
| 인증 | JWT + TOTP 2FA |
| 컨테이너 | Docker Compose |
## 머신 역할 (자세한 IP / 포트 → inventory)
## 네트워크 환경
| 머신 | 역할 |
|------|------|
| GPU 서버 | Docker Compose 메인: fastapi · frontend · postgres `pkm` · kordoc · ocr-service · marker-service · reranker (TEI) · caddy. Ollama (embedding / 4B 추론). home-gateway 별 compose (ingress + 나노클로 + searxng) |
| Mac mini | MLX 26B 추론 endpoint + MLX Whisper STT. ingress 역할 0 |
| Synology NAS | 파일 원본 (`/volume4/Document_Server/PKM/` → GPU `/mnt/nas/Document_Server` NFS), Synology Office/Drive/Calendar/MailPlus |
| VPS-2 (OVH) | 메일 relay (`relay.hyungi.net:587`), Gitea bare mirror, Secondary MX |
```
GPU 서버 (RTX 4070 Ti Super, Ubuntu, 메인 서버):
- Docker Compose: FastAPI(:8000), PostgreSQL(:5432), kordoc(:3100),
Caddy(:8080 HTTP only), Ollama(127.0.0.1:11434), AI Gateway(127.0.0.1:8081), frontend(:3000)
- NFS 마운트: /mnt/nas/Document_Server → NAS /volume4/Document_Server
- 외부 접근: document.hyungi.net (Mac mini nginx → Caddy)
- 로컬 IP: 192.168.1.186
## AI 파이프라인 (역할 기준 — 실제 모델 매핑은 inventory)
Mac mini M4 Pro (AI 서버 + 앞단 프록시):
- MLX Server: http://100.76.254.116:8800/v1/chat/completions (Qwen3.5-35B-A3B)
- nginx: HTTPS 종료 → GPU 서버 Caddy(:8080)로 프록시
- Tailscale IP: 100.76.254.116
| 역할 | 위치 |
|------|------|
| 분류/심층 요약 primary | Mac mini MLX 26B |
| Triage (1차 분류) / Fallback / Chat | GPU Ollama 4B |
| Embedding | GPU Ollama (1024d, 다국어) |
| Reranker | GPU TEI 컨테이너 |
| OCR | docker compose `ocr-service` (Surya OCR GPU) — `ai.models.vision` 미사용 |
| STT | Mac mini MLX Whisper large-v3 |
| Premium (수동 trigger) | Anthropic API (`require_explicit_trigger`, 일일 한도) |
Synology NAS (DS1525+):
- LAN IP: 192.168.1.227
- Tailscale IP: 100.101.79.37
- 파일 원본: /volume4/Document_Server/PKM/
- NFS export → GPU 서버
- Synology Drive: https://link.hyungi.net (문서 편집)
- Synology Calendar: CalDAV 태스크 관리
- MailPlus: IMAP(993) + SMTP(465)
```
호출 시 반드시 `app/ai/client.py``AIClient` 사용 (`call_triage` / `call_primary` / `call_fallback`). 직접 HTTP 호출 금지.
## 인증 정보
- 위치: `credentials.env` (프로젝트 루트, .gitignore에 포함)
- 템플릿: `credentials.env.example`
- 스크립트에서 python-dotenv 또는 Docker env_file로 로딩
## AI 모델 구성
```
Primary (Mac mini MLX, Tailscale 경유, 상시, 무료):
mlx-community/Qwen3.5-35B-A3B-4bit — 분류, 태그, 요약
→ http://100.76.254.116:8800/v1/chat/completions
Fallback (GPU Ollama, 같은 Docker 네트워크, MLX 장애 시):
qwen3.5:35b-a3b
→ http://ollama:11434/v1/chat/completions
Premium (Claude API, 종량제, 수동 트리거만):
claude-sonnet — 복잡한 분석, 장문 처리
→ 일일 한도 $5, require_explicit_trigger: true
Embedding (GPU Ollama, 같은 Docker 네트워크):
nomic-embed-text → 벡터 임베딩
Qwen2.5-VL-7B → 이미지/도면 OCR
bge-reranker-v2-m3 → RAG 리랭킹
```
## 프로젝트 구조
```
hyungi_Document_Server/
├── docker-compose.yml
├── Caddyfile ← HTTP only, auto_https off
├── config.yaml ← AI 엔드포인트, NAS 경로, 스케줄
├── credentials.env.example
├── app/ ← FastAPI 백엔드
│ ├── main.py ← 엔트리포인트 + APScheduler (watcher/consumer 포함)
│ ├── Dockerfile ← LibreOffice headless 포함
│ ├── core/ (config, database, auth, utils)
│ ├── models/ (document, task, queue)
│ ├── api/ (documents, search, dashboard, auth, setup)
│ ├── workers/ (file_watcher, extract, classify, embed, preview, law_monitor, mailplus, digest, queue_consumer)
│ ├── prompts/classify.txt
│ └── ai/client.py ← AIClient + parse_json_response (Qwen3.5 thinking 처리)
├── services/kordoc/ ← Node.js 마이크로서비스 (HWP/PDF 파싱)
├── gpu-server/ ← AI Gateway (deprecated, 통합됨)
├── frontend/ ← SvelteKit 5
│ └── src/
│ ├── routes/ ← 페이지 (documents, inbox, settings, login)
│ └── lib/
│ ├── components/ ← Sidebar, DocumentCard, DocumentViewer, PreviewPanel,
│ │ TagPill, FormatIcon, UploadDropzone
│ ├── stores/ ← auth, ui
│ └── api.ts ← fetch wrapper (JWT 토큰 관리)
├── migrations/ ← PostgreSQL 스키마 (schema_migrations로 추적)
├── scripts/
├── docs/
└── tests/
```
## 문서 처리 파이프라인
@@ -75,77 +115,82 @@ GPU 서버가 메인 (Docker Compose / DB / 검색 / OCR / 마커), Mac mini = M
파일 업로드 (드래그 앤 드롭 or file_watcher)
extract (텍스트 추출)
- kordoc: HWP, HWPX, PDF → Markdown
- LibreOffice: xlsx, docx, pptx 등 → txt/csv
- 직접 읽기: md, txt, csv, json, xml, html
classify_worker (tier triage) preview / marker
- 4B Ollama → TriageOutput - LibreOffice → PDF 변환
- escalate_to_26b 시 deep_summary - marker → PDF → markdown
- ai_tldr / ai_bullets / inconsistencies
- kordoc: HWP, HWPX, PDF → Markdown
- LibreOffice: xlsx, docx, pptx, odt 등 → txt/csv
- 직접 읽기: md, txt, csv, json, xml, html
↓ ↓
classify (AI 분류) preview (PDF 미리보기 생성)
- Qwen3.5 → domain - LibreOffice → PDF 변환
- tags, summary - 캐시: PKM/.preview/{id}.pdf
embed_worker (bge-m3 1024d, doc-level)
chunk_worker (문서 유형별 chunking)
embed (벡터 임베딩)
- nomic-embed-text (768차원)
```
핵심 원칙:
**핵심 원칙:**
- 파일은 업로드 위치에 그대로 유지 (물리적 이동 없음)
- 분류 (`ai_domain` / `ai_sub_group` / `ai_tags` / `category` / `tier`) 는 DB 메타데이터로만 관리
- preview / marker 는 classify 와 병렬
- 분류(domain/sub_group/tags)는 DB 메타데이터로만 관리
- preview는 classify와 병렬로 실행 (AI 결과 불필요)
## 워커 / 스케줄러 (`app/main.py` 의 scheduler.add_job)
## UI 구조
- queue_consumer (interval 1m), file_watcher (5m), upload_cleanup (10m)
- study_q_embed (1m), study_q_related_refresh (1m), study_queue (1m), study_session_queue (1m)
- tier_backfill (30m)
- law_monitor (07:00 KST), mailplus_archive (07/18:00 KST)
- daily_digest (20:00 KST)
- **global_digest** (04:00 KST) — Phase 4 country×topic 7일 rolling
- **morning_briefing** (05:10 KST) — 야간 KST 0~5h 수집 뉴스 topic×country 비교
```
┌──────────────────────────────────────────────────┐
│ [☰ 사이드바] [PKM / 문서] [ℹ 정보] 버튼│ ← 상단 nav
├──────────────────────────────────────────────────┤
│ [검색바] [모드] [ℹ] │
│ 문서 목록 (30%) — 드래그 업로드 지원 │ ← 상단 영역
│ █ 문서카드 (domain 색상 바 + 포맷 아이콘) │
├──────────────────────────────────────────────────┤
│ 하단 뷰어/편집 (70%) — 전체 너비 │ ← 하단 영역
│ Markdown: split editor (textarea + preview) │
│ PDF: 브라우저 내장 뷰어 │
│ 오피스: PDF 변환 미리보기 + [편집] 새 탭 버튼 │
│ 이미지: img 태그 │
└──────────────────────────────────────────────────┘
scheduler timezone = `Asia/Seoul`.
사이드바: 평소 접힘, ☰로 오버레이 (domain 트리 + 스마트 그룹 + Inbox)
정보 패널: ℹ 버튼 → 우측 전체 높이 drawer (메모/태그 편집/메타/처리상태/편집 URL)
```
## 데이터 계층
1. **원본 파일** NAS `/volume4/Document_Server/PKM/`. 유일한 원본, 위치 변경 없음
2. **가공 데이터** PostgreSQL `pkm` (텍스트, AI 분류, 검색 인덱스, 메모, 태그, briefing, digest, …)
3. **파생물**pgvector embedding, PDF preview 캐시 (`.preview/`), marker 결과 (markdown + extracted_images NAS 저장)
1. **원본 파일** (NAS `/volume4/Document_Server/PKM/`) — 유일한 원본, 위치 변경 없음
2. **가공 데이터** (PostgreSQL) — 텍스트 추출, AI 분류, 검색 인덱스, 메모, 태그
3. **파생물**벡터 임베딩 (pgvector), PDF 미리보기 캐시 (`.preview/`)
## 코딩 규칙
- Python 3.11+, asyncio, type hints
- SQLAlchemy 2.0+ async 세션
- Svelte 5 runes mode (`$state`, `$derived`, `$effect``$:` 금지)
- 인증 정보는 `credentials.env` 에서 로딩 (하드코딩 금지)
- 로그는 `logs/` (Docker 볼륨)
- AI 호출은 반드시 `app/ai/client.py` `AIClient` 경유
- Svelte 5 runes mode ($state, $derived, $effect — $: 사용 금지)
- 인증 정보는 credentials.env에서 로딩 (하드코딩 금지)
- 로그는 `logs/`에 저장 (Docker 볼륨)
- AI 호출은 반드시 `app/ai/client.py``AIClient`를 통해 (직접 HTTP 호출 금지)
- 한글 주석 사용
- Migration: `migrations/NNN_*.sql`, `init_db()` 자동 실행 (`schema_migrations` 추적)
- SQL `BEGIN/COMMIT` 금지 (외부 트랜잭션 깨짐)
- asyncpg `prepared statement` 가 multi-statement 불허 → 1 statement 1 파일 분리
- 기존 DB 에서는 `schema_migrations` 수동 이력 등록 필요할 수 있음
- 디자인 시스템 토큰 only (`bg-surface`, `text-dim`, `border-default`, `text-accent`, …). `bg-[var(--*)]` 금지 (`lint:tokens` 차단)
- 커밋 메시지: `type(scope): summary` (`feat` / `fix` / `refactor` / `ops` / `incident` / `docs`)
- Migration: `migrations/*.sql`에 작성, `init_db()` 자동 실행 (schema_migrations 추적)
- SQL에 BEGIN/COMMIT 금지 (외부 트랜잭션 깨짐)
- 기존 DB에서는 schema_migrations에 수동 이력 등록 필요할 수 있음
## 개발 / 배포 워크플로우
## 개발/배포 워크플로우
```bash
# 개발 (MacBook Pro)
cd ~/Documents/code/hyungi_Document_Server/
# 코드 작성 → git commit → push (Gitea)
# 배포 (GPU 서버)
ssh gpu
cd ~/Documents/code/hyungi_Document_Server/
git pull
docker compose up -d --build fastapi frontend
```
MacBook Pro (개발) → Gitea push → GPU 서버에서 pull
PR 머지는 Gitea UI **Rebase and merge** 기본 (선형 히스토리 + force-push 충돌 회피). 단독 작업 확증 시만 로컬 rebase+FF.
개발:
cd ~/Documents/code/hyungi_Document_Server/
# 코드 작성 → git commit & push
GPU 서버 배포 (메인):
ssh hyungi@100.111.160.84
cd ~/Documents/code/hyungi_Document_Server/
git pull
docker compose up -d --build fastapi frontend
```
## v1 코드 참조
v1 (DEVONthink 기반) 코드는 `v1-final` 태그로 보존:
v1(DEVONthink 기반) 코드는 `v1-final` 태그로 보존:
```bash
git show v1-final:scripts/law_monitor.py
git show v1-final:scripts/pkm_utils.py
@@ -153,10 +198,10 @@ git show v1-final:scripts/pkm_utils.py
## 주의사항
- `credentials.env` 는 git 에 올리지 않음 (`.gitignore`)
- NAS NFS 마운트: Docker 컨테이너 내 `/documents`. FastAPI 시작 시 `/documents/PKM` 존재 확인
- 법령 API (LAW_OC) 는 승인 대기 중
- Ollama 는 127.0.0.1 바인딩 (외부 접근 차단)
- Caddy 는 `auto_https off` + `http://` only (HTTPS 종료는 앞단 home-caddy 가 처리)
- Synology Office 편집은 새 탭 열기 방식 (iframe 미사용, `edit_url` 수동 등록)
- 한국어 NFS 경로는 NFC↔NFD 비대칭 — 경로 수신 시 NFC→NFD→parent glob fallback 필수
- credentials.env는 git에 올리지 않음 (.gitignore)
- NAS NFS 마운트 경로: Docker 컨테이너 내 `/documents`
- FastAPI 시작 시 `/documents/PKM` 존재 확인 (NFS 미마운트 방지)
- 법령 API (LAW_OC)는 승인 대기 중
- Ollama/AI Gateway 포트는 127.0.0.1 바인딩 (외부 접근 차단)
- Caddy는 `auto_https off` + `http://` only (HTTPS는 Mac mini nginx에서 처리)
- Synology Office 편집은 새 탭 열기 방식 (iframe 미사용, edit_url 수동 등록)
-6
View File
@@ -1,11 +1,5 @@
{
auto_https off
# home-caddy (docker bridge 사설망) 가 TLS 를 종단하고 X-Forwarded-Proto: https
# 를 전달. trusted_proxies 없으면 Caddy 가 incoming scheme (http) 로 덮어써
# FastAPI 307 redirect 의 Location 헤더가 http:// 로 나가 mixed-content block.
servers {
trusted_proxies static private_ranges
}
}
http://document.hyungi.net {
+37 -81
View File
@@ -1,108 +1,64 @@
# hyungi_Document_Server
Self-hosted 개인 지식관리(PKM) + 다국 뉴스 비교 분석 웹 애플리케이션.
> 모델 이름·엔드포인트·머신 정보는 운영 상태에 따라 변하므로 README 에 박지 않습니다.
> 운영 단일 진실 소스(SSOT): `~/.claude/projects/-Users-hyungiahn/memory/infra_inventory.md`.
> 모델/엔드포인트/포트/SSH 어디서든 README 와 inventory 가 충돌하면 **inventory 가 정답**입니다.
Self-hosted 개인 지식관리(PKM) 웹 애플리케이션
## 기술 스택
- **백엔드**: FastAPI + SQLAlchemy 2.0 async, APScheduler cron
- **DB**: PostgreSQL 16 + pgvector + pg_trgm (단일 `pkm` DB)
- **프론트엔드**: SvelteKit 5 (runes mode) + Tailwind CSS 4
- **문서 파싱**: kordoc 마이크로서비스 (HWP/HWPX/PDF → Markdown), LibreOffice headless (오피스), marker (PDF → markdown Phase 1B)
- **AI 파이프라인** (역할별, 자세한 모델 매핑은 inventory):
- 분류/요약 본체: Mac mini MLX 26B (primary)
- Triage / fallback / chat: GPU Ollama 4B
- Embedding: GPU Ollama `bge-m3` (1024d)
- Reranker: GPU TEI 컨테이너 `bge-reranker-v2-m3`
- OCR: docker compose `ocr-service` (Surya OCR GPU)
- STT: Mac mini MLX Whisper large-v3
- Premium (수동 trigger): Anthropic Claude (`require_explicit_trigger`)
- **인증**: JWT (access) + HttpOnly cookie (refresh) + TOTP 2FA
- **인프라**: Docker Compose, Caddy (HTTP only, 앞단 home-caddy 가 HTTPS 종료), Synology NAS NFS
- **백엔드**: FastAPI + SQLAlchemy (async)
- **데이터베이스**: PostgreSQL 16 + pgvector + pg_trgm
- **프론트엔드**: SvelteKit
- **문서 파싱**: kordoc (HWP/HWPX/PDF → Markdown)
- **AI**: Qwen3.5-35B-A3B (MLX), nomic-embed-text, Claude API (폴백)
- **인프라**: Docker Compose, Caddy, Synology NAS
## 주요 기능
- **문서 자동 분류/태그/요약** — Triage(4B) → Deep summary(26B) tier 분리, 백로그 guard / 텍스트 슬라이스 / inconsistency 감지
- **하이브리드 검색** — pgvector 벡터 + pg_trgm 전문검색 + reranker (bge-reranker-v2-m3) + Ask pipeline (HyDE / evidence_service)
- **다국어 OCR** — Surya OCR GPU (한/영/일/중/독/불 등), NFC/NFD 경로 정규화
- **음성/영상 전사** — MLX Whisper large-v3, `/audio` `/video` 라우트 + direct play
- **법령 변경 모니터링** — `law_monitor` cron, freshness decay (365일 반감기)
- **이메일 자동 수집** — MailPlus IMAP, NFS 저장
- **Phase 4 Global Digest** — 매일 04:00 KST 7일 rolling 뉴스 country×topic 2-level 비교 (`/digest`)
- **야간 뉴스 브리핑** — 매일 05:10 KST KST 자정~05:00 5시간 윈도우, topic×country 비교 분석 1페이지 카드 (`/news`)
- **자료실 (Library)** — 카테고리 facet 분류 + AI 제안 1-click 승인
- **메모/이벤트/공부** — 5초 행동 기록 메모, 일정/할 일/회고 events 도메인, 가스기사 학습 워크스페이스 (274 개념 + 2,100 기출)
- **마크다운 canonical layer** — extracted_images NAS 저장 + `document_images` 메타 + 단기 토큰 인증 (`?token=`)
- 문서 자동 분류/태그/요약 (AI 기반)
- 전문검색 + 벡터 유사도 검색
- HWP/PDF/Markdown 문서 뷰어
- 법령 변경 모니터링 (산업안전보건법 등)
- 이메일 자동 수집 (MailPlus IMAP)
- 일일 다이제스트
- CalDAV 태스크 연동 (Synology Calendar)
## Quick Start
```bash
git clone https://git.hyungi.net/hyungi/hyungi_document_server.git
cd hyungi_document_server
git clone https://git.hyungi.net/hyungi/hyungi_document_server.git hyungi_Document_Server
cd hyungi_Document_Server
# 인증 정보 (DB 비밀번호, JWT secret, Claude API key 등)
# 인증 정보 설정
cp credentials.env.example credentials.env
$EDITOR credentials.env
nano credentials.env # 실제 값 입력
# AI 모델 / 엔드포인트 / 경로
$EDITOR config.yaml # inventory 참조하면서 채움
$EDITOR .env # POSTGRES_PASSWORD, MAC_MINI_HOST, NAS_NFS_PATH 등
docker compose up -d --build
# 실행
docker compose up -d
```
운영 도메인 (GPU 서버 배포 기준): `https://document.hyungi.net`
API 문서: `https://document.hyungi.net/docs`
`http://localhost:8000/docs` 에서 API 문서 확인
## 디렉토리 구조
```
├── app/ FastAPI 백엔드
│ ├── api/ 라우터 (documents, search, briefing, digest, memos, events, study, …)
│ ├── workers/ APScheduler / queue (briefing_worker, digest_worker, classify_worker, …)
│ ├── services/ 도메인 로직 (briefing/, digest/, search/, clustering_common, …)
│ ├── ai/client.py AIClient (call_triage / call_primary / call_fallback, parse_json_response)
│ ├── prompts/ *.txt 프롬프트 (분류, 요약, briefing_comparative, digest_topic, …)
│ ├── policy/ AI envelope + prompt_render
│ └── models/ SQLAlchemy ORM
├── frontend/ SvelteKit 5 (runes mode) + Tailwind
│ └── src/routes/ /news (아침 브리핑) /library /memos /audio /video /study /digest /ask …
├── services/
│ ├── kordoc/ HWP/HWPX/PDF 파싱 (Node.js)
│ ├── ocr/ Surya OCR GPU 서비스 (FastAPI)
│ └── marker/ PDF → markdown Phase 1B
├── migrations/ 255+ SQL migrations (schema_migrations 추적)
├── docs/ 설계 문서
└── tests/ pytest
├── app/ FastAPI 백엔드 (API, 워커, AI 클라이언트)
├── frontend/ SvelteKit 프론트엔드
├── services/kordoc/ 문서 파싱 마이크로서비스 (Node.js)
├── gpu-server/ GPU 서버 배포 (AI Gateway)
├── migrations/ PostgreSQL 스키마
├── docs/ 설계 문서, 배포 가이드
└── tests/ 테스트 코드
```
`gpu-server/` 폴더는 v1 잔재로 deprecated (현재 AI Gateway 는 `~/home-gateway/` 별 repo).
## 인프라 구성
## 인프라 구성 (운영 기준)
| 머신 | 역할 |
|---|---|
| **GPU 서버** (메인) | Docker Compose (fastapi, frontend, postgres pkm, kordoc, ocr-service, marker-service, reranker(TEI), caddy), Ollama (`bge-m3`, 4B chat), home-gateway 별 compose |
| **Mac mini** | MLX 26B primary 추론 + MLX Whisper STT (HTTP 추론 endpoint only, ingress 역할 0) |
| **Synology NAS** | 파일 원본 (`/volume4/Document_Server/PKM/`), Synology Office/Drive/Calendar/MailPlus, NFS export → GPU |
| **VPS-2** (OVH) | 메일 relay (`relay.hyungi.net:587` SASL+TLS+DKIM+LE), Gitea bare mirror, Secondary MX |
상세 IP / 모델 / 컨테이너 / drift / verify 명령은 `infra_inventory.md` 참조.
## 운영 변경 정책
1. inventory 먼저 갱신
2. `config.yaml` / `credentials.env` 갱신
3. deploy (commit → push Gitea → GPU `git pull && docker compose up -d --build`)
4. verify (smoke endpoints, postgres count, 모니터링)
순서를 어기면 drift. drift 발견 시 `infra_inventory.md` 의 Drift Log 에 등록 후 정정.
| 서버 | 역할 |
|------|------|
| Mac mini M4 Pro | Docker Compose (FastAPI, PostgreSQL, kordoc, Caddy) + MLX AI |
| Synology NAS | 파일 원본 저장, Synology Office/Drive/Calendar/MailPlus |
| GPU 서버 | AI Gateway, 벡터 임베딩, OCR, 리랭킹 |
## 문서
- [아키텍처](docs/architecture.md) — DB 스키마, AI 전략, UI 설계
- [배포 가이드](docs/deploy.md) — Docker Compose 배포
- [개발 단계](docs/development-stages.md) — Phase 별 roadmap (Phase 4 Global Digest / 야간 브리핑 등 신규 phase 는 inventory + plan 파일 우선)
- [아키텍처](docs/architecture.md) — 전체 시스템 설계
- [배포 가이드](docs/deploy.md) — Docker Compose 배포 방법
- [개발 단계](docs/development-stages.md) — Phase 0~5 개발 계획
-34
View File
@@ -1,34 +0,0 @@
# Third Party Licenses
본 프로젝트는 다음 오픈소스를 사용합니다.
## perfect-freehand
- License: **MIT**
- Repository: https://github.com/steveruizok/perfect-freehand
- Used by: `frontend/src/lib/components/HandwriteCanvas.svelte` — Apple Pencil 압력/tilt
를 반영한 손글씨 stroke 렌더링.
```
MIT License
Copyright (c) 2021 Stephen Ruiz Ltd
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
+3 -4
View File
@@ -2,13 +2,12 @@ FROM python:3.11-slim
WORKDIR /app
# LibreOffice headless (PDF 변환용) + 한글/CJK 폰트 + ffmpeg (비디오 썸네일)
# LibreOffice headless (PDF 변환용) + 한글/CJK 폰트
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libreoffice-core libreoffice-calc libreoffice-writer libreoffice-impress \
fonts-noto-cjk fonts-noto-cjk-extra fonts-nanum \
fonts-noto-core fonts-noto-extra \
ffmpeg && \
fonts-noto-core fonts-noto-extra && \
apt-get clean && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
@@ -16,4 +15,4 @@ RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers", "--forwarded-allow-ips", "*"]
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+29 -190
View File
@@ -21,119 +21,25 @@ def strip_thinking(text: str) -> str:
def parse_json_response(raw: str) -> dict | None:
"""AI 응답에서 JSON 객체 추출 (think 태그, 코드블록 등 제거).
파싱 시도 순서 (앞 단계가 성공하면 즉시 반환):
1. ``` json fenced 블록 안의 첫 ``{...}`` (DOTALL)
2. balanced 정규식 finditer 의 마지막 매치
3. 전체 cleaned 그대로 json.loads
4. (Phase 4-A 후속) "first ``{`` ~ last ``}``" greedy slice — envelope JSON 안에
내부 따옴표/백틱/뉴라인 때문에 balanced 정규식이 못 잡는 케이스 방어.
raw text 의 첫 ``{`` 부터 마지막 ``}`` 까지 잘라 json.loads. 모델이 JSON 앞뒤
자유 텍스트 섞어도 본체만 추출.
"""
"""AI 응답에서 JSON 객체 추출 (think 태그, 코드블록 등 제거)"""
cleaned = strip_thinking(raw)
# 1. 코드블록 내부 JSON 추출
# 코드블록 내부 JSON 추출
code_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", cleaned, re.DOTALL)
if code_match:
cleaned = code_match.group(1)
# 2. 마지막 유효 JSON 객체 찾기 (balanced 1단계)
# 마지막 유효 JSON 객체 찾기
matches = list(re.finditer(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}", cleaned, re.DOTALL))
for m in reversed(matches):
try:
return json.loads(m.group())
except json.JSONDecodeError:
continue
# 3. 전체 cleaned
# 최후 시도: 전체 텍스트를 JSON으로
try:
result = json.loads(cleaned)
if isinstance(result, dict):
return result
except json.JSONDecodeError:
pass
# 4. greedy slice fallback — first '{' ~ last '}' 까지
first = cleaned.find("{")
last = cleaned.rfind("}")
if first < 0 or last <= first:
return None
candidate = cleaned[first : last + 1]
try:
obj = json.loads(candidate)
return obj if isinstance(obj, dict) else None
except json.JSONDecodeError:
pass
# 5. (Phase 4-A 후속) Markdown 줄바꿈 + LaTeX 수식이 JSON string literal 안에
# raw 로 들어간 케이스 방어. 두 가지 invalid:
# - raw newline (LF/CR/TAB) — JSON 표준 string 안 control char 금지
# - invalid backslash — `\circ`, `\text`, `\,` 같은 LaTeX. JSON valid escape
# 은 `\"`, `\\`, `\/`, `\b`, `\f`, `\n`, `\r`, `\t`, `\uXXXX` 만.
# stateful walker — string literal 안에서만 fix. 외부 (object 구조) 의 newline
# 은 valid whitespace 라 보존.
escaped = _fix_json_string_escapes(candidate)
try:
obj = json.loads(escaped)
return obj if isinstance(obj, dict) else None
return json.loads(cleaned)
except json.JSONDecodeError:
return None
_VALID_JSON_ESCAPES = set('"\\/bfnrtu')
def _fix_json_string_escapes(s: str) -> str:
"""JSON string literal 안의 raw newline + invalid backslash 만 escape.
state machine: in_string 토글 (`"` 마주침). string 안에서만:
- raw LF/CR/TAB → ``\\n``/``\\r``/``\\t`` 로 변환
- 백슬래시 다음에 valid escape char (`"\\/bfnrtu`) 면 그대로
- 백슬래시 다음에 invalid char (`\\c`, `\\,`) 면 백슬래시 자체를 ``\\\\`` 로 escape
string 외부 (`{` `,` `:` 사이) 의 raw newline 등은 JSON whitespace 라 보존.
"""
out: list[str] = []
i = 0
n = len(s)
in_string = False
while i < n:
ch = s[i]
if not in_string:
if ch == '"':
in_string = True
out.append(ch)
i += 1
continue
# in_string
if ch == "\\":
nxt = s[i + 1] if i + 1 < n else ""
if nxt in _VALID_JSON_ESCAPES:
out.append(ch)
out.append(nxt)
i += 2
continue
# invalid escape — backslash 자체를 escape
out.append("\\\\")
i += 1
continue
if ch == '"':
in_string = False
out.append(ch)
i += 1
continue
if ch == "\n":
out.append("\\n")
i += 1
continue
if ch == "\r":
out.append("\\r")
i += 1
continue
if ch == "\t":
out.append("\\t")
i += 1
continue
out.append(ch)
i += 1
return "".join(out)
# 프롬프트 로딩
PROMPTS_DIR = Path(__file__).parent.parent / "prompts"
@@ -146,61 +52,24 @@ CLASSIFY_PROMPT = _load_prompt("classify.txt") if (PROMPTS_DIR / "classify.txt")
class AIClient:
"""AI 모델 통합 클라이언트.
B-0 3-tier routing:
- call_triage(): Mac mini 26B MLX, 상시 호출 (llm_gate 외부 — concurrent 안전성 별 검토)
- call_primary(): Mac mini 26B MLX, 에스컬레이션 전용 (llm_gate Semaphore(1) 는 **caller 책임**)
- call_fallback(): triage/primary 실패 시 최후 방어선. Claude Sonnet 4 API (PR #20 swap 완료)
Legacy: classify() / summarize() 는 기존 호출부(tests/eval runner)를 위해 남겨둠.
신규 worker 경로는 전부 call_triage / call_primary 사용.
"""
"""AI Gateway를 통한 통합 클라이언트. 기본값은 항상 Qwen3.5."""
def __init__(self):
self.ai = settings.ai
self._http = httpx.AsyncClient(timeout=120)
# ─── 3-tier routing (B-0) ───────────────────────────────────────────────
async def call_triage(self, prompt: str) -> str:
"""Mac mini 26B MLX 직접 호출 (config.yaml ai.models.triage). llm_gate 외부 실행 — PR #20 이후 triage/primary 동일 endpoint 라 concurrent 안전성 별 검토.
timeout 은 config.yaml ai.models.triage.timeout (기본 30s).
실패 시 caller 가 에스컬레이션 또는 fallback 판단.
"""
return await self._request(self.ai.triage, prompt)
async def call_primary(self, prompt: str, system: str | None = None) -> str:
"""26B MLX 호출. 에스컬레이션 전용.
**caller 가 반드시 `async with get_mlx_gate():` 블록 안에서 호출해야 한다.**
Semaphore(1) 로 동시 호출이 1건으로 제한되어 있고, gate 는 primary 전용.
system: 지정 시 별도 system 메시지로 주입(이드 substrate compose 등). None=기존 동작(user 단일).
"""
return await self._request(self.ai.primary, prompt, system=system)
async def call_fallback(self, prompt: str) -> str:
"""triage/primary 실패 시 최후 방어선. Claude Sonnet 4 API (config.yaml ai.models.fallback) — PR #20 이후 swap 완료."""
return await self._request(self.ai.fallback, prompt)
# ─── Legacy API (classify_worker 교체 시 제거 예정) ───────────────────
async def classify(self, text: str) -> dict:
"""[DEPRECATED] 기존 classify_worker 전용. B-1 에서 summary_triage 로 대체.
호출부 정리 전 존속. 신규 코드는 call_triage + prompt_render 를 쓸 것.
"""
"""문서 분류 — 항상 primary(Qwen3.5) 사용"""
prompt = CLASSIFY_PROMPT.replace("{document_text}", text)
response = await self._call_chat(self.ai.primary, prompt)
return response
async def summarize(self, text: str, force_premium: bool = False) -> str:
"""[DEPRECATED] 기존 호출부용. B-1 에서 summary_triage 가 tldr 대체."""
if force_premium:
return await self._call_chat(self.ai.premium, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")
return await self._call_chat(self.ai.primary, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")
"""문서 요약 — 기본 Qwen3.5, 장문이거나 명시적 요청 시만 Claude"""
model = self.ai.primary
if force_premium or len(text) > 15000:
model = self.ai.premium
return await self._call_chat(model, f"다음 문서를 500자 이내로 요약해주세요:\n\n{text}")
async def embed(self, text: str) -> list[float]:
"""벡터 임베딩 — GPU 서버 전용"""
@@ -211,24 +80,10 @@ class AIClient:
response.raise_for_status()
return response.json()["embedding"]
async def rerank(self, query: str, texts: list[str]) -> list[dict]:
"""TEI bge-reranker-v2-m3 호출 (Phase 1.3).
TEI POST /rerank API:
request: {"query": str, "texts": [str, ...]}
response: [{"index": int, "score": float}, ...] (정렬됨)
timeout은 self.ai.rerank.timeout (config.yaml).
호출자(rerank_service)가 asyncio.Semaphore + try/except로 감쌈.
"""
timeout = float(self.ai.rerank.timeout) if self.ai.rerank.timeout else 5.0
response = await self._http.post(
self.ai.rerank.endpoint,
json={"query": query, "texts": texts},
timeout=timeout,
)
response.raise_for_status()
return response.json()
async def ocr(self, image_bytes: bytes) -> str:
"""이미지 OCR — GPU 서버 전용"""
# TODO: Qwen2.5-VL-7B 비전 모델 호출 구현
raise NotImplementedError("OCR는 Phase 1에서 구현")
async def _call_chat(self, model_config, prompt: str) -> str:
"""OpenAI 호환 API 호출 + 자동 폴백"""
@@ -239,12 +94,8 @@ class AIClient:
return await self._request(self.ai.fallback, prompt)
raise
async def _request(self, model_config, prompt: str, system: str | None = None) -> str:
"""단일 모델 API 호출 (OpenAI 호환 + Anthropic Messages API).
system: 지정 시 system 으로 주입(OpenAI=system role 메시지 / Anthropic=top-level system 필드).
None=user 단일 메시지(기존 동작, 하위호환).
"""
async def _request(self, model_config, prompt: str) -> str:
"""단일 모델 API 호출 (OpenAI 호환 + Anthropic Messages API)"""
is_anthropic = "anthropic.com" in model_config.endpoint
if is_anthropic:
@@ -254,40 +105,28 @@ class AIClient:
"anthropic-version": "2023-06-01",
"content-type": "application/json",
}
body = {
"model": model_config.model,
"max_tokens": model_config.max_tokens,
"messages": [{"role": "user", "content": prompt}],
}
if system:
body["system"] = system
response = await self._http.post(
model_config.endpoint,
headers=headers,
json=body,
json={
"model": model_config.model,
"max_tokens": model_config.max_tokens,
"messages": [{"role": "user", "content": prompt}],
},
timeout=model_config.timeout,
)
response.raise_for_status()
data = response.json()
return data["content"][0]["text"]
else:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
payload = {
"model": model_config.model,
"messages": messages,
"max_tokens": model_config.max_tokens,
"chat_template_kwargs": {"enable_thinking": False},
}
if model_config.temperature is not None:
payload["temperature"] = model_config.temperature
if model_config.top_p is not None:
payload["top_p"] = model_config.top_p
response = await self._http.post(
model_config.endpoint,
json=payload,
json={
"model": model_config.model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": model_config.max_tokens,
"chat_template_kwargs": {"enable_thinking": False},
},
timeout=model_config.timeout,
)
response.raise_for_status()
-97
View File
@@ -1,97 +0,0 @@
"""EscalationEnvelope — 4B → 26B 핸드오프 계약.
4B 가 "자신이 처리 못한다" 고 판단했을 때 26B 에게 전달하는 구조화 메시지.
26B 는 distilled_context 로 방향을 잡고 original_pointers 로 필요한 원문만 재조회.
PR-A 는 dataclass 계약만 정의. 실제 생성/소비는 PR-B 의 escalation_service 가 담당.
"""
from __future__ import annotations
import json
from dataclasses import asdict, dataclass, field
from typing import Any
ValidFromStage = {
"triage",
"classify",
"summarize_short",
"advice_trigger",
"night_sweep",
"ask_pre",
"unknown", # 호환성용
}
@dataclass(frozen=True)
class EscalationEnvelope:
from_stage: str
escalation_reasons: tuple[str, ...]
risk_flags: tuple[str, ...]
distilled_context: str
original_pointers: dict[str, Any] = field(default_factory=dict)
synthesis_directives: tuple[str, ...] = ()
user_intent: str | None = None
draft_hint: str | None = None
def __post_init__(self) -> None:
if self.from_stage not in ValidFromStage:
raise ValueError(
f"from_stage '{self.from_stage}' not in {ValidFromStage}"
)
if not isinstance(self.escalation_reasons, tuple):
raise TypeError("escalation_reasons must be tuple (for hashability)")
if not isinstance(self.risk_flags, tuple):
raise TypeError("risk_flags must be tuple (for hashability)")
if not isinstance(self.synthesis_directives, tuple):
raise TypeError("synthesis_directives must be tuple (for hashability)")
# -- 26B system prompt 주입용 텍스트 -----------------------------------
def to_system_injection(self) -> str:
lines = [
"=== ESCALATION ENVELOPE (from 4B) ===",
f"from_stage: {self.from_stage}",
f"reasons: {', '.join(self.escalation_reasons) or '(none)'}",
f"risk_flags: {', '.join(self.risk_flags) or '(none)'}",
]
if self.user_intent:
lines.append(f"user_intent: {self.user_intent}")
if self.draft_hint:
lines.append(f"draft_hint: {self.draft_hint}")
if self.synthesis_directives:
lines.append("")
lines.append("synthesis_directives (각 risk_flag 별 지시사항, 반드시 준수):")
for d in self.synthesis_directives:
lines.append(f" - {d}")
if self.distilled_context:
lines.append("")
lines.append("distilled_context (4B 가 압축한 요지 — 참고용, 숫자·인용은 원문 재확인 필수):")
lines.append(self.distilled_context)
if self.original_pointers:
lines.append("")
lines.append("original_pointers (필요 시 재조회):")
lines.append(json.dumps(self.original_pointers, ensure_ascii=False, indent=2))
return "\n".join(lines)
# -- JSON round-trip ---------------------------------------------------
def to_json(self) -> str:
return json.dumps(asdict(self), ensure_ascii=False)
@classmethod
def from_json(cls, s: str) -> EscalationEnvelope:
raw = json.loads(s)
return cls(
from_stage=raw["from_stage"],
escalation_reasons=tuple(raw.get("escalation_reasons", ())),
risk_flags=tuple(raw.get("risk_flags", ())),
distilled_context=raw.get("distilled_context", ""),
original_pointers=raw.get("original_pointers", {}) or {},
synthesis_directives=tuple(raw.get("synthesis_directives", ())),
user_intent=raw.get("user_intent"),
draft_hint=raw.get("draft_hint"),
)
-72
View File
@@ -1,72 +0,0 @@
"""오디오 전사(STT) 조회 API — /api/audio
AudioPlayer 가 줄 단위로 렌더하고 클릭 시 audio.currentTime 으로 점프한다.
"""
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.audio_segment import AudioSegment
from models.document import Document
from models.user import User
router = APIRouter()
class AudioSegmentResponse(BaseModel):
start: float
end: float
text: str
model_config = {"from_attributes": True}
class AudioSegmentsResponse(BaseModel):
document_id: int
language: str | None
duration: float | None
segments: list[AudioSegmentResponse]
@router.get("/{doc_id}/segments", response_model=AudioSegmentsResponse)
async def get_audio_segments(
doc_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""audio 문서의 전사 세그먼트 조회.
category='audio' 가 아닌 문서는 404. 세그먼트가 아직 없는 경우 빈 배열 반환.
language / duration 은 현재 ORM 에 별도 컬럼이 없어 None (필요 시 후속 확장).
"""
doc = await session.get(Document, doc_id)
if not doc or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
if getattr(doc, "category", None) != "audio":
raise HTTPException(status_code=404, detail="오디오 문서가 아닙니다")
result = await session.execute(
select(AudioSegment)
.where(AudioSegment.document_id == doc_id)
.order_by(AudioSegment.start_s.asc())
)
rows = result.scalars().all()
segments = [
AudioSegmentResponse(start=r.start_s, end=r.end_s, text=r.text)
for r in rows
]
return AudioSegmentsResponse(
document_id=doc_id,
language=None,
duration=None,
segments=segments,
)
-15
View File
@@ -15,12 +15,9 @@ from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import (
REFRESH_TOKEN_EXPIRE_DAYS,
create_access_token,
create_laptop_worker_bot_token,
create_refresh_token,
create_voice_memo_bot_token,
decode_token,
get_current_user,
verify_password_changed_at,
hash_password,
verify_password,
verify_totp,
@@ -120,16 +117,6 @@ async def login(
user.last_login_at = datetime.now(timezone.utc)
await session.commit()
# Voice Memo PoC v1 — bot 계정 한정 long-expiry token (env gate). 일반 사용자 흐름 영향 0.
bot_token = create_voice_memo_bot_token(user.username)
if bot_token is not None:
return AccessTokenResponse(access_token=bot_token)
# PR-Worker-Pool-Registry-1B — laptop-worker-bot 한정 long-expiry token (voice-memo 분기 우선 평가).
laptop_bot_token = create_laptop_worker_bot_token(user.username)
if laptop_bot_token is not None:
return AccessTokenResponse(access_token=laptop_bot_token)
# refresh token → HttpOnly cookie
_set_refresh_cookie(response, create_refresh_token(user.username))
@@ -168,7 +155,6 @@ async def refresh_token(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="유저를 찾을 수 없음",
)
verify_password_changed_at(payload, user)
# 새 refresh token → cookie
_set_refresh_cookie(response, create_refresh_token(user.username))
@@ -211,6 +197,5 @@ async def change_password(
)
user.password_hash = hash_password(body.new_password)
user.password_changed_at = datetime.now(timezone.utc)
await session.commit()
return {"message": "비밀번호가 변경되었습니다"}
-323
View File
@@ -1,323 +0,0 @@
"""Morning Briefing API — read-only + 수동 regenerate.
엔드포인트:
- GET /api/briefing/latest : 가장 최근 briefing
- GET /api/briefing?date=YYYY-MM-DD : 특정 날짜 briefing
- POST /api/briefing/regenerate?date=... : 동기 워커 트리거 (admin), DELETE+INSERT tx
응답은 topic 평면 list (axis 반대 — Phase 4 와 달리 country 그룹 X).
각 topic 안에 country_perspectives JSONB 가 들어있어 cross-country 비교 분석을 표현.
"""
from datetime import date as date_type
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from core.auth import get_current_user, require_admin
from core.database import get_session
from models.briefing import BriefingTopic, MorningBriefing
from models.user import User
router = APIRouter()
# ─── Pydantic 응답 모델 ───
class CountryPerspective(BaseModel):
country: str
summary: str
article_ids: list[int] = []
class KeyQuote(BaseModel):
country: str = ""
source: str = ""
quote: str
class TopicResponse(BaseModel):
id: int # 2026-05-13 카드 액션 (read/highlight) 호출용 식별자
topic_rank: int
topic_label: str
headline: str
country_perspectives: list[CountryPerspective]
divergences: list[str]
convergences: list[str]
key_quotes: list[KeyQuote]
historical_context: str | None = None
cluster_members: list[int] = []
article_count: int
country_count: int
importance_score: float
llm_fallback_used: bool
# 2026-05-13 사용자 액션 — UI 의 카드별 토글
is_read: bool = False
read_at: datetime | None = None
highlighted: bool = False
highlighted_at: datetime | None = None
class BriefingResponse(BaseModel):
briefing_date: date_type
window_start: datetime
window_end: datetime
decay_lambda: float
total_articles: int
total_countries: int
total_topics: int
generation_ms: int | None
llm_calls: int
llm_failures: int
status: str
headline_oneliner: str | None = None
topics: list[TopicResponse]
class RegenerateResponse(BaseModel):
status: str
briefing_id: int | None
briefing_date: date_type
total_topics: int
total_articles: int
llm_calls: int
llm_failures: int
generation_ms: int
regenerated: bool
# ─── helpers ───
def _build_response(b: MorningBriefing) -> BriefingResponse:
topics = []
for t in sorted(b.topics, key=lambda x: x.topic_rank):
topics.append(
TopicResponse(
id=t.id,
topic_rank=t.topic_rank,
topic_label=t.topic_label,
headline=t.headline,
country_perspectives=[
CountryPerspective(**cp) for cp in (t.country_perspectives or [])
],
divergences=list(t.divergences or []),
convergences=list(t.convergences or []),
key_quotes=[KeyQuote(**q) for q in (t.key_quotes or [])],
historical_context=t.historical_context,
cluster_members=list(t.cluster_members or []),
article_count=t.article_count,
country_count=t.country_count,
importance_score=t.importance_score,
llm_fallback_used=t.llm_fallback_used,
is_read=t.is_read,
read_at=t.read_at,
highlighted=t.highlighted,
highlighted_at=t.highlighted_at,
)
)
return BriefingResponse(
briefing_date=b.briefing_date,
window_start=b.window_start,
window_end=b.window_end,
decay_lambda=b.decay_lambda,
total_articles=b.total_articles,
total_countries=b.total_countries,
total_topics=b.total_topics,
generation_ms=b.generation_ms,
llm_calls=b.llm_calls,
llm_failures=b.llm_failures,
status=b.status,
headline_oneliner=b.headline_oneliner,
topics=topics,
)
async def _load_briefing(
session: AsyncSession,
target_date: date_type | None,
) -> MorningBriefing | None:
query = select(MorningBriefing).options(selectinload(MorningBriefing.topics))
if target_date is not None:
query = query.where(MorningBriefing.briefing_date == target_date)
else:
query = query.order_by(MorningBriefing.briefing_date.desc())
query = query.limit(1)
result = await session.execute(query)
return result.scalar_one_or_none()
# ─── Routes ───
@router.get("/latest", response_model=BriefingResponse)
async def get_latest(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""가장 최근 morning briefing."""
b = await _load_briefing(session, target_date=None)
if b is None:
raise HTTPException(status_code=404, detail="아직 생성된 briefing 없음")
return _build_response(b)
@router.get("", response_model=BriefingResponse)
async def get_briefing(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
date: date_type | None = Query(default=None, description="YYYY-MM-DD (KST briefing_date)"),
):
"""특정 날짜 briefing (date 미지정 시 최신)."""
b = await _load_briefing(session, target_date=date)
if b is None:
raise HTTPException(
status_code=404,
detail=f"briefing 없음 (date={date})" if date else "아직 생성된 briefing 없음",
)
return _build_response(b)
@router.post("/regenerate", response_model=RegenerateResponse)
async def regenerate(
user: Annotated[User, Depends(require_admin)],
date: date_type | None = Query(default=None, description="YYYY-MM-DD KST 기준 briefing_date"),
):
"""수동 트리거 (admin). 동기 실행 — delete+insert transaction.
date 미지정 시 오늘 KST. 같은 날 row 존재 시 transaction 안에서 삭제 후 신규 생성.
응답 status='success' | 'partial' | 'failed' | 'empty'.
"""
from workers.briefing_worker import run
result = await run(target_date=date)
if result is None:
raise HTTPException(status_code=500, detail="briefing 워커 실행 실패 (로그 확인)")
return RegenerateResponse(
status=result["status"],
briefing_id=result.get("briefing_id"),
briefing_date=date or datetime.now().date(),
total_topics=result["total_topics"],
total_articles=result["total_articles"],
llm_calls=result["llm_calls"],
llm_failures=result["llm_failures"],
generation_ms=result["generation_ms"],
regenerated=result.get("regenerated", True),
)
# ─── 2026-05-13 신규: 날짜 선택 + 카드 액션 ───
class BriefingDateSummary(BaseModel):
briefing_date: date_type
total_topics: int
total_articles: int
status: str
read_count: int # 사용자가 읽음 처리한 토픽 수
highlighted_count: int
class TopicActionRequest(BaseModel):
value: bool
class TopicActionResponse(BaseModel):
id: int
is_read: bool
read_at: datetime | None
highlighted: bool
highlighted_at: datetime | None
@router.get("/dates", response_model=list[BriefingDateSummary])
async def list_dates(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
limit: int = Query(default=60, ge=1, le=365),
):
"""사용 가능한 briefing 날짜 목록 (최신 desc). UI date picker 의 데이터 소스."""
from sqlalchemy import func, case
stmt = (
select(
MorningBriefing.briefing_date,
MorningBriefing.total_topics,
MorningBriefing.total_articles,
MorningBriefing.status,
func.count(case((BriefingTopic.is_read.is_(True), 1))).label("read_count"),
func.count(case((BriefingTopic.highlighted.is_(True), 1))).label("highlighted_count"),
)
.outerjoin(BriefingTopic, BriefingTopic.briefing_id == MorningBriefing.id)
.group_by(MorningBriefing.id)
.order_by(MorningBriefing.briefing_date.desc())
.limit(limit)
)
rows = (await session.execute(stmt)).all()
return [
BriefingDateSummary(
briefing_date=r.briefing_date,
total_topics=r.total_topics,
total_articles=r.total_articles,
status=r.status,
read_count=r.read_count or 0,
highlighted_count=r.highlighted_count or 0,
)
for r in rows
]
@router.patch("/topics/{topic_id}/read", response_model=TopicActionResponse)
async def set_topic_read(
topic_id: int,
body: TopicActionRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""토픽 카드 읽음 토글. value=true → 읽음 + read_at=now / false → 해제 + read_at=NULL."""
topic = await session.get(BriefingTopic, topic_id)
if topic is None:
raise HTTPException(status_code=404, detail=f"topic 없음 id={topic_id}")
topic.is_read = body.value
topic.read_at = datetime.now() if body.value else None
await session.commit()
await session.refresh(topic)
return TopicActionResponse(
id=topic.id,
is_read=topic.is_read,
read_at=topic.read_at,
highlighted=topic.highlighted,
highlighted_at=topic.highlighted_at,
)
@router.patch("/topics/{topic_id}/highlight", response_model=TopicActionResponse)
async def set_topic_highlight(
topic_id: int,
body: TopicActionRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""토픽 카드 하이라이트 토글. value=true → highlighted + highlighted_at=now / false → 해제."""
topic = await session.get(BriefingTopic, topic_id)
if topic is None:
raise HTTPException(status_code=404, detail=f"topic 없음 id={topic_id}")
topic.highlighted = body.value
topic.highlighted_at = datetime.now() if body.value else None
await session.commit()
await session.refresh(topic)
return TopicActionResponse(
id=topic.id,
is_read=topic.is_read,
read_at=topic.read_at,
highlighted=topic.highlighted,
highlighted_at=topic.highlighted_at,
)
-34
View File
@@ -1,34 +0,0 @@
"""공개 설정 엔드포인트
이 엔드포인트의 scope:
- 민감정보 없는, 프론트 동작에 필수인 최소 공개 설정만 제공.
- 임의의 서버 설정을 프론트에 노출하는 범용 창구가 아님.
- 필드 추가 시 "민감정보 여부 + 프론트 필수 여부" 2가지 기준 통과 필요.
"""
from fastapi import APIRouter
from pydantic import BaseModel
from core.config import settings
router = APIRouter()
class UploadPublicConfig(BaseModel):
max_bytes: int
class PublicConfigResponse(BaseModel):
upload: UploadPublicConfig
@router.get("/public", response_model=PublicConfigResponse)
async def get_public_config() -> PublicConfigResponse:
"""프론트가 초기 로드 시 조회하는 공개 설정.
현재 제공: upload.max_bytes (업로드 pre-check UX 용도).
slack_ratio, stream_chunk_bytes 등 서버 내부 정책은 노출하지 않음.
"""
return PublicConfigResponse(
upload=UploadPublicConfig(max_bytes=settings.upload.max_bytes),
)
+5 -167
View File
@@ -35,42 +35,6 @@ class PipelineStatus(BaseModel):
count: int
class QueueLag(BaseModel):
"""파이프라인 stage 별 처리 지연 — 운영 카드용.
pipeline_status 는 24h 누적 통계라 현재 적체 신호로 부족.
queue_lag 는 현재 시점 pending/processing/failed + oldest pending age 로
"지금 막힌 게 있는가" 를 보여준다.
"""
stage: str
pending: int
processing: int
failed: int
oldest_pending_age_sec: int | None # 가장 오래된 pending 의 created_at 기준 경과 (초)
class TierHealthStack(BaseModel):
"""PR-B B-3 — tier 관측성 카드 소스 (24h 윈도우).
대시보드 카드 (Day 4 튜닝 — 2026-04-27 임계치 재조정):
- "에스컬레이션 비율": escalated_total / triage_total
· <80% 적색 (정책 매칭 실패 증가 — 진짜 튜닝 필요)
· 80~99% 정상 (safety/health 정책 의도)
- "triage JSON 건강도": triage_json_invalid / triage_total (>5% 적색)
- "Backlog Suppression": suppressed_total / triage_total (>10% 주황)
- "Deep summary 안정성": deep_err_total / deep_total (>5% 적색)
"""
triage_total: int = 0
escalated_total: int = 0
escalation_by_reason: dict[str, int] = {} # long_context / low_confidence / deep_requested / self_declare
escalation_by_domain: dict[str, int] = {} # safety_reference / news_item / ...
triage_json_invalid: int = 0 # error_code='triage_json_invalid'
suppressed_total: int = 0 # suppressed_reason IS NOT NULL
# Day 4 튜닝 신규 — deep_summary 호출 안정성
deep_total: int = 0 # mode='summary_deep' 전체
deep_err_total: int = 0 # error_code IS NOT NULL (call_failed / parse:*)
class DashboardResponse(BaseModel):
today_added: int
today_by_domain: list[DomainCount]
@@ -80,16 +44,6 @@ class DashboardResponse(BaseModel):
pipeline_status: list[PipelineStatus]
failed_count: int
total_documents: int
# 카운트 분리: 문서함(비-note/비-news) / 메모(memo+note) / 뉴스(news)
documents_count: int = 0
memos_count: int = 0
news_count: int = 0
# §4 — category 기반 카드 + 승인 pending + queue lag
category_counts: dict[str, int] = {}
library_pending_suggestions: int = 0
queue_lag: list[QueueLag] = []
# PR-B B-3 — tier 관측성
tier_health: TierHealthStack = TierHealthStack()
@router.get("/", response_model=DashboardResponse)
@@ -128,11 +82,11 @@ async def get_dashboard(
)
law_alerts = law_result.scalar() or 0
# 최근 문서 7
# 최근 문서 5
recent_result = await session.execute(
select(Document)
.order_by(Document.created_at.desc())
.limit(7)
.limit(5)
)
recent_docs = recent_result.scalars().all()
@@ -154,118 +108,9 @@ async def get_dashboard(
)
failed_count = failed_result.scalar() or 0
# 전체 문서 수 + 카테고리별 분리 (단일 쿼리)
# 문서함: 비-note, 비-news / 메모: memo+note / 뉴스: news 유입 경로 기준
count_result = await session.execute(
text("""
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE source_channel NOT IN ('news', 'law_monitor') AND file_type != 'note') AS documents,
COUNT(*) FILTER (WHERE source_channel = 'memo' AND file_type = 'note') AS memos,
COUNT(*) FILTER (WHERE source_channel = 'news') AS news
FROM documents WHERE deleted_at IS NULL
""")
)
counts = count_result.one()
total_documents = counts[0]
documents_count = counts[1]
memos_count = counts[2]
news_count = counts[3]
# §4 — 카테고리별 count (§1 documents.category enum)
cat_result = await session.execute(
text("""
SELECT category, COUNT(*)
FROM documents
WHERE deleted_at IS NULL AND category IS NOT NULL
GROUP BY category
""")
)
category_counts = {row[0]: row[1] for row in cat_result.all()}
# §4 — 승인 대기 (library 제안)
pending_result = await session.execute(
text("""
SELECT COUNT(*)
FROM documents
WHERE deleted_at IS NULL
AND ai_suggestion IS NOT NULL
AND ai_suggestion->>'proposed_category' = 'library'
""")
)
library_pending_suggestions = pending_result.scalar() or 0
# §4 — queue lag (현재 시점 stage 별 적체 신호)
# extract/classify/embed 외에 stt/thumbnail (§3) 도 자동 포함.
lag_result = await session.execute(
text("""
SELECT
stage,
COUNT(*) FILTER (WHERE status='pending') AS pending,
COUNT(*) FILTER (WHERE status='processing') AS processing,
COUNT(*) FILTER (WHERE status='failed') AS failed,
EXTRACT(EPOCH FROM (NOW() - MIN(created_at) FILTER (WHERE status='pending')))::int
AS oldest_pending_age_sec
FROM processing_queue
GROUP BY stage
ORDER BY stage
""")
)
queue_lag = [
QueueLag(
stage=row[0],
pending=row[1] or 0,
processing=row[2] or 0,
failed=row[3] or 0,
oldest_pending_age_sec=row[4],
)
for row in lag_result.all()
]
# ─── PR-B B-3 — tier 관측성 (24h) + Day 4 deep_err 추가 ───
tier_rows = (await session.execute(text("""
SELECT
COUNT(*) FILTER (WHERE mode = 'summary_triage') AS triage_total,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND escalated_to_26b = true) AS escalated_total,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND error_code = 'triage_json_invalid') AS json_invalid,
COUNT(*) FILTER (WHERE mode = 'summary_triage' AND suppressed_reason IS NOT NULL) AS suppressed_total,
COUNT(*) FILTER (WHERE mode = 'summary_deep') AS deep_total,
COUNT(*) FILTER (WHERE mode = 'summary_deep' AND error_code IS NOT NULL) AS deep_err_total
FROM analyze_events
WHERE created_at > NOW() - INTERVAL '24 hours'
"""))).one()
reason_rows = await session.execute(text("""
SELECT unnest(escalation_reasons) AS reason, COUNT(*) AS n
FROM analyze_events
WHERE created_at > NOW() - INTERVAL '24 hours'
AND mode = 'summary_triage'
AND escalated_to_26b = true
GROUP BY 1 ORDER BY 2 DESC
"""))
escalation_by_reason = {r[0]: r[1] for r in reason_rows if r[0]}
domain_rows = await session.execute(text("""
SELECT subject_domain, COUNT(*) AS n
FROM analyze_events
WHERE created_at > NOW() - INTERVAL '24 hours'
AND mode = 'summary_triage'
AND escalated_to_26b = true
AND subject_domain IS NOT NULL
GROUP BY 1 ORDER BY 2 DESC
"""))
escalation_by_domain = {r[0]: r[1] for r in domain_rows}
tier_health = TierHealthStack(
triage_total=int(tier_rows.triage_total or 0),
escalated_total=int(tier_rows.escalated_total or 0),
triage_json_invalid=int(tier_rows.json_invalid or 0),
suppressed_total=int(tier_rows.suppressed_total or 0),
deep_total=int(tier_rows.deep_total or 0),
deep_err_total=int(tier_rows.deep_err_total or 0),
escalation_by_reason=escalation_by_reason,
escalation_by_domain=escalation_by_domain,
)
# 전체 문서 수
total_result = await session.execute(select(func.count(Document.id)))
total_documents = total_result.scalar() or 0
return DashboardResponse(
today_added=today_added,
@@ -290,11 +135,4 @@ async def get_dashboard(
],
failed_count=failed_count,
total_documents=total_documents,
documents_count=documents_count,
memos_count=memos_count,
news_count=news_count,
category_counts=category_counts,
library_pending_suggestions=library_pending_suggestions,
queue_lag=queue_lag,
tier_health=tier_health,
)
-250
View File
@@ -1,250 +0,0 @@
"""Phase 4 Global Digest API — read-only + 디버그 regenerate.
엔드포인트:
- GET /api/digest/latest : 가장 최근 digest
- GET /api/digest/dates : 생성된 digest 날짜 목록 (date picker 용)
- GET /api/digest?date=YYYY-MM-DD : 특정 날짜 digest
- GET /api/digest?country=KR : 특정 국가만
- POST /api/digest/regenerate : 백그라운드 digest 워커 트리거 (auth 필요)
응답은 country → topic 2-level 구조. country 가 비어있는 경우 응답에서 자동 생략.
각 topic 은 article_ids(doc_id) 와 함께 articles([{id, title}]) 를 반환 — title 은 documents
배치 조회로 채우며(한 digest 당 1 쿼리), 매칭 없는 id(하드삭제 등)는 title=null 로 둔다
(프론트는 "(제목 없음)" 으로 렌더, 빈 링크 금지). article → /documents/{id} 라우팅용.
"""
import asyncio
from datetime import date as date_type
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from core.auth import get_current_user, require_admin
from core.database import get_session
from models.digest import DigestTopic, GlobalDigest
from models.document import Document
from models.user import User
router = APIRouter()
# ─── Pydantic 응답 모델 (schemas/ 디렉토리 미사용 → inline 정의) ───
class ArticleRef(BaseModel):
id: int
title: str | None = None
class TopicResponse(BaseModel):
topic_rank: int
topic_label: str
summary: str
article_ids: list[int]
articles: list[ArticleRef]
article_count: int
importance_score: float
raw_weight_sum: float
llm_fallback_used: bool
class CountryGroup(BaseModel):
country: str
topics: list[TopicResponse]
class DigestResponse(BaseModel):
digest_date: date_type
window_start: datetime
window_end: datetime
decay_lambda: float
total_articles: int
total_countries: int
total_topics: int
generation_ms: int | None
llm_calls: int
llm_failures: int
status: str
countries: list[CountryGroup]
class DigestDateSummary(BaseModel):
"""date picker 용 경량 요약 (브리핑 /briefing/dates 와 동형)."""
digest_date: date_type
total_topics: int
total_countries: int
total_articles: int
status: str
# ─── helpers ───
def _collect_article_ids(digest: GlobalDigest) -> set[int]:
"""digest 의 모든 topic article_ids 를 dedupe 한 set (배치 title 조회용).
같은 기사가 여러 topic 에 걸리면 중복 id 가 생기므로 set 으로 한 번 줄인다.
"""
ids: set[int] = set()
for t in digest.topics:
for aid in t.article_ids or []:
try:
ids.add(int(aid))
except (TypeError, ValueError):
continue
return ids
async def _fetch_titles(session: AsyncSession, ids: set[int]) -> dict[int, str | None]:
"""doc_id → title 배치 조회. 매칭 없는 id 는 map 에 부재(호출부가 None 처리)."""
if not ids:
return {}
result = await session.execute(
select(Document.id, Document.title).where(Document.id.in_(ids))
)
return {row.id: row.title for row in result.all()}
def _build_response(
digest: GlobalDigest,
title_map: dict[int, str | None],
country_filter: str | None = None,
) -> DigestResponse:
"""ORM 객체 → DigestResponse. country_filter 가 주어지면 해당 국가만.
title_map miss(삭제/아카이브된 문서)는 title=None 으로 — 프론트가 "(제목 없음)" 처리.
"""
topics_by_country: dict[str, list[TopicResponse]] = {}
for t in sorted(digest.topics, key=lambda x: (x.country, x.topic_rank)):
if country_filter and t.country != country_filter:
continue
ids = [int(a) for a in (t.article_ids or [])]
topics_by_country.setdefault(t.country, []).append(
TopicResponse(
topic_rank=t.topic_rank,
topic_label=t.topic_label,
summary=t.summary,
article_ids=ids,
articles=[ArticleRef(id=aid, title=title_map.get(aid)) for aid in ids],
article_count=t.article_count,
importance_score=t.importance_score,
raw_weight_sum=t.raw_weight_sum,
llm_fallback_used=t.llm_fallback_used,
)
)
countries = [
CountryGroup(country=c, topics=topics_by_country[c])
for c in sorted(topics_by_country.keys())
]
return DigestResponse(
digest_date=digest.digest_date,
window_start=digest.window_start,
window_end=digest.window_end,
decay_lambda=digest.decay_lambda,
total_articles=digest.total_articles,
total_countries=digest.total_countries,
total_topics=digest.total_topics,
generation_ms=digest.generation_ms,
llm_calls=digest.llm_calls,
llm_failures=digest.llm_failures,
status=digest.status,
countries=countries,
)
async def _load_digest(
session: AsyncSession,
target_date: date_type | None,
) -> GlobalDigest | None:
"""date 가 주어지면 해당 날짜, 아니면 최신 digest 1건."""
query = select(GlobalDigest).options(selectinload(GlobalDigest.topics))
if target_date is not None:
query = query.where(GlobalDigest.digest_date == target_date)
else:
query = query.order_by(GlobalDigest.digest_date.desc())
query = query.limit(1)
result = await session.execute(query)
return result.scalar_one_or_none()
async def _respond(session: AsyncSession, digest: GlobalDigest, country_filter: str | None = None) -> DigestResponse:
"""digest 1건 → article 제목 배치 enrich 후 응답 빌드."""
title_map = await _fetch_titles(session, _collect_article_ids(digest))
return _build_response(digest, title_map, country_filter=country_filter)
# ─── Routes ───
@router.get("/latest", response_model=DigestResponse)
async def get_latest(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""가장 최근 생성된 global digest."""
digest = await _load_digest(session, target_date=None)
if digest is None:
raise HTTPException(status_code=404, detail="아직 생성된 digest 없음")
return await _respond(session, digest)
@router.get("/dates", response_model=list[DigestDateSummary])
async def list_dates(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
limit: int = Query(default=60, ge=1, le=365, description="최신부터 N개"),
):
"""생성된 digest 날짜 목록 (date picker 용, 최신 내림차순)."""
query = (
select(GlobalDigest)
.order_by(GlobalDigest.digest_date.desc())
.limit(limit)
)
rows = (await session.execute(query)).scalars().all()
return [
DigestDateSummary(
digest_date=g.digest_date,
total_topics=g.total_topics,
total_countries=g.total_countries,
total_articles=g.total_articles,
status=g.status,
)
for g in rows
]
@router.get("", response_model=DigestResponse)
async def get_digest(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
date: date_type | None = Query(default=None, description="YYYY-MM-DD (KST)"),
country: str | None = Query(default=None, description="국가 코드 (예: KR)"),
):
"""특정 날짜 또는 국가 필터링된 digest. date 미지정 시 최신."""
digest = await _load_digest(session, target_date=date)
if digest is None:
raise HTTPException(
status_code=404,
detail=f"digest 없음 (date={date})" if date else "아직 생성된 digest 없음",
)
country_filter = country.upper() if country else None
return await _respond(session, digest, country_filter=country_filter)
@router.post("/regenerate")
async def regenerate(
user: Annotated[User, Depends(require_admin)],
):
"""수동 트리거 — 백그라운드 태스크로 워커 실행 (admin 필요)."""
from workers.digest_worker import run
asyncio.create_task(run())
return {"status": "started", "message": "global_digest 워커 백그라운드 실행 시작"}
-151
View File
@@ -1,151 +0,0 @@
"""자료별 손글씨 노트 API.
흐름:
GET /api/documents/{id}/note → 단건 조회 (없으면 strokes_json=None)
PUT /api/documents/{id}/note → upsert (strokes_json + canvas 크기)
DELETE /api/documents/{id}/note → 노트 삭제
ownership:
- documents 에 user_id 부재 (single-user). document_notes.user_id 만으로 분리.
- GET/PUT/DELETE 모두 WHERE user_id=current_user.id AND document_id=:doc_id.
"""
import logging
from datetime import datetime
from typing import Annotated, Any
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.document import Document
from models.document_note import DocumentNote
from models.user import User
logger = logging.getLogger(__name__)
router = APIRouter()
class NoteResponse(BaseModel):
document_id: int
strokes_json: dict[str, Any] | None
canvas_width: int | None
canvas_height: int | None
schema_version: int
updated_at: datetime | None
created_at: datetime | None
class NoteUpdate(BaseModel):
strokes_json: dict[str, Any] | None = None
canvas_width: int | None = None
canvas_height: int | None = None
async def _verify_document(session: AsyncSession, document_id: int) -> Document:
doc = await session.get(Document, document_id)
if doc is None or getattr(doc, "deleted_at", None) is not None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
return doc
def _empty_response(document_id: int) -> NoteResponse:
return NoteResponse(
document_id=document_id,
strokes_json=None,
canvas_width=None,
canvas_height=None,
schema_version=1,
updated_at=None,
created_at=None,
)
def _to_response(note: DocumentNote) -> NoteResponse:
return NoteResponse(
document_id=note.document_id,
strokes_json=note.strokes_json,
canvas_width=note.canvas_width,
canvas_height=note.canvas_height,
schema_version=note.schema_version,
updated_at=note.updated_at,
created_at=note.created_at,
)
@router.get("/{document_id}/note", response_model=NoteResponse)
async def get_note(
document_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
await _verify_document(session, document_id)
res = await session.execute(
select(DocumentNote).where(
DocumentNote.user_id == user.id,
DocumentNote.document_id == document_id,
)
)
note = res.scalar_one_or_none()
if note is None:
return _empty_response(document_id)
return _to_response(note)
@router.put("/{document_id}/note", response_model=NoteResponse)
async def upsert_note(
document_id: int,
body: NoteUpdate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""upsert — 같은 (user, document) 면 update, 없으면 insert. PostgreSQL ON CONFLICT."""
await _verify_document(session, document_id)
values: dict[str, Any] = {
"user_id": user.id,
"document_id": document_id,
"strokes_json": body.strokes_json,
"canvas_width": body.canvas_width,
"canvas_height": body.canvas_height,
}
stmt = (
pg_insert(DocumentNote)
.values(**values)
.on_conflict_do_update(
index_elements=["user_id", "document_id"],
set_={
"strokes_json": body.strokes_json,
"canvas_width": body.canvas_width,
"canvas_height": body.canvas_height,
"updated_at": datetime.now(),
},
)
.returning(DocumentNote)
)
result = await session.execute(stmt)
note = result.scalar_one()
await session.commit()
return _to_response(note)
@router.delete("/{document_id}/note", status_code=204)
async def delete_note(
document_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
await _verify_document(session, document_id)
res = await session.execute(
select(DocumentNote).where(
DocumentNote.user_id == user.id,
DocumentNote.document_id == document_id,
)
)
note = res.scalar_one_or_none()
if note is not None:
await session.delete(note)
await session.commit()
-112
View File
@@ -1,112 +0,0 @@
"""자료실 회독 카운트 API — append-only 로그 기반.
동작 규칙 (사용자 명시):
- detail 페이지 진입만으로 자동 +1 금지. 명시 클릭 시에만 호출.
- POST /api/documents/{id}/read → row 1개 insert (회독 +1)
- GET /api/documents/{id}/read-stats → {read_count, last_read_at}
- DELETE /api/documents/{id}/read/last → 현재 사용자의 그 문서 마지막 row 1개만 삭제
ownership:
- documents 테이블에 user_id 없음 (single-user). document_reads.user_id 로
사용자 분리. multi-user 전환 시 documents.user_id 추가 후 ownership check 필요.
"""
import logging
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from sqlalchemy import delete, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.document import Document
from models.document_read import DocumentRead
from models.user import User
logger = logging.getLogger(__name__)
router = APIRouter()
class ReadStats(BaseModel):
read_count: int
last_read_at: datetime | None
async def _get_stats(
session: AsyncSession, user_id: int, document_id: int
) -> ReadStats:
row = await session.execute(
select(
func.count(DocumentRead.id),
func.max(DocumentRead.read_at),
).where(
DocumentRead.user_id == user_id,
DocumentRead.document_id == document_id,
)
)
count, last = row.one()
return ReadStats(read_count=int(count or 0), last_read_at=last)
async def _verify_document_visible(
session: AsyncSession, document_id: int
) -> Document:
"""문서 존재 + 미삭제 확인. ownership 은 single-user 가정으로 통과."""
doc = await session.get(Document, document_id)
if doc is None or getattr(doc, "deleted_at", None) is not None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
return doc
@router.post("/{document_id}/read", response_model=ReadStats, status_code=201)
async def add_read(
document_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""회독 +1 — 사용자 명시 클릭. 같은 날 여러 번 호출 가능 (각각 별개 회독)."""
await _verify_document_visible(session, document_id)
session.add(DocumentRead(user_id=user.id, document_id=document_id))
await session.commit()
return await _get_stats(session, user.id, document_id)
@router.get("/{document_id}/read-stats", response_model=ReadStats)
async def get_read_stats(
document_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""현재 사용자의 그 문서 회독 통계."""
await _verify_document_visible(session, document_id)
return await _get_stats(session, user.id, document_id)
@router.delete("/{document_id}/read/last", response_model=ReadStats)
async def delete_last_read(
document_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""현재 사용자의 그 문서 마지막 회독 row 1개만 삭제 (실수 클릭 취소)."""
await _verify_document_visible(session, document_id)
# 현재 사용자 + 해당 문서의 가장 최근 row 1건만.
last = await session.execute(
select(DocumentRead.id)
.where(
DocumentRead.user_id == user.id,
DocumentRead.document_id == document_id,
)
.order_by(DocumentRead.read_at.desc(), DocumentRead.id.desc())
.limit(1)
)
last_id = last.scalar_one_or_none()
if last_id is not None:
await session.execute(
delete(DocumentRead).where(DocumentRead.id == last_id)
)
await session.commit()
return await _get_stats(session, user.id, document_id)
+50 -1309
View File
File diff suppressed because it is too large Load Diff
-680
View File
@@ -1,680 +0,0 @@
"""events API — 개인 운영 로그 / 일정 / 할 일 / 회고 (PR-1).
PR-1 scope (plan beszel-tingly-sloth.md v6):
- POST /api/events (kind=task/calendar_event/activity_log)
- GET /api/events/{id}
- GET /api/events?kind&status&from&to&project_tag&source
- PATCH /api/events/{id} (허용 필드만, 시간 필드 변경 시 reschedule history)
- POST /api/events/{id}/complete | /cancel | /defer | /reactivate
- GET /api/events/today (timezone 정책 적용)
- GET /api/events/inbox
- GET /api/events/activity?from&to
PR-1 제외: DELETE / log shortcut / upcoming / ingest / iCal / ntfy.
"""
import json
import logging
from datetime import date, datetime, timedelta, timezone
from typing import Annotated, Any
from zoneinfo import ZoneInfo
from fastapi import APIRouter, Body, Depends, HTTPException, Query
from pydantic import BaseModel, Field
from sqlalchemy import and_, or_, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.event import Event
from models.event_history import EventHistory
from models.user import User
logger = logging.getLogger(__name__)
router = APIRouter()
DEFAULT_TIMEZONE = "Asia/Seoul"
# PATCH 허용 필드 — status/completed_at/cancelled_at/defer_until/source/source_ref/
# raw_metadata/user_id/created_by 는 lifecycle endpoint 또는 시스템 결정.
PATCH_ALLOWED_FIELDS = {
"title",
"description",
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
"priority",
"project_tag",
"tags",
"memo_document_id",
}
# 시간 필드 변경 시 reschedule history 1건 자동 기록 (defer_until 은 /defer 전용).
RESCHEDULE_TIME_FIELDS = {
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
}
# ─── 스키마 ───
class EventCreate(BaseModel):
title: str
description: str | None = None
kind: str # task | calendar_event | activity_log
status: str | None = None # 미지정 시 kind 별 default
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
all_day: bool = False
timezone: str | None = None
priority: int | None = None
project_tag: str | None = None
tags: list[Any] = Field(default_factory=list)
memo_document_id: int | None = None
source: str = "manual"
source_ref: str | None = None
raw_metadata: dict[str, Any] = Field(default_factory=dict)
class EventPatch(BaseModel):
"""PATCH 허용 필드만. status/completed_at 등 lifecycle 필드는 명시 거부."""
title: str | None = None
description: str | None = None
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
all_day: bool | None = None
timezone: str | None = None
priority: int | None = None
project_tag: str | None = None
tags: list[Any] | None = None
memo_document_id: int | None = None
model_config = {"extra": "forbid"} # 허용 외 필드 → 422
class DeferRequest(BaseModel):
defer_until: datetime
class EventResponse(BaseModel):
id: int
title: str
description: str | None
kind: str
status: str
due_at: datetime | None
start_at: datetime | None
end_at: datetime | None
started_at: datetime | None
ended_at: datetime | None
all_day: bool
timezone: str | None
defer_until: datetime | None
completed_at: datetime | None
cancelled_at: datetime | None
priority: int | None
project_tag: str | None
tags: list[Any]
source: str
source_ref: str | None
raw_metadata: dict[str, Any]
memo_document_id: int | None
user_id: int
created_by: str
created_at: datetime
updated_at: datetime
class EventListResponse(BaseModel):
items: list[EventResponse]
total: int
class EventHistoryResponse(BaseModel):
id: int
event_id: int
changed_at: datetime
changed_by: str
change_kind: str
before: dict[str, Any] | None
after: dict[str, Any]
class EventHistoryListResponse(BaseModel):
items: list[EventHistoryResponse]
# ─── 헬퍼 ───
def _to_response(ev: Event) -> EventResponse:
return EventResponse.model_validate(ev, from_attributes=True)
def _serialize_for_history(ev: Event) -> dict[str, Any]:
"""events_history.before/after 용 dict snapshot (JSON 친화)."""
payload: dict[str, Any] = {}
for col in (
"id",
"title",
"description",
"kind",
"status",
"due_at",
"start_at",
"end_at",
"started_at",
"ended_at",
"all_day",
"timezone",
"defer_until",
"completed_at",
"cancelled_at",
"priority",
"project_tag",
"tags",
"source",
"source_ref",
"raw_metadata",
"memo_document_id",
"user_id",
"created_by",
):
v = getattr(ev, col, None)
if isinstance(v, datetime):
payload[col] = v.isoformat()
else:
payload[col] = v
return payload
def _actor_for_user(user: User) -> str:
"""사용자 직접 호출 = manual. 향후 이드/email_ingest 는 service token 분기 (PR-3)."""
return "manual"
async def _record_history(
session: AsyncSession,
*,
event: Event,
change_kind: str,
changed_by: str,
before: dict[str, Any] | None,
after: dict[str, Any],
) -> None:
history = EventHistory(
event_id=event.id,
changed_by=changed_by,
change_kind=change_kind,
before=before,
after=after,
)
session.add(history)
async def _load_owned(
session: AsyncSession, event_id: int, user: User
) -> Event:
ev = await session.get(Event, event_id)
if ev is None or ev.user_id != user.id:
raise HTTPException(status_code=404, detail="event not found")
return ev
def _resolve_timezone(tz_name: str | None) -> ZoneInfo:
try:
return ZoneInfo(tz_name or DEFAULT_TIMEZONE)
except Exception:
raise HTTPException(status_code=400, detail=f"invalid timezone: {tz_name}")
def _local_day_bounds(tz_name: str | None) -> tuple[datetime, datetime, datetime]:
"""today 의 [start_utc, end_utc) + now_utc 반환."""
tz = _resolve_timezone(tz_name)
now_local = datetime.now(tz)
today_local = now_local.replace(hour=0, minute=0, second=0, microsecond=0)
tomorrow_local = today_local + timedelta(days=1)
return (
today_local.astimezone(timezone.utc),
tomorrow_local.astimezone(timezone.utc),
now_local.astimezone(timezone.utc),
)
def _apply_activity_log_defaults(payload: dict[str, Any]) -> None:
"""빠른 행동 기록 5초 UX — kind=activity_log 시 status/시간 default."""
if payload.get("kind") != "activity_log":
return
now = datetime.now(timezone.utc)
if not payload.get("status"):
payload["status"] = "done"
if payload.get("ended_at") is None:
payload["ended_at"] = now
if payload.get("started_at") is None:
payload["started_at"] = payload["ended_at"]
if payload.get("status") == "done":
payload.setdefault("completed_at", now)
def _apply_kind_default_status(payload: dict[str, Any]) -> None:
"""kind 별 status default 보정."""
if payload.get("status"):
return
kind = payload.get("kind")
if kind == "calendar_event":
payload["status"] = "scheduled"
elif kind == "task":
payload["status"] = "inbox"
# ─── Create ───
@router.post("/", response_model=EventResponse, status_code=201)
async def create_event(
body: EventCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""events 생성. kind=activity_log 면 status=done/ended_at=now() default."""
payload = body.model_dump(exclude_none=False)
_apply_activity_log_defaults(payload)
_apply_kind_default_status(payload)
if payload["kind"] not in ("task", "calendar_event", "activity_log"):
raise HTTPException(status_code=400, detail="invalid kind")
actor = _actor_for_user(user)
ev = Event(
title=payload["title"],
description=payload.get("description"),
kind=payload["kind"],
status=payload.get("status") or "inbox",
due_at=payload.get("due_at"),
start_at=payload.get("start_at"),
end_at=payload.get("end_at"),
started_at=payload.get("started_at"),
ended_at=payload.get("ended_at"),
all_day=payload.get("all_day") or False,
timezone=payload.get("timezone"),
completed_at=payload.get("completed_at"),
priority=payload.get("priority"),
project_tag=payload.get("project_tag"),
tags=payload.get("tags") or [],
source=payload.get("source") or "manual",
source_ref=payload.get("source_ref"),
raw_metadata=payload.get("raw_metadata") or {},
memo_document_id=payload.get("memo_document_id"),
user_id=user.id,
created_by=actor,
)
session.add(ev)
await session.flush()
await _record_history(
session,
event=ev,
change_kind="create",
changed_by=actor,
before=None,
after=_serialize_for_history(ev),
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
# ─── List / Get ───
@router.get("/", response_model=EventListResponse)
async def list_events(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
kind: str | None = Query(None),
status: str | None = Query(None, description="comma-separated list"),
from_: datetime | None = Query(None, alias="from"),
to: datetime | None = Query(None),
project_tag: str | None = Query(None),
source: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
"""events 목록 — current_user.id 자동 필터. upcoming 은 ?from=now&to=now+7d 로."""
where = [Event.user_id == user.id]
if kind:
where.append(Event.kind == kind)
if status:
statuses = [s.strip() for s in status.split(",") if s.strip()]
if statuses:
where.append(Event.status.in_(statuses))
if project_tag:
where.append(Event.project_tag == project_tag)
if source:
where.append(Event.source == source)
if from_ is not None:
# task: due_at, calendar_event: start_at, activity_log: started_at
where.append(
or_(
Event.due_at >= from_,
Event.start_at >= from_,
Event.started_at >= from_,
)
)
if to is not None:
where.append(
or_(
Event.due_at < to,
Event.start_at < to,
Event.started_at < to,
)
)
base = select(Event).where(and_(*where))
total_q = await session.execute(
select(Event.id).where(and_(*where))
)
total = len(total_q.scalars().all())
rows = await session.execute(
base.order_by(Event.created_at.desc())
.offset((page - 1) * page_size)
.limit(page_size)
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=total)
@router.get("/today", response_model=EventListResponse)
async def list_today(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
timezone: str | None = Query(None, description="기본 Asia/Seoul"),
):
"""오늘 해야 할 것 / 예정된 것. timezone 적용.
포함: task(due_at today) / calendar_event(start_at today) / activity_log(started_at today)
status: inbox/next/scheduled/in_progress 또는 deferred (defer_until <= now() 일 때만).
"""
start_utc, end_utc, now_utc = _local_day_bounds(timezone)
today_clause = or_(
and_(Event.kind == "task", Event.due_at >= start_utc, Event.due_at < end_utc),
and_(
Event.kind == "calendar_event",
Event.start_at >= start_utc,
Event.start_at < end_utc,
),
and_(
Event.kind == "activity_log",
Event.started_at >= start_utc,
Event.started_at < end_utc,
),
)
active_clause = or_(
Event.status.in_(("inbox", "next", "scheduled", "in_progress")),
and_(Event.status == "deferred", Event.defer_until <= now_utc),
)
rows = await session.execute(
select(Event)
.where(Event.user_id == user.id, today_clause, active_clause)
.order_by(Event.start_at.asc(), Event.due_at.asc(), Event.started_at.asc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/inbox", response_model=EventListResponse)
async def list_inbox(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""Inbox — 아직 정리 안 된 것."""
rows = await session.execute(
select(Event)
.where(Event.user_id == user.id, Event.status == "inbox")
.order_by(Event.created_at.desc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/activity", response_model=EventListResponse)
async def list_activity(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
from_: datetime | None = Query(None, alias="from"),
to: datetime | None = Query(None),
):
"""Activity timeline — 한 일 (kind=activity_log + status=done). Today 와 분리."""
where = [
Event.user_id == user.id,
Event.kind == "activity_log",
Event.status == "done",
]
if from_ is not None:
where.append(Event.started_at >= from_)
if to is not None:
where.append(Event.started_at < to)
rows = await session.execute(
select(Event).where(and_(*where)).order_by(Event.started_at.desc())
)
items = [_to_response(e) for e in rows.scalars().all()]
return EventListResponse(items=items, total=len(items))
@router.get("/{event_id}", response_model=EventResponse)
async def get_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
return _to_response(ev)
@router.get("/{event_id}/history", response_model=EventHistoryListResponse)
async def get_event_history(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""events_history 조회 — 상세 페이지 timeline. lifecycle op 자동 기록만 (v1)."""
await _load_owned(session, event_id, user) # owner 검증
rows = await session.execute(
select(EventHistory)
.where(EventHistory.event_id == event_id)
.order_by(EventHistory.changed_at.desc())
)
items = [
EventHistoryResponse.model_validate(h, from_attributes=True)
for h in rows.scalars().all()
]
return EventHistoryListResponse(items=items)
# ─── PATCH ───
@router.patch("/{event_id}", response_model=EventResponse)
async def patch_event(
event_id: int,
body: EventPatch,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""PATCH — 허용 필드만. 시간 필드 변경 시 reschedule history 자동 기록.
status/completed_at/cancelled_at/defer_until 등 lifecycle 필드는 별 endpoint 강제.
"""
ev = await _load_owned(session, event_id, user)
patch = body.model_dump(exclude_unset=True)
if not patch:
return _to_response(ev)
# 안전 검사 — extra=forbid 로 막혀 있지만 한 번 더.
for k in patch:
if k not in PATCH_ALLOWED_FIELDS:
raise HTTPException(status_code=400, detail=f"field not patchable: {k}")
time_changed = any(k in RESCHEDULE_TIME_FIELDS for k in patch)
before_snapshot = _serialize_for_history(ev) if time_changed else None
for k, v in patch.items():
setattr(ev, k, v)
await session.flush()
if time_changed:
actor = _actor_for_user(user)
await _record_history(
session,
event=ev,
change_kind="reschedule",
changed_by=actor,
before=before_snapshot,
after=_serialize_for_history(ev),
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
# ─── Lifecycle ───
async def _transition(
session: AsyncSession,
*,
event: Event,
change_kind: str,
new_status: str,
user: User,
extra_apply: dict[str, Any] | None = None,
) -> Event:
actor = _actor_for_user(user)
before = _serialize_for_history(event)
event.status = new_status
if extra_apply:
for k, v in extra_apply.items():
setattr(event, k, v)
await session.flush()
await _record_history(
session,
event=event,
change_kind=change_kind,
changed_by=actor,
before=before,
after=_serialize_for_history(event),
)
return event
@router.post("/{event_id}/complete", response_model=EventResponse)
async def complete_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
now = datetime.now(timezone.utc)
await _transition(
session,
event=ev,
change_kind="complete",
new_status="done",
user=user,
extra_apply={"completed_at": now},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/cancel", response_model=EventResponse)
async def cancel_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
now = datetime.now(timezone.utc)
await _transition(
session,
event=ev,
change_kind="cancel",
new_status="cancelled",
user=user,
extra_apply={"cancelled_at": now},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/defer", response_model=EventResponse)
async def defer_event(
event_id: int,
body: DeferRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
ev = await _load_owned(session, event_id, user)
await _transition(
session,
event=ev,
change_kind="defer",
new_status="deferred",
user=user,
extra_apply={"defer_until": body.defer_until},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
@router.post("/{event_id}/reactivate", response_model=EventResponse)
async def reactivate_event(
event_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""완료/취소/연기 해제 — kind 따라 기본 status 복귀.
task: inbox, calendar_event: scheduled, activity_log: done 유지 안 함 (activity_log 는 done 이 자연 상태이므로 reactivate 적용 X → 400).
"""
ev = await _load_owned(session, event_id, user)
if ev.kind == "activity_log":
raise HTTPException(
status_code=400, detail="activity_log 는 reactivate 대상 아님"
)
new_status = "scheduled" if ev.kind == "calendar_event" else "inbox"
await _transition(
session,
event=ev,
change_kind="reactivate",
new_status=new_status,
user=user,
extra_apply={"completed_at": None, "cancelled_at": None, "defer_until": None},
)
await session.commit()
await session.refresh(ev)
return _to_response(ev)
-75
View File
@@ -1,75 +0,0 @@
"""PR-MacMini-Derived-Worker-1 internal endpoint.
Mac mini derived-worker 가 study explanation 가공을 위해 호출.
GPU = RAG context provider (LLM generation X), Mac mini = LLM 가공 공장.
Bearer token 보호 (settings.internal_worker_token).
"""
from __future__ import annotations
import logging
from fastapi import APIRouter, Depends, Header, HTTPException, Path, Response, status
from sqlalchemy.ext.asyncio import AsyncSession
from core.config import settings
from core.database import async_session
from models.study_question import StudyQuestion
from services.study.explanation_rag import gather_explanation_context, render_evidence_block
from workers.study_explanation_worker import _render_envelope_prompt
logger = logging.getLogger(__name__)
router = APIRouter()
def _verify_token(authorization: str | None = Header(default=None)) -> None:
if not settings.internal_worker_token:
raise HTTPException(status_code=503, detail="internal_worker_token not configured")
if not authorization or not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=401, detail="missing Bearer token")
token = authorization[7:].strip()
if token != settings.internal_worker_token:
raise HTTPException(status_code=403, detail="invalid token")
async def _session() -> AsyncSession:
async with async_session() as s:
yield s
@router.get("/explanation-context/{question_id}")
async def get_explanation_context(
question_id: int = Path(..., ge=1),
_auth: None = Depends(_verify_token),
session: AsyncSession = Depends(_session),
):
question = await session.get(StudyQuestion, question_id)
if question is None or question.deleted_at is not None:
raise HTTPException(status_code=410, detail="question deleted or missing")
if question.ai_explanation_status == "ready":
raise HTTPException(status_code=410, detail="explanation already ready")
ctx = await gather_explanation_context(session, question.user_id, question)
docs_count = len(ctx.documents)
qs_count = len(ctx.questions)
if docs_count == 0 and qs_count == 0:
return Response(status_code=204)
doc_block = render_evidence_block(ctx.documents)
q_block = render_evidence_block(ctx.questions)
rendered_prompt = _render_envelope_prompt(question, doc_block, q_block)
logger.info(
"internal_study_context qid=%s docs=%s questions=%s prompt_len=%s",
question_id, docs_count, qs_count, len(rendered_prompt),
)
return {
"question_id": question.id,
"question_correct_choice": question.correct_choice,
"rendered_prompt": rendered_prompt,
"evidence_summary": {
"documents_count": docs_count,
"questions_count": qs_count,
},
}
-327
View File
@@ -1,327 +0,0 @@
"""PR-Worker-Pool-Registry-1B: /internal/worker/* 5 endpoint 실 구현.
worker-pool-policy §B.2 invariant 매핑:
- inv 2: drain = heartbeat INSERT only (advisory). claim 거부 = Notebook-Pilot-1.
- inv 3: /result result = raw JSONB only. canonical promote 0.
- inv 4: ProcessingQueue 무변경 — worker_jobs 별 table.
- inv 5: 운영 자동 분기 변경 0 — heartbeat alive 판정 SQL 부재, classify_worker/queue_consumer touch 0.
사용자 review 정정 5개 (2026-05-19):
- #1: worker_jobs.user_id = job owner (실 사용자). worker 인증은 worker_id + JWT 별도.
- #2: /result 소유권 검증 (WHERE id AND worker_id AND status='processing'). 매칭 0건 → 404.
- #3: explicit failed 재시도 (attempts<max → pending 복귀, attempts>=max → final failed).
- #4: /claim 204 = Response(status_code=204) body 0.
- #5: mig 275 status CHECK ('pending','processing','completed','failed').
"""
import json
import os
from datetime import datetime, timezone
from typing import Annotated, Any
from fastapi import APIRouter, Depends, HTTPException, Response, status
from pydantic import BaseModel, Field
from sqlalchemy import select, update
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user, require_worker_user
from core.database import get_session
from models.worker_pool import WorkerCapability, WorkerHeartbeat, WorkerJob
from services.worker_recap_context import fetch_recap_context
# PR-Worker-Pool-Registry-1C — payload size guard (recap context 가 큰 경우 차단).
# 사용자 결정 2026-05-19: cap 1MB 상향 + fetch_recap_context deterministic compaction
# (top-N memo + daily/kind aggregate). 운영 7d 데이터 ~1.36MB → 100KB 부족 → 1MB.
# 운영 조정용 env override = `WORKER_RECAP_PAYLOAD_MAX_BYTES`.
def _payload_max_bytes() -> int:
return int(os.getenv("WORKER_RECAP_PAYLOAD_MAX_BYTES", "1000000"))
router = APIRouter()
# ─── Pydantic schemas ───
class WorkerRegisterRequest(BaseModel):
worker_id: str
device_label: str
worker_class: str
tier: str
capabilities: list[str] = []
models_loaded: list[str] = []
endpoint: str | None = None
class WorkerHeartbeatRequest(BaseModel):
worker_id: str
status: str # starting/available/busy/draining
current_job_id: int | None = None
battery: str | None = None
thermal: str | None = None
raw_payload: dict[str, Any] = {}
class WorkerClaimRequest(BaseModel):
worker_id: str
job_type: str
class WorkerClaimResponse(BaseModel):
id: int
job_type: str
payload: dict[str, Any]
attempts: int
class WorkerResultRequest(BaseModel):
job_id: int
worker_id: str # 정정 #2 — 소유권 검증
status: str # completed | failed
result: dict[str, Any] | None = None
error_message: str | None = None
class WorkerDrainRequest(BaseModel):
worker_id: str
reason: str | None = None
# ─── 엔드포인트 ───
@router.post("/register")
async def register(
body: WorkerRegisterRequest,
user: Annotated[Any, Depends(require_worker_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""worker_capabilities UPSERT — register 또는 capability 갱신."""
now = datetime.now(timezone.utc)
stmt = pg_insert(WorkerCapability).values(
worker_id=body.worker_id,
user_id=user.id,
device_label=body.device_label,
worker_class=body.worker_class,
tier=body.tier,
capabilities=body.capabilities,
models_loaded=body.models_loaded,
endpoint=body.endpoint,
created_at=now,
last_registered_at=now,
).on_conflict_do_update(
index_elements=["worker_id"],
set_={
"device_label": body.device_label,
"worker_class": body.worker_class,
"tier": body.tier,
"capabilities": body.capabilities,
"models_loaded": body.models_loaded,
"endpoint": body.endpoint,
"last_registered_at": now,
},
)
await session.execute(stmt)
await session.commit()
return {"ok": True, "worker_id": body.worker_id}
@router.post("/heartbeat")
async def heartbeat(
body: WorkerHeartbeatRequest,
user: Annotated[Any, Depends(require_worker_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""worker_heartbeats append-only INSERT.
inv 5 강제: alive 판정 SQL 부재. 본 endpoint 는 row 추가 + ok 반환만.
"""
hb = WorkerHeartbeat(
worker_id=body.worker_id,
status=body.status,
current_job_id=body.current_job_id,
battery=body.battery,
thermal=body.thermal,
raw_payload=body.raw_payload,
)
session.add(hb)
await session.commit()
return {"ok": True}
@router.post(
"/claim",
responses={
200: {"model": WorkerClaimResponse},
204: {"description": "queue empty"},
},
)
async def claim(
body: WorkerClaimRequest,
user: Annotated[Any, Depends(require_worker_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""SELECT FOR UPDATE SKIP LOCKED 로 pending job 1건 claim.
정정 #4: miss → Response(status_code=204) body 0. WorkerClaimResponse | None 회피.
"""
now = datetime.now(timezone.utc)
stmt = (
select(WorkerJob)
.where(WorkerJob.status == "pending", WorkerJob.job_type == body.job_type)
.order_by(WorkerJob.created_at)
.limit(1)
.with_for_update(skip_locked=True)
)
result = await session.execute(stmt)
job = result.scalar_one_or_none()
if job is None:
await session.commit() # FOR UPDATE 트랜잭션 해제
return Response(status_code=204)
job.status = "processing"
job.worker_id = body.worker_id
job.claimed_at = now
job.attempts = job.attempts + 1
await session.commit()
return WorkerClaimResponse(
id=job.id,
job_type=job.job_type,
payload=job.payload,
attempts=job.attempts,
)
@router.post("/result")
async def result(
body: WorkerResultRequest,
user: Annotated[Any, Depends(require_worker_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""job 결과 제출. 정정 #2 (소유권) + #3 (재시도) 강제.
소유권 검증: WHERE id AND worker_id AND status='processing'. 매칭 0건 → 404.
completed: status='completed' + result + completed_at.
failed:
attempts < max_attempts → status='pending' (worker_id/claimed_at/completed_at NULL).
attempts >= max_attempts → status='failed' final + completed_at.
result 컬럼 절대 갱신 X — request.result 무시 (failed 시 partial result 저장 차단).
"""
if body.status not in ("completed", "failed"):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="status must be 'completed' or 'failed'",
)
stmt = select(WorkerJob).where(
WorkerJob.id == body.job_id,
WorkerJob.worker_id == body.worker_id,
WorkerJob.status == "processing",
)
res = await session.execute(stmt)
job = res.scalar_one_or_none()
if job is None:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="job not found or not owned by this worker (or not in processing)",
)
now = datetime.now(timezone.utc)
if body.status == "completed":
job.status = "completed"
job.result = body.result # raw JSONB (inv 3 — canonical promote 0)
job.completed_at = now
job.error_message = None
else: # failed
job.error_message = body.error_message
# 정정 #3 정책: result 컬럼 절대 갱신 X (request.result 무시)
if job.attempts < job.max_attempts:
job.status = "pending"
job.worker_id = None
job.claimed_at = None
job.completed_at = None
else:
job.status = "failed"
job.completed_at = now
await session.commit()
return {"ok": True, "status": job.status, "attempts": job.attempts}
class JobsRecapRequest(BaseModel):
days: int = Field(default=7, ge=1, le=30)
class JobsRecapResponse(BaseModel):
job_id: int
memo_count: int
event_count: int
payload_bytes: int
payload_compacted: bool
omitted_memos: int
@router.post("/jobs/recap", response_model=JobsRecapResponse)
async def enqueue_recap(
body: JobsRecapRequest,
user: Annotated[Any, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""PR-Worker-Pool-Registry-1C — recap context 조립 + worker_jobs INSERT.
인증 = 일반 user JWT (require_worker_user 아님). user 자신의 memo/event 만 묶음.
payload size guard = JSON 직렬화 100KB 초과 시 413 (정정 #4 정신, recap-specific).
"""
context = await fetch_recap_context(session, user_id=user.id, days=body.days)
payload_bytes = len(json.dumps(context, ensure_ascii=False).encode("utf-8"))
cap = _payload_max_bytes()
if payload_bytes > cap:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=(
f"recap context payload {payload_bytes} bytes > {cap} bytes (after compaction). "
f"days 를 줄여 재시도 (현재 {body.days}d) 또는 운영자에게 RECAP_MEMO_TOP_N / "
"WORKER_RECAP_PAYLOAD_MAX_BYTES 조정 요청."
),
)
job = WorkerJob(
user_id=user.id,
job_type="recap",
payload=context,
)
session.add(job)
await session.commit()
await session.refresh(job)
return JobsRecapResponse(
job_id=job.id,
memo_count=context["memo_count"],
event_count=context["event_count"],
payload_bytes=payload_bytes,
payload_compacted=context["payload_compacted"],
omitted_memos=context["summary_stats"]["omitted_memos"],
)
@router.post("/drain")
async def drain(
body: WorkerDrainRequest,
user: Annotated[Any, Depends(require_worker_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""drain = heartbeat INSERT status='draining' (advisory/audit only, inv 2).
claim 거부 로직 부재 = Notebook-Pilot-1 영역.
"""
payload: dict[str, Any] = {}
if body.reason:
payload["reason"] = body.reason
hb = WorkerHeartbeat(
worker_id=body.worker_id,
status="draining",
raw_payload=payload,
)
session.add(hb)
await session.commit()
return {"ok": True}
-544
View File
@@ -1,544 +0,0 @@
"""자료실 분류 체계 CRUD API — /api/library"""
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import func, select
from sqlalchemy import text as sql_text
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from core.library import LIBRARY_PREFIX, MAX_DEPTH, normalize_library_path
from models.category import LibraryCategory
from models.document import Document
from models.facet_value import FacetValue
from models.user import User
FACET_TYPES = ("company", "topic", "doctype") # year는 사전 불필요
router = APIRouter()
# ─── 스키마 ───
class CategoryCreate(BaseModel):
path: str
class CategoryRename(BaseModel):
path: str
new_name: str
class CategoryResponse(BaseModel):
id: int
path: str
name: str
parent_path: str | None
depth: int
is_system: bool
created_at: datetime
updated_at: datetime
model_config = {"from_attributes": True}
class CategoryTreeNode(BaseModel):
name: str
path: str
count: int
# 현재 사용자 기준, 해당 경로 (하위 경로 포함) 의 안 본 자료 수.
# 0 이면 모두 1+회독.
unread_count: int = 0
is_category: bool
is_system: bool
has_children: bool
children: list["CategoryTreeNode"]
# ─── 엔드포인트 ───
@router.get("/categories", response_model=list[CategoryResponse])
async def list_categories(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""전체 카테고리 flat 목록 (path 순)"""
result = await session.execute(
select(LibraryCategory).order_by(LibraryCategory.path)
)
return [CategoryResponse.model_validate(c) for c in result.scalars().all()]
@router.post("/categories", response_model=CategoryResponse, status_code=201)
async def create_category(
body: CategoryCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""카테고리 생성 (조상 자동 생성 포함)"""
try:
normalized = normalize_library_path(body.path)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
segments = normalized.split("/")
if len(segments) > MAX_DEPTH:
raise HTTPException(status_code=400, detail=f"최대 {MAX_DEPTH}단계까지 가능")
# 중복 검사
existing = await session.execute(
select(LibraryCategory).where(LibraryCategory.path == normalized)
)
if existing.scalar_one_or_none():
raise HTTPException(status_code=409, detail="이미 존재하는 분류 경로")
# 조상 자동 생성
for i in range(1, len(segments)):
ancestor_path = "/".join(segments[:i])
ancestor_name = segments[i - 1]
ancestor_parent = "/".join(segments[: i - 1]) or None
exists = await session.execute(
select(LibraryCategory.id).where(
LibraryCategory.path == ancestor_path
)
)
if not exists.scalar_one_or_none():
session.add(LibraryCategory(
path=ancestor_path,
name=ancestor_name,
parent_path=ancestor_parent,
depth=i,
))
# 본 카테고리 생성
category = LibraryCategory(
path=normalized,
name=segments[-1],
parent_path="/".join(segments[:-1]) or None,
depth=len(segments),
)
session.add(category)
await session.commit()
await session.refresh(category)
return CategoryResponse.model_validate(category)
@router.patch("/categories", response_model=CategoryResponse)
async def rename_category(
body: CategoryRename,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""카테고리 이름 변경 (leaf only, path 기반 식별)"""
# 카테고리 조회
result = await session.execute(
select(LibraryCategory).where(LibraryCategory.path == body.path)
)
category = result.scalar_one_or_none()
if not category:
raise HTTPException(status_code=404, detail="분류를 찾을 수 없습니다")
# 시스템 분류 보호
if category.is_system:
raise HTTPException(status_code=422, detail="시스템 분류는 변경할 수 없습니다")
# leaf 검사
children = await session.execute(
select(func.count()).where(
LibraryCategory.parent_path == category.path
)
)
if children.scalar() > 0:
raise HTTPException(
status_code=422, detail="하위 분류가 있어 이름을 변경할 수 없습니다"
)
# new_name 검증
new_name = body.new_name.strip()
if not new_name:
raise HTTPException(status_code=400, detail="빈 이름")
if len(new_name) > 30:
raise HTTPException(status_code=400, detail="이름은 30자 이하")
# 새 path 계산
new_path = (
f"{category.parent_path}/{new_name}" if category.parent_path else new_name
)
# 중복 검사
dup = await session.execute(
select(LibraryCategory.id).where(LibraryCategory.path == new_path)
)
if dup.scalar_one_or_none():
raise HTTPException(status_code=409, detail="같은 이름의 분류가 이미 존재합니다")
old_tag = f"{LIBRARY_PREFIX}{category.path}"
new_tag = f"{LIBRARY_PREFIX}{new_path}"
# 문서 태그 갱신
await session.execute(
sql_text("""
UPDATE documents
SET user_tags = COALESCE((
SELECT jsonb_agg(
CASE WHEN elem = :old_tag THEN :new_tag ELSE elem END
)
FROM jsonb_array_elements_text(
COALESCE(user_tags, '[]'::jsonb)
) AS elem
), '[]'::jsonb)
WHERE user_tags @> :old_tag_jsonb
""").bindparams(
old_tag=old_tag,
new_tag=new_tag,
old_tag_jsonb=f'["{old_tag}"]',
)
)
# 카테고리 row 갱신 (path, name만. parent_path 유지)
category.path = new_path
category.name = new_name
await session.commit()
await session.refresh(category)
return CategoryResponse.model_validate(category)
@router.delete("/categories", status_code=204)
async def delete_category(
path: str = Query(..., description="삭제할 카테고리 경로"),
user: Annotated[User, Depends(get_current_user)] = None,
session: Annotated[AsyncSession, Depends(get_session)] = None,
):
"""카테고리 삭제 (leaf only, 문서 없는 경우만)"""
result = await session.execute(
select(LibraryCategory).where(LibraryCategory.path == path)
)
category = result.scalar_one_or_none()
if not category:
raise HTTPException(status_code=404, detail="분류를 찾을 수 없습니다")
if category.is_system:
raise HTTPException(status_code=422, detail="시스템 분류는 삭제할 수 없습니다")
# leaf 검사
children = await session.execute(
select(func.count()).where(
LibraryCategory.parent_path == category.path
)
)
if children.scalar() > 0:
raise HTTPException(
status_code=422, detail="하위 분류가 있어 삭제할 수 없습니다"
)
# 문서 연결 검사
tag = f"{LIBRARY_PREFIX}{category.path}"
doc_count = await session.execute(
sql_text("""
SELECT COUNT(*) FROM documents
WHERE deleted_at IS NULL
AND EXISTS (
SELECT 1 FROM jsonb_array_elements_text(
COALESCE(user_tags, '[]'::jsonb)
) AS t
WHERE t = :tag
)
""").bindparams(tag=tag)
)
if doc_count.scalar() > 0:
raise HTTPException(
status_code=422,
detail="이 분류에 속한 문서가 있어 삭제할 수 없습니다. 문서를 먼저 이동하세요.",
)
await session.delete(category)
await session.commit()
@router.get("/tree", response_model=list[CategoryTreeNode])
async def get_library_tree(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""카테고리 저장소 + 문서 태그 count 머지 트리"""
# 1. 카테고리 전체 fetch
cat_result = await session.execute(
select(LibraryCategory).order_by(LibraryCategory.path)
)
categories = cat_result.scalars().all()
# path → category 매핑
cat_map: dict[str, LibraryCategory] = {c.path: c for c in categories}
# 2. 문서 태그에서 doc count 집계
doc_result = await session.execute(
select(Document.id, Document.user_tags).where(
Document.deleted_at == None, # noqa: E711
Document.user_tags != None, # noqa: E711
)
)
# path → set of doc_ids
path_docs: dict[str, set[int]] = {}
for doc_id, tags in doc_result:
if not tags:
continue
seen_ancestors: set[str] = set()
for tag in tags:
if not isinstance(tag, str) or not tag.startswith(LIBRARY_PREFIX):
continue
path = tag[len(LIBRARY_PREFIX):]
parts = path.split("/")
for i in range(1, len(parts) + 1):
ancestor = "/".join(parts[:i])
if ancestor not in seen_ancestors:
path_docs.setdefault(ancestor, set()).add(doc_id)
seen_ancestors.add(ancestor)
# 2.5 현재 사용자가 1+회독 한 doc_id 집합 (안 본 자료 = 전체 - 읽음)
from models.document_read import DocumentRead
read_result = await session.execute(
select(DocumentRead.document_id)
.where(DocumentRead.user_id == user.id)
.group_by(DocumentRead.document_id)
)
read_doc_ids: set[int] = {r[0] for r in read_result}
# 3. 모든 path 합산 (카테고리 + 태그)
all_paths = set(cat_map.keys()) | set(path_docs.keys())
# 4. 트리 구축
root: dict = {}
for p in sorted(all_paths):
parts = p.split("/")
node = root
for i, part in enumerate(parts):
if part not in node:
node[part] = {"_children": {}}
node = node[part]["_children"] if i < len(parts) - 1 else node[part]
def build_tree(d: dict, prefix: str = "") -> list[dict]:
nodes = []
for name, data in sorted(d.items()):
if name.startswith("_"):
continue
path = f"{prefix}/{name}" if prefix else name
children_dict = data.get("_children", {})
children = build_tree(children_dict, path)
cat = cat_map.get(path)
# path_docs[path] 는 이미 본 노드의 자손 doc 까지 누적되어 있음 (위 ancestor 누적 로직).
# 따라서 unread_count 도 하위 경로 전체 합산 (bottom-up 별도 계산 불필요).
docs_at_path = path_docs.get(path, set())
unread = len(docs_at_path - read_doc_ids)
nodes.append(CategoryTreeNode(
name=name,
path=path,
count=len(docs_at_path),
unread_count=unread,
is_category=path in cat_map,
is_system=cat.is_system if cat else False,
has_children=len(children) > 0,
children=children,
))
return nodes
return build_tree(root)
# ─── Facet API (Phase 2) ───
class FacetValueResponse(BaseModel):
facet_type: str
value: str
model_config = {"from_attributes": True}
class FacetCountItem(BaseModel):
value: str
count: int
class FacetCountsResponse(BaseModel):
company: list[FacetCountItem]
topic: list[FacetCountItem]
year: list[FacetCountItem]
doctype: list[FacetCountItem]
@router.get("/facets", response_model=dict[str, list[str]])
async def get_facet_values(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""facet 축별 허용값 사전 (year는 실제 데이터 기반)"""
result: dict[str, list[str]] = {}
for ft in FACET_TYPES:
rows = await session.execute(
select(FacetValue.value)
.where(FacetValue.facet_type == ft)
.order_by(FacetValue.value)
)
result[ft] = [r[0] for r in rows]
# year는 사전 없이 실제 문서 값에서 추출
year_rows = await session.execute(
select(Document.facet_year)
.where(
Document.deleted_at == None, # noqa: E711
Document.facet_year != None, # noqa: E711
)
.distinct()
.order_by(Document.facet_year.desc())
)
result["year"] = [str(r[0]) for r in year_rows]
return result
@router.post("/facets", response_model=FacetValueResponse, status_code=201)
async def add_facet_value(
body: FacetValueResponse,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""facet 사전에 새 값 추가"""
if body.facet_type not in FACET_TYPES:
raise HTTPException(status_code=400, detail=f"허용 facet: {', '.join(FACET_TYPES)}")
value = body.value.strip()
if not value:
raise HTTPException(status_code=400, detail="빈 값")
existing = await session.execute(
select(FacetValue).where(
FacetValue.facet_type == body.facet_type,
FacetValue.value == value,
)
)
if existing.scalar_one_or_none():
raise HTTPException(status_code=409, detail="이미 존재하는 값")
fv = FacetValue(facet_type=body.facet_type, value=value)
session.add(fv)
await session.commit()
return FacetValueResponse(facet_type=body.facet_type, value=value)
@router.get("/facet-counts", response_model=FacetCountsResponse)
async def get_facet_counts(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
library_path: str | None = None,
facet_company: str | None = None,
facet_topic: str | None = None,
facet_year: int | None = None,
facet_doctype: str | None = None,
q: str | None = None,
):
"""현재 필터 기준 facet별 집계 count"""
def base_query():
query = select(Document).where(
Document.deleted_at == None, # noqa: E711
Document.doc_purpose == "business",
)
if library_path:
exact = f"{LIBRARY_PREFIX}{library_path}"
prefix = f"{LIBRARY_PREFIX}{library_path}/%"
query = query.where(
sql_text("""
EXISTS (
SELECT 1 FROM jsonb_array_elements_text(
COALESCE(documents.user_tags, '[]'::jsonb)
) AS t
WHERE t = :exact OR t LIKE :prefix
)
""").bindparams(exact=exact, prefix=prefix)
)
if q:
query = query.where(Document.title.ilike(f"%{q}%"))
return query
result = FacetCountsResponse(company=[], topic=[], year=[], doctype=[])
# company counts (다른 facet 필터 적용, 자기 자신 제외)
q_company = base_query()
if facet_topic:
q_company = q_company.where(Document.facet_topic == facet_topic)
if facet_year:
q_company = q_company.where(Document.facet_year == facet_year)
if facet_doctype:
q_company = q_company.where(Document.facet_doctype == facet_doctype)
rows = await session.execute(
select(Document.facet_company, func.count())
.where(Document.facet_company != None) # noqa: E711
.where(Document.id.in_(q_company.with_only_columns(Document.id).subquery().select()))
.group_by(Document.facet_company)
.order_by(func.count().desc())
)
result.company = [FacetCountItem(value=r[0], count=r[1]) for r in rows]
# topic counts
q_topic = base_query()
if facet_company:
q_topic = q_topic.where(Document.facet_company == facet_company)
if facet_year:
q_topic = q_topic.where(Document.facet_year == facet_year)
if facet_doctype:
q_topic = q_topic.where(Document.facet_doctype == facet_doctype)
rows = await session.execute(
select(Document.facet_topic, func.count())
.where(Document.facet_topic != None) # noqa: E711
.where(Document.id.in_(q_topic.with_only_columns(Document.id).subquery().select()))
.group_by(Document.facet_topic)
.order_by(func.count().desc())
)
result.topic = [FacetCountItem(value=r[0], count=r[1]) for r in rows]
# year counts
q_year = base_query()
if facet_company:
q_year = q_year.where(Document.facet_company == facet_company)
if facet_topic:
q_year = q_year.where(Document.facet_topic == facet_topic)
if facet_doctype:
q_year = q_year.where(Document.facet_doctype == facet_doctype)
rows = await session.execute(
select(Document.facet_year, func.count())
.where(Document.facet_year != None) # noqa: E711
.where(Document.id.in_(q_year.with_only_columns(Document.id).subquery().select()))
.group_by(Document.facet_year)
.order_by(Document.facet_year.desc())
)
result.year = [FacetCountItem(value=str(r[0]), count=r[1]) for r in rows]
# doctype counts
q_doctype = base_query()
if facet_company:
q_doctype = q_doctype.where(Document.facet_company == facet_company)
if facet_topic:
q_doctype = q_doctype.where(Document.facet_topic == facet_topic)
if facet_year:
q_doctype = q_doctype.where(Document.facet_year == facet_year)
rows = await session.execute(
select(Document.facet_doctype, func.count())
.where(Document.facet_doctype != None) # noqa: E711
.where(Document.id.in_(q_doctype.with_only_columns(Document.id).subquery().select()))
.group_by(Document.facet_doctype)
.order_by(func.count().desc())
)
result.doctype = [FacetCountItem(value=r[0], count=r[1]) for r in rows]
return result
-798
View File
@@ -1,798 +0,0 @@
"""메모 CRUD API — text 메모(file_type='note') + voice 메모 (file_type='immutable', category='audio', source_channel='voice')
doc_type enum = (immutable, editable, note). 기존 audio 파일이 file_type='immutable' + category='audio'
패턴을 사용하므로 voice 메모도 같은 패턴 따름 (enum 확장 회피).
"""
import hashlib
import logging
import os
import re
import uuid
from datetime import datetime, timezone
from pathlib import Path
from typing import Annotated, Any
from fastapi import APIRouter, Depends, File, Form, HTTPException, Query, UploadFile
from pydantic import BaseModel, Field
from sqlalchemy import delete, func, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.config import settings
from core.database import get_session
from models.document import Document
from models.event import Event
from models.event_history import EventHistory
from models.queue import ProcessingQueue, enqueue_stage
from models.user import User
# Voice upload 제한 (plan v9 결정 — 10분 / 50MB)
VOICE_MAX_BYTES = 50 * 1024 * 1024
VOICE_ALLOWED_EXTS = {".m4a", ".mp3", ".wav", ".webm", ".ogg", ".opus", ".aac"}
VOICE_ALLOWED_CONTENT_PREFIXES = ("audio/",)
VOICE_NAS_SUBDIR = "PKM/Recordings" # /mnt/nas/Document_Server/PKM/Recordings/{YYYY-MM}/{uuid}.{ext}
logger = logging.getLogger(__name__)
router = APIRouter()
# markdown task line: "- [ ] ..." 또는 "- [x] ..."
TASK_LINE_RE = re.compile(r"^(\s*- \[)([ xX])(\].*)$")
# #태그 파싱 패턴: 한글/영문/숫자/밑줄, 2자 이상
TAG_PATTERN = re.compile(r"(?:^|(?<=\s))#([가-힣a-zA-Z0-9_]{2,})")
def _parse_hashtags(content: str) -> list[str]:
"""본문에서 #태그 추출, 중복 제거, 순서 유지"""
seen: set[str] = set()
tags: list[str] = []
for m in TAG_PATTERN.finditer(content):
tag = m.group(1)
if tag not in seen:
seen.add(tag)
tags.append(tag)
return tags
def _content_hash(content: str) -> str:
"""메모 본문의 SHA-256 해시 (note의 file_hash = content hash)"""
return hashlib.sha256(content.encode("utf-8")).hexdigest()
def _auto_title(content: str) -> str:
"""첫 줄에서 제목 자동 생성 (80자 절단, 마크다운 헤딩 제거)"""
first_line = content.split("\n", 1)[0].strip()
title = re.sub(r"^#+\s*", "", first_line)[:80] or "메모"
return title
def _toggle_task_line(content: str, target_index: int, checked: bool) -> tuple[str, bool]:
"""N번째 markdown task line을 찾아 checked/unchecked 상태로 설정.
(new_content, found) 반환. found=False면 target_index에 해당하는 task line이 없음
(본문 편집으로 drift된 경우).
"""
lines = content.split("\n")
ti = 0
found = False
for i, line in enumerate(lines):
m = TASK_LINE_RE.match(line)
if not m:
continue
if ti == target_index:
mark = "x" if checked else " "
lines[i] = m.group(1) + mark + m.group(3)
found = True
break
ti += 1
return "\n".join(lines), found
def _sync_task_state_with_content(content: str, existing_state: dict | None) -> dict:
"""content 의 체크리스트 상태를 memo_task_state 와 동기화.
- content 의 `- [x]` 중 state 에 checked_at 이 없으면 현재 시각으로 기록
→ 본문에 `- [x]` 로 직접 입력된 legacy 항목도 저장 시각 기준으로 10초 후 숨김 동작.
- content 의 `- [ ]` 에 해당하는 index 는 state 에서 제거.
- content 에 task 가 줄어들어 사라진 index 도 정리.
"""
state = dict(existing_state or {})
current_keys: set[str] = set()
task_idx = 0
now_iso = datetime.now(timezone.utc).isoformat()
for line in (content or "").split("\n"):
m = TASK_LINE_RE.match(line)
if not m:
continue
key = str(task_idx)
is_checked = m.group(2).lower() == "x"
if is_checked:
current_keys.add(key)
entry = state.get(key) or {}
if not entry.get("checked_at"):
state[key] = {"checked_at": now_iso}
# unchecked 는 current_keys 에 넣지 않음 → 아래에서 제거
task_idx += 1
# content 에서 unchecked 가 됐거나 아예 사라진 index 의 state 정리
for k in list(state.keys()):
if k not in current_keys:
state.pop(k, None)
return state
async def _enqueue_ai_stages(session: AsyncSession, document_id: int):
"""classify + embed + chunk 큐 등록. 기존 pending 건 정리 (중복 방지)."""
stages = ["classify", "embed", "chunk"]
await session.execute(
delete(ProcessingQueue).where(
ProcessingQueue.document_id == document_id,
ProcessingQueue.stage.in_(stages),
ProcessingQueue.status == "pending",
)
)
for stage in stages:
await enqueue_stage(session, document_id, stage)
# ─── 스키마 ───
class MemoCreate(BaseModel):
content: str
title: str | None = None # 선택적 제목 (없으면 첫 줄 자동 생성)
ask_includable: bool = True
# PR-Hermes-Docsrv-Bridge-1: 외부 채널 진입점 식별. default='memo' (web UI 호환).
# 허용 값: memo / voice / hermes / ... (app/models/document.py source_channel enum).
source_channel: str | None = None
# PR-Hermes-Docsrv-Bridge-1: channel/user/message_id/timestamp 등 채널 메타.
source_metadata: dict | None = None
class MemoUpdate(BaseModel):
content: str
title: str | None = None # 명시 제목 변경 (None이면 자동 생성)
class ArchiveSet(BaseModel):
archived: bool
class TaskToggle(BaseModel):
checked: bool
class MemoResponse(BaseModel):
id: int
title: str | None
content: str | None # extracted_text
file_format: str
user_tags: list | None
ai_tags: list | None
ai_domain: str | None
ai_sub_group: str | None
ai_summary: str | None
pinned: bool
archived: bool
ask_includable: bool
memo_task_state: dict # {"<task_index>": {"checked_at": "<ISO8601>"}}
# Memo Intake Upgrade PR-2B — AI 추천 분류 (사용자 1-click promote 의 hint)
ai_event_kind: str | None = None
ai_event_confidence: float | None = None
source_channel: str | None = None # voice/memo/hermes 등 진입점 식별 (UI 배지)
source_metadata: dict = {} # PR-Hermes-Docsrv-Bridge-1: channel/user/message_id/timestamp
file_type: str | None = None # audio (voice 메모) vs note (text 메모)
file_path: str | None = None # voice 메모의 NAS audio 경로 (audio player 용)
created_at: datetime
updated_at: datetime
class Config:
from_attributes = True
class MemoListResponse(BaseModel):
items: list[MemoResponse]
total: int
page: int
page_size: int
def _to_memo_response(doc: Document) -> MemoResponse:
return MemoResponse(
id=doc.id,
title=doc.title,
content=doc.extracted_text,
file_format=doc.file_format,
user_tags=doc.user_tags,
ai_tags=doc.ai_tags,
ai_domain=doc.ai_domain,
ai_sub_group=doc.ai_sub_group,
ai_summary=doc.ai_summary,
pinned=doc.pinned,
archived=doc.archived,
ask_includable=doc.ask_includable,
memo_task_state=dict(doc.memo_task_state or {}),
ai_event_kind=doc.ai_event_kind,
ai_event_confidence=doc.ai_event_confidence,
source_channel=doc.source_channel,
source_metadata=dict(doc.source_metadata or {}),
file_type=doc.file_type,
file_path=doc.file_path,
created_at=doc.created_at,
updated_at=doc.updated_at,
)
# ─── 엔드포인트 ───
@router.post("/", response_model=MemoResponse, status_code=201)
async def create_memo(
body: MemoCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 생성 — file_type='note', 파일 없는 문서"""
content = body.content.strip()
if not content:
raise HTTPException(status_code=400, detail="메모 내용이 비어있습니다")
# PR-Hermes-Docsrv-Bridge-1: source_channel/metadata override 가능. default='memo' (기존 web UI 호환).
channel = body.source_channel or "memo"
if channel not in ("memo", "voice", "hermes"):
raise HTTPException(
status_code=400,
detail=f"source_channel '{channel}' 허용 안 됨 (memo/voice/hermes 만)",
)
doc = Document(
file_path=None,
file_hash=_content_hash(content),
file_format="md",
file_size=len(content.encode("utf-8")),
file_type="note",
title=body.title.strip() if body.title and body.title.strip() else _auto_title(content),
extracted_text=content,
review_status="approved",
source_channel=channel,
source_metadata=body.source_metadata or {},
user_tags=_parse_hashtags(content),
pinned=False,
archived=False,
ask_includable=body.ask_includable,
# 본문에 `- [x]` 로 입력된 체크 항목도 생성 시각 기준 10초 후 자동 숨김 대상이 되도록 sync.
memo_task_state=_sync_task_state_with_content(content, None),
)
session.add(doc)
await session.flush()
await _enqueue_ai_stages(session, doc.id)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
@router.get("/", response_model=MemoListResponse)
async def list_memos(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
page: int = Query(1, ge=1),
page_size: int = Query(20, ge=1, le=100),
tag: str | None = Query(None, description="user_tags 또는 ai_tags 필터"),
archived: bool = Query(False, description="true면 아카이브 목록"),
pinned: bool | None = Query(None, description="true면 핀 고정된 메모만"),
):
"""메모 목록 — 활성: 핀 우선 + 최신순 / 아카이브: 최신순 (핀 무시)
PR-2C: source_channel='voice' (음성 메모) 도 포함. 사용자 의도 = 메모는 모든 입력의 inbox.
voice 메모는 file_type='immutable' + category='audio' + source_channel='voice' 패턴.
source_channel 만으로 분리 (file_type 필터는 immutable 다른 binary 까지 끌어옴 — 회피).
PR-Hermes-Docsrv-Bridge-1: source_channel='hermes' (Hermes Discord 등 외부 채널 진입) 도 inbox 포함.
"""
base = select(Document).where(
Document.source_channel.in_(("memo", "voice", "hermes")),
Document.deleted_at == None, # noqa: E711
Document.archived == archived,
)
if pinned is not None:
base = base.where(Document.pinned == pinned)
if tag:
base = base.where(
Document.user_tags.op("@>")(f'["{tag}"]')
| Document.ai_tags.op("@>")(f'["{tag}"]')
)
count_query = select(func.count()).select_from(base.subquery())
total = (await session.execute(count_query)).scalar() or 0
# 활성: pinned DESC + created_at DESC / 아카이브: created_at DESC (핀 무시)
if archived:
query = base.order_by(Document.created_at.desc())
else:
query = base.order_by(Document.pinned.desc(), Document.created_at.desc())
query = query.offset((page - 1) * page_size).limit(page_size)
result = await session.execute(query)
items = result.scalars().all()
return MemoListResponse(
items=[_to_memo_response(doc) for doc in items],
total=total,
page=page,
page_size=page_size,
)
@router.get("/{memo_id}", response_model=MemoResponse)
async def get_memo(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 단건 조회"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
return _to_memo_response(doc)
@router.patch("/{memo_id}", response_model=MemoResponse)
async def update_memo(
memo_id: int,
body: MemoUpdate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 수정 — content 변경 시 AI 데이터 초기화 + 재처리 큐 등록"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
content = body.content.strip()
if not content:
raise HTTPException(status_code=400, detail="메모 내용이 비어있습니다")
doc.extracted_text = content
doc.file_hash = _content_hash(content)
doc.file_size = len(content.encode("utf-8"))
# 본문 편집으로 task 순서/추가/삭제가 일어났을 수 있으니 state 재동기화.
# `- [x]` 에 checked_at 없으면 이번 수정 시각으로 기록 → 10초 후 자동 숨김 동작.
doc.memo_task_state = _sync_task_state_with_content(content, doc.memo_task_state)
# PATCH semantics: title 필드를 명시적으로 보낸 경우만 덮어쓴다.
# 체크박스 토글 경로처럼 {content}만 PATCH 하면 기존 title을 보존해야 함
# (이전엔 None→_auto_title(content)로 제목이 체크박스 라인으로 덮어씌워지는 버그).
if "title" in body.model_fields_set:
doc.title = body.title.strip() if body.title and body.title.strip() else _auto_title(content)
elif not (doc.title or "").strip():
# 기존 title이 비어 있던 경우만 보강
doc.title = _auto_title(content)
doc.user_tags = _parse_hashtags(content)
# stale AI 데이터 즉시 초기화
doc.ai_summary = None
doc.ai_domain = None
doc.ai_sub_group = None
doc.ai_tags = None
doc.ai_confidence = None
doc.ai_processed_at = None
doc.embedding = None
doc.embedded_at = None
# 기존 chunks 삭제
from models.chunk import DocumentChunk
await session.execute(
delete(DocumentChunk).where(DocumentChunk.doc_id == memo_id)
)
# 재처리 큐 등록
await _enqueue_ai_stages(session, memo_id)
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
@router.patch("/{memo_id}/tasks/{task_index}", response_model=MemoResponse)
async def toggle_memo_task(
memo_id: int,
task_index: int,
body: TaskToggle,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 체크박스 토글 전용 엔드포인트.
N번째 markdown task line의 체크 상태를 설정하고 memo_task_state에 시각 기록.
AI 재처리(classify/embed/chunk)는 **의도적으로 스킵** — 체크박스 한 번에 재분석을 트리거하는 건 과하다.
같은 row를 동시에 토글하는 race 방지를 위해 SELECT ... FOR UPDATE 사용.
"""
# ❶ FOR UPDATE: 같은 row 동시 토글 race 차단 (JSONB 전체 replace라 필수)
doc = await session.get(Document, memo_id, with_for_update=True)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
state = dict(doc.memo_task_state or {})
key = str(task_index)
# ❷ content의 N번째 task line 토글
new_content, found = _toggle_task_line(doc.extracted_text or "", task_index, body.checked)
if not found:
# drift: 사용자가 본문 편집으로 task_index 매칭이 깨짐 → stale state만 정리하고 200 OK
stale_removed = key in state
if stale_removed:
state.pop(key, None)
doc.memo_task_state = state
await session.commit()
await session.refresh(doc)
logger.info(
"memo_task_toggle_drift memo_id=%s task_index=%s stale_removed=%s",
memo_id, task_index, stale_removed,
)
return _to_memo_response(doc)
doc.extracted_text = new_content
doc.file_hash = _content_hash(new_content)
doc.file_size = len(new_content.encode("utf-8"))
# ❸ task_state 갱신 (JSONB 전체 replace — FOR UPDATE lock 아래라 race safe)
if body.checked:
state[key] = {"checked_at": datetime.now(timezone.utc).isoformat()}
else:
state.pop(key, None)
doc.memo_task_state = state
doc.updated_at = datetime.now(timezone.utc)
# AI 재처리 / user_tags 재파싱 / chunks 삭제 / queue enqueue — 모두 의도적 스킵.
# 왜 스킵하는지 나중에 디버깅하지 않아도 되도록 명시 로그.
logger.info(
"memo_task_toggle_skip_ai memo_id=%s task_index=%s checked=%s",
memo_id, task_index, body.checked,
)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
@router.delete("/{memo_id}", status_code=204)
async def delete_memo(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 soft delete"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.deleted_at = datetime.now(timezone.utc)
await session.commit()
@router.patch("/{memo_id}/pin", response_model=MemoResponse)
async def toggle_pin(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 핀 토글"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.pinned = not doc.pinned
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
@router.patch("/{memo_id}/archive", response_model=MemoResponse)
async def set_archive(
memo_id: int,
body: ArchiveSet,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 아카이브 설정 (멱등, 토글 아님)"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.archived = body.archived
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
@router.patch("/{memo_id}/ask-includable", response_model=MemoResponse)
async def toggle_ask_includable(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""/ask 합성 포함 여부 토글"""
doc = await session.get(Document, memo_id)
if not doc or doc.file_type != "note" or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.ask_includable = not doc.ask_includable
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
# ─── Memo Intake Upgrade PR-2B: promote to event ───
class PromotePayload(BaseModel):
"""메모 → events 승급. kind 미지정 시 documents.ai_event_kind 사용.
AI worker 는 events row 직접 생성 X — 본 endpoint 만이 사용자 의도 channel.
"""
kind: str | None = None # 'task' | 'calendar_event' | 'activity_log'
due_at: datetime | None = None
start_at: datetime | None = None
end_at: datetime | None = None
started_at: datetime | None = None
ended_at: datetime | None = None
priority: int | None = None
project_tag: str | None = None
_PROMOTE_KIND_MAP = {
# AI 추천 (event_kind_hint) → events.kind
"task": "task",
"calendar_event": "calendar_event",
"activity_log": "activity_log",
# 'note' / 'reference' 는 promote 대상 아님 (사용자가 명시 kind 지정 필요)
}
@router.post("/{memo_id}/promote-to-event", status_code=201)
async def promote_memo_to_event(
memo_id: int,
body: PromotePayload,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""메모 1건 → events row 1건 생성. memo_document_id 자동 link.
kind 결정 순서: body.kind > documents.ai_event_kind > 400 거부.
한 메모 → N events 가능 (정책: dedup 없음, 사용자 의도 따라).
"""
doc = await session.get(Document, memo_id)
if (
not doc
or doc.deleted_at is not None
or doc.source_channel not in ("memo", "voice")
):
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
# kind 결정
requested = (body.kind or "").strip().lower() or None
ai_hint = (doc.ai_event_kind or "").strip().lower() or None
chosen = requested or ai_hint
event_kind = _PROMOTE_KIND_MAP.get(chosen or "")
if not event_kind:
raise HTTPException(
status_code=400,
detail="promote 할 kind 가 명확하지 않습니다 (task/calendar_event/activity_log 중 1개 지정 또는 ai_event_kind 필요)",
)
# 시간 필드 default — activity_log 는 빠른 행동 기록 UX 그대로
now = datetime.now(timezone.utc)
started_at = body.started_at
ended_at = body.ended_at
completed_at: datetime | None = None
status_val = "inbox"
if event_kind == "activity_log":
ended_at = ended_at or now
started_at = started_at or ended_at
completed_at = now
status_val = "done"
elif event_kind == "calendar_event":
status_val = "scheduled" if body.start_at else "inbox"
title = (doc.title or "").strip() or "메모"
description = doc.extracted_text
ev = Event(
title=title,
description=description,
kind=event_kind,
status=status_val,
due_at=body.due_at,
start_at=body.start_at,
end_at=body.end_at,
started_at=started_at,
ended_at=ended_at,
completed_at=completed_at,
priority=body.priority,
project_tag=body.project_tag,
source="memo",
source_ref=str(doc.id), # 같은 메모 N promote 시 별 row → dedup 의도 X
raw_metadata={
"memo_id": doc.id,
"ai_event_kind": doc.ai_event_kind,
"ai_event_confidence": doc.ai_event_confidence,
"promoted_at": now.isoformat(),
},
memo_document_id=doc.id,
user_id=user.id,
created_by="manual",
)
session.add(ev)
await session.flush()
# events_history.create row (events 도메인 패턴 — events/api/events.py 의 _record_history 와 동일 형태)
history = EventHistory(
event_id=ev.id,
changed_by="manual",
change_kind="create",
before=None,
after={
"id": ev.id,
"title": ev.title,
"kind": ev.kind,
"status": ev.status,
"source": ev.source,
"memo_document_id": ev.memo_document_id,
},
)
session.add(history)
await session.commit()
await session.refresh(ev)
return {
"event_id": ev.id,
"kind": ev.kind,
"status": ev.status,
"memo_document_id": ev.memo_document_id,
}
@router.post("/{memo_id}/dismiss-event-suggestion", response_model=MemoResponse)
async def dismiss_event_suggestion(
memo_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""'그냥 메모' — AI 추천 무시 + ai_event_kind='note' 강제. 4 버튼 숨김 신호.
MVP: AI 추천값과 사용자 확정값을 같은 컬럼에 저장 (정확도 측정 흐려짐 가능).
백로그: user_event_kind 별 컬럼 분리 (plan Memo Intake Upgrade 백로그).
"""
doc = await session.get(Document, memo_id)
if (
not doc
or doc.deleted_at is not None
or doc.source_channel not in ("memo", "voice")
):
raise HTTPException(status_code=404, detail="메모를 찾을 수 없습니다")
doc.ai_event_kind = "note"
doc.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
# ─── Memo Intake Upgrade PR-2C: voice upload ───
@router.post("/voice", response_model=MemoResponse, status_code=201)
async def upload_voice_memo(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
audio: UploadFile = File(...),
recorded_at: str | None = Form(None),
device_hint: str | None = Form(None),
):
"""애플워치 / 모바일 / 기타 음성 메모 업로드 → STT 큐 → 자동 분류.
PR-2C: source_channel='voice' + file_type='audio'. 기존 stt_worker → classify
파이프라인 자동 통과. plan 원칙: AI worker 는 events 직접 생성 X.
"""
# Content-Type 검증
if audio.content_type and not audio.content_type.startswith(VOICE_ALLOWED_CONTENT_PREFIXES):
raise HTTPException(status_code=415, detail=f"지원되지 않는 Content-Type: {audio.content_type}")
# 확장자 결정
orig_name = audio.filename or ""
ext = (Path(orig_name).suffix or "").lower()
if ext and ext not in VOICE_ALLOWED_EXTS:
raise HTTPException(status_code=415, detail=f"지원되지 않는 확장자: {ext}")
if not ext:
# content_type 으로 추정 (audio/m4a 등)
ext = ".m4a"
# 본문 읽기 + size 검증
payload: bytes = await audio.read()
if len(payload) > VOICE_MAX_BYTES:
raise HTTPException(status_code=413, detail=f"50MB 초과 ({len(payload)//1024//1024}MB)")
if len(payload) == 0:
raise HTTPException(status_code=400, detail="빈 audio")
# 저장 경로 (NAS) — fastapi 컨테이너 안 /documents = NAS mount
nas_root = Path(settings.nas_mount_path)
yyyy_mm = datetime.now(timezone.utc).astimezone().strftime("%Y-%m")
target_dir = nas_root / VOICE_NAS_SUBDIR / yyyy_mm
target_dir.mkdir(parents=True, exist_ok=True)
file_uuid = uuid.uuid4().hex
target_path = target_dir / f"{file_uuid}{ext}"
# fsync + rename(atomic) 패턴 — NAS soft mount 안전 (feedback_nfs_korean_path_normalize 결)
tmp_path = target_path.with_suffix(target_path.suffix + ".tmp")
try:
with open(tmp_path, "wb") as fh:
fh.write(payload)
fh.flush()
os.fsync(fh.fileno())
os.replace(tmp_path, target_path)
except OSError as e:
# NAS 쓰기 실패 graceful — DB row 미생성
if tmp_path.exists():
try:
tmp_path.unlink()
except OSError:
pass
logger.error("voice upload NAS write 실패: %s", e)
raise HTTPException(status_code=503, detail="NAS 저장 실패 (재시도 권장)")
# recorded_at 파싱
rec_at: datetime | None = None
if recorded_at:
try:
rec_at = datetime.fromisoformat(recorded_at.replace("Z", "+00:00"))
except ValueError:
rec_at = None
raw_metadata: dict[str, Any] = {}
if device_hint:
raw_metadata["device_hint"] = device_hint
if rec_at:
raw_metadata["recorded_at"] = rec_at.isoformat()
# file_path 는 NAS root 기준 상대 경로 (다른 documents 컨벤션, /api/documents/{id}/file endpoint 호환)
relative_path = target_path.relative_to(nas_root)
# Document row — file_type='immutable' (binary, doc_type enum 제약) + category='audio' + source_channel='voice'
# 기존 audio 컨테이너 인입과 같은 패턴. source_channel='voice' 로 일반 audio 와 구분.
title_seed = (orig_name or "음성 메모").rsplit(".", 1)[0]
doc = Document(
file_path=str(relative_path),
file_hash=hashlib.sha256(payload).hexdigest(),
file_format=ext.lstrip(".") or "m4a",
file_size=len(payload),
file_type="immutable",
title=title_seed[:80] or "음성 메모",
extracted_text=None, # STT 후 채움
review_status="approved",
source_channel="voice",
category="audio",
ask_includable=True,
pinned=False,
archived=False,
memo_task_state={},
extract_meta=raw_metadata or None,
)
session.add(doc)
await session.flush()
# STT 큐 등록 — 기존 stt_worker → classify → embed → chunk 파이프라인 자동
await enqueue_stage(session, doc.id, "stt")
await session.commit()
await session.refresh(doc)
return _to_memo_response(doc)
+9 -39
View File
@@ -8,7 +8,7 @@ from pydantic import BaseModel
from sqlalchemy import String, select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user, require_admin
from core.auth import get_current_user
from core.database import get_session
from models.news_source import NewsSource
from models.user import User
@@ -60,14 +60,9 @@ async def list_sources(
@router.post("/sources")
async def create_source(
body: NewsSourceCreate,
user: Annotated[User, Depends(require_admin)],
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
from core.url_validator import validate_feed_url
try:
validate_feed_url(body.feed_url)
except ValueError as e:
raise HTTPException(status_code=422, detail=f"feed_url 검증 실패: {e}")
source = NewsSource(**body.model_dump())
session.add(source)
await session.commit()
@@ -78,18 +73,12 @@ async def create_source(
async def update_source(
source_id: int,
body: NewsSourceUpdate,
user: Annotated[User, Depends(require_admin)],
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
source = await session.get(NewsSource, source_id)
if not source:
raise HTTPException(status_code=404)
if body.feed_url is not None:
from core.url_validator import validate_feed_url
try:
validate_feed_url(body.feed_url)
except ValueError as e:
raise HTTPException(status_code=422, detail=f"feed_url 검증 실패: {e}")
for field, value in body.model_dump(exclude_unset=True).items():
setattr(source, field, value)
await session.commit()
@@ -99,7 +88,7 @@ async def update_source(
@router.delete("/sources/{source_id}")
async def delete_source(
source_id: int,
user: Annotated[User, Depends(require_admin)],
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
source = await session.get(NewsSource, source_id)
@@ -116,7 +105,6 @@ async def list_articles(
session: Annotated[AsyncSession, Depends(get_session)],
source: str | None = None,
unread_only: bool = False,
pinned_only: bool = False,
page: int = 1,
page_size: int = 30,
):
@@ -139,8 +127,6 @@ async def list_articles(
query = query.where(Document.ai_sub_group == source)
if unread_only:
query = query.where(Document.is_read == False)
if pinned_only:
query = query.where(Document.pinned.is_(True))
count_q = select(func.count()).select_from(query.subquery())
total = (await session.execute(count_q)).scalar()
@@ -176,28 +162,12 @@ async def mark_all_read(
return {"marked": result.rowcount}
import asyncio
_collect_lock = asyncio.Lock()
@router.post("/collect")
async def trigger_collect(
user: Annotated[User, Depends(require_admin)],
user: Annotated[User, Depends(get_current_user)],
):
"""수동 수집 트리거 (admin 전용).
asyncio.Lock은 단일 프로세스/이벤트루프 기준.
현재 FastAPI 단일 인스턴스 운영이므로 유효하지만,
scale-out 시 DB advisory lock으로 교체 필요.
"""
if _collect_lock.locked():
raise HTTPException(status_code=429, detail="수집이 이미 진행 중입니다")
async def _run_with_lock():
async with _collect_lock:
from workers.news_collector import run
await run()
asyncio.create_task(_run_with_lock())
"""수동 수집 트리거"""
from workers.news_collector import run
import asyncio
asyncio.create_task(run())
return {"message": "뉴스 수집 시작됨"}
+78 -1052
View File
File diff suppressed because it is too large Load Diff
-2
View File
@@ -8,7 +8,6 @@ from pathlib import Path
from typing import Annotated
import pyotp
from datetime import datetime, timezone
from fastapi import APIRouter, Depends, HTTPException, Request, status
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
@@ -138,7 +137,6 @@ async def create_admin(
username=body.username,
password_hash=hash_password(body.password),
is_active=True,
password_changed_at=datetime.now(timezone.utc),
)
session.add(user)
await session.commit()
-417
View File
@@ -1,417 +0,0 @@
"""study_cards API — 암기카드 검수 (공부 암기노트 Phase 1 검수 UI).
needs_review=true 카드를 '출처 문제별 그룹'으로 보고 채택(approve)/수정(edit)/폐기(delete).
별 라우터(prefix=/api/study-cards)라 /api/study-questions/{id} 와 경로 충돌 없음.
정적 경로(/needs-review/count, /approve-batch)는 /{card_id} 보다 먼저 정의.
결정(2026-06-07):
- 수정(cue/fact/cloze 편집) 시 dedup_hash 재계산 + needs_review=false(사용자 확정본). flagged 클리어.
- 전체 일괄승인 버튼 없음 — approve-batch 는 source_question_id 단위(그 문제의 카드만).
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import and_, func, or_, select, update
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.study_memo_card import StudyMemoCard, StudyMemoCardEvidence, record_card_view
from models.study_memo_card_progress import StudyMemoCardProgress, rate_card
from models.study_question import StudyQuestion
from models.user import User
from services.study.card_normalize import compute_dedup_hash
router = APIRouter()
class CardEvidence(BaseModel):
source_type: str
source_id: int | None = None
snippet: str | None = None
class CardItem(BaseModel):
id: int
source_kind: str = "question"
format: str
cue: str
fact: str
cloze_text: str | None = None
needs_review: bool
flagged_by: str | None = None
evidence: list[CardEvidence] = []
# 복습(SR) 큐에서만 채움 — 정답('암') 시 다음 복습일 미리보기 라벨 계산용
# (stage별 동적: +3/7/14일·졸업). deck/검수 응답에선 None.
review_stage: int | None = None
class CardQuestionGroup(BaseModel):
source_question_id: int | None = None
question_text: str | None = None
correct_choice: int | None = None
cards: list[CardItem] = []
class CardUpdate(BaseModel):
needs_review: bool | None = None
cue: str | None = None
fact: str | None = None
cloze_text: str | None = None
class ApproveBatch(BaseModel):
source_question_id: int
class RateBody(BaseModel):
outcome: str # 암/애매/모름 또는 correct/unsure/wrong
class RateResult(BaseModel):
card_id: int
outcome: str
review_stage: int | None = None
due_at: datetime | None = None
# 자기평가 read-time 매핑 (신규 enum 0 — last_outcome 어휘는 기존 4종 재사용)
_RATE_MAP = {
"": "correct", "애매": "unsure", "모름": "wrong",
"correct": "correct", "unsure": "unsure", "wrong": "wrong",
}
async def _build_card_items(
session: AsyncSession,
cards: list[StudyMemoCard],
stages: dict[int, int | None] | None = None,
) -> list[CardItem]:
"""카드 목록 → CardItem(evidence 동반). due/deck 학습 flow 공용.
stages: card_id → review_stage (복습 큐에서만 전달, 동적 라벨 미리보기용).
"""
if not cards:
return []
stages = stages or {}
ids = [c.id for c in cards]
ev_rows = (
await session.execute(
select(StudyMemoCardEvidence).where(StudyMemoCardEvidence.card_id.in_(ids))
)
).scalars().all()
ev_by: dict[int, list[CardEvidence]] = {}
for e in ev_rows:
ev_by.setdefault(e.card_id, []).append(
CardEvidence(source_type=e.source_type, source_id=e.source_id, snippet=e.snippet)
)
return [
CardItem(
id=c.id, source_kind=c.source_kind, format=c.format, cue=c.cue, fact=c.fact,
cloze_text=c.cloze_text, needs_review=c.needs_review, flagged_by=c.flagged_by,
evidence=ev_by.get(c.id, []), review_stage=stages.get(c.id),
)
for c in cards
]
def _verify_card(card: StudyMemoCard | None, user: User) -> StudyMemoCard:
if card is None or card.user_id != user.id or card.deleted_at is not None:
raise HTTPException(status_code=404, detail="카드를 찾을 수 없습니다")
return card
@router.get("/needs-review/count")
async def count_needs_review_cards(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""검수 대기 카드 수 (배지용)."""
n = (
await session.execute(
select(func.count())
.select_from(StudyMemoCard)
.where(
StudyMemoCard.user_id == user.id,
StudyMemoCard.deleted_at.is_(None),
StudyMemoCard.needs_review,
)
)
).scalar_one()
return {"count": n}
@router.get("", response_model=list[CardQuestionGroup])
async def list_cards(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
needs_review: Annotated[bool, Query()] = True,
format: Annotated[str | None, Query()] = None,
limit: Annotated[int, Query(ge=1, le=2000)] = 600,
):
"""카드 목록 — 출처 문제별 그룹. 기본 needs_review=true 검수 큐."""
conds = [StudyMemoCard.user_id == user.id, StudyMemoCard.deleted_at.is_(None)]
if needs_review:
conds.append(StudyMemoCard.needs_review)
if format in ("qa", "cloze"):
conds.append(StudyMemoCard.format == format)
rows = (
await session.execute(
select(StudyMemoCard)
.where(*conds)
.order_by(StudyMemoCard.source_question_id.asc().nulls_last(), StudyMemoCard.id.asc())
.limit(limit)
)
).scalars().all()
if not rows:
return []
# evidence 일괄 조회
card_ids = [c.id for c in rows]
ev_rows = (
await session.execute(
select(StudyMemoCardEvidence).where(StudyMemoCardEvidence.card_id.in_(card_ids))
)
).scalars().all()
ev_by_card: dict[int, list[CardEvidence]] = {}
for e in ev_rows:
ev_by_card.setdefault(e.card_id, []).append(
CardEvidence(source_type=e.source_type, source_id=e.source_id, snippet=e.snippet)
)
# 출처 문제 메타 일괄 조회
qids = sorted({c.source_question_id for c in rows if c.source_question_id is not None})
q_meta: dict[int, tuple[str, int]] = {}
if qids:
q_rows = (
await session.execute(
select(StudyQuestion.id, StudyQuestion.question_text, StudyQuestion.correct_choice)
.where(StudyQuestion.id.in_(qids))
)
).all()
q_meta = {r.id: (r.question_text, r.correct_choice) for r in q_rows}
# 그룹핑 (출제순서=rows 순서 유지). question 카드는 출처 문제별,
# manual(직접 추가) 카드는 extra.material 별로 묶는다.
groups: dict[str, CardQuestionGroup] = {}
order: list[str] = []
for c in rows:
if c.source_question_id is not None:
gkey = f"q:{c.source_question_id}"
else:
material = c.extra.get("material") if isinstance(c.extra, dict) else None
gkey = f"m:{material or '직접 추가'}"
if gkey not in groups:
if c.source_question_id is not None:
qt, cc = q_meta.get(c.source_question_id, (None, None))
groups[gkey] = CardQuestionGroup(
source_question_id=c.source_question_id, question_text=qt, correct_choice=cc, cards=[]
)
else:
material = c.extra.get("material") if isinstance(c.extra, dict) else None
groups[gkey] = CardQuestionGroup(
source_question_id=None,
question_text=(f"[자료] {material}" if material else "직접 추가 카드"),
correct_choice=None, cards=[],
)
order.append(gkey)
groups[gkey].cards.append(
CardItem(
id=c.id, source_kind=c.source_kind, format=c.format, cue=c.cue, fact=c.fact,
cloze_text=c.cloze_text, needs_review=c.needs_review, flagged_by=c.flagged_by,
evidence=ev_by_card.get(c.id, []),
)
)
return [groups[k] for k in order]
@router.post("/approve-batch")
async def approve_batch(
body: ApproveBatch,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""한 출처 문제의 검수 대기 카드를 일괄 승인(needs_review=false). 전체 일괄승인은 없음."""
result = await session.execute(
update(StudyMemoCard)
.where(
StudyMemoCard.user_id == user.id,
StudyMemoCard.source_question_id == body.source_question_id,
StudyMemoCard.deleted_at.is_(None),
StudyMemoCard.needs_review,
)
.values(needs_review=False, flagged_by=None, flagged_at=None)
)
await session.commit()
return {"approved": result.rowcount or 0}
# ─── 복습(SR) 트랙 ───
@router.get("/due", response_model=list[CardItem])
async def due_cards(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
limit: Annotated[int, Query(ge=1, le=200)] = 30,
):
"""오늘 복습할 카드 (검수 통과만). 두 부류:
- 신규 승인 카드(progress 없음=첫 회상 전) — SR 큐 진입 경로(첫 회상). ''이면 due 안
박고 종료('큐 폭발 방지'), 애매/모름이면 평가 즉시 due(내일)로 입고.
- 예정 due 카드(due_at<=now, stage<4).
progress 는 user+card UNIQUE 라 outer join 으로 최대 1행. 예정 due 먼저, 신규(due NULL) 뒤로."""
now = datetime.now(timezone.utc)
P = StudyMemoCardProgress
rows = (
await session.execute(
select(StudyMemoCard, P.review_stage)
.outerjoin(P, and_(P.card_id == StudyMemoCard.id, P.user_id == user.id))
.where(
StudyMemoCard.user_id == user.id,
StudyMemoCard.deleted_at.is_(None),
StudyMemoCard.needs_review.is_(False),
or_(
P.id.is_(None), # 신규(첫 회상 전) — progress 미생성
and_(
P.due_at.is_not(None),
P.due_at <= now,
or_(P.review_stage.is_(None), P.review_stage < 4),
),
),
)
.order_by(P.due_at.asc().nulls_last(), StudyMemoCard.id.asc())
.limit(limit)
)
).all()
cards = [r[0] for r in rows]
stages = {r[0].id: r[1] for r in rows}
return await _build_card_items(session, cards, stages)
@router.post("/{card_id}/rate", response_model=RateResult)
async def rate(
card_id: int,
body: RateBody,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""카드 자기평가(암/애매/모름) → SR 즉시 자동 입고."""
card = await session.get(StudyMemoCard, card_id)
card = _verify_card(card, user)
if card.needs_review:
raise HTTPException(status_code=400, detail="검수 안 된 카드는 복습(SR) 대상이 아닙니다")
outcome = _RATE_MAP.get((body.outcome or "").strip())
if outcome is None:
raise HTTPException(status_code=422, detail=f"invalid outcome: {body.outcome!r}")
progress = await rate_card(session, card=card, outcome=outcome, now=datetime.now(timezone.utc))
await session.commit()
return RateResult(
card_id=card.id, outcome=outcome, review_stage=progress.review_stage, due_at=progress.due_at
)
# ─── 그냥 공부(cram) 트랙 — 봤다 기록, SR 무관 ───
@router.get("/deck", response_model=list[CardItem])
async def deck(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
material: Annotated[str | None, Query()] = None,
format: Annotated[str | None, Query()] = None,
limit: Annotated[int, Query(ge=1, le=100)] = 20,
):
"""'그냥 공부'(cram) 덱 — 검수 통과 카드를 덜 본 순서로. material/format 필터. SR 무관."""
conds = [
StudyMemoCard.user_id == user.id,
StudyMemoCard.deleted_at.is_(None),
StudyMemoCard.needs_review.is_(False),
]
if format in ("qa", "cloze"):
conds.append(StudyMemoCard.format == format)
if material:
conds.append(StudyMemoCard.extra["material"].astext == material)
rows = (
await session.execute(
select(StudyMemoCard)
.where(*conds)
.order_by(StudyMemoCard.last_viewed_at.asc().nulls_first(), StudyMemoCard.id.asc())
.limit(limit)
)
).scalars().all()
return await _build_card_items(session, list(rows))
@router.post("/{card_id}/view", status_code=204)
async def view_card(
card_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""'그냥 공부' 봤다 기록 (view_count++, SR 무관)."""
ok = await record_card_view(session, user_id=user.id, card_id=card_id)
await session.commit()
if not ok:
raise HTTPException(status_code=404, detail="카드를 찾을 수 없습니다")
@router.patch("/{card_id}", response_model=CardItem)
async def update_card(
card_id: int,
body: CardUpdate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""승인(needs_review=false) 또는 수정(cue/fact/cloze). 내용 수정 시 dedup_hash 재계산 + 검수완료."""
card = await session.get(StudyMemoCard, card_id)
card = _verify_card(card, user)
fields_set = body.model_fields_set
content_changed = False
for fname in {"cue", "fact", "cloze_text"} & fields_set:
setattr(card, fname, getattr(body, fname))
content_changed = True
if content_changed:
# 정답 토큰(fact) 기준 dedup_hash 재계산 + 사용자 확정본 → 검수 완료.
card.dedup_hash = compute_dedup_hash(card.source_question_id, card.format, card.fact)
card.needs_review = False
card.flagged_by = None
card.flagged_at = None
elif "needs_review" in fields_set:
card.needs_review = bool(body.needs_review)
if card.needs_review:
card.flagged_by = "user"
card.flagged_at = datetime.now(timezone.utc)
else:
card.flagged_by = None
card.flagged_at = None
try:
await session.commit()
except IntegrityError:
await session.rollback()
raise HTTPException(status_code=409, detail="같은 정답의 중복 카드가 이미 있습니다")
return CardItem(
id=card.id, source_kind=card.source_kind, format=card.format, cue=card.cue, fact=card.fact,
cloze_text=card.cloze_text, needs_review=card.needs_review, flagged_by=card.flagged_by, evidence=[],
)
@router.delete("/{card_id}", status_code=204)
async def delete_card(
card_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""저품질 카드 soft-delete. partial unique(WHERE deleted_at IS NULL)가 자연 정합."""
card = await session.get(StudyMemoCard, card_id)
card = _verify_card(card, user)
card.deleted_at = datetime.now(timezone.utc)
await session.commit()
-728
View File
@@ -1,728 +0,0 @@
"""학습 진행 상태 (progress) API — review-complete + review-queue + stats.
review-complete: 사용자가 오답/모르겠음 문제를 검토했음을 표시. due_at 최초 부여.
review-queue: 5 (due_today / pending_review / chronic / regressed / mastered) 으로 progress 조회.
stats (Phase 2-D): 통계 대시보드 진척도 / 패턴 분포 / 복습 / 세션 추이 / 일별 풀이량 / 과목별.
"""
from __future__ import annotations
from datetime import date, datetime, timedelta, timezone
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import and_, case, cast, func, or_, select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.types import Date as SQLDate
from core.auth import get_current_user
from core.database import get_session
from models.study_question import StudyQuestion, StudyQuestionAttempt
from models.study_question_progress import StudyQuestionProgress
from models.study_quiz_session import StudyQuizSession
from models.study_topic import StudyTopic
from models.user import User
router = APIRouter(prefix="/study-topics", tags=["study-progress"])
# 1차 due_at 부여 시 디폴트 1일 뒤 — SR 상수는 sr_schedule.py 단일 source (재-export).
from services.study.sr_schedule import DEFAULT_FIRST_DUE_DAYS # noqa: E402,F401
def _verify_topic_owner(topic: StudyTopic | None, user: User) -> None:
if topic is None or topic.deleted_at is not None or topic.user_id != user.id:
raise HTTPException(status_code=404, detail="주제를 찾을 수 없습니다")
@router.post("/{topic_id}/questions/{question_id}/review-complete", status_code=204)
async def review_complete(
topic_id: int,
question_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""확인완료 처리 — last_reviewed_at + (wrong/unsure 인 경우) due_at 최초 부여.
이미 due_at 박힌 문제면 due_at 그대로 유지 ( 위치 보존).
정답 맞춘 문제면 due_at 박지 않음 ( 폭발 방지).
"""
topic = await session.get(StudyTopic, topic_id)
_verify_topic_owner(topic, user)
q = await session.get(StudyQuestion, question_id)
if q is None or q.deleted_at is not None or q.user_id != user.id or q.study_topic_id != topic_id:
raise HTTPException(status_code=404, detail="문제를 찾을 수 없습니다")
progress = (
await session.execute(
select(StudyQuestionProgress).where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.study_question_id == question_id,
)
)
).scalar_one_or_none()
if progress is None:
# attempt 없는데 review-complete 시도. 진척 상태가 없어 의미 없음.
raise HTTPException(status_code=409, detail="아직 시도한 적이 없는 문제입니다")
now = datetime.now(timezone.utc)
progress.last_reviewed_at = now
# due_at 최초 부여는 wrong/unsure 일 때만. 이미 박혀있으면 유지.
if progress.last_outcome in ("wrong", "unsure") and progress.due_at is None:
progress.review_stage = 0
progress.due_at = now + timedelta(days=DEFAULT_FIRST_DUE_DAYS)
await session.commit()
# ─── review-queue ───
class ReviewQueueItem(BaseModel):
question_id: int
question_text: str
subject: str | None
scope: str | None
exam_round: str | None
exam_question_number: int | None
last_outcome: str | None
last_attempted_at: datetime | None
last_reviewed_at: datetime | None
due_at: datetime | None
review_stage: int | None
pattern_state: str | None
class ReviewQueueResponse(BaseModel):
tab: str
total: int
items: list[ReviewQueueItem]
page: int
page_size: int
# Phase 2-F: due_today 탭에서만 채움. due_at < today 0시 (UTC) + stage < 4.
# UI 가 "정체 N건" 경고 + [정리] 버튼 노출 판단에 사용.
overdue_count: int = 0
def _truncate(text: str, n: int = 80) -> str:
if not text:
return ""
s = text.strip()
return s if len(s) <= n else s[:n].rstrip() + ""
@router.get("/{topic_id}/review-queue", response_model=ReviewQueueResponse)
async def review_queue(
topic_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
tab: str = Query(..., pattern="^(due_today|pending_review|chronic|regressed|mastered)$"),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
"""5 탭 진행 상태 조회.
- due_today: progress.due_at <= now() AND review_stage < 4
- pending_review: last_outcome IN (wrong, unsure)
AND (last_reviewed_at IS NULL OR last_reviewed_at < last_attempted_at)
- chronic: pattern_state = 'chronic_wrong'
- regressed: pattern_state = 'regressed'
- mastered: review_stage >= 4
"""
topic = await session.get(StudyTopic, topic_id)
_verify_topic_owner(topic, user)
base = (
select(StudyQuestionProgress, StudyQuestion)
.join(StudyQuestion, StudyQuestion.id == StudyQuestionProgress.study_question_id)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestion.deleted_at.is_(None),
)
)
now = datetime.now(timezone.utc)
if tab == "due_today":
base = base.where(
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at <= now,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
).order_by(StudyQuestionProgress.due_at.asc())
elif tab == "pending_review":
base = base.where(
StudyQuestionProgress.last_outcome.in_(("wrong", "unsure")),
or_(
StudyQuestionProgress.last_reviewed_at.is_(None),
and_(
StudyQuestionProgress.last_reviewed_at.is_not(None),
StudyQuestionProgress.last_attempted_at.is_not(None),
StudyQuestionProgress.last_reviewed_at
< StudyQuestionProgress.last_attempted_at,
),
),
).order_by(StudyQuestionProgress.last_attempted_at.desc().nulls_last())
elif tab == "chronic":
base = base.where(
StudyQuestionProgress.pattern_state == "chronic_wrong",
).order_by(StudyQuestionProgress.last_attempted_at.desc().nulls_last())
elif tab == "regressed":
base = base.where(
StudyQuestionProgress.pattern_state == "regressed",
).order_by(StudyQuestionProgress.last_attempted_at.desc().nulls_last())
elif tab == "mastered":
base = base.where(
StudyQuestionProgress.review_stage.is_not(None),
StudyQuestionProgress.review_stage >= 4,
).order_by(StudyQuestionProgress.last_attempted_at.desc().nulls_last())
# total
total_row = await session.execute(
select(func.count()).select_from(base.subquery())
)
total = int(total_row.scalar() or 0)
# paged
rows = (
await session.execute(
base.offset((page - 1) * page_size).limit(page_size)
)
).all()
items = [
ReviewQueueItem(
question_id=q.id,
question_text=_truncate(q.question_text, 80),
subject=q.subject,
scope=q.scope,
exam_round=q.exam_round,
exam_question_number=q.exam_question_number,
last_outcome=p.last_outcome,
last_attempted_at=p.last_attempted_at,
last_reviewed_at=p.last_reviewed_at,
due_at=p.due_at,
review_stage=p.review_stage,
pattern_state=p.pattern_state,
)
for (p, q) in rows
]
# Phase 2-F: due_today 탭일 때 overdue 카운트 (오늘 0시 UTC 이전 due) — UI 경고 노출용
overdue_count = 0
if tab == "due_today":
today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
overdue_row = await session.execute(
select(func.count())
.select_from(StudyQuestionProgress)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at < today_start,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
)
)
overdue_count = int(overdue_row.scalar() or 0)
return ReviewQueueResponse(
tab=tab, total=total, items=items, page=page, page_size=page_size,
overdue_count=overdue_count,
)
# ─── redistribute (Phase 2-F due_at 정체 정리) ───
class RedistributeRequest(BaseModel):
spread_days: int = 7 # 1~14 일 사이. default 7.
class RedistributeResponse(BaseModel):
redistributed_count: int
spread_days: int
@router.post(
"/{topic_id}/review-queue/redistribute", response_model=RedistributeResponse
)
async def redistribute_overdue(
topic_id: int,
body: RedistributeRequest,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""overdue (due_at < today 0시 UTC + stage < 4) 를 내일~spread_days 일에 round-robin 분산.
동작:
- 오늘 0 이전에 due 항목 모두 fetch (오래된 )
- i % spread_days + 1 자정 + i*7 (분산용 분단위) due_at 갱신
- review_stage 건드리지 않음 (정체 처리는 시간 재배치만)
"""
if not (1 <= body.spread_days <= 14):
raise HTTPException(status_code=400, detail="spread_days 는 1~14 사이여야 합니다")
topic = await session.get(StudyTopic, topic_id)
_verify_topic_owner(topic, user)
now = datetime.now(timezone.utc)
today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
overdue = (
await session.execute(
select(StudyQuestionProgress)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at < today_start,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
)
.order_by(StudyQuestionProgress.due_at.asc())
)
).scalars().all()
if not overdue:
return RedistributeResponse(redistributed_count=0, spread_days=body.spread_days)
base_day = today_start # 오늘 0시 기준 — +1일부터 분산
for i, p in enumerate(overdue):
days_offset = (i % body.spread_days) + 1
# 같은 날 안에서도 분산하려고 i*7분 추가 (200건 까지 24시간 안에 겹침 없이 spread)
minute_offset = (i * 7) % (24 * 60)
p.due_at = base_day + timedelta(days=days_offset, minutes=minute_offset)
await session.commit()
return RedistributeResponse(
redistributed_count=len(overdue), spread_days=body.spread_days
)
# ─── stats (Phase 2-D 통계 대시보드) ───
class StatsQuestions(BaseModel):
total: int
attempted: int
unattempted: int
class StatsDue(BaseModel):
today: int
this_week: int
later: int
mastered: int
class StatsSessionTrendItem(BaseModel):
id: int
finished_at: datetime
total: int
correct_count: int
wrong_count: int
unsure_count: int
accuracy: int # 0~100
newly_correct_count: int
relapsed_count: int
recovered_count: int
class StatsDailyAttempt(BaseModel):
date: date
count: int
class StatsSubjectBreakdown(BaseModel):
subject: str
total: int
attempted: int
last_correct: int
accuracy: int # 0~100
pending_review: int
chronic: int
class StatsAiExplanation(BaseModel):
"""Phase 4-A 운영 관찰 — AI 풀이 캐시 진척 + 최근 7일 worker 결과."""
# study_questions.ai_explanation_status 분포 (이 토픽 전체)
status_distribution: dict # 'none' / 'ready' / 'failed' / 'skipped' / 'stale' / 'pending'
# wrong/unsure 중 ready 박힌 비율 (캐시 hit 가능성 추정)
target_total: int # progress.last_outcome IN (wrong, unsure) 의 qid 수
target_ready: int # 그 중 ai_explanation_status='ready' 인 수
# 최근 7일 study_question_jobs 의 (status, error_code) 분포
recent_jobs: dict # {'completed': N, 'failed:guard_fail': N, 'failed:parse_fail': N, 'skipped:evidence_missing': N, 'pending': N, ...}
class StatsResponse(BaseModel):
questions: StatsQuestions
pattern_distribution: dict # state(or "unattempted") → count
review_stage_distribution: dict # "0"/"1"/"2"/"3"/"mastered" → count
due: StatsDue
session_trend: list[StatsSessionTrendItem] # 최근 done 세션 newest→oldest
daily_attempts_30d: list[StatsDailyAttempt]
subject_breakdown: list[StatsSubjectBreakdown]
ai_explanation: StatsAiExplanation
@router.get("/{topic_id}/stats", response_model=StatsResponse)
async def topic_stats(
topic_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
session_trend_limit: int = Query(20, ge=1, le=100),
):
"""통계 대시보드 — progress + quiz_sessions + attempts 한 번에 집계.
가벼운 쿼리 6~7 묶음. 1 운영 + 토픽당 progress 수천 가정 추가 인덱스 없이 OK.
"""
topic = await session.get(StudyTopic, topic_id)
_verify_topic_owner(topic, user)
now = datetime.now(timezone.utc)
# 1. 문제 진척도 — 토픽의 question 총수 + progress 행 수 (attempted)
total_q_row = await session.execute(
select(func.count())
.select_from(StudyQuestion)
.where(
StudyQuestion.user_id == user.id,
StudyQuestion.study_topic_id == topic_id,
StudyQuestion.deleted_at.is_(None),
)
)
total_q = int(total_q_row.scalar() or 0)
attempted_row = await session.execute(
select(func.count())
.select_from(StudyQuestionProgress)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.last_outcome.is_not(None),
)
)
attempted = int(attempted_row.scalar() or 0)
unattempted = max(0, total_q - attempted)
# 2. pattern_state 분포 (NULL 은 "unattempted" 로)
pattern_rows = (
await session.execute(
select(
func.coalesce(StudyQuestionProgress.pattern_state, "unattempted").label("state"),
func.count().label("cnt"),
)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
)
.group_by("state")
)
).all()
pattern_distribution = {r.state: int(r.cnt) for r in pattern_rows}
# 모든 키 default 0 채우기 (UI 가 빈 키 처리 안 해도 되게)
for k in ("stable", "unstable", "unsure", "regressed", "recovered", "chronic_wrong", "unattempted"):
pattern_distribution.setdefault(k, 0)
# 한 번도 시도 안 한 (progress 행 자체 없음) 분량을 unattempted 에 합산
pattern_distribution["unattempted"] += unattempted
# 3. review_stage 분포 — 0/1/2/3/mastered (>=4)
stage_rows = (
await session.execute(
select(
StudyQuestionProgress.review_stage.label("stage"),
func.count().label("cnt"),
)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.review_stage.is_not(None),
)
.group_by(StudyQuestionProgress.review_stage)
)
).all()
review_stage_distribution = {"0": 0, "1": 0, "2": 0, "3": 0, "mastered": 0}
for r in stage_rows:
st = int(r.stage)
if st >= 4:
review_stage_distribution["mastered"] += int(r.cnt)
elif 0 <= st <= 3:
review_stage_distribution[str(st)] += int(r.cnt)
# 4. due 분류 — today / this_week / later / mastered
end_today = now.replace(hour=23, minute=59, second=59, microsecond=999999)
end_week = end_today + timedelta(days=7)
due_rows = (
await session.execute(
select(
func.count().filter(
and_(
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at <= end_today,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
)
).label("today"),
func.count().filter(
and_(
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at > end_today,
StudyQuestionProgress.due_at <= end_week,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
)
).label("this_week"),
func.count().filter(
and_(
StudyQuestionProgress.due_at.is_not(None),
StudyQuestionProgress.due_at > end_week,
or_(
StudyQuestionProgress.review_stage.is_(None),
StudyQuestionProgress.review_stage < 4,
),
)
).label("later"),
func.count().filter(
and_(
StudyQuestionProgress.review_stage.is_not(None),
StudyQuestionProgress.review_stage >= 4,
)
).label("mastered"),
)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
)
)
).first()
due = StatsDue(
today=int(due_rows.today or 0),
this_week=int(due_rows.this_week or 0),
later=int(due_rows.later or 0),
mastered=int(due_rows.mastered or 0),
)
# 5. 최근 done 세션 추이 (Phase 2-B 4 컬럼 활용)
sess_rows = (
await session.execute(
select(StudyQuizSession)
.where(
StudyQuizSession.user_id == user.id,
StudyQuizSession.study_topic_id == topic_id,
StudyQuizSession.status == "done",
StudyQuizSession.finished_at.is_not(None),
)
.order_by(StudyQuizSession.finished_at.desc())
.limit(session_trend_limit)
)
).scalars().all()
session_trend: list[StatsSessionTrendItem] = []
for s in sess_rows:
total_n = len(s.question_ids or [])
acc = round((s.correct_count / total_n) * 100) if total_n > 0 else 0
session_trend.append(StatsSessionTrendItem(
id=s.id,
finished_at=s.finished_at,
total=total_n,
correct_count=s.correct_count,
wrong_count=s.wrong_count,
unsure_count=s.unsure_count,
accuracy=acc,
newly_correct_count=s.newly_correct_count,
relapsed_count=s.relapsed_count,
recovered_count=s.recovered_count,
))
# 6. 일별 풀이량 30일 (date 기준 — UTC, 시간대 차이는 Phase 5 후보)
start_30d = (now - timedelta(days=29)).replace(hour=0, minute=0, second=0, microsecond=0)
daily_rows = (
await session.execute(
select(
cast(StudyQuestionAttempt.answered_at, SQLDate).label("d"),
func.count().label("cnt"),
)
.where(
StudyQuestionAttempt.user_id == user.id,
StudyQuestionAttempt.study_topic_id == topic_id,
StudyQuestionAttempt.answered_at >= start_30d,
)
.group_by("d")
.order_by("d")
)
).all()
daily_attempts_30d = [StatsDailyAttempt(date=r.d, count=int(r.cnt)) for r in daily_rows]
# 7. 과목별 약점
subj_rows = (
await session.execute(
select(
func.coalesce(StudyQuestion.subject, "(미분류)").label("subject"),
func.count(StudyQuestion.id.distinct()).label("total"),
func.count(StudyQuestionProgress.id.distinct()).filter(
StudyQuestionProgress.last_outcome.is_not(None)
).label("attempted"),
func.count(StudyQuestionProgress.id.distinct()).filter(
StudyQuestionProgress.last_outcome == "correct"
).label("last_correct"),
func.count(StudyQuestionProgress.id.distinct()).filter(
and_(
StudyQuestionProgress.last_outcome.in_(("wrong", "unsure")),
or_(
StudyQuestionProgress.last_reviewed_at.is_(None),
and_(
StudyQuestionProgress.last_reviewed_at.is_not(None),
StudyQuestionProgress.last_attempted_at.is_not(None),
StudyQuestionProgress.last_reviewed_at
< StudyQuestionProgress.last_attempted_at,
),
),
)
).label("pending_review"),
func.count(StudyQuestionProgress.id.distinct()).filter(
StudyQuestionProgress.pattern_state == "chronic_wrong"
).label("chronic"),
)
.select_from(StudyQuestion)
.outerjoin(
StudyQuestionProgress,
and_(
StudyQuestionProgress.user_id == StudyQuestion.user_id,
StudyQuestionProgress.study_topic_id == StudyQuestion.study_topic_id,
StudyQuestionProgress.study_question_id == StudyQuestion.id,
),
)
.where(
StudyQuestion.user_id == user.id,
StudyQuestion.study_topic_id == topic_id,
StudyQuestion.deleted_at.is_(None),
)
.group_by("subject")
.order_by(func.count(StudyQuestion.id.distinct()).desc())
)
).all()
subject_breakdown = [
StatsSubjectBreakdown(
subject=r.subject,
total=int(r.total),
attempted=int(r.attempted),
last_correct=int(r.last_correct),
accuracy=round((int(r.last_correct) / int(r.attempted)) * 100) if int(r.attempted) > 0 else 0,
pending_review=int(r.pending_review),
chronic=int(r.chronic),
)
for r in subj_rows
]
# 8. Phase 4-A: AI 풀이 캐시 진척 + 최근 7일 worker 결과
# 8a. study_questions.ai_explanation_status 분포 (토픽 전체)
ai_status_rows = (
await session.execute(
select(
func.coalesce(StudyQuestion.ai_explanation_status, "none").label("st"),
func.count().label("cnt"),
)
.where(
StudyQuestion.user_id == user.id,
StudyQuestion.study_topic_id == topic_id,
StudyQuestion.deleted_at.is_(None),
)
.group_by("st")
)
).all()
ai_status_distribution = {r.st: int(r.cnt) for r in ai_status_rows}
for k in ("none", "ready", "failed", "skipped", "stale", "pending"):
ai_status_distribution.setdefault(k, 0)
# 8b. wrong/unsure 의 ready 비율 (캐시 hit 가능성)
target_total_row = await session.execute(
select(func.count())
.select_from(StudyQuestionProgress)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.last_outcome.in_(("wrong", "unsure")),
)
)
target_total = int(target_total_row.scalar() or 0)
target_ready_row = await session.execute(
select(func.count())
.select_from(StudyQuestionProgress)
.join(
StudyQuestion,
and_(
StudyQuestion.id == StudyQuestionProgress.study_question_id,
StudyQuestion.deleted_at.is_(None),
),
)
.where(
StudyQuestionProgress.user_id == user.id,
StudyQuestionProgress.study_topic_id == topic_id,
StudyQuestionProgress.last_outcome.in_(("wrong", "unsure")),
StudyQuestion.ai_explanation_status == "ready",
)
)
target_ready = int(target_ready_row.scalar() or 0)
# 8c. 최근 7일 study_question_jobs 분포 — terminal status × error_code
from models.study_question_job import StudyQuestionJob
recent_cutoff = now - timedelta(days=7)
job_rows = (
await session.execute(
select(
StudyQuestionJob.status.label("st"),
func.coalesce(StudyQuestionJob.error_code, "").label("err"),
func.count().label("cnt"),
)
.join(
StudyQuestion,
and_(
StudyQuestion.id == StudyQuestionJob.study_question_id,
StudyQuestion.study_topic_id == topic_id,
StudyQuestion.user_id == user.id,
),
)
.where(
StudyQuestionJob.user_id == user.id,
StudyQuestionJob.created_at >= recent_cutoff,
)
.group_by("st", "err")
)
).all()
recent_jobs: dict[str, int] = {}
for r in job_rows:
key = f"{r.st}:{r.err}" if r.err else r.st
recent_jobs[key] = int(r.cnt)
return StatsResponse(
questions=StatsQuestions(
total=total_q, attempted=attempted, unattempted=unattempted
),
pattern_distribution=pattern_distribution,
review_stage_distribution=review_stage_distribution,
due=due,
session_trend=session_trend,
daily_attempts_30d=daily_attempts_30d,
subject_breakdown=subject_breakdown,
ai_explanation=StatsAiExplanation(
status_distribution=ai_status_distribution,
target_total=target_total,
target_ready=target_ready,
recent_jobs=recent_jobs,
),
)
File diff suppressed because it is too large Load Diff
-54
View File
@@ -1,54 +0,0 @@
"""study_reminders API — 알람 재료 조회 (공부 암기노트 Phase 1, A 워크스트림).
GET /latest = 가장 최근 발화된 알람 1(현재 due 스냅샷). 없으면 204.
종일 오프라인 과거 슬롯(09/13) 유실 = 의도("현재 due만"). push 채널·디바이스 UX P3.
라우터(prefix=/api/study-reminders) /study-topics·/study-questions 경로와 충돌 회피.
"""
from __future__ import annotations
from datetime import datetime
from typing import Annotated
from fastapi import APIRouter, Depends, Response
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import get_current_user
from core.database import get_session
from models.study_reminder import StudyReminder
from models.user import User
router = APIRouter()
class ReminderResponse(BaseModel):
id: int
due_count: int | None = None
focus_topic_names: list | None = None
fired_at: datetime
@router.get("/latest", response_model=ReminderResponse)
async def latest_reminder(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""현재 due 요약 1건. 없으면 204 No Content."""
row = (
await session.execute(
select(StudyReminder)
.where(StudyReminder.user_id == user.id)
.order_by(StudyReminder.fired_at.desc())
.limit(1)
)
).scalar_one_or_none()
if row is None:
return Response(status_code=204)
return ReminderResponse(
id=row.id,
due_count=row.due_count,
focus_topic_names=row.focus_topic_names,
fired_at=row.fired_at,
)
-927
View File
@@ -1,927 +0,0 @@
"""학습 세션 API — Phase 1 MVP (자격증 + 어학 일반화)
iPad 손글씨 필사 / 모바일 암기노트 / 모바일 퀴즈 같은 study_sessions 데이터를
공유. 모듈은 Phase 1 = iPad 필사 세션 + DB/API 일반화 까지만 다룬다.
핵심:
- study_type 'certification' | 'language' 분기. metadata jsonb 도메인별 자유 메타.
- 단일 *_document_id 컬럼 . 모든 미디어 연결은 study_session_assets 통일.
- documents 본체는 절대 삭제하지 않음 (assets 연결만 해제).
- ownership 검증: study_sessions.user_id == current_user.id (필수).
documents single-user 시스템이라 컬럼 부재 미래 multi-user 대비
`getattr(doc, 'user_id', None)` 부드럽게 검증 ( 있으면 비교, 없으면 통과).
- 409 중복: UNIQUE(study_session_id, document_id, asset_type, role) 위반.
Phase 2~4 미사용 필드 (review_state / quiz / ocr_text / ai_summary / prompt )
스키마에만 존재, 자동 로직 없음. 별도 PR 에서 활성.
"""
import asyncio
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Annotated, Any
from fastapi import (
APIRouter,
Depends,
Form,
HTTPException,
Query,
Request,
UploadFile,
)
from pydantic import BaseModel, Field
from sqlalchemy import and_, delete, func, select
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from starlette.requests import ClientDisconnect
from core.auth import get_current_user
from core.config import settings
from core.database import get_session
from core.utils import file_hash
from models.document import Document
from models.queue import enqueue_stage
from models.study_session import StudySession, StudySessionAsset
from models.user import User
logger = logging.getLogger(__name__)
router = APIRouter()
# ─── Enum 검증 상수 ───
VALID_STUDY_TYPES: set[str] = {"certification", "language"}
VALID_MODES: set[str] = {
"copy", "trace", "blank-repeat",
"dictation", "shadowing",
"quiz", "flashcard", # Phase 2~4 활성, schema 만 수용
}
VALID_ASSET_TYPES: set[str] = {
"source_scan", "handwriting_png", "audio", "video", "transcript", "reference",
}
VALID_ROLES: set[str | None] = {
None,
"prompt", "answer", "pronunciation", "lecture",
"listening_source", "shadowing_source", "reference",
}
VALID_REVIEW_STATES: set[str | None] = {
None, "new", "learning", "weak", "mastered",
}
VALID_ORDERS: set[str] = {"created_at", "next_review_at", "last_quiz_at"}
# ─── Helpers ───
def _upload_error(status_code: int, error_code: str, message: str) -> HTTPException:
"""업로드 실패 응답 — documents.py 와 동일한 패턴."""
return HTTPException(
status_code=status_code,
detail={"error_code": error_code, "message": message},
)
def _verify_session_ownership(
sess: StudySession | None, user: User
) -> StudySession:
"""세션 ownership 검증. 정보 누설 방지로 mismatch 도 404."""
if sess is None or sess.user_id != user.id:
raise HTTPException(status_code=404, detail="학습 세션을 찾을 수 없습니다")
return sess
def _verify_document_ownership(doc: Document | None, user: User) -> Document:
"""문서 ownership 검증.
documents.user_id 컬럼은 현재 single-user 시스템이라 부재.
미래 multi-user 대비 `getattr` 안전하게 비교.
"""
if doc is None or getattr(doc, "deleted_at", None) is not None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
doc_user_id = getattr(doc, "user_id", None)
if doc_user_id is not None and doc_user_id != user.id:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
return doc
# ─── Pydantic Schemas ───
class StudySessionAssetCreate(BaseModel):
document_id: int
asset_type: str
role: str | None = None
sort_order: int = 0
class StudySessionAssetResponse(BaseModel):
id: int
document_id: int
asset_type: str
role: str | None
sort_order: int
created_at: datetime
class Config:
from_attributes = True
class StudySessionCreate(BaseModel):
study_type: str = "certification"
certification: str | None = None
language_code: str | None = None
learning_level: str | None = None
subject: str | None = None
topic: str | None = None
source_text: str | None = None
source_page: int | None = None
mode: str = "copy"
prompt_question: str | None = None
expected_answer: str | None = None
metadata: dict[str, Any] | None = None
target_count: int | None = None
canvas_width: int | None = None
canvas_height: int | None = None
strokes_json: dict[str, Any] | None = None
# 학습 워크스페이스 묶음. 미지정 시 미분류.
study_topic_id: int | None = None
class StudySessionUpdate(BaseModel):
"""PATCH 부분 업데이트 — 명시 set 된 필드만 반영."""
certification: str | None = None
language_code: str | None = None
learning_level: str | None = None
subject: str | None = None
topic: str | None = None
source_text: str | None = None
source_page: int | None = None
mode: str | None = None
prompt_question: str | None = None
expected_answer: str | None = None
metadata: dict[str, Any] | None = None
target_count: int | None = None
repetition_count: int | None = None
canvas_width: int | None = None
canvas_height: int | None = None
strokes_json: dict[str, Any] | None = None
ocr_text: str | None = None
user_corrected_text: str | None = None
review_state: str | None = None
next_review_at: datetime | None = None
# 주제 재할당 (NULL 로 분리도 가능)
study_topic_id: int | None = None
class StudySessionResponse(BaseModel):
id: int
user_id: int
study_type: str
certification: str | None
language_code: str | None
learning_level: str | None
subject: str | None
topic: str | None
source_text: str | None
source_page: int | None
mode: str
prompt_question: str | None
expected_answer: str | None
metadata: dict[str, Any] | None = Field(default=None)
target_count: int | None
repetition_count: int
canvas_width: int | None
canvas_height: int | None
schema_version: int
strokes_json: dict[str, Any] | None
ocr_text: str | None
user_corrected_text: str | None
ai_summary: str | None
review_state: str | None
next_review_at: datetime | None
last_quiz_at: datetime | None
correct_count: int
incorrect_count: int
study_topic_id: int | None = None
assets: list[StudySessionAssetResponse]
created_at: datetime
updated_at: datetime
class StudySessionListResponse(BaseModel):
items: list[StudySessionResponse]
total: int
limit: int
offset: int
def _to_session_response(sess: StudySession) -> StudySessionResponse:
return StudySessionResponse(
id=sess.id,
user_id=sess.user_id,
study_type=sess.study_type,
certification=sess.certification,
language_code=sess.language_code,
learning_level=sess.learning_level,
subject=sess.subject,
topic=sess.topic,
source_text=sess.source_text,
source_page=sess.source_page,
mode=sess.mode,
prompt_question=sess.prompt_question,
expected_answer=sess.expected_answer,
metadata=sess.metadata_json,
target_count=sess.target_count,
repetition_count=sess.repetition_count,
canvas_width=sess.canvas_width,
canvas_height=sess.canvas_height,
schema_version=sess.schema_version,
strokes_json=sess.strokes_json,
ocr_text=sess.ocr_text,
user_corrected_text=sess.user_corrected_text,
ai_summary=sess.ai_summary,
review_state=sess.review_state,
next_review_at=sess.next_review_at,
last_quiz_at=sess.last_quiz_at,
correct_count=sess.correct_count,
incorrect_count=sess.incorrect_count,
study_topic_id=sess.study_topic_id,
assets=[
StudySessionAssetResponse.model_validate(a) for a in (sess.assets or [])
],
created_at=sess.created_at,
updated_at=sess.updated_at,
)
def _validate_create_payload(body: StudySessionCreate) -> None:
if body.study_type not in VALID_STUDY_TYPES:
raise HTTPException(
status_code=422,
detail=f"study_type 은 {sorted(VALID_STUDY_TYPES)} 중 하나여야 합니다",
)
if body.mode not in VALID_MODES:
raise HTTPException(
status_code=422,
detail=f"mode 는 {sorted(VALID_MODES)} 중 하나여야 합니다",
)
# ─── 엔드포인트 ───
@router.post("/", response_model=StudySessionResponse, status_code=201)
async def create_study_session(
body: StudySessionCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""새 학습 세션 생성.
자격증 : study_type='certification', certification='산업안전기사',
subject='산업안전보건법', topic='안전보건관리책임자의 직무', mode='copy'
어학 : study_type='language', language_code='ja', learning_level='JLPT N3',
subject='漢字', topic='安全', source_text='安全',
metadata={'reading':'あんぜん','meaning':'안전','unit_type':'kanji'}
"""
_validate_create_payload(body)
# study_topic_id 가 주어지면 소유 검증 (다른 사용자의 주제로 매핑 차단)
if body.study_topic_id is not None:
from models.study_topic import StudyTopic as _Topic
topic = await session.get(_Topic, body.study_topic_id)
if topic is None or topic.user_id != user.id or topic.deleted_at is not None:
raise HTTPException(status_code=404, detail="학습 주제를 찾을 수 없습니다")
sess = StudySession(
user_id=user.id,
study_type=body.study_type,
certification=body.certification,
language_code=body.language_code,
learning_level=body.learning_level,
subject=body.subject,
topic=body.topic,
source_text=body.source_text,
source_page=body.source_page,
mode=body.mode,
prompt_question=body.prompt_question,
expected_answer=body.expected_answer,
metadata_json=body.metadata,
target_count=body.target_count,
canvas_width=body.canvas_width,
canvas_height=body.canvas_height,
strokes_json=body.strokes_json,
study_topic_id=body.study_topic_id,
)
session.add(sess)
await session.flush()
await session.commit()
# 새 세션은 assets 가 비어있지만 async session lazy load 우회를 위해 명시 refresh
await session.refresh(sess, attribute_names=["assets"])
return _to_session_response(sess)
@router.get("/", response_model=StudySessionListResponse)
async def list_study_sessions(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
study_type: str | None = Query(None),
certification: str | None = Query(None),
language_code: str | None = Query(None),
learning_level: str | None = Query(None),
subject: str | None = Query(None),
topic: str | None = Query(None),
review_state: str | None = Query(None),
document_id: int | None = Query(None, description="이 문서가 연결된 세션만"),
asset_type: str | None = Query(None, description="이 asset_type 보유 세션만"),
mode: str | None = Query(None),
due_before: datetime | None = Query(None, description="next_review_at <= due_before"),
study_topic_id: int | None = Query(None, description="학습 워크스페이스(주제) id"),
order: str = Query("created_at"),
limit: int = Query(50, ge=1, le=200),
offset: int = Query(0, ge=0),
):
"""학습 세션 목록 — Phase 1 부터 모든 filter 수용 (Phase 3/4 활성 대비)."""
if study_type is not None and study_type not in VALID_STUDY_TYPES:
raise HTTPException(status_code=422, detail="study_type 값이 올바르지 않습니다")
if review_state is not None and review_state not in VALID_REVIEW_STATES:
raise HTTPException(status_code=422, detail="review_state 값이 올바르지 않습니다")
if asset_type is not None and asset_type not in VALID_ASSET_TYPES:
raise HTTPException(status_code=422, detail="asset_type 값이 올바르지 않습니다")
if mode is not None and mode not in VALID_MODES:
raise HTTPException(status_code=422, detail="mode 값이 올바르지 않습니다")
if order not in VALID_ORDERS:
raise HTTPException(status_code=422, detail="order 값이 올바르지 않습니다")
base = select(StudySession).where(StudySession.user_id == user.id)
if study_type is not None:
base = base.where(StudySession.study_type == study_type)
if certification is not None:
base = base.where(StudySession.certification == certification)
if language_code is not None:
base = base.where(StudySession.language_code == language_code)
if learning_level is not None:
base = base.where(StudySession.learning_level == learning_level)
if subject is not None:
base = base.where(StudySession.subject == subject)
if topic is not None:
base = base.where(StudySession.topic == topic)
if review_state is not None:
base = base.where(StudySession.review_state == review_state)
if mode is not None:
base = base.where(StudySession.mode == mode)
if due_before is not None:
base = base.where(StudySession.next_review_at <= due_before)
if study_topic_id is not None:
base = base.where(StudySession.study_topic_id == study_topic_id)
# assets join filter — EXISTS 서브쿼리
if document_id is not None or asset_type is not None:
asset_conditions = [StudySessionAsset.study_session_id == StudySession.id]
if document_id is not None:
asset_conditions.append(StudySessionAsset.document_id == document_id)
if asset_type is not None:
asset_conditions.append(StudySessionAsset.asset_type == asset_type)
base = base.where(
select(StudySessionAsset.id)
.where(and_(*asset_conditions))
.exists()
)
count_query = select(func.count()).select_from(base.subquery())
total = (await session.execute(count_query)).scalar() or 0
if order == "next_review_at":
ordered = base.order_by(StudySession.next_review_at.asc().nullslast(), StudySession.id.desc())
elif order == "last_quiz_at":
ordered = base.order_by(StudySession.last_quiz_at.desc().nullslast(), StudySession.id.desc())
else:
ordered = base.order_by(StudySession.created_at.desc(), StudySession.id.desc())
ordered = (
ordered.options(selectinload(StudySession.assets))
.offset(offset)
.limit(limit)
)
rows = (await session.execute(ordered)).scalars().all()
return StudySessionListResponse(
items=[_to_session_response(s) for s in rows],
total=total,
limit=limit,
offset=offset,
)
@router.get("/groups")
async def get_study_groups(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""도메인별 그룹 카운트 (Phase 3 모바일 카드 메뉴 대비, Phase 1 부터 endpoint 제공).
응답: {by_type: {certification: {...}, language: {...}}}
"""
# certification 그룹: certification → subject → topic
cert_query = (
select(
StudySession.certification,
StudySession.subject,
StudySession.topic,
func.count().label("session_count"),
func.count().filter(StudySession.review_state == "weak").label("weak_count"),
func.count()
.filter(
and_(
StudySession.next_review_at.is_not(None),
StudySession.next_review_at <= datetime.now(timezone.utc),
)
)
.label("due_count"),
)
.where(
StudySession.user_id == user.id,
StudySession.study_type == "certification",
)
.group_by(StudySession.certification, StudySession.subject, StudySession.topic)
)
cert_rows = (await session.execute(cert_query)).all()
# language 그룹: language_code → learning_level → subject → topic + assets 보유 여부
lang_query = (
select(
StudySession.language_code,
StudySession.learning_level,
StudySession.subject,
StudySession.topic,
func.count().label("session_count"),
func.count().filter(StudySession.review_state == "weak").label("weak_count"),
func.count()
.filter(
and_(
StudySession.next_review_at.is_not(None),
StudySession.next_review_at <= datetime.now(timezone.utc),
)
)
.label("due_count"),
)
.where(
StudySession.user_id == user.id,
StudySession.study_type == "language",
)
.group_by(
StudySession.language_code,
StudySession.learning_level,
StudySession.subject,
StudySession.topic,
)
)
lang_rows = (await session.execute(lang_query)).all()
# 어학 그룹의 has_audio / has_video — 별도 카운트 (assets 와 join)
media_query = (
select(
StudySession.language_code,
StudySession.learning_level,
StudySession.subject,
StudySession.topic,
StudySessionAsset.asset_type,
func.count().label("c"),
)
.join(StudySessionAsset, StudySessionAsset.study_session_id == StudySession.id)
.where(
StudySession.user_id == user.id,
StudySession.study_type == "language",
StudySessionAsset.asset_type.in_(["audio", "video"]),
)
.group_by(
StudySession.language_code,
StudySession.learning_level,
StudySession.subject,
StudySession.topic,
StudySessionAsset.asset_type,
)
)
media_rows = (await session.execute(media_query)).all()
media_map: dict[tuple, dict[str, int]] = {}
for r in media_rows:
key = (r.language_code, r.learning_level, r.subject, r.topic)
media_map.setdefault(key, {"audio": 0, "video": 0})[r.asset_type] = r.c
# certification 트리 빌드
cert_groups: dict[str | None, dict[str | None, dict[str | None, dict]]] = {}
for r in cert_rows:
cert_groups.setdefault(r.certification, {}).setdefault(r.subject, {})[r.topic] = {
"session_count": r.session_count,
"weak_count": r.weak_count,
"due_count": r.due_count,
}
cert_out = []
for cert_name, subjects in cert_groups.items():
subj_list = []
sess_total = weak_total = due_total = 0
for subj_name, topics in subjects.items():
topic_list = []
s_count = w_count = d_count = 0
for topic_name, stats in topics.items():
topic_list.append({
"topic": topic_name,
"session_count": stats["session_count"],
"weak_count": stats["weak_count"],
"due_count": stats["due_count"],
})
s_count += stats["session_count"]
w_count += stats["weak_count"]
d_count += stats["due_count"]
subj_list.append({
"subject": subj_name,
"topics": topic_list,
"session_count": s_count,
"weak_count": w_count,
"due_count": d_count,
})
sess_total += s_count
weak_total += w_count
due_total += d_count
cert_out.append({
"certification": cert_name,
"subjects": subj_list,
"session_count": sess_total,
"weak_count": weak_total,
"due_count": due_total,
})
# language 트리 빌드
lang_groups: dict[str | None, dict[str | None, dict[str | None, dict[str | None, dict]]]] = {}
for r in lang_rows:
media = media_map.get(
(r.language_code, r.learning_level, r.subject, r.topic),
{"audio": 0, "video": 0},
)
(
lang_groups
.setdefault(r.language_code, {})
.setdefault(r.learning_level, {})
.setdefault(r.subject, {})[r.topic]
) = {
"session_count": r.session_count,
"weak_count": r.weak_count,
"due_count": r.due_count,
"has_audio": media["audio"] > 0,
"has_video": media["video"] > 0,
}
lang_out = []
for lang_code, levels in lang_groups.items():
for level_name, subjects in levels.items():
subj_list = []
for subj_name, topics in subjects.items():
topic_list = []
for topic_name, stats in topics.items():
topic_list.append({
"topic": topic_name,
"session_count": stats["session_count"],
"weak_count": stats["weak_count"],
"due_count": stats["due_count"],
"has_audio": stats["has_audio"],
"has_video": stats["has_video"],
})
subj_list.append({"subject": subj_name, "topics": topic_list})
lang_out.append({
"language_code": lang_code,
"learning_level": level_name,
"subjects": subj_list,
})
return {
"by_type": {
"certification": {"groups": cert_out},
"language": {"groups": lang_out},
}
}
@router.get("/{session_id}", response_model=StudySessionResponse)
async def get_study_session(
session_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
sess = await session.get(
StudySession, session_id, options=[selectinload(StudySession.assets)]
)
sess = _verify_session_ownership(sess, user)
return _to_session_response(sess)
@router.patch("/{session_id}", response_model=StudySessionResponse)
async def update_study_session(
session_id: int,
body: StudySessionUpdate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
sess = await session.get(
StudySession, session_id, options=[selectinload(StudySession.assets)]
)
sess = _verify_session_ownership(sess, user)
# 명시 set 된 필드만 적용
fields_set = body.model_fields_set
if "mode" in fields_set:
if body.mode not in VALID_MODES:
raise HTTPException(status_code=422, detail="mode 값이 올바르지 않습니다")
sess.mode = body.mode
if "review_state" in fields_set:
if body.review_state not in VALID_REVIEW_STATES:
raise HTTPException(status_code=422, detail="review_state 값이 올바르지 않습니다")
sess.review_state = body.review_state
# study_topic_id 변경 시 소유 검증
if "study_topic_id" in fields_set and body.study_topic_id is not None:
from models.study_topic import StudyTopic as _Topic
topic = await session.get(_Topic, body.study_topic_id)
if topic is None or topic.user_id != user.id or topic.deleted_at is not None:
raise HTTPException(status_code=404, detail="학습 주제를 찾을 수 없습니다")
# 단순 매핑 필드 (검증 불필요)
SIMPLE_FIELDS = {
"certification", "language_code", "learning_level", "subject", "topic",
"source_text", "source_page", "prompt_question", "expected_answer",
"target_count", "repetition_count",
"canvas_width", "canvas_height", "strokes_json",
"ocr_text", "user_corrected_text", "next_review_at",
"study_topic_id",
}
for fname in SIMPLE_FIELDS & fields_set:
setattr(sess, fname, getattr(body, fname))
if "metadata" in fields_set:
sess.metadata_json = body.metadata
sess.updated_at = datetime.now(timezone.utc)
await session.commit()
await session.refresh(sess, attribute_names=["assets"])
return _to_session_response(sess)
@router.delete("/{session_id}", status_code=204)
async def delete_study_session(
session_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""학습 세션 삭제. 연관 assets 도 cascade 로 함께 제거 (DB ON DELETE CASCADE).
documents 본체는 유지 assets row 사라진다.
"""
sess = await session.get(StudySession, session_id)
sess = _verify_session_ownership(sess, user)
await session.delete(sess)
await session.commit()
# ─── Assets 엔드포인트 ───
@router.post(
"/{session_id}/assets",
response_model=StudySessionAssetResponse,
status_code=201,
)
async def link_study_asset(
session_id: int,
body: StudySessionAssetCreate,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""기존 documents 의 id 를 study_session 에 asset 으로 연결.
409: 같은 (session, document, asset_type, role) 조합 이미 존재.
"""
if body.asset_type not in VALID_ASSET_TYPES:
raise HTTPException(
status_code=422,
detail=f"asset_type 은 {sorted(VALID_ASSET_TYPES)} 중 하나여야 합니다",
)
if body.role not in VALID_ROLES:
raise HTTPException(
status_code=422,
detail=f"role 은 {sorted(r for r in VALID_ROLES if r is not None)} 중 하나 또는 NULL 이어야 합니다",
)
sess = await session.get(StudySession, session_id)
sess = _verify_session_ownership(sess, user)
doc = await session.get(Document, body.document_id)
_verify_document_ownership(doc, user)
# 사전 SELECT 로 중복 검사 + DB UNIQUE 제약 둘 다 — race condition 안전.
existing = await session.execute(
select(StudySessionAsset).where(
StudySessionAsset.study_session_id == session_id,
StudySessionAsset.document_id == body.document_id,
StudySessionAsset.asset_type == body.asset_type,
StudySessionAsset.role.is_(body.role) if body.role is None
else StudySessionAsset.role == body.role,
)
)
if existing.scalar_one_or_none() is not None:
raise HTTPException(
status_code=409,
detail={
"error_code": "asset_already_linked",
"message": "해당 문서가 이미 같은 asset_type/role 로 연결되어 있습니다",
},
)
asset = StudySessionAsset(
study_session_id=session_id,
document_id=body.document_id,
asset_type=body.asset_type,
role=body.role,
sort_order=body.sort_order,
)
session.add(asset)
try:
await session.commit()
except IntegrityError:
await session.rollback()
# UNIQUE 위반 — 위 사전 SELECT 와 race 했을 가능성. 동일 메시지로 응답.
raise HTTPException(
status_code=409,
detail={
"error_code": "asset_already_linked",
"message": "해당 문서가 이미 같은 asset_type/role 로 연결되어 있습니다",
},
)
await session.refresh(asset)
return StudySessionAssetResponse.model_validate(asset)
@router.delete(
"/{session_id}/assets/{asset_id}", status_code=204
)
async def unlink_study_asset(
session_id: int,
asset_id: int,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""asset 연결 해제. documents 본체는 유지."""
sess = await session.get(StudySession, session_id)
sess = _verify_session_ownership(sess, user)
asset = await session.get(StudySessionAsset, asset_id)
if asset is None or asset.study_session_id != session_id:
raise HTTPException(status_code=404, detail="asset 을 찾을 수 없습니다")
await session.delete(asset)
await session.commit()
# ─── Snapshot (PNG 업로드) ───
@router.post("/{session_id}/snapshot", response_model=StudySessionAssetResponse, status_code=201)
async def upload_handwriting_snapshot(
session_id: int,
request: Request,
file: UploadFile,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
sort_order: int = Form(0),
):
"""캔버스 PNG 업로드 → documents 등록 + handwriting_png asset 연결.
documents.py upload_document atomic rename + error_code 패턴을 PNG 전용으로 차용.
동일 세션에 여러 snapshot 누적 가능 (UNIQUE 제약은 (session, document, type, role) 단위라
document_id 매번 새로 생기므로 충돌 없음).
"""
sess = await session.get(StudySession, session_id)
sess = _verify_session_ownership(sess, user)
if not file.filename:
raise _upload_error(400, "invalid_input", "파일명이 필요합니다")
safe_name = Path(file.filename).name
if not safe_name or safe_name.startswith("."):
raise _upload_error(400, "invalid_input", "유효하지 않은 파일명")
ext = Path(safe_name).suffix.lower()
if ext != ".png":
raise _upload_error(
400, "invalid_input", "snapshot 은 PNG 파일만 지원합니다",
)
max_bytes = settings.upload.max_bytes
slack_ratio = settings.upload.content_length_slack_ratio
chunk_size = settings.upload.stream_chunk_bytes
# Content-Length 사전 차단
cl_header = request.headers.get("content-length")
if cl_header:
try:
cl = int(cl_header)
if cl > int(max_bytes * slack_ratio):
raise _upload_error(413, "body_too_large", "파일이 너무 큽니다")
except ValueError:
pass
# NAS Inbox 경로 결정 + 충돌 회피
inbox_dir = Path(settings.nas_mount_path) / "PKM" / "Inbox"
inbox_dir.mkdir(parents=True, exist_ok=True)
target = (inbox_dir / safe_name).resolve()
if not str(target).startswith(str(inbox_dir.resolve())):
raise _upload_error(400, "invalid_input", "잘못된 파일 경로")
counter = 1
stem, suffix = target.stem, target.suffix
staging = target.with_name(target.name + ".uploading")
while target.exists() or staging.exists():
target = inbox_dir.resolve() / f"{stem}_{counter}{suffix}"
staging = target.with_name(target.name + ".uploading")
counter += 1
# 스트리밍 저장 + 누적 사이즈 검증
written = 0
try:
with staging.open("wb") as f:
while chunk := await file.read(chunk_size):
written += len(chunk)
if written > max_bytes:
raise _upload_error(413, "body_too_large", "파일이 너무 큽니다")
f.write(chunk)
if written == 0:
raise _upload_error(400, "empty_file", "빈 파일은 업로드할 수 없습니다")
except ClientDisconnect:
staging.unlink(missing_ok=True)
logger.info("snapshot aborted by client: %s (written=%d)", safe_name, written)
raise _upload_error(499, "network_abort", "업로드가 취소되었습니다")
except asyncio.TimeoutError:
staging.unlink(missing_ok=True)
logger.warning("snapshot timeout: %s (written=%d)", safe_name, written)
raise _upload_error(408, "upload_timeout", "업로드 시간 초과")
except HTTPException:
staging.unlink(missing_ok=True)
raise
except Exception:
staging.unlink(missing_ok=True)
logger.exception("snapshot internal error: %s (written=%d)", safe_name, written)
raise _upload_error(500, "internal", "업로드 처리 중 오류가 발생했습니다")
# atomic rename → 최종 경로
try:
staging.replace(target)
except OSError:
staging.unlink(missing_ok=True)
logger.exception("snapshot rename failed: %s -> %s", staging, target)
raise _upload_error(500, "internal", "파일 저장 후 정리 중 오류가 발생했습니다")
# Document + ProcessingQueue('extract') + StudySessionAsset 단일 트랜잭션
rel_path = str(target.relative_to(Path(settings.nas_mount_path)))
fhash = file_hash(target)
# 학습 세션 메타에서 user_tags 합성
domain_tag = sess.certification or sess.language_code or "general"
user_tags = ["handwriting", domain_tag]
if sess.subject:
user_tags.append(sess.subject)
title = f"필기 — {sess.topic or sess.subject or 'study session'} #{session_id}"
try:
doc = Document(
file_path=rel_path,
file_hash=fhash,
file_format="png",
file_size=written,
file_type="immutable",
title=title,
user_tags=user_tags,
)
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "extract")
asset = StudySessionAsset(
study_session_id=session_id,
document_id=doc.id,
asset_type="handwriting_png",
role="answer",
sort_order=sort_order,
)
session.add(asset)
await session.commit()
await session.refresh(asset)
except Exception:
# DB 트랜잭션은 자동 rollback. 파일은 별도 자원 → 명시 unlink.
target.unlink(missing_ok=True)
raise
return StudySessionAssetResponse.model_validate(asset)
File diff suppressed because it is too large Load Diff
-56
View File
@@ -1,56 +0,0 @@
"""비디오 썸네일 서빙 API — /api/video
ffmpeg 썸네일 생성은 thumbnail_worker 에서 수행. 라우터는 저장된 파일만 서빙.
"""
from pathlib import Path
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import FileResponse
from sqlalchemy.ext.asyncio import AsyncSession
from core.auth import decode_token, get_current_user
from core.database import get_session
from models.document import Document
from models.user import User
router = APIRouter()
@router.get("/{doc_id}/thumbnail")
async def get_video_thumbnail(
doc_id: int,
session: Annotated[AsyncSession, Depends(get_session)],
token: str | None = Query(None, description="Bearer token (img src 용)"),
user: User | None = Depends(lambda: None),
):
"""비디오 썸네일 jpg 서빙. `<img src="...?token=...">` 바인딩 가능.
쿼리 토큰 또는 Authorization 헤더 하나로 인증. /file 엔드포인트와 동일 정책.
"""
# 쿼리 토큰 검증 (img src 용) — /file 과 동일 패턴
if not token:
raise HTTPException(status_code=401, detail="토큰이 필요합니다")
payload = decode_token(token)
if not payload or payload.get("type") != "access":
raise HTTPException(status_code=401, detail="유효하지 않은 토큰")
doc = await session.get(Document, doc_id)
if not doc or doc.deleted_at is not None:
raise HTTPException(status_code=404, detail="문서를 찾을 수 없습니다")
thumb = getattr(doc, "thumbnail_path", None)
if not thumb:
raise HTTPException(status_code=404, detail="썸네일이 아직 생성되지 않았습니다")
path = Path(thumb)
if not path.exists():
raise HTTPException(status_code=404, detail="썸네일 파일이 없습니다")
return FileResponse(
path=str(path),
media_type="image/jpeg",
headers={"Content-Disposition": "inline"},
)
+5 -80
View File
@@ -1,6 +1,5 @@
"""JWT + TOTP 2FA 인증"""
import os
from datetime import datetime, timedelta, timezone
from typing import Annotated
@@ -31,41 +30,15 @@ def hash_password(password: str) -> str:
return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
def create_access_token(subject: str, expires_minutes: int | None = None) -> str:
minutes = expires_minutes if expires_minutes is not None else ACCESS_TOKEN_EXPIRE_MINUTES
now = datetime.now(timezone.utc)
expire = now + timedelta(minutes=minutes)
payload = {"sub": subject, "exp": expire, "iat": int(now.timestamp()), "type": "access"}
def create_access_token(subject: str) -> str:
expire = datetime.now(timezone.utc) + timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES)
payload = {"sub": subject, "exp": expire, "type": "access"}
return jwt.encode(payload, settings.jwt_secret, algorithm=ALGORITHM)
def create_voice_memo_bot_token(username: str) -> str | None:
# Voice Memo PoC v1 — bot 계정 한정 long-expiry access token (env gate + username hard-match).
# 일반 사용자 호출 시 None 반환. 정식 service-account/api_keys 는 Phase 2.
if os.getenv("VOICE_MEMO_BOT_TOKEN_ENABLED", "false").lower() != "true":
return None
bot_username = os.getenv("VOICE_MEMO_BOT_USERNAME", "voice-memo-bot")
if username != bot_username:
return None
expire_days = int(os.getenv("VOICE_MEMO_BOT_TOKEN_EXPIRE_DAYS", "365"))
return create_access_token(username, expires_minutes=expire_days * 24 * 60)
def create_laptop_worker_bot_token(username: str) -> str | None:
# PR-Worker-Pool-Registry-1B — laptop-worker-bot 계정 한정 long-expiry token (voice-memo 동형).
if os.getenv("LAPTOP_WORKER_BOT_TOKEN_ENABLED", "false").lower() != "true":
return None
bot_username = os.getenv("LAPTOP_WORKER_BOT_USERNAME", "laptop-worker-bot")
if username != bot_username:
return None
expire_days = int(os.getenv("LAPTOP_WORKER_BOT_TOKEN_EXPIRE_DAYS", "365"))
return create_access_token(username, expires_minutes=expire_days * 24 * 60)
def create_refresh_token(subject: str) -> str:
now = datetime.now(timezone.utc)
expire = now + timedelta(days=REFRESH_TOKEN_EXPIRE_DAYS)
payload = {"sub": subject, "exp": expire, "iat": int(now.timestamp()), "type": "refresh"}
expire = datetime.now(timezone.utc) + timedelta(days=REFRESH_TOKEN_EXPIRE_DAYS)
payload = {"sub": subject, "exp": expire, "type": "refresh"}
return jwt.encode(payload, settings.jwt_secret, algorithm=ALGORITHM)
@@ -76,21 +49,6 @@ def decode_token(token: str) -> dict | None:
return None
def verify_password_changed_at(payload: dict, user) -> None:
# legacy 호환: password_changed_at NULL 이면 검증 skip (migration 전 발급 token 유지)
# password 변경 후 발급 token 만 검증 — iat (int 초) >= int(password_changed_at.timestamp())
if user.password_changed_at is None:
return
iat = payload.get("iat")
pwd_changed_int = int(user.password_changed_at.timestamp())
if iat is None or pwd_changed_int > int(iat):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="비밀번호 변경 후 재로그인 필요",
)
def verify_totp(code: str, secret: str | None = None) -> bool:
"""TOTP 코드 검증 (유저별 secret 또는 글로벌 설정)"""
totp_secret = secret or settings.totp_secret
@@ -124,37 +82,4 @@ async def get_current_user(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="유저를 찾을 수 없음",
)
verify_password_changed_at(payload, user)
return user
async def require_admin(
credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""관리자 권한 확인 — 뉴스 소스 CRUD, 수집 트리거, digest 재생성 등"""
user = await get_current_user(credentials, session)
if not user.is_admin:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="관리자 권한 필요",
)
return user
async def require_worker_user(
credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""PR-Worker-Pool-Registry-1B — /internal/worker/* 인증.
laptop-worker-bot 허용. voice-memo-bot 또는 일반 사용자 토큰 403.
"""
user = await get_current_user(credentials, session)
bot_username = os.getenv("LAPTOP_WORKER_BOT_USERNAME", "laptop-worker-bot")
if user.username != bot_username:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="worker user only",
)
return user
+7 -176
View File
@@ -7,15 +7,6 @@ import yaml
from pydantic import BaseModel
class UploadConfig(BaseModel):
max_bytes: int = 100_000_000
content_length_slack_ratio: float = 1.05
stream_chunk_bytes: int = 1_048_576
# orphan cleanup (`*.uploading` — 크래시/abort 후 잔존물)
orphan_max_age_sec: int = 3600
cleanup_warn_threshold: int = 10
class AIModelConfig(BaseModel):
endpoint: str
model: str
@@ -23,86 +14,16 @@ class AIModelConfig(BaseModel):
timeout: int = 60
daily_budget_usd: float | None = None
require_explicit_trigger: bool = False
# B-0: 4B/26B 에 부여한 실사용 컨텍스트 상한 (char). triage=120k, primary=260k.
# classify_worker 가 에스컬레이션 판정 시 참고. 0/None 이면 상한 무시.
context_char_limit: int | None = None
# P1 of family-adaptive-bengio (2026-05-23): config-driven sampling profile.
# None = MLX/OpenAI server default. Anthropic branch 는 미적용 (별 plan 범위).
temperature: float | None = None
top_p: float | None = None
class DeepSummaryBacklogConfig(BaseModel):
"""B-1 R2 — deep_summary enqueue 폭발 억제 임계치."""
ratio_threshold: float = 0.3 # 지난 window 의 deep_n/classify_n
pending_threshold: int = 5 # deep_summary pending+processing
window_minutes: int = 30
class SearchAskBackendConfig(BaseModel):
"""PR-2 of DS AI routing policy ([[document-server-ai-routing-policy]], 2026-05-23):
/api/search/ask backend dispatcher llm-router :8890 단일 경유.
- backend 미지정 / "gemma-macmini" / "mac-mini-default" router tier_b
- backend "qwen-macbook" router named upstream (M5 Max)
- backend "claude-cloud" router 503 명시 (scaffold)
- backend "auto" router rule + LLM triage
Unavailable BackendUnavailable 503 명시 (silent fallback 0).
Rollback: DS_BACKENDS_VIA_ROUTER=false legacy 직접 호출 path.
legacy macmini_url / macbook_url / macbook_model fallback 시만 사용.
"""
# PR-2 신규: llm-router URL. 비면 env LLM_ROUTER_URL 또는 hardcoded default.
router_url: str = ""
# Legacy fields (DS_BACKENDS_VIA_ROUTER=false 시만 사용)
macmini_url: str = "http://100.76.254.116:8801"
macbook_url: str = "http://100.118.112.84:8810"
macbook_model: str = "mlx-community/Qwen3.6-27B-8bit"
timeout_connect_s: int = 5
timeout_read_s: int = 60
class SearchAskReactConfig(BaseModel):
"""PR-DocSrv-Ask-ToolCalling-ReAct-1: /api/search/ask/react ReAct loop.
qwen-macbook only (endpoint 자체가 implicit opt-in). G0-2 counter semantics:
max_tool_rounds=2 LLM 호출 최대 3 (tool round 2 + final 1), search 실행 최대 2.
"""
enabled: bool = True
max_tool_rounds: int = 2
search_tool_limit: int = 5
search_tool_mode: str = "hybrid"
class SearchAskConfig(BaseModel):
backend: SearchAskBackendConfig = SearchAskBackendConfig()
react: SearchAskReactConfig = SearchAskReactConfig()
class SearchConfig(BaseModel):
ask: SearchAskConfig = SearchAskConfig()
class AIConfig(BaseModel):
gateway_endpoint: str
# B-0: 3-tier routing. triage/primary = Mac mini 26B MLX (PR #20 endpoint 통합). fallback = Claude Sonnet 4 API.
triage: AIModelConfig
primary: AIModelConfig
fallback: AIModelConfig
premium: AIModelConfig
embedding: AIModelConfig
vision: AIModelConfig
rerank: AIModelConfig
# Phase 3.5a: answerability classifier (optional — 없으면 score-only gate). PR #20 이후 Mac mini 26B MLX endpoint (initial = exaone3.5).
classifier: AIModelConfig | None = None
# Phase 3.5b: semantic verifier (optional — 없으면 grounding-only). PR #20 이후 Mac mini 26B MLX endpoint (initial = exaone3.5).
verifier: AIModelConfig | None = None
# Legacy: vision 슬롯 (현재 사용처 0 — Document Server 는 OCR/STT 별도 서비스).
# 제거 진행 중이므로 optional 로 관대한 로딩 유지.
vision: AIModelConfig | None = None
# B-1 R2: backlog guard 임계치
deep_summary_backlog: DeepSummaryBacklogConfig = DeepSummaryBacklogConfig()
class Settings(BaseModel):
@@ -112,9 +33,6 @@ class Settings(BaseModel):
# AI
ai: AIConfig | None = None
# PR-MacBook-RAG-Backend-1: /api/search/ask backend dispatcher
search: SearchConfig = SearchConfig()
# NAS
nas_mount_path: str = "/documents"
nas_pkm_root: str = "/documents/PKM"
@@ -123,65 +41,21 @@ class Settings(BaseModel):
jwt_secret: str = ""
totp_secret: str = ""
# Phase 3.5: eval runner shared secret — X-Source=eval / X-Eval-Case-Id 헤더 신뢰 검증.
# 비어있으면 모든 eval 헤더 거부 (부재 = 비활성).
eval_runner_token: str = ""
# kordoc
kordoc_endpoint: str = "http://kordoc-service:3100"
# OCR (Surya)
ocr_endpoint: str = "http://ocr-service:3200"
# STT (faster-whisper, §3)
stt_endpoint: str = "http://stt-service:3300"
# §3 file_watcher: Roon 음원 경로 (prefix match 로 skip).
# 빈 문자열이면 skip 없음. 예: "/documents/PKM/../Music/roon-library" 또는
# NFS 경유 별도 마운트된 Roon 라이브러리.
roon_library_path: str = ""
# KGS Code 등 외부 작성 마크다운 자료 추가 스캔 경로 (PKM 상대 경로, 쉼표 구분).
# env: ADDITIONAL_WATCH_TARGETS=Knowledge/Industrial_Safety/가스기사/KGS_Code,...
# 모두 expected_category="library" 로 처리 (md/pdf/docx 등 문서 확장자만 수락).
# Inbox/Recordings/Videos 기본 스캔 외에 추가만 허용.
additional_watch_targets: list[str] = []
# 분류 체계
taxonomy: dict = {}
document_types: list[str] = []
# 업로드 한도 (authoritative policy)
upload: UploadConfig = UploadConfig()
# PR-MacMini-Derived-Worker-1: study explanation owner = Mac mini
# GPU 측은 false 로 설정 (.env), explanation 분기 skip guard 트리거.
study_explanation_enabled: bool = True
# 공부 암기노트 Phase 1: card_extract 폴러/consumer 게이트. owner 분리 시 false 로.
study_card_extract_enabled: bool = True
# internal endpoint Bearer token (Mac mini derived-worker 호출용)
internal_worker_token: str = ""
def load_settings() -> Settings:
"""config.yaml + 환경변수에서 설정 로딩"""
# 환경변수 (docker-compose에서 주입)
database_url = os.getenv("DATABASE_URL", "")
study_explanation_enabled = os.getenv("STUDY_EXPLANATION_ENABLED", "true").lower() in ("1", "true", "yes")
study_card_extract_enabled = os.getenv("STUDY_CARD_EXTRACT_ENABLED", "true").lower() in ("1", "true", "yes")
internal_worker_token = os.getenv("INTERNAL_WORKER_TOKEN", "")
jwt_secret = os.getenv("JWT_SECRET", "")
totp_secret = os.getenv("TOTP_SECRET", "")
eval_runner_token = os.getenv("EVAL_RUNNER_TOKEN", "")
kordoc_endpoint = os.getenv("KORDOC_ENDPOINT", "http://kordoc-service:3100")
ocr_endpoint = os.getenv("OCR_ENDPOINT", "http://ocr-service:3200")
stt_endpoint = os.getenv("STT_ENDPOINT", "http://stt-service:3300")
roon_library_path = os.getenv("ROON_LIBRARY_PATH", "")
# ADDITIONAL_WATCH_TARGETS — 쉼표 구분 (공백 제거)
awt_raw = os.getenv("ADDITIONAL_WATCH_TARGETS", "")
additional_watch_targets = [p.strip() for p in awt_raw.split(",") if p.strip()]
# config.yaml — Docker 컨테이너 내부(/app/config.yaml) 또는 프로젝트 루트
config_path = Path("/app/config.yaml")
@@ -197,76 +71,33 @@ def load_settings() -> Settings:
if "ai" in raw:
ai_raw = raw["ai"]
models = ai_raw.get("models", {})
# B-0: triage 는 config.yaml 에 없을 수도 있는 신규 슬롯. 구버전 호환을 위해
# 없으면 fallback 를 triage 로 대체 (동일 모델 재사용).
triage_raw = models.get("triage") or models.get("fallback")
if triage_raw is None:
raise ValueError("config.yaml: ai.models.triage (or fallback) required")
ai_config = AIConfig(
gateway_endpoint=ai_raw.get("gateway", {}).get("endpoint", ""),
triage=AIModelConfig(**triage_raw),
primary=AIModelConfig(**models["primary"]),
fallback=AIModelConfig(**models["fallback"]),
premium=AIModelConfig(**models["premium"]),
embedding=AIModelConfig(**models["embedding"]),
rerank=AIModelConfig(**models["rerank"]),
vision=(AIModelConfig(**models["vision"]) if "vision" in models else None),
classifier=(
AIModelConfig(**models["classifier"]) if "classifier" in models else None
),
verifier=(
AIModelConfig(**models["verifier"]) if "verifier" in models else None
),
deep_summary_backlog=DeepSummaryBacklogConfig(
**ai_raw.get("deep_summary_backlog", {})
),
primary=AIModelConfig(**ai_raw["models"]["primary"]),
fallback=AIModelConfig(**ai_raw["models"]["fallback"]),
premium=AIModelConfig(**ai_raw["models"]["premium"]),
embedding=AIModelConfig(**ai_raw["models"]["embedding"]),
vision=AIModelConfig(**ai_raw["models"]["vision"]),
rerank=AIModelConfig(**ai_raw["models"]["rerank"]),
)
if "nas" in raw:
nas_mount = raw["nas"].get("mount_path", nas_mount)
nas_pkm = raw["nas"].get("pkm_root", nas_pkm)
search_cfg = SearchConfig()
if config_path.exists() and raw and "search" in raw:
ask_raw = (raw.get("search") or {}).get("ask", {}) or {}
sb = ask_raw.get("backend", {}) or {}
sr = ask_raw.get("react", {}) or {}
search_cfg = SearchConfig(
ask=SearchAskConfig(
backend=SearchAskBackendConfig(**sb),
react=SearchAskReactConfig(**sr),
)
)
taxonomy = raw.get("taxonomy", {}) if config_path.exists() and raw else {}
document_types = raw.get("document_types", []) if config_path.exists() and raw else []
upload_cfg = (
UploadConfig(**raw["upload"])
if config_path.exists() and raw and "upload" in raw
else UploadConfig()
)
return Settings(
database_url=database_url,
ai=ai_config,
search=search_cfg,
nas_mount_path=nas_mount,
nas_pkm_root=nas_pkm,
jwt_secret=jwt_secret,
totp_secret=totp_secret,
eval_runner_token=eval_runner_token,
kordoc_endpoint=kordoc_endpoint,
ocr_endpoint=ocr_endpoint,
stt_endpoint=stt_endpoint,
roon_library_path=roon_library_path,
additional_watch_targets=additional_watch_targets,
taxonomy=taxonomy,
document_types=document_types,
upload=upload_cfg,
study_explanation_enabled=study_explanation_enabled,
study_card_extract_enabled=study_card_extract_enabled,
internal_worker_token=internal_worker_token,
)
+2 -10
View File
@@ -95,8 +95,7 @@ async def _run_migrations(conn) -> None:
applied = {row[0] for row in result}
# migration 파일 스캔
# /app/core/database.py → parent.parent = /app → /app/migrations (volume mount 위치)
migrations_dir = Path(__file__).resolve().parent.parent / "migrations"
migrations_dir = Path(__file__).resolve().parent.parent.parent / "migrations"
if not migrations_dir.is_dir():
logger.info("[migration] migrations/ 디렉토리 없음, 스킵")
return
@@ -114,15 +113,8 @@ async def _run_migrations(conn) -> None:
for version, name, path in pending:
sql = path.read_text(encoding="utf-8")
_validate_sql_content(name, sql)
if "schema_migrations" in sql.lower():
raise ValueError(
f"Migration {name} must not modify schema_migrations table"
)
logger.info(f"[migration] {name} 실행 중...")
# raw driver SQL 사용 — text() 의 :name bind parameter 해석으로
# SQL 주석/literal 에 콜론이 들어가면 InvalidRequestError 발생.
# exec_driver_sql 은 SQL 을 driver(asyncpg) 에 그대로 전달.
await conn.exec_driver_sql(sql)
await conn.execute(text(sql))
await conn.execute(
text("INSERT INTO schema_migrations (version, name) VALUES (:v, :n)"),
{"v": version, "n": name},
-80
View File
@@ -1,80 +0,0 @@
"""자료실 경로 유틸.
user_tags @library/ 접두사 태그를 정규화·검증·추출한다.
"""
LIBRARY_PREFIX = "@library/"
DEFAULT_LIBRARY_PATH = "미분류"
MAX_DEPTH = 5
MAX_SEGMENT_LEN = 30
def normalize_library_path(raw: str) -> str:
"""경로 정규화. 엄격 정책 — 규칙 위반 시 ValueError 즉시 raise.
규칙:
- 앞뒤 공백·슬래시 제거
- segment별 trim
- segment(// 또는 공백만) ValueError
- segment 30 초과 ValueError
- 5단계 초과 ValueError
GET /documents/library?path= 쿼리에도 동일하게 적용.
"""
stripped = raw.strip().strip("/")
if not stripped:
raise ValueError("빈 경로")
segments = stripped.split("/")
normalized: list[str] = []
for s in segments:
s = s.strip()
if not s:
raise ValueError("빈 세그먼트 (// 또는 공백만 있는 구간)")
if len(s) > MAX_SEGMENT_LEN:
raise ValueError(f"세그먼트 '{s}'{MAX_SEGMENT_LEN}자 초과")
normalized.append(s)
if len(normalized) > MAX_DEPTH:
raise ValueError(f"최대 {MAX_DEPTH}단계까지 가능")
return "/".join(normalized)
def extract_library_paths(user_tags: list[str] | None) -> list[str]:
"""user_tags에서 @library/ 경로만 추출 (prefix 포함)."""
if not user_tags:
return []
return [t for t in user_tags if t.startswith(LIBRARY_PREFIX)]
def validate_user_tags(tags: list) -> list[str]:
"""user_tags 전체 검증. 입력 순서 보존, 중복 제거.
- 문자열이 아닌 원소 TypeError
- 문자열 / 공백만 있는 태그 제거
- 일반 태그 strip() 통과
- @library/ 태그 normalize_library_path() 적용
- 중복 출현만 유지 (입력 순서 보존)
"""
result: list[str] = []
for tag in tags:
if not isinstance(tag, str):
raise TypeError(f"태그는 문자열이어야 합니다: {tag!r}")
tag = tag.strip()
if not tag:
continue
if tag.startswith(LIBRARY_PREFIX):
path = tag[len(LIBRARY_PREFIX):]
normalized = normalize_library_path(path)
tag = f"{LIBRARY_PREFIX}{normalized}"
result.append(tag)
# 중복 제거 (입력 순서 보존)
seen: set[str] = set()
deduped: list[str] = []
for t in result:
if t not in seen:
seen.add(t)
deduped.append(t)
return deduped
-62
View File
@@ -1,62 +0,0 @@
"""외부 피드 URL 검증 — SSRF 차단 + redirect target 재검증
등록 validate_feed_url() 1 검증, fetch redirect target마다
동일 함수로 재검증. 완전한 TOCTOU 방어는 httpx transport 레벨 후킹이
필요하므로 이중 검증이 현재 현실적 상한선.
"""
import ipaddress
import socket
from urllib.parse import urlparse
ALLOWED_SCHEMES = {"https"}
# HTTP 예외 도메인 — 여기에 없으면 HTTPS만 허용
# 추가 시 사유/승인일/재검토일을 주석에 기록
HTTP_EXCEPTION_DOMAINS: set[str] = {
"www.scmp.com", # 2026-04-13 승인, HTTPS→HTTP 301 redirect. 2026-07 재검토
}
def _is_blocked_ip(ip: ipaddress.IPv4Address | ipaddress.IPv6Address) -> bool:
"""ipaddress 내장 속성으로 넓게 차단 (단순 대역 비교보다 안전)"""
return (
ip.is_private
or ip.is_loopback
or ip.is_link_local
or ip.is_reserved
or ip.is_multicast
or ip.is_unspecified
# Tailscale CGNAT 대역 (is_private에 포함 안 됨)
or ip in ipaddress.ip_network("100.64.0.0/10")
)
def validate_feed_url(url: str, allow_http: bool = False) -> str:
"""URL 검증. 실패 시 ValueError raise.
allow_http는 HTTP_EXCEPTION_DOMAINS allowlist 연동 시에만 사용.
API 파라미터로 노출하지 않는다.
"""
parsed = urlparse(url)
allowed = ALLOWED_SCHEMES | ({"http"} if allow_http else set())
if parsed.scheme not in allowed:
raise ValueError(f"허용되지 않은 스킴: {parsed.scheme}")
if not parsed.hostname:
raise ValueError("호스트명 누락")
# DNS 해석 후 IP 차단
try:
addrs = socket.getaddrinfo(parsed.hostname, None)
except socket.gaierror:
raise ValueError(f"DNS 해석 실패: {parsed.hostname}")
for _, _, _, _, sockaddr in addrs:
ip = ipaddress.ip_address(sockaddr[0])
if _is_blocked_ip(ip):
# IP 자체를 에러에 노출하지 않음 — hostname만
raise ValueError(f"차단된 네트워크: {parsed.hostname}")
return url
+30
View File
@@ -106,3 +106,33 @@ END:VCALENDAR"""
except Exception as e:
logging.getLogger("caldav").error(f"CalDAV VTODO 생성 실패: {e}")
return None
# ─── SMTP 헬퍼 ───
def send_smtp_email(
host: str,
port: int,
username: str,
password: str,
subject: str,
body: str,
to_addr: str | None = None,
):
"""Synology MailPlus SMTP로 이메일 발송"""
import smtplib
from email.mime.text import MIMEText
to_addr = to_addr or username
msg = MIMEText(body, "plain", "utf-8")
msg["Subject"] = subject
msg["From"] = username
msg["To"] = to_addr
try:
with smtplib.SMTP_SSL(host, port, timeout=30) as server:
server.login(username, password)
server.send_message(msg)
except Exception as e:
logging.getLogger("smtp").error(f"SMTP 발송 실패: {e}")
-1
View File
@@ -1 +0,0 @@
"""이드(eid) — 운영 비서 substrate compose + 액션 dispatch 모듈."""
-41
View File
@@ -1,41 +0,0 @@
"""이드 실행 컨텍스트 LLM 클라이언트 — egress 코드층 박탈 (W4-1).
설계 0-4 / project_eid_persona_substrate 불변식 #5: 이드 LLM = call_primary(:8801 Mac mini MLX) 만.
공인 Claude(ai.fallback) 경로를 *구조적으로* 차단 같은 fastapi 컨테이너에 합법 egress 워커
(daily_digest SMTP·law_monitor CalDAV ) import 있어도 이드는 클라이언트라 fallback/외부
endpoint 부른다(silent fallback 0, rules no-silent-fallback).
차단 3 (코드층 = 1·확정 가드. 네트워크 default-deny = W4-2 belt, 조건부):
- call_fallback() raise (공인 Claude 직접 호출 봉쇄)
- _call_chat() 자동 fallback 분기 제거(primary 실패 = re-raise caller 503)
- _request() endpoint anthropic.com 있으면 raise(primary 오결선 방어, 이중보증)
call_primary / call_triage / embed / rerank 그대로(내부 inference·임베딩 허용).
egress 워커·시스템 경로는 기존 AIClient 유지 fallback 시스템만, 이드만 박탈(분리).
"""
from __future__ import annotations
from ai.client import AIClient
class EidEgressBlocked(RuntimeError):
"""이드 컨텍스트에서 외부 egress(공인 Claude 등) 시도 — 코드층 박탈로 차단."""
class EidAIClient(AIClient):
"""이드 전용 — call_primary only. fallback/외부 endpoint 구조적 봉쇄. AIClient drop-in."""
async def call_fallback(self, prompt: str) -> str:
raise EidEgressBlocked(
"이드: 공인 Claude fallback 금지(egress 코드층 박탈). call_primary(:8801) 만 허용."
)
async def _call_chat(self, model_config, prompt: str) -> str:
# 자동 fallback 분기 제거 — primary 실패는 그대로 raise(caller 가 503 매핑, silent fallback 0).
return await self._request(model_config, prompt)
async def _request(self, model_config, prompt: str, system: str | None = None) -> str:
endpoint = getattr(model_config, "endpoint", "") or ""
if "anthropic.com" in endpoint:
raise EidEgressBlocked(f"이드: 외부 endpoint 차단 ({endpoint}). 내부 inference 만.")
return await super()._request(model_config, prompt, system=system)
-162
View File
@@ -1,162 +0,0 @@
"""이드 substrate compose — persona → rules → overlay → task 단일 system 문자열.
설계 정본 : PKM plans/2026-06-05-eid-persona-substrate-plan.html (eid-persona-substrate, r1~r3 수렴)
구현 plan : plans/2026-06-07-eid-persona-impl-plan.html (W2-1)
불변식 : memory project_eid_persona_substrate (load-bearing 9)
핵심 불변식 (바꾸지 위반 = 설계 회귀):
#3 "강력하게" = 출력계약 경계(균질주입 아님). 자유-prose 표면 = persona ON,
STRICT JSON 기계류 = persona ZERO. 판정 = 정적 ROUTE_MAP(런타임 sniffing 아님).
#4 합본 = persona → rules → overlay → task. rules 는 합본의 *명시 항*(compose 가 반드시 끼움)
'rules 부재 = fail-loud' 성립. 충돌 rules > persona, overlay rules.
persona 부재 = quiet fail-open / rules 부재 = fail-loud(degraded 배너 + 로그).
#2 overlay 는 delta-only. injection 방어는 공통 rules(rules.md)에 있음(overlay 아님, never-dropped).
스코프: 사용자대면 자유-prose 표면만. STRICT JSON 기계류 9종은 ROUTE_MAP 부재 compose 우회(task-only).
의존성: stdlib only (DB·yaml·LLM 불필요). 입력 = app/prompts/substrate/ vendored 아티팩트.
"""
from __future__ import annotations
import logging
from functools import lru_cache
from pathlib import Path
logger = logging.getLogger("eid.compose")
# vendored 아티팩트 (sync = app/prompts/substrate/README.md)
_SUBSTRATE_DIR = Path(__file__).resolve().parent.parent / "prompts" / "substrate"
_OVERLAY_DIR = _SUBSTRATE_DIR / "overlays"
# 합본 구분자 — MLX 다중 system role 위험 회피용 단일 문자열 join (설계 0-3)
SEP = "\n\n---\n\n"
# variant → persona 아티팩트 파일명. 26B/27B = full, 4B = compact.
_PERSONA_FILES = {"full": "persona.full.md", "compact": "persona.compact.md"}
# rules 미주입 시 degraded 배너 (fail-loud — silent 빈문자열 금지, 불변식 #4)
_RULES_DEGRADED = (
"[substrate-degraded: 운영 규칙(rules) 미주입 — 안전·정책 가드 없이 동작 중. "
"app/prompts/substrate/rules.md 부재. 관리자 확인 필요.]"
)
# ── 정적 ROUTE_MAP (surface → overlay + variant). 런타임 출력 sniffing 아님(불변식 #3). ──
# overlay=None → 자유-prose 표면(persona + rules + task, 기능 overlay 없음).
# overlay name → 미래 active eid 표면(W3+ 배선). variant = persona 변형(현재 전부 26B/27B = full).
# 미등록 surface(.get None) → base(persona + rules + task) + 가시 로그.
_ROUTE: dict[str, dict] = {
# W2-2 wire 대상 — 자유-prose, 기능 overlay 없음(base)
"react_ask": {"overlay": None, "variant": "full"},
"study_subject_note": {"overlay": None, "variant": "full"},
"study_question_explanation": {"overlay": None, "variant": "full"},
# 미래 active eid 표면 — 기능 overlay (W3+ 에서 호출 배선)
"study_diagnosis": {"overlay": "study", "variant": "full"},
"document_brief": {"overlay": "document", "variant": "full"},
"news_brief": {"overlay": "news", "variant": "full"},
"recap_brief": {"overlay": "recap", "variant": "full"},
"schedule_brief": {"overlay": "schedule", "variant": "full"},
}
class SubstrateOverflow(RuntimeError):
"""non-droppable floor 가 모델 budget 초과 — fail-loud(26B 에스컬레이트), 절대 silent drop 안 함."""
@lru_cache(maxsize=8)
def _read(path_str: str) -> str | None:
"""파일 읽기(캐시). 부재 = None (호출부가 quiet/loud 결정)."""
p = Path(path_str)
if not p.is_file():
return None
return p.read_text(encoding="utf-8").strip()
def _persona(variant: str) -> str:
"""persona 변형 로드. 부재 = quiet fail-open(빈 문자열) — voice 는 cosmetic(불변식 #4)."""
fname = _PERSONA_FILES.get(variant)
if fname is None:
logger.debug("eid.compose: unknown persona variant %r → quiet skip", variant)
return ""
text = _read(str(_SUBSTRATE_DIR / fname))
if text is None:
logger.debug("eid.compose: persona %r absent → quiet fail-open", fname)
return ""
return text
def _rules() -> str:
"""rules 로드. 부재 = fail-loud(degraded 배너 + error 로그) — 정책은 silent 누락 금지(불변식 #4)."""
text = _read(str(_SUBSTRATE_DIR / "rules.md"))
if text is None:
logger.error(
"eid.compose: rules.md ABSENT — substrate degraded (안전·정책 가드 없이 동작). "
"app/prompts/substrate/rules.md 확인 필요."
)
return _RULES_DEGRADED
return text
def _overlay(name: str | None) -> str:
"""기능 overlay 로드. name=None → 빈 문자열(base). 미존재 파일 = fail-loud(error 로그 + 빈)."""
if name is None:
return ""
text = _read(str(_OVERLAY_DIR / f"{name}.txt"))
if text is None:
logger.error("eid.compose: overlay %r 파일 부재 → base 로 degrade", name)
return ""
return text
def is_composed_surface(surface: str) -> bool:
"""이 surface 가 ROUTE_MAP 에 등록된 compose 대상인가(= persona 주입 표면인가)."""
return surface in _ROUTE
def compose(surface: str, task: str, *, variant: str | None = None,
budget_chars: int | None = None) -> str:
"""persona → rules → overlay → task 단일 system 문자열 합성.
surface : 정적 ROUTE_MAP . 미등록이면 base(persona+rules+task) + 가시 로그.
task : 표면 고유 지시(기존 prompt txt 본문). 합본의 마지막 .
variant : persona 변형 override. None = ROUTE_MAP variant(기본 full).
budget_chars: 모델 system 예산(char). None = 무제한(26B/27B 경로). 설정 non-droppable
floor(persona+rules+overlay) 초과면 SubstrateOverflow(fail-loud, 절대 silent drop X).
반환: SEP join system 문자열. (persona 부재 ) join 에서 제외.
"""
route = _ROUTE.get(surface)
if route is None:
logger.info(
"eid.compose: surface %r ROUTE_MAP 미등록 → base(persona+rules+task)", surface
)
v = variant or "full"
overlay_name = None
else:
v = variant or route["variant"]
overlay_name = route["overlay"]
persona = _persona(v)
rules = _rules() # 항상 비-빈(degraded 배너라도) → 합본의 명시 항 보장
overlay = _overlay(overlay_name)
# non-droppable floor = persona + rules + overlay (task 제외). budget 초과 = fail-loud.
if budget_chars is not None:
floor = len(SEP.join(p for p in (persona, rules, overlay) if p))
if floor > budget_chars:
logger.error(
"eid.compose: non-droppable floor %d char > budget %d (surface=%r, variant=%r) "
"→ fail-loud, 26B 에스컬레이트 필요(silent drop 안 함)",
floor, budget_chars, surface, v,
)
raise SubstrateOverflow(
f"floor {floor} > budget {budget_chars} for surface={surface!r} variant={v!r}"
)
parts = [persona, rules, overlay, task]
return SEP.join(p for p in parts if p)
def clear_cache() -> None:
"""vendored 아티팩트 sync 후 재로드용(1회 캐시 불변식). 프로세스 재시작 대안."""
_read.cache_clear()
-1
View File
@@ -1 +0,0 @@
"""이드 액션 도구 — 고정 enum dispatch (동적 해석 0)."""
-131
View File
@@ -1,131 +0,0 @@
"""이드 액션 dispatch — 고정 enum, 동적 해석 0 (egress 코드층 능력박탈 1차).
설계 정본 : PKM plans/2026-06-05-eid-persona-substrate-plan.html §3-1 (고정 dispatch 불변식)
구현 plan : plans/2026-06-07-eid-persona-impl-plan.html (W2-4)
불변식 : memory project_eid_persona_substrate #5, #8
핵심 (바꾸지 위반 = egress 잠금 회귀):
- LLM action 명을 *닫힌 enum* 대조. getattr/eval/동적 import/setattr 0. 미지 = reject.
ReAct action *고르는* 자체는 허용(루프 본질) 막는 *이름의 동적 해석*.
- enum egress verb(send_smtp_email/create_caldav_todo/httpx/call_fallback) *미포함*
이중 보증(import-time assert 강제). 같은 컨테이너에 egress 함수가 import 있어도
이드는 이름을 dispatch 없다.
- 핸들러 = 정적 dict 매핑(register_handler 명시 등록). 동적 발견 아님. 미등록 = reject.
- T3 external = 권한 0. Phase1 request_external_approval = *즉시 거부*(INSERT ).
dispatcher 없는 상태에서 pending 무한적재 + 소비 되는 노출 회피. pending INSERT
dispatcher 있는 Phase3 부터(W2-4 'INSERT만' D-2 침묵 불일치 해소).
의존성: stdlib only. 실제 read/write 핸들러는 W3(eid_* migration) register_handler 주입.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Callable
logger = logging.getLogger("eid.dispatch")
class EidAction(str, Enum):
"""이드 호출 가능 액션 화이트리스트. *내부 액션만* — egress verb 절대 미포함.
Tier (project_eid_persona_substrate #8):
T0 read = 자율 / T1 write-derived = 자율(append-only) / T2 action = 조건부(1클릭)
T3 external = 권한 0 (approval_requests 큐만, Phase1 = 즉시 거부)
"""
# ── T0 read (자율) ──
READ_DOCUMENTS = "read_documents"
READ_EVENTS = "read_events"
READ_STUDY = "read_study"
READ_NEWS = "read_news"
# ── T1 write-derived (append-only, 자율) — 핸들러는 W3(eid_* 테이블) 후 ──
WRITE_STUDY_WEAKNESS = "write_study_weakness"
WRITE_REVIEW_SET_DRAFT = "write_review_set_draft"
WRITE_WEEKLY_RECAP = "write_weekly_recap"
# ── T2 conditional (사용자 1클릭 승인 후) ──
SCHEDULE_REVIEW_SET = "schedule_review_set"
# ── T3 external = 권한 0. Phase1 = 즉시 거부(아래 dispatch 특수 분기) ──
REQUEST_EXTERNAL_APPROVAL = "request_external_approval"
ALLOWED_ACTIONS: frozenset[str] = frozenset(a.value for a in EidAction)
# egress verb 블랙리스트 — enum 에 *절대* 없어야 함(이중 보증). 같은 프로세스에 import 된
# core/utils.send_smtp_email·create_caldav_todo / httpx / ai.client.call_fallback 등을 가리킴.
_FORBIDDEN_EGRESS_VERBS: frozenset[str] = frozenset({
"send_smtp_email", "create_caldav_todo", "call_fallback",
"httpx", "http_get", "http_post", "fetch_url", "fetch",
"webhook", "push", "send_email", "upload", "post_external",
})
# import-time 단언: 화이트리스트와 egress verb 교집합 = 0 (불변식 #5 이중 보증)
assert not (ALLOWED_ACTIONS & _FORBIDDEN_EGRESS_VERBS), (
"eid dispatch enum 에 egress verb 포함 — 불변식 #5 위반: "
f"{sorted(ALLOWED_ACTIONS & _FORBIDDEN_EGRESS_VERBS)}"
)
@dataclass
class DispatchResult:
ok: bool
action: str
reason: str = ""
data: Any = None
meta: dict = field(default_factory=dict)
# 정적 핸들러 매핑 — action(str) → callable(args:dict) → data. getattr/동적 X.
# 부팅 시 register_handler 로 명시 등록(W3+). 미등록 action = reject(핸들러 없음).
_HANDLERS: dict[str, Callable[[dict], Any]] = {}
def register_handler(action: EidAction, fn: Callable[[dict], Any]) -> None:
"""핸들러 정적 등록(명시). 동적 발견 아님. egress 분기는 등록 불가(아래 가드)."""
if action.value in _FORBIDDEN_EGRESS_VERBS: # 도달 불가(enum 가드)이나 방어적 이중확인
raise ValueError(f"egress verb 핸들러 등록 거부: {action.value}")
if action == EidAction.REQUEST_EXTERNAL_APPROVAL:
raise ValueError("request_external_approval 은 Phase1 즉시거부 — 핸들러 등록 불가")
_HANDLERS[action.value] = fn
def _reject(action: str, reason: str) -> DispatchResult:
logger.warning("eid.dispatch REJECT action=%r reason=%s", action, reason)
return DispatchResult(ok=False, action=action, reason=reason)
def dispatch(action: str, args: dict | None = None) -> DispatchResult:
"""이드가 고른 action 을 *고정 분기*로 실행. 동적 이름 해석 0.
1) 닫힌 enum 화이트리스트 대조 미지 = reject (getattr/eval ).
2) T3 external Phase1 = 즉시 거부(INSERT ).
3) 정적 핸들러 dict lookup 미등록 = reject (W3 이전엔 read/write 핸들러 부재).
"""
args = args or {}
# 1) allowlist (닫힌 enum). 동적 해석 없이 멤버십만 본다.
if action not in ALLOWED_ACTIONS:
return _reject(action, "unknown action — eid enum 화이트리스트 외 (동적 해석 거부)")
# 2) T3 external = 권한 0. Phase1 즉시 거부(적재 안 함).
if action == EidAction.REQUEST_EXTERNAL_APPROVAL.value:
return _reject(
action,
"external egress = 권한 0. Phase1: 승인큐 비활성 → 거부(pending 적재 안 함). "
"외부 전송은 사용자(요청자≠집행자) 경유.",
)
# 3) 정적 핸들러 lookup (dict — getattr 아님). 미등록 = reject.
fn = _HANDLERS.get(action)
if fn is None:
return _reject(action, "handler 미등록 (W3 eid_* 핸들러 주입 이전)")
try:
data = fn(args)
except Exception as exc: # 핸들러 오류 = reject(loud), 다른 분기로 새지 않음
logger.exception("eid.dispatch handler error action=%r", action)
return _reject(action, f"handler error: {type(exc).__name__}")
return DispatchResult(ok=True, action=action, data=data)
+6 -108
View File
@@ -6,30 +6,12 @@ from fastapi import FastAPI, Request
from fastapi.responses import RedirectResponse
from sqlalchemy import func, select, text
from api.audio import router as audio_router
from api.internal_study import router as internal_study_router
from api.internal_worker import router as internal_worker_router
from api.auth import router as auth_router
from api.briefing import router as briefing_router
from api.config import router as config_router
from api.dashboard import router as dashboard_router
from api.digest import router as digest_router
from api.document_notes import router as document_notes_router
from api.document_reads import router as document_reads_router
from api.documents import router as documents_router
from api.events import router as events_router
from api.library import router as library_router
from api.memos import router as memos_router
from api.news import router as news_router
from api.search import router as search_router
from api.setup import router as setup_router
from api.study_question_progress import router as study_question_progress_router
from api.study_questions import router as study_questions_router
from api.study_sessions import router as study_sessions_router
from api.study_topics import router as study_topics_router
from api.study_reminders import router as study_reminders_router
from api.study_cards import router as study_cards_router
from api.video import router as video_router
from core.config import settings
from core.database import async_session, engine, init_db
from models.user import User
@@ -38,35 +20,14 @@ from models.user import User
@asynccontextmanager
async def lifespan(app: FastAPI):
"""앱 시작/종료 시 실행되는 lifespan 핸들러"""
import asyncio
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
from zoneinfo import ZoneInfo
KST = ZoneInfo("Asia/Seoul")
from services.search.query_analyzer import prewarm_analyzer
from workers.briefing_worker import run as morning_briefing_run
from workers.daily_digest import run as daily_digest_run
from workers.dedup_reconcile import run as dedup_reconcile_run
from workers.digest_worker import run as global_digest_run
from workers.file_watcher import watch_inbox
from workers.law_monitor import run as law_monitor_run
from workers.mailplus_archive import run as mailplus_run
from workers.news_collector import run as news_collector_run
from workers.queue_consumer import consume_queue, consume_markdown_queue
from workers.study_queue_consumer import consume_study_queue
from workers.study_session_queue_consumer import consume_study_session_queue
from workers.study_memo_card_jobs_consumer import consume_study_memo_card_queue
from workers.study_card_enqueue import run as study_card_enqueue_run
from workers.study_reminder import run as study_reminder_run
from workers.study_weakness import run as study_weakness_run
from workers.study_question_embed_worker import (
refresh_stale_related as study_q_related_refresh,
run as study_q_embed_run,
)
from workers.tier_backfill import run as tier_backfill_run
from workers.upload_cleanup import cleanup_orphan_uploads
from workers.queue_consumer import consume_queue
# 시작: DB 연결 확인
await init_db()
@@ -84,57 +45,15 @@ async def lifespan(app: FastAPI):
scheduler = AsyncIOScheduler(timezone="Asia/Seoul")
# 상시 실행
scheduler.add_job(consume_queue, "interval", minutes=1, id="queue_consumer")
# PR-DocSrv-Markdown-Consumer-Split-1: markdown(marker) 전용 consumer.
# 대형 PDF split 변환(수십 분)이 메인 consume_queue 를 점유해 전 파이프라인을
# stall 시키던 문제 제거. max_instances=1(기본) 으로 동시 marker 변환 2건은 방지.
scheduler.add_job(consume_markdown_queue, "interval", minutes=1, id="markdown_consumer")
scheduler.add_job(watch_inbox, "interval", minutes=5, id="file_watcher")
scheduler.add_job(cleanup_orphan_uploads, "interval", minutes=10, id="upload_cleanup")
# PR-4: study_questions 자동 임베딩 (status='none/failed/stale' 행을 batch=10 처리).
# 별도 큐 테이블 없이 status 자체가 큐. backfill 도 cron 이 'none' 행을 자연스럽게 처리.
scheduler.add_job(study_q_embed_run, "interval", minutes=1, id="study_q_embed")
# PR-12-A 후속: related-types 캐시 stale 행 재계산. 임베딩 워커와 분리한 별도 cron.
# 새 문제 ready / 같은 토픽 invalidation / 임계값 변경 시 NULL 마킹된 행을 batch=20 처리.
scheduler.add_job(study_q_related_refresh, "interval", minutes=1, id="study_q_related_refresh")
# Phase 4-A: study_question_jobs 처리 — wrong/unsure AI 풀이 prefetch.
# MLX gate 직렬화 + BATCH_SIZE=1 로 GPU 부하 통제. STALE_MINUTES=10 자체 복구.
scheduler.add_job(consume_study_queue, "interval", minutes=1, id="study_queue_consumer")
# Phase 4-B v1: study_quiz_session_jobs 처리 — 세션 단위 자유 마크다운 분석.
# 4-A 와 같은 MLX gate 공유 — 4-A 처리 중이면 직렬 대기.
scheduler.add_job(consume_study_session_queue, "interval", minutes=1, id="study_session_queue_consumer")
# 공부 암기노트 Phase 1: card_extract 큐 consumer + 버전키 폴러(study_card_enqueue).
# 별 테이블/별 consumer 로 기존 study queue 와 격리. settings.study_card_extract_enabled 게이트.
scheduler.add_job(consume_study_memo_card_queue, "interval", minutes=1, id="study_memo_card_consumer")
scheduler.add_job(study_card_enqueue_run, "interval", minutes=1, id="study_card_enqueue")
# PR-B 레거시 tier 백필 — 30분 주기로 호출되지만 KST 00:00~06:00 시간대만 실제 enqueue.
# safety > law > manual 우선순위로 25건씩. 6720 레거시 → 야간당 ~150건 → 약 45일 소화.
scheduler.add_job(tier_backfill_run, "interval", minutes=30, id="tier_backfill")
# 일일 스케줄 (KST)
scheduler.add_job(law_monitor_run, CronTrigger(hour=7, timezone=KST), id="law_monitor")
scheduler.add_job(mailplus_run, CronTrigger(hour=7, timezone=KST), id="mailplus_morning")
scheduler.add_job(mailplus_run, CronTrigger(hour=18, timezone=KST), id="mailplus_evening")
scheduler.add_job(daily_digest_run, CronTrigger(hour=20, timezone=KST), id="daily_digest")
scheduler.add_job(global_digest_run, CronTrigger(hour=4, minute=0, timezone=KST), id="global_digest")
scheduler.add_job(morning_briefing_run, CronTrigger(hour=5, minute=10, timezone=KST), id="morning_briefing")
# 공부 암기노트 Phase 1: 공부중 토픽 due 요약 알람 재료 (09/13/19 KST). LLM 0.
scheduler.add_job(study_reminder_run, CronTrigger(hour="9,13,19", timezone=KST), id="study_reminder")
# 이드 W3-2: 공부중 토픽 약점 derived 스냅샷 (nightly 04:30 KST, LLM 0). study_diagnosis 표면 source.
scheduler.add_job(study_weakness_run, CronTrigger(hour=4, minute=30, timezone=KST), id="study_weakness")
scheduler.add_job(law_monitor_run, CronTrigger(hour=7), id="law_monitor")
scheduler.add_job(mailplus_run, CronTrigger(hour=7), id="mailplus_morning")
scheduler.add_job(mailplus_run, CronTrigger(hour=18), id="mailplus_evening")
scheduler.add_job(daily_digest_run, CronTrigger(hour=20), id="daily_digest")
scheduler.add_job(news_collector_run, "interval", hours=6, id="news_collector")
# plan ds-s1-backend-1 B-4: dedup 컬럼(duplicate_of/duplicate_count) 야간 절대 재계산.
# soft-delete 잔여 드리프트 정리(멱등, 드리프트 없으면 no-op). cron 03:30 (다른 잡과 비충돌).
scheduler.add_job(dedup_reconcile_run, CronTrigger(hour=3, minute=30, timezone=KST), id="dedup_reconcile")
scheduler.start()
# Phase 2.1 (async 구조): QueryAnalyzer prewarm.
# 대표 쿼리 15~20개를 background task로 분석해 cache 적재.
# 첫 사용자 요청부터 cache hit rate 70~80% 목표.
# 논블로킹 — startup을 막지 않음. MLX 부하 완화 위해 delay_between=0.5.
prewarm_task = asyncio.create_task(prewarm_analyzer())
prewarm_task.add_done_callback(
lambda t: t.exception() and None # 예외는 query_analyzer 내부에서 로깅
)
yield
# 종료: 스케줄러 → DB 순서로 정리
@@ -151,33 +70,12 @@ app = FastAPI(
# ─── 라우터 등록 ───
app.include_router(setup_router, prefix="/api/setup", tags=["setup"])
app.include_router(config_router, prefix="/api/config", tags=["config"])
app.include_router(auth_router, prefix="/api/auth", tags=["auth"])
app.include_router(documents_router, prefix="/api/documents", tags=["documents"])
# 회독 카운트 — /api/documents/{id}/read* 경로. documents_router 와 prefix 같아 충돌 없음.
app.include_router(document_reads_router, prefix="/api/documents", tags=["document-reads"])
app.include_router(document_notes_router, prefix="/api/documents", tags=["document-notes"])
app.include_router(search_router, prefix="/api/search", tags=["search"])
app.include_router(memos_router, prefix="/api/memos", tags=["memos"])
app.include_router(events_router, prefix="/api/events", tags=["events"])
app.include_router(dashboard_router, prefix="/api/dashboard", tags=["dashboard"])
app.include_router(library_router, prefix="/api/library", tags=["library"])
app.include_router(news_router, prefix="/api/news", tags=["news"])
app.include_router(digest_router, prefix="/api/digest", tags=["digest"])
app.include_router(briefing_router, prefix="/api/briefing", tags=["briefing"])
app.include_router(audio_router, prefix="/api/audio", tags=["audio"])
app.include_router(internal_study_router, prefix="/internal/study", tags=["internal-study"])
app.include_router(internal_worker_router, prefix="/internal/worker", tags=["internal-worker"])
app.include_router(video_router, prefix="/api/video", tags=["video"])
app.include_router(study_sessions_router, prefix="/api/study-sessions", tags=["study-sessions"])
app.include_router(study_topics_router, prefix="/api/study-topics", tags=["study-topics"])
# study_questions: 라우터 안에서 /study-topics/{id}/questions 와 /study-questions/{id} 두 줄기를 모두 정의하므로 prefix=/api 로 등록
app.include_router(study_questions_router, prefix="/api", tags=["study-questions"])
app.include_router(study_reminders_router, prefix="/api/study-reminders", tags=["study-reminders"])
app.include_router(study_cards_router, prefix="/api/study-cards", tags=["study-cards"])
# Phase 1: 학습 진행 상태 (review-complete + review-queue). prefix=/api/study-topics 안에 정의됨.
app.include_router(study_question_progress_router, prefix="/api", tags=["study-progress"])
# TODO: Phase 5에서 추가
# app.include_router(tasks.router, prefix="/api/tasks", tags=["tasks"])
@@ -186,7 +84,7 @@ app.include_router(study_question_progress_router, prefix="/api", tags=["study-p
# ─── 셋업 미들웨어: 유저 0명이면 /setup으로 리다이렉트 ───
SETUP_BYPASS_PREFIXES = (
"/api/setup", "/api/config", "/setup", "/health", "/docs", "/openapi.json", "/redoc",
"/api/setup", "/setup", "/health", "/docs", "/openapi.json", "/redoc",
)
-63
View File
@@ -1,63 +0,0 @@
"""analyze_events 테이블 ORM — POST /documents/{id}/analyze 호출 관측 (Phase E.2)
목적: 분석 failure mode 분류 (timeout / parse / llm / missing_summary) +
source 사용 패턴 (document_server / synology_chat / ui_search / ui_detail / eval).
단계 3 snapshot DB 설계 입력이 .
"""
from datetime import datetime
from typing import Any
from sqlalchemy import ARRAY, BigInteger, Boolean, DateTime, Float, ForeignKey, Integer, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class AnalyzeEvent(Base):
__tablename__ = "analyze_events"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
doc_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
user_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="SET NULL")
)
mode: Mapped[str] = mapped_column(Text, default="quick", nullable=False) # quick / full / summary_triage / summary_deep / retrieval_select / synthesis
text_limit: Mapped[int | None] = mapped_column(Integer)
truncated: Mapped[bool] = mapped_column(Boolean, default=False)
layers_returned: Mapped[list[Any] | None] = mapped_column(JSONB, default=list)
cached: Mapped[bool] = mapped_column(Boolean, default=False)
latency_ms: Mapped[int | None] = mapped_column(Integer)
model_name: Mapped[str | None] = mapped_column(Text)
prompt_version: Mapped[str | None] = mapped_column(Text)
# None (success) | "timeout" | "llm" | "parse" | "missing_summary" | "no_text"
error_code: Mapped[str | None] = mapped_column(Text)
# document_server / synology_chat / ui_search / ui_detail / eval / unknown
source: Mapped[str] = mapped_column(Text, default="document_server", nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
# PR-A (migration 153) — routing shadow observability
subject_domain: Mapped[str | None] = mapped_column(Text)
risk_flags: Mapped[list[str] | None] = mapped_column(ARRAY(Text))
high_impact_task: Mapped[bool | None] = mapped_column(Boolean)
escalated_to_26b: Mapped[bool | None] = mapped_column(Boolean)
escalation_reasons: Mapped[list[str] | None] = mapped_column(ARRAY(Text))
confidence: Mapped[float | None] = mapped_column(Float)
policy_violation: Mapped[bool | None] = mapped_column(Boolean)
policy_violation_ids: Mapped[list[str] | None] = mapped_column(ARRAY(Text))
shadow_would_route_to: Mapped[str | None] = mapped_column(Text)
policy_version: Mapped[str | None] = mapped_column(Text)
# PR-B (migration 159) — 실제 호출 tier 와 R2 backlog guard 이벤트
tier: Mapped[str | None] = mapped_column(Text) # 'triage' | 'primary' | 'fallback'
suppressed_reason: Mapped[str | None] = mapped_column(Text) # 'backlog_guard(ratio=0.42,pending=7)'
# PR-B B-2 (migration 161) — /ask 3-state answerability 독립 컬럼
answerability: Mapped[str | None] = mapped_column(Text) # 'direct' | 'partial' | 'insufficient'
partial_basis: Mapped[bool | None] = mapped_column(Boolean) # partial 답변이 실제 생성됐는지
suggested_query_count: Mapped[int | None] = mapped_column(Integer)
-48
View File
@@ -1,48 +0,0 @@
"""ask_events 테이블 ORM — /ask 호출 관측 (Phase 3.5a migration 102, Phase 3.5b 배선)
threshold calibration + verifier FP 분석 + defense layer 디버깅 데이터.
"""
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, Boolean, DateTime, Float, ForeignKey, Integer, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class AskEvent(Base):
__tablename__ = "ask_events"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
query: Mapped[str] = mapped_column(Text, nullable=False)
user_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="SET NULL")
)
completeness: Mapped[str | None] = mapped_column(Text) # full / partial / insufficient
synthesis_status: Mapped[str | None] = mapped_column(Text)
confidence: Mapped[str | None] = mapped_column(Text) # high / medium / low
refused: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
classifier_verdict: Mapped[str | None] = mapped_column(Text) # sufficient / insufficient
max_rerank_score: Mapped[float | None] = mapped_column(Float)
aggregate_score: Mapped[float | None] = mapped_column(Float)
hallucination_flags: Mapped[list[Any] | None] = mapped_column(JSONB, default=list)
evidence_count: Mapped[int | None] = mapped_column(Integer)
citation_count: Mapped[int | None] = mapped_column(Integer)
defense_layers: Mapped[dict[str, Any] | None] = mapped_column(JSONB)
total_ms: Mapped[int | None] = mapped_column(Integer)
# Phase E.1: 측정 필드 확장 (answer_length가 E.3 400→600자 비교 핵심)
answer_length: Mapped[int | None] = mapped_column(Integer)
covered_aspects: Mapped[list[Any] | None] = mapped_column(JSONB)
missing_aspects: Mapped[list[Any] | None] = mapped_column(JSONB)
model_name: Mapped[str | None] = mapped_column(Text)
prompt_version: Mapped[str | None] = mapped_column(Text)
# Phase 3.5 calibration: eval/production 분리 + golden join 키
# 138~141 단계: nullable. 142 적용 후 source 는 NOT NULL (DB 강제, 앱은 항상 채움).
source: Mapped[str | None] = mapped_column(Text)
eval_case_id: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
-18
View File
@@ -1,18 +0,0 @@
"""audio_segments 테이블 ORM — STT 전사 결과의 타임스탬프 세그먼트."""
from sqlalchemy import BigInteger, Float, ForeignKey, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class AudioSegment(Base):
__tablename__ = "audio_segments"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
start_s: Mapped[float] = mapped_column(Float, nullable=False)
end_s: Mapped[float] = mapped_column(Float, nullable=False)
text: Mapped[str] = mapped_column(Text, nullable=False)
-103
View File
@@ -1,103 +0,0 @@
"""morning_briefings + briefing_topics 테이블 ORM (야간 수집 뉴스 브리핑).
axis 반대: Phase 4 = country×topic / Briefing = topic×country.
country_perspectives JSONB 안에 topic 여러 국가 관점 array.
"""
from datetime import date, datetime
from sqlalchemy import (
BigInteger,
Boolean,
Date,
DateTime,
Float,
ForeignKey,
Integer,
String,
Text,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class MorningBriefing(Base):
"""하루 단위 브리핑 메타데이터 (KST 자정~05:00 윈도우)"""
__tablename__ = "morning_briefings"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
briefing_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
window_start: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
window_end: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
decay_lambda: Mapped[float] = mapped_column(Float, nullable=False)
total_articles: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_countries: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_topics: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
generation_ms: Mapped[int | None] = mapped_column(Integer)
llm_calls: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
llm_failures: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="success")
headline_oneliner: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
topics: Mapped[list["BriefingTopic"]] = relationship(
back_populates="briefing",
cascade="all, delete-orphan",
order_by="BriefingTopic.topic_rank",
)
class BriefingTopic(Base):
"""1 briefing 안 topic_rank 순 cross-country 비교 분석 결과"""
__tablename__ = "briefing_topics"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
briefing_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("morning_briefings.id", ondelete="CASCADE"),
nullable=False,
)
topic_rank: Mapped[int] = mapped_column(Integer, nullable=False)
topic_label: Mapped[str] = mapped_column(String(120), nullable=False)
headline: Mapped[str] = mapped_column(Text, nullable=False)
country_perspectives: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
divergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
convergences: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
key_quotes: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
historical_article_ids: Mapped[list | None] = mapped_column(JSONB)
historical_context: Mapped[str | None] = mapped_column(Text)
historical_window_days: Mapped[int | None] = mapped_column(Integer)
cluster_members: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
article_count: Mapped[int] = mapped_column(Integer, nullable=False)
country_count: Mapped[int] = mapped_column(Integer, nullable=False)
importance_score: Mapped[float] = mapped_column(Float, nullable=False)
raw_weight_sum: Mapped[float] = mapped_column(Float, nullable=False)
llm_model: Mapped[str | None] = mapped_column(String(100))
llm_fallback_used: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
# 2026-05-13 카드별 사용자 액션 (date picker 와 동반).
is_read: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
read_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
highlighted: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
highlighted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
briefing: Mapped["MorningBriefing"] = relationship(back_populates="topics")
-25
View File
@@ -1,25 +0,0 @@
"""library_categories 테이블 ORM — 자료실 분류 체계 독립 관리"""
from datetime import datetime
from sqlalchemy import BigInteger, Boolean, DateTime, Integer, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class LibraryCategory(Base):
__tablename__ = "library_categories"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
path: Mapped[str] = mapped_column(Text, unique=True, nullable=False)
name: Mapped[str] = mapped_column(Text, nullable=False)
parent_path: Mapped[str | None] = mapped_column(Text, nullable=True)
depth: Mapped[int] = mapped_column(Integer, nullable=False, default=1)
is_system: Mapped[bool] = mapped_column(Boolean, default=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now
)
+1 -9
View File
@@ -3,7 +3,7 @@
from datetime import datetime
from pgvector.sqlalchemy import Vector
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Integer, SmallInteger, String, Text, UniqueConstraint
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, String, Text, UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
@@ -34,14 +34,6 @@ class DocumentChunk(Base):
text: Mapped[str] = mapped_column(Text, nullable=False)
embedding = mapped_column(Vector(1024), nullable=True)
# Hier-Decomp-1: 계층 분해 트리 (migration 282). 기존 chunk_worker INSERT 는 미설정 →
# server_default 로 legacy 행 = in_corpus=true / is_leaf=false 보장.
parent_id: Mapped[int | None] = mapped_column(BigInteger) # 트리 부모. DB FK 미설정(app-level).
level: Mapped[int | None] = mapped_column(SmallInteger) # authoritative depth.
node_type: Mapped[str | None] = mapped_column(Text) # nullable hint, retrieval/replace 활성 조건 미사용.
is_leaf: Mapped[bool] = mapped_column(Boolean, nullable=False, server_default="false") # authoritative leaf 마커.
in_corpus: Mapped[bool] = mapped_column(Boolean, nullable=False, server_default="true") # 검색 코퍼스 편입 여부.
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
-87
View File
@@ -1,87 +0,0 @@
"""global_digests + digest_topics 테이블 ORM (Phase 4)"""
from datetime import date, datetime
from sqlalchemy import (
BigInteger,
Boolean,
Date,
DateTime,
Float,
ForeignKey,
Integer,
String,
Text,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class GlobalDigest(Base):
"""하루 단위 digest run 메타데이터"""
__tablename__ = "global_digests"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
digest_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
window_start: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
window_end: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
decay_lambda: Mapped[float] = mapped_column(Float, nullable=False)
total_articles: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_countries: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
total_topics: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
generation_ms: Mapped[int | None] = mapped_column(Integer)
llm_calls: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
llm_failures: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="success")
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
topics: Mapped[list["DigestTopic"]] = relationship(
back_populates="digest",
cascade="all, delete-orphan",
order_by="DigestTopic.country, DigestTopic.topic_rank",
)
class DigestTopic(Base):
"""country × topic 단위 cluster 결과"""
__tablename__ = "digest_topics"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
digest_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("global_digests.id", ondelete="CASCADE"),
nullable=False,
)
country: Mapped[str] = mapped_column(String(10), nullable=False)
topic_rank: Mapped[int] = mapped_column(Integer, nullable=False)
topic_label: Mapped[str] = mapped_column(Text, nullable=False)
summary: Mapped[str] = mapped_column(Text, nullable=False)
article_ids: Mapped[list] = mapped_column(JSONB, nullable=False)
article_count: Mapped[int] = mapped_column(Integer, nullable=False)
importance_score: Mapped[float] = mapped_column(Float, nullable=False)
raw_weight_sum: Mapped[float] = mapped_column(Float, nullable=False)
centroid_sample: Mapped[dict | None] = mapped_column(JSONB)
llm_model: Mapped[str | None] = mapped_column(String(100))
llm_fallback_used: Mapped[bool] = mapped_column(
Boolean, nullable=False, default=False
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, default=datetime.now
)
digest: Mapped["GlobalDigest"] = relationship(back_populates="topics")
+3 -104
View File
@@ -3,12 +3,10 @@
from datetime import datetime
from pgvector.sqlalchemy import Vector
from sqlalchemy import BigInteger, Boolean, DateTime, Enum, ForeignKey, Integer, String, Text
from sqlalchemy import BigInteger, Boolean, DateTime, Enum, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
# Note: file_type='note' (메모) 문서는 file_path=NULL, file_hash=content SHA-256
from core.database import Base
@@ -18,7 +16,7 @@ class Document(Base):
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
# 1계층: 원본 파일
file_path: Mapped[str | None] = mapped_column(Text, nullable=True)
file_path: Mapped[str] = mapped_column(Text, unique=True, nullable=False)
file_hash: Mapped[str] = mapped_column(String(64), nullable=False)
file_format: Mapped[str] = mapped_column(String(20), nullable=False)
file_size: Mapped[int | None] = mapped_column(BigInteger)
@@ -28,28 +26,11 @@ class Document(Base):
)
import_source: Mapped[str | None] = mapped_column(Text)
# 1계층: 원본명 + 중복검사 (S1-ADD, migration 287)
# original_filename = 업로드 원본 파일명(다운로드 라벨용). file_path 는 충돌 시 _N 리네임됨.
# cf. original_format(ODF 변환용) / original_path·original_hash(007 legacy dead) 와 의미 구분.
# duplicate_of = canonical doc id (자기 자신이 canonical 이면 NULL). FK ON DELETE SET NULL.
# duplicate_count = canonical 행에 담는 '본인 제외 동일 판정 사본 수' (group_size-1). 업로드/backfill 가 갱신.
original_filename: Mapped[str | None] = mapped_column(Text)
duplicate_of: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="SET NULL")
)
duplicate_count: Mapped[int] = mapped_column(
Integer, nullable=False, default=0, server_default="0"
)
# 2계층: 텍스트 추출
extracted_text: Mapped[str | None] = mapped_column(Text)
extracted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
extractor_version: Mapped[str | None] = mapped_column(String(50))
# 2계층: 추출 메타 (OCR 판정/실행)
extract_meta: Mapped[dict | None] = mapped_column(JSONB, default=dict)
ocr_derived: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
# 2계층: AI 가공
ai_summary: Mapped[str | None] = mapped_column(Text)
ai_tags: Mapped[dict | None] = mapped_column(JSONB, default=[])
@@ -61,15 +42,6 @@ class Document(Base):
importance: Mapped[str | None] = mapped_column(String(20), default="medium")
ai_confidence: Mapped[float | None] = mapped_column()
# Memo Intake Upgrade PR-2B — Gemma 4B triage 가 추론한 메모 의도 분류 hint
# ('note' | 'task' | 'calendar_event' | 'activity_log' | 'reference')
# AI 자동 events 생성 X — 사용자 1-click promote 시점에만 events row 생성 (안전 boundary).
ai_event_kind: Mapped[str | None] = mapped_column(
Enum("note", "task", "calendar_event", "activity_log", "reference",
name="event_kind_hint")
)
ai_event_confidence: Mapped[float | None] = mapped_column()
# 3계층: 벡터 임베딩
embedding = mapped_column(Vector(1024), nullable=True)
embed_model_version: Mapped[str | None] = mapped_column(String(50))
@@ -78,22 +50,6 @@ class Document(Base):
# 사용자 메모
user_note: Mapped[str | None] = mapped_column(Text)
# 사용자 태그 (ai_tags와 분리, #태그 파싱 결과 또는 수동 입력)
user_tags: Mapped[list | None] = mapped_column(JSONB, default=[])
# 핀 고정
pinned: Mapped[bool] = mapped_column(Boolean, default=False)
# /ask 합성 포함 여부 (false면 검색은 되지만 evidence에서 제외)
ask_includable: Mapped[bool] = mapped_column(Boolean, default=True)
# 아카이브 (현재 메모 UX 전용, 문서 쪽에는 노출하지 않음)
archived: Mapped[bool] = mapped_column(Boolean, default=False)
# 메모 체크박스별 메타 — {"<task_index>": {"checked_at": "<ISO8601 UTC>"}}
# UI에서 체크 후 10초 경과 항목 숨김 판정에 사용. file_type='note'에서만 의미 있음.
memo_task_state: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
# ODF 변환
derived_path: Mapped[str | None] = mapped_column(Text) # 변환본 경로 (.derived/)
original_format: Mapped[str | None] = mapped_column(String(20))
@@ -117,71 +73,14 @@ class Document(Base):
# 메타데이터
source_channel: Mapped[str | None] = mapped_column(
Enum("law_monitor", "devonagent", "email", "web_clip",
"tksafety", "inbox_route", "manual", "drive_sync", "news", "memo",
"voice", "hermes",
"tksafety", "inbox_route", "manual", "drive_sync", "news",
name="source_channel")
)
# 외부 채널 (Hermes Discord 등) 의 channel/user/message_id/timestamp 메타.
# extract_meta (OCR 전용) 와 분리.
source_metadata: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
data_origin: Mapped[str | None] = mapped_column(
Enum("work", "external", name="data_origin")
)
# 용도 구분 (우선순위: 수동 수정 > 업로드 명시값 > AI 추론)
doc_purpose: Mapped[str | None] = mapped_column(
Enum("business", "knowledge", name="document_purpose")
)
title: Mapped[str | None] = mapped_column(Text)
# 카테고리 (1차 진입점 — UI 탭/라우트 분기)
# 7 활성: document / library / news / memo / audio / video / law
# 3 유보: mail / calendar / plex
category: Mapped[str | None] = mapped_column(
Enum("document", "library", "news", "memo", "audio", "video", "law",
"mail", "calendar", "plex",
name="doc_category", create_type=False)
)
# AI 가 제안했지만 미승인된 변경 후보 (category / path / doctype)
# /accept-suggestion 승인 시에만 category / user_tags 반영 (자동 전이 금지)
ai_suggestion: Mapped[dict | None] = mapped_column(JSONB)
# PR-B B-1: summary_triage (4B, 상시) / summary_deep (26B, 에스컬레이션) 분할 산출
ai_tldr: Mapped[str | None] = mapped_column(Text) # ≤60자 TL;DR
ai_bullets: Mapped[list | None] = mapped_column(JSONB) # 3~5개 핵심 bullets
ai_detail_summary: Mapped[str | None] = mapped_column(Text) # 26B 2~3문단
ai_inconsistencies: Mapped[list | None] = mapped_column(JSONB) # [{kind, desc}]
# 'triage' | 'deep' | NULL — 현재 문서가 어느 tier 까지 분석 완료됐는지
ai_analysis_tier: Mapped[str | None] = mapped_column(String(10))
# 비디오 썸네일 (§3) — ffmpeg 50% 지점 1장. PKM/Videos/.thumbs/{id}.jpg 절대경로.
thumbnail_path: Mapped[str | None] = mapped_column(Text)
# NAS 드롭된 mov/mkv/avi quarantine 플래그 (§3). true 면 재생 불가 안내만 표시.
needs_conversion: Mapped[bool] = mapped_column(Boolean, default=False, server_default="false")
# facet 탐색 축 (Phase 2)
facet_company: Mapped[str | None] = mapped_column(Text)
facet_topic: Mapped[str | None] = mapped_column(Text)
facet_year: Mapped[int | None] = mapped_column(Integer)
facet_doctype: Mapped[str | None] = mapped_column(Text)
# === Phase 1A canonical Markdown layer columns (migrations 211~219) ===
# plan: ~/.claude/plans/plan-idempotent-sundae.md
md_content: Mapped[str | None] = mapped_column(Text)
md_frontmatter: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
md_format_version: Mapped[str] = mapped_column(Text, nullable=False, default='1.0')
md_status: Mapped[str] = mapped_column(Text, nullable=False, default='pending')
md_extraction_engine: Mapped[str | None] = mapped_column(Text)
md_extraction_engine_version: Mapped[str | None] = mapped_column(Text)
md_extraction_quality: Mapped[dict | None] = mapped_column(JSONB)
md_extraction_error: Mapped[str | None] = mapped_column(Text)
md_content_hash: Mapped[str | None] = mapped_column(Text)
md_source_hash: Mapped[str | None] = mapped_column(Text)
md_generated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
content_origin: Mapped[str] = mapped_column(Text, nullable=False, default='extracted')
md_draft_status: Mapped[str | None] = mapped_column(Text)
# 타임스탬프
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
-42
View File
@@ -1,42 +0,0 @@
"""document_images ORM (Phase 1B.5) — marker 추출 이미지 메타.
저장: NAS `/documents/extracted_images/{document_id}/{image_key}.{ext}`
표시: GET /api/documents/{doc_id}/images/{image_key}/raw (인증 필요)
md_content ref `![alt](docimg:img_001)` 형식 image_key sequence 기반 결정적이라
재변환 idempotent.
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class DocumentImage(Base):
__tablename__ = "document_images"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
image_key: Mapped[str] = mapped_column(String(32), nullable=False)
relative_path: Mapped[str] = mapped_column(Text, nullable=False)
file_path: Mapped[str] = mapped_column(Text, nullable=False)
mime_type: Mapped[str] = mapped_column(Text, nullable=False)
file_size: Mapped[int] = mapped_column(BigInteger, nullable=False)
content_hash: Mapped[str] = mapped_column(String(64), nullable=False)
width: Mapped[int | None] = mapped_column(Integer)
height: Mapped[int | None] = mapped_column(Integer)
page_index: Mapped[int | None] = mapped_column(Integer)
alt_text: Mapped[str | None] = mapped_column(Text)
source_slug: Mapped[str | None] = mapped_column(Text)
extraction_engine: Mapped[str] = mapped_column(
String(32), nullable=False, default="marker"
)
extraction_engine_version: Mapped[str | None] = mapped_column(String(32))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
-44
View File
@@ -1,44 +0,0 @@
"""document_notes 테이블 ORM — 자료별 손글씨 노트 (자료 1:1).
설계:
- user×document UNIQUE 자료당 사용자별 캔버스.
- upsert 방식. PUT /api/documents/{id}/note strokes_json 전체 갱신.
- 회독 (document_reads, append-only log) 별개.
NOTE: documents user_id 부재 (single-user). document_notes.user_id
ownership. multi-user 전환 documents.user_id 추가 별도 check 필요.
"""
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, UniqueConstraint
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class DocumentNote(Base):
__tablename__ = "document_notes"
__table_args__ = (
UniqueConstraint("user_id", "document_id", name="document_notes_user_id_document_id_key"),
)
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
strokes_json: Mapped[dict[str, Any] | None] = mapped_column(JSONB)
canvas_width: Mapped[int | None] = mapped_column(Integer)
canvas_height: Mapped[int | None] = mapped_column(Integer)
schema_version: Mapped[int] = mapped_column(Integer, default=1, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-33
View File
@@ -1,33 +0,0 @@
"""document_reads 테이블 ORM — 자료실 회독 추적.
NOTE: documents 테이블에 user_id 컬럼이 없음 (single-user 가정).
회독 ownership document_reads.user_id 만으로 추적.
multi-user 전환 documents.user_id 추가 별도 ownership check 필요.
설계:
- append-only log. 회독 횟수 = COUNT(*), 마지막 시각 = MAX(read_at).
- 사용자 명시 행동 (버튼 클릭) 으로만 row insert. 자동 +1 금지.
- 같은 user/document 여러 row 허용 (회독 카운트 누적).
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class DocumentRead(Base):
__tablename__ = "document_reads"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
read_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
-43
View File
@@ -1,43 +0,0 @@
"""eid_review_set_draft ORM — 이드 복습세트 초안 (append-only 제안). migration 302.
워커가 약점 스냅샷에서 chronic/relapse 문항을 복습세트 초안으로 '제안' INSERT.
실제 편성(study_question_progress.due_at) 사용자 1클릭 T2 액션 draft 불변 제안 기록.
UPDATE/DELETE DB RULE 차단. 스탬프 actor·source_generated_at NOT NULL no-default.
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, String, func
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class EidReviewSetDraft(Base):
__tablename__ = "eid_review_set_draft"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE")
) # nullable = cross-topic 세트
question_ids: Mapped[list] = mapped_column(JSONB, nullable=False) # ordered list[int]
reason: Mapped[str] = mapped_column(String(40), nullable=False) # chronic|relapse|coverage|overdue
actor: Mapped[str] = mapped_column(String(20), nullable=False) # 스탬프
source_weakness_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("eid_study_weakness.id", ondelete="SET NULL")
)
source_generated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False
) # 스탬프
supersedes_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("eid_review_set_draft.id", ondelete="SET NULL")
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, server_default=func.now()
)
-51
View File
@@ -1,51 +0,0 @@
"""eid_study_weakness ORM — 이드 학습 약점 스냅샷 (append-only). migration 301.
워커(workers/study_weakness.py) INSERT, study_diagnosis 표면이 최신 active SELECT.
UPDATE/DELETE DB RULE(DO INSTEAD NOTHING) 차단 ORM mutate 시도도 no-op( 불변).
스탬프 actor·source_generated_at NOT NULL no-default 워커가 명시 제공(누락 INSERT 거부).
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import (
BigInteger,
Boolean,
DateTime,
ForeignKey,
Integer,
String,
func,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class EidStudyWeakness(Base):
__tablename__ = "eid_study_weakness"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
# [{topic_id, topic, chronic, relapsed, unsure, coverage_gap, overdue, trend, tier}]
weaknesses: Mapped[list] = mapped_column(JSONB, nullable=False)
# {avoidance_topics, session_abandon_rate, stale_due_count, skew_topics}
habit_signals: Mapped[dict] = mapped_column(JSONB, nullable=False)
trend_label: Mapped[str] = mapped_column(String(20), nullable=False)
sample_attempts: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
is_shallow_sample: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="active")
supersedes_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("eid_study_weakness.id", ondelete="SET NULL")
)
actor: Mapped[str] = mapped_column(String(20), nullable=False) # 스탬프(no default)
source_generated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False
) # 스탬프(no default)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), nullable=False, server_default=func.now()
)
-113
View File
@@ -1,113 +0,0 @@
"""events 1차 컨테이너 ORM (개인 운영 로그 / 일정 / 할 일 / 회고)
PR-1 (migrations 239~247) 본체. kind enum 으로 task/calendar_event/activity_log
변형을 통합 관리. memo_document_id 메모 link (optional).
"""
from datetime import datetime
from typing import Any
from sqlalchemy import (
BigInteger,
Boolean,
DateTime,
ForeignKey,
SmallInteger,
String,
Text,
)
from sqlalchemy.dialects.postgresql import ENUM as PgEnum
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
# Postgres enum 재선언 X (create_type=False) — migration 239~243 이 권위.
EventKindEnum = PgEnum(
"task",
"calendar_event",
"activity_log",
name="event_kind",
create_type=False,
)
EventStatusEnum = PgEnum(
"inbox",
"next",
"scheduled",
"in_progress",
"done",
"cancelled",
"deferred",
name="event_status",
create_type=False,
)
EventSourceEnum = PgEnum(
"manual",
"memo",
"email",
"chat",
"webhook",
"git_commit",
"claude_code",
name="event_source",
create_type=False,
)
EventActorEnum = PgEnum(
"manual",
"eid",
"email_ingest",
"system",
name="event_actor",
create_type=False,
)
class Event(Base):
__tablename__ = "events"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
title: Mapped[str] = mapped_column(Text, nullable=False)
description: Mapped[str | None] = mapped_column(Text)
kind: Mapped[str] = mapped_column(EventKindEnum, nullable=False)
status: Mapped[str] = mapped_column(EventStatusEnum, nullable=False, default="inbox")
# 시간 필드 — kind 별 의미가 다름 (CHECK 제약은 migration 244)
due_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
start_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
end_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
ended_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
all_day: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
timezone: Mapped[str | None] = mapped_column(Text)
# lifecycle
defer_until: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
cancelled_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
priority: Mapped[int | None] = mapped_column(SmallInteger)
project_tag: Mapped[str | None] = mapped_column(String(64))
tags: Mapped[list[Any]] = mapped_column(JSONB, nullable=False, default=list)
# 출처 / 외부 식별자
source: Mapped[str] = mapped_column(EventSourceEnum, nullable=False, default="manual")
source_ref: Mapped[str | None] = mapped_column(Text)
raw_metadata: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False, default=dict)
# 메모 link (optional, ON DELETE SET NULL)
memo_document_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="SET NULL")
)
# 인증 / actor
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id"), nullable=False
)
created_by: Mapped[str] = mapped_column(EventActorEnum, nullable=False, default="manual")
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-43
View File
@@ -1,43 +0,0 @@
"""events_history ORM — events 의 lifecycle 변경 이력 (append-only).
PR-1 (migrations 248~249). FK ON DELETE RESTRICT 부모 events row 직접 삭제 차단
(feedback_history_table_fk_restrict.md 이력은 시점 사실).
"""
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import ENUM as PgEnum
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
from models.event import EventActorEnum
HistoryChangeKindEnum = PgEnum(
"create",
"reschedule",
"defer",
"reactivate",
"complete",
"cancel",
name="history_change_kind",
create_type=False,
)
class EventHistory(Base):
__tablename__ = "events_history"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
event_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("events.id", ondelete="RESTRICT"), nullable=False
)
changed_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
changed_by: Mapped[str] = mapped_column(EventActorEnum, nullable=False)
change_kind: Mapped[str] = mapped_column(HistoryChangeKindEnum, nullable=False)
before: Mapped[dict[str, Any] | None] = mapped_column(JSONB)
after: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False)
-20
View File
@@ -1,20 +0,0 @@
"""facet_values 테이블 ORM — facet 축별 허용값 사전"""
from datetime import datetime
from sqlalchemy import BigInteger, Boolean, DateTime, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class FacetValue(Base):
__tablename__ = "facet_values"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
facet_type: Mapped[str] = mapped_column(Text, nullable=False) # company, topic, doctype
value: Mapped[str] = mapped_column(Text, nullable=False)
is_system: Mapped[bool] = mapped_column(Boolean, default=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
+4 -47
View File
@@ -2,9 +2,7 @@
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, Enum, ForeignKey, SmallInteger, Text, text
from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import BigInteger, DateTime, Enum, ForeignKey, SmallInteger, Text, UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
@@ -16,16 +14,7 @@ class ProcessingQueue(Base):
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
document_id: Mapped[int] = mapped_column(BigInteger, ForeignKey("documents.id"), nullable=False)
stage: Mapped[str] = mapped_column(
# 'stt' (audio): migration 150 / 'thumbnail' (video): queue_consumer 가 enqueue.
# 'deep_summary' (PR-B B-1): classify_worker 가 에스컬레이션 시 enqueue.
# DB enum 변경은 마이그레이션이 처리하므로 create_type=False.
Enum(
"extract", "classify", "summarize", "embed", "chunk", "preview",
"stt", "thumbnail", "deep_summary", "markdown",
name="process_stage",
create_type=False,
),
nullable=False,
Enum("extract", "classify", "summarize", "embed", "chunk", "preview", name="process_stage"), nullable=False
)
status: Mapped[str] = mapped_column(
Enum("pending", "processing", "completed", "failed", name="process_status"),
@@ -34,44 +23,12 @@ class ProcessingQueue(Base):
attempts: Mapped[int] = mapped_column(SmallInteger, default=0)
max_attempts: Mapped[int] = mapped_column(SmallInteger, default=3)
error_message: Mapped[str | None] = mapped_column(Text)
# B-1: deep_summary stage 가 EscalationEnvelope 를 payload 로 싣는다. 다른 stage 는 NULL.
payload: Mapped[dict | None] = mapped_column(JSONB)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# DB 제약은 partial unique index uq_queue_active로 관리 (migration 117)
async def enqueue_stage(
session: AsyncSession,
document_id: int,
stage: str,
*,
status: str = "pending",
payload: dict | None = None,
) -> bool:
"""ProcessingQueue에 행 추가 (DB 레벨 중복 방어).
같은 (document_id, stage) 활성 (pending/processing) 이미 있으면
아무것도 하지 않고 False 반환.
B-1: payload 옵션으로 deep_summary EscalationEnvelope JSON 실을 있다.
같은 문서 deep_summary 재제안될 경우 on_conflict_do_nothing 으로 기존 payload
유지 (최초 envelope 원본). 이후 재처리 재분석은 classify 트리거.
"""
values: dict = {"document_id": document_id, "stage": stage, "status": status}
if payload is not None:
values["payload"] = payload
stmt = (
pg_insert(ProcessingQueue)
.values(**values)
.on_conflict_do_nothing(
index_elements=["document_id", "stage"],
index_where=text("status IN ('pending', 'processing')"),
)
__table_args__ = (
UniqueConstraint("document_id", "stage", "status"),
)
result = await session.execute(stmt)
return result.rowcount > 0
-49
View File
@@ -1,49 +0,0 @@
"""chunk_section_analysis 테이블 ORM (PR-DocSrv-Hier-Section-Summary-1).
per-(hier_section is_leaf) Mac mini 분석 결과 저장. document_chunks(retrieval-hot)
분리된 -레벨 분석 . migration 286 에서 테이블 생성.
pilot 단계(scripts/section_summary_pilot.py) `./scripts` mount rebuild 없이
돌지만, 모델은 `app/` 이라 baked pilot script 모델을 import 하지 않고
raw SQL 쓴다. 모델은 (1) 스키마 문서화 (2) 향후 상시 worker 배선( PR, image
rebuild 동반) 용도. 컬럼 정의는 migration 286 단일 진실로 동기 유지.
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, Float, ForeignKey, Text, text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class ChunkSectionAnalysis(Base):
__tablename__ = "chunk_section_analysis"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
# FK CASCADE — document_chunks 에 종속된 분석 데이터(1:1). parent_id(self-FK, app-level)와 의도적 차이.
chunk_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("document_chunks.id", ondelete="CASCADE"), nullable=False
)
# summarized | skipped_tiny | failed — skip 도 행으로 박제(미처리 vs 의도 skip 구분)
status: Mapped[str] = mapped_column(Text, nullable=False)
summary: Mapped[str | None] = mapped_column(Text)
# 절-전용 역할 enum (느슨한 text, CHECK 미설정 — pilot 관찰 후 조임).
# definition/requirement/procedure/formula/data_table/example/case_study/question/reference/overview/other
section_type: Mapped[str | None] = mapped_column(Text)
# doc-level taxonomy path(documents.ai_domain) 상속 스냅샷.
domain: Mapped[str | None] = mapped_column(Text)
confidence: Mapped[float | None] = mapped_column(Float)
model: Mapped[str | None] = mapped_column(Text)
prompt_version: Mapped[str] = mapped_column(Text, nullable=False)
# 분석 시점 leaf chunk_content_hash 스냅샷 — 원문 변경(재분해) stale 탐지.
source_content_hash: Mapped[str | None] = mapped_column(Text)
error: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=text("now()"), nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=text("now()"), nullable=False
)
# UNIQUE(chunk_id, prompt_version) 는 migration 286 에 정의 (ORM 미반영 — 조회/upsert 는 raw SQL).
-235
View File
@@ -1,235 +0,0 @@
"""study_memo_cards / study_memo_card_evidence ORM (공부 암기노트 Phase 1).
study_questions(MCQ) 별개로, 풀이/근거에서 추출한 암기 플래시카드 본체.
- source_kind: question(P1) / subject_note / document(P3 예약)
- format: qa(cue->fact) / cloze(빈칸). 강한 enum 미사용 (read-time 매핑).
- source_generated_at: 추출 당시 ai_explanation_generated_at 버전 /stale 판정.
- needs_review DEFAULT true: 생성물이라 검토 대기로 입고.
dedup_hash PARTIAL UNIQUE(migration 288, WHERE deleted_at IS NULL) 중복 최종 방어선.
정정/삭제 supersede(구버전 카드 deleted_at 마킹) stale 잔류 0 append 전에 호출해
살아있는 구카드가 추출을 ON CONFLICT 막지 않게 한다.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any, Sequence
from sqlalchemy import (
BigInteger,
Boolean,
DateTime,
ForeignKey,
Integer,
String,
Text,
func,
text,
update,
)
from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyMemoCard(Base):
__tablename__ = "study_memo_cards"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
source_kind: Mapped[str] = mapped_column(String(40), nullable=False)
source_question_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_questions.id", ondelete="CASCADE")
)
source_subject_note_id: Mapped[int | None] = mapped_column(BigInteger)
format: Mapped[str] = mapped_column(String(20), nullable=False)
cue: Mapped[str] = mapped_column(Text, nullable=False)
fact: Mapped[str] = mapped_column(Text, nullable=False)
cloze_text: Mapped[str | None] = mapped_column(Text)
extra: Mapped[dict | None] = mapped_column(JSONB)
source_generated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
dedup_hash: Mapped[str] = mapped_column(String(64), nullable=False)
needs_review: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
flagged_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
flagged_by: Mapped[str | None] = mapped_column(String(40))
model: Mapped[str | None] = mapped_column(String(120))
generated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# '그냥 공부'(cram) 봤다 기록 (SR 무관, migration 300)
view_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
last_viewed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
deleted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
class StudyMemoCardEvidence(Base):
"""append-only citation. UPDATE/DELETE 없음."""
__tablename__ = "study_memo_card_evidence"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
card_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_memo_cards.id", ondelete="CASCADE"), nullable=False
)
source_type: Mapped[str] = mapped_column(String(40), nullable=False)
source_id: Mapped[int | None] = mapped_column(BigInteger)
chunk_index: Mapped[int | None] = mapped_column(Integer)
snippet: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
async def supersede_old_cards(
session: AsyncSession,
*,
source_question_id: int,
keep_generated_at: datetime | None,
) -> int:
"""같은 문제의 '다른 버전' 카드를 deleted_at 마킹(retire).
source_generated_at 카드 적재 '전에' 호출 살아있는 구버전 카드가 dedup PARTIAL
UNIQUE 추출을 막는 것을 방지(정정- stale 잔류 0). 같은 버전은 보존.
Returns: retire .
"""
stmt = (
update(StudyMemoCard)
.where(
StudyMemoCard.source_question_id == source_question_id,
StudyMemoCard.deleted_at.is_(None),
StudyMemoCard.source_generated_at.is_distinct_from(keep_generated_at),
)
.values(deleted_at=func.now())
)
result = await session.execute(stmt)
return result.rowcount or 0
async def append_card(
session: AsyncSession,
*,
user_id: int,
study_topic_id: int,
source_kind: str,
source_question_id: int | None,
format: str,
cue: str,
fact: str,
cloze_text: str | None,
dedup_hash: str,
source_generated_at: datetime | None,
model: str | None,
generated_at: datetime | None,
needs_review: bool = True,
) -> int | None:
"""카드 1장 INSERT. dedup_hash PARTIAL UNIQUE 충돌 시 None (DO NOTHING).
Returns: card.id, 또는 중복으로 건너뛰면 None.
"""
stmt = (
pg_insert(StudyMemoCard)
.values(
user_id=user_id,
study_topic_id=study_topic_id,
source_kind=source_kind,
source_question_id=source_question_id,
format=format,
cue=cue,
fact=fact,
cloze_text=cloze_text,
dedup_hash=dedup_hash,
source_generated_at=source_generated_at,
needs_review=needs_review,
model=model,
generated_at=generated_at,
)
.on_conflict_do_nothing(
index_elements=["dedup_hash"],
index_where=text("deleted_at IS NULL"),
)
.returning(StudyMemoCard.id)
)
result = await session.execute(stmt)
return result.scalar_one_or_none()
async def append_card_evidence(
session: AsyncSession,
*,
card_id: int,
refs: Sequence[dict[str, Any]],
) -> int:
"""카드 인용 append-only INSERT. refs: [{source_type, source_id?, chunk_index?, snippet?}]."""
rows = [
{
"card_id": card_id,
"source_type": r.get("source_type") or "unknown",
"source_id": r.get("source_id"),
"chunk_index": r.get("chunk_index"),
"snippet": r.get("snippet"),
}
for r in refs
]
if not rows:
return 0
await session.execute(pg_insert(StudyMemoCardEvidence).values(rows))
return len(rows)
async def record_card_view(
session: AsyncSession, *, user_id: int, card_id: int
) -> bool:
"""'그냥 공부'(cram) 봤다 기록 — view_count++ + last_viewed_at. SR(progress) 무관.
needs_review 무관(검수 카드도 가볍게 둘러볼 있음), 본인·미삭제 카드만.
Returns: 기록됨 여부.
"""
stmt = (
update(StudyMemoCard)
.where(
StudyMemoCard.id == card_id,
StudyMemoCard.user_id == user_id,
StudyMemoCard.deleted_at.is_(None),
)
.values(view_count=StudyMemoCard.view_count + 1, last_viewed_at=func.now())
)
result = await session.execute(stmt)
return (result.rowcount or 0) > 0
async def flag_cards_for_source(
session: AsyncSession,
*,
source_question_id: int,
reason: str,
) -> int:
"""소스 문제 정정/삭제 시 파생 카드를 needs_review=auto 마킹(임시 플래그).
최종 stale 정리는 워커 supersede 책임 이건 사용자 가시화용 즉시 플래그.
reason: 'source_changed' | 'source_deleted'.
Returns: 마킹된 .
"""
stmt = (
update(StudyMemoCard)
.where(
StudyMemoCard.source_question_id == source_question_id,
StudyMemoCard.deleted_at.is_(None),
)
.values(needs_review=True, flagged_by=reason, flagged_at=func.now())
)
result = await session.execute(stmt)
return result.rowcount or 0
-92
View File
@@ -1,92 +0,0 @@
"""study_memo_card_jobs ORM — card_extract 비동기 작업 큐 (다형 소스).
231_study_question_jobs 복제 + source_kind/source_id/source_version(=ai_explanation_generated_at).
별도 테이블 + 별도 consumer(study_memo_card_jobs_consumer.py) 기존 study_queue_consumer 격리.
error_code 권장값:
- parse_fail / llm_timeout / unknown 재시도 대상 (attempts < max_attempts)
- all_dropped 0 생성. completed 종결해 같은 버전 재추출 차단.
- no_ready_explanation ai_explanation 미준비(race). skipped, 비재시도.
멱등 이중구조: active partial unique(migration 292) 동시 active 1행만,
버전 멱등(같은 source_version 재추출 차단) 폴러의 NOT EXISTS(source_version) 책임.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey, SmallInteger, String, Text, text
from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyMemoCardJob(Base):
__tablename__ = "study_memo_card_jobs"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
source_kind: Mapped[str] = mapped_column(String(40), nullable=False)
source_id: Mapped[int] = mapped_column(BigInteger, nullable=False)
source_version: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
kind: Mapped[str] = mapped_column(String(40), nullable=False)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=0)
max_attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=2)
error_code: Mapped[str | None] = mapped_column(String(40))
error_message: Mapped[str | None] = mapped_column(Text)
payload: Mapped[dict | None] = mapped_column(JSONB)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# active partial unique idx (source_kind, source_id) WHERE active 는 migration 292.
async def enqueue_study_memo_card_job(
session: AsyncSession,
*,
user_id: int,
source_kind: str,
source_id: int,
source_version: datetime | None,
kind: str = "card_extract",
payload: dict[str, Any] | None = None,
) -> bool:
"""study_memo_card_jobs 에 행 추가 (DB 레벨 동시 active 중복 방어).
같은 (source_kind, source_id) 활성 (pending/processing) 있으면 False.
버전 멱등(같은 source_version 재추출 차단) 호출 폴러의 NOT EXISTS 선판단.
Returns: True = enqueue, False = active 중복으로 건너뜀.
"""
values: dict[str, Any] = {
"user_id": user_id,
"source_kind": source_kind,
"source_id": source_id,
"source_version": source_version,
"kind": kind,
"status": "pending",
}
if payload is not None:
values["payload"] = payload
stmt = (
pg_insert(StudyMemoCardJob)
.values(**values)
.on_conflict_do_nothing(
index_elements=["source_kind", "source_id"],
index_where=text("status IN ('pending', 'processing')"),
)
)
result = await session.execute(stmt)
return result.rowcount > 0
-88
View File
@@ -1,88 +0,0 @@
"""study_memo_card_progress ORM — 카드 SR(간격반복) 상태 (문제 progress '분리 미러').
migration 294. 226 골격 축소: SR 4컬럼(last_outcome/last_reviewed_at/due_at/review_stage),
pattern 분류 컬럼은 미보유(카드 복습함은 due/미확인/완료 3). UNIQUE(user_id, card_id).
간격 산술은 sr_schedule.py 단일 source.
입고 정책(결정 2026-06-07): '평가 즉시 자동 입고' 애매/모름 카드는 평가 즉시 due 부여
(문제 SR의 [학습완료] 수동 게이트와 달리 자동). (correct) 카드는 due 박음( 폭발 방지).
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, SmallInteger, String, UniqueConstraint, select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
from models.study_memo_card import StudyMemoCard
from services.study import sr_schedule
class StudyMemoCardProgress(Base):
__tablename__ = "study_memo_card_progress"
__table_args__ = (UniqueConstraint("user_id", "card_id", name="uq_card_progress_user_card"),)
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
card_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_memo_cards.id", ondelete="CASCADE"), nullable=False
)
last_outcome: Mapped[str | None] = mapped_column(String(20))
last_reviewed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
due_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
review_stage: Mapped[int | None] = mapped_column(SmallInteger)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
async def rate_card(
session: AsyncSession, *, card: StudyMemoCard, outcome: str, now: datetime
) -> StudyMemoCardProgress:
"""카드 자기평가 1건 처리 (SR 즉시 자동 입고). outcome ∈ correct/wrong/unsure.
- progress 없으면 생성. last_outcome/last_reviewed_at 갱신.
- 이미 due(복습 ) sr_schedule.advance(전진/리셋/졸업).
- due 없으면 애매/모름만 first_due 부여(즉시 입고), 암은 due 박음.
caller commit.
"""
progress = (
await session.execute(
select(StudyMemoCardProgress).where(
StudyMemoCardProgress.user_id == card.user_id,
StudyMemoCardProgress.card_id == card.id,
)
)
).scalar_one_or_none()
if progress is None:
progress = StudyMemoCardProgress(
user_id=card.user_id, study_topic_id=card.study_topic_id, card_id=card.id
)
session.add(progress)
progress.last_outcome = outcome
progress.last_reviewed_at = now
if progress.due_at is not None:
result = sr_schedule.advance(progress.review_stage, outcome, now)
if result is not None: # skipped 는 None → 불변
progress.review_stage, progress.due_at = result
elif outcome in ("wrong", "unsure"):
# 즉시 자동 입고: 애매·모름은 평가 즉시 복습 큐로 (stage0 + 내일)
progress.review_stage, progress.due_at = sr_schedule.first_due(now)
# outcome == 'correct' 이고 due 없음 → due 안 박음(큐 폭발 방지)
return progress
-140
View File
@@ -1,140 +0,0 @@
"""study_questions / study_question_attempts ORM — 학습 워크스페이스의 문제은행 트랙
PR-2 가드레일:
- study_topic 1 컨테이너에 자산 타입별 조인 테이블 추가 방식. polymorphic 단일 테이블 영구 금지.
- subject/scope 강한 enum 미사용 (jlpt 어학 분류 확장 여지).
- 문제 삭제는 API 에서 soft delete only. attempts FK ON DELETE RESTRICT DB 레벨 보호 (hard delete 실수 차단, 이력 보존).
- correct_choice 변경 기존 attempt.is_correct 재계산 (기록은 시점의 사실).
"""
from datetime import datetime
from pgvector.sqlalchemy import Vector
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Integer, SmallInteger, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class StudyQuestion(Base):
__tablename__ = "study_questions"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
question_text: Mapped[str] = mapped_column(Text, nullable=False)
choice_1: Mapped[str] = mapped_column(Text, nullable=False)
choice_2: Mapped[str] = mapped_column(Text, nullable=False)
choice_3: Mapped[str] = mapped_column(Text, nullable=False)
choice_4: Mapped[str] = mapped_column(Text, nullable=False)
correct_choice: Mapped[int] = mapped_column(SmallInteger, nullable=False)
subject: Mapped[str | None] = mapped_column(String(120))
scope: Mapped[str | None] = mapped_column(String(200))
exam_name: Mapped[str | None] = mapped_column(String(120))
exam_round: Mapped[str | None] = mapped_column(String(120))
explanation: Mapped[str | None] = mapped_column(Text)
source_note: Mapped[str | None] = mapped_column(Text)
is_active: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
# PR-6: 회차 안 문항 번호 (1~exam_round_size). NULL 허용 — 기존 행 + 회차 미설정 입력
exam_question_number: Mapped[int | None] = mapped_column(SmallInteger)
# PR-3: AI 풀이 캐시 (수동 트리거)
# status: none | pending | ready | failed | stale (강한 enum 미사용, VARCHAR 권장값)
ai_explanation: Mapped[str | None] = mapped_column(Text)
ai_explanation_status: Mapped[str] = mapped_column(
String(20), default="none", nullable=False
)
ai_explanation_generated_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True)
)
ai_explanation_model: Mapped[str | None] = mapped_column(String(120))
# PR-4: 자동 임베딩 (bge-m3 1024차원). status 가 큐 역할.
# 재계산 트리거 = question_text / choice_1~4 변경.
# correct_choice / subject / scope / explanation 변경은 재계산 안 함.
embedding = mapped_column(Vector(1024), nullable=True)
embedding_status: Mapped[str] = mapped_column(
String(20), default="none", nullable=False
)
embedding_updated_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True)
)
embedding_model: Mapped[str | None] = mapped_column(String(120))
# PR-12-A 후속: related-types 영속 캐시. 임베딩 ready 워커가 채우고,
# 같은 토픽 다른 문제 ready 시 related_computed_at=NULL 마킹 → 다음 cron 재계산.
related_repeat: Mapped[list | None] = mapped_column(JSONB)
related_similar: Mapped[list | None] = mapped_column(JSONB)
related_repeat_round_count: Mapped[int | None] = mapped_column(Integer)
related_similar_round_count: Mapped[int | None] = mapped_column(Integer)
related_repeat_grade: Mapped[str | None] = mapped_column(String(50))
related_computed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
related_threshold_version: Mapped[str | None] = mapped_column(String(20))
# 공부 암기노트 Phase 1: 검수 대기 플래그 (DDL=migration 296). 정정/삭제 훅 + needs_review 큐가 set/clear.
# flagged_by 권장값: 'user' / 'source_changed' / 'source_deleted' (서버측 상수, read-time 매핑).
needs_review: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
flagged_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
flagged_by: Mapped[str | None] = mapped_column(String(40))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
deleted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# 연관 — 통합 뷰/통계 조회 시 selectinload 으로 끌어옴
topic: Mapped["StudyTopic | None"] = relationship( # type: ignore[name-defined] # noqa: F821
"StudyTopic", back_populates="questions", lazy="noload"
)
attempts: Mapped[list["StudyQuestionAttempt"]] = relationship(
back_populates="question",
cascade="all, delete-orphan", # ORM 레벨 cascade — 실 hard delete 는 RESTRICT FK 가 막음
order_by="StudyQuestionAttempt.answered_at.desc()",
lazy="noload",
)
class StudyQuestionAttempt(Base):
__tablename__ = "study_question_attempts"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_question_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("study_questions.id", ondelete="RESTRICT"),
nullable=False,
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
# PR-9: selected_choice 는 NULL 허용 (unsure 케이스). is_correct 는 false 로 박힘.
selected_choice: Mapped[int | None] = mapped_column(SmallInteger, nullable=True)
correct_choice: Mapped[int] = mapped_column(SmallInteger, nullable=False)
is_correct: Mapped[bool] = mapped_column(Boolean, nullable=False)
# PR-9: outcome 권장값 (correct/wrong/unsure). 강한 enum 미사용.
outcome: Mapped[str] = mapped_column(String(20), nullable=False)
answered_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
# PR-10: 어떤 quiz 세션의 attempt 인지 (NULL = 세션 외 직접 입력 또는 세션 삭제됨).
quiz_session_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_quiz_sessions.id", ondelete="SET NULL"), nullable=True
)
# PR-10: 결과 카드에서 "학습완료" 체크 시 박힘. NULL = 미확인.
reviewed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
question: Mapped["StudyQuestion"] = relationship(back_populates="attempts")
-31
View File
@@ -1,31 +0,0 @@
"""study_question_images ORM (PR-8) — 문제별 첨부 이미지.
저장: NAS /documents/study_question_images/{topic_id}/{qid}/{img_id}.{ext}
표시: GET /api/study-questions/{qid}/images/{img_id}/raw (인증 필요)
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuestionImage(Base):
__tablename__ = "study_question_images"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_question_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_questions.id", ondelete="CASCADE"), nullable=False
)
file_path: Mapped[str] = mapped_column(Text, nullable=False)
file_size: Mapped[int] = mapped_column(BigInteger, nullable=False)
mime_type: Mapped[str] = mapped_column(String(80), nullable=False)
sort_order: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
-87
View File
@@ -1,87 +0,0 @@
"""study_question_jobs ORM (Phase 4-A) — study 도메인 전용 비동기 작업 큐.
processing_queue documents.id FK study_questions 직접 재사용 불가.
별도 테이블 + 별도 consumer (study_queue_consumer.py).
kind 권장값:
- 'explanation' (Phase 4-A): wrong/unsure 문제의 AI 풀이 prefetch
- 'session_summary' (Phase 4-B 예약): 세션 단위 종합 분석. session_summary question
단위에 얹기 어색해 Phase 4-B 구현 study_quiz_session_jobs 별도 분리 검토.
terminal status (completed/failed/skipped) completed_at 항상 기록.
failed 재시도는 기존 row pending 으로 되살리지 않고 row 생성 이력 누적.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey, SmallInteger, String, Text, text
from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuestionJob(Base):
__tablename__ = "study_question_jobs"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
study_question_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_questions.id", ondelete="CASCADE"), nullable=False
)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
kind: Mapped[str] = mapped_column(String(40), nullable=False)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=0)
max_attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=2)
error_code: Mapped[str | None] = mapped_column(String(40))
error_message: Mapped[str | None] = mapped_column(Text)
payload: Mapped[dict | None] = mapped_column(JSONB)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# active partial unique idx 는 migration 232 가 관리.
async def enqueue_study_question_job(
session: AsyncSession,
*,
study_question_id: int,
user_id: int,
kind: str,
payload: dict[str, Any] | None = None,
) -> bool:
"""study_question_jobs 에 행 추가 (DB 레벨 중복 방어).
같은 (study_question_id, kind) 활성 (pending/processing) 이미 있으면
아무것도 하지 않고 False 반환. terminal 이력은 별도 row 누적되므로 이번 호출이
failed/skipped/completed row 무관하게 active 행을 만들 있다.
Returns: True = enqueue 발생, False = 중복으로 건너뜀.
"""
values: dict[str, Any] = {
"study_question_id": study_question_id,
"user_id": user_id,
"kind": kind,
"status": "pending",
}
if payload is not None:
values["payload"] = payload
stmt = (
pg_insert(StudyQuestionJob)
.values(**values)
.on_conflict_do_nothing(
index_elements=["study_question_id", "kind"],
index_where=text("status IN ('pending', 'processing')"),
)
)
result = await session.execute(stmt)
return result.rowcount > 0
-73
View File
@@ -1,73 +0,0 @@
"""study_question_progress — 사용자 × 토픽 × 문제 단위 현재 상태 캐시 (Phase 1).
attempts (append-only 원본 로그) 분리. 박힌 attempts 절대 update .
progress 마지막 시도 / 사용자 검토 / 복습 / 패턴 분류 derived 4 차원 메타.
세션 종료 finalize 다음 갱신:
- last_outcome / last_attempted_at / last_attempt_id
- pattern_state / pattern_updated_at / pattern_window_attempts
- (이미 due_at 박힌 행만) review_stage / due_at 복습 stage 갱신
review-complete 다음 갱신:
- last_reviewed_at
- (wrong/unsure 경우) due_at 최초 부여
study_question_id 단일 topic 소속 전제 (현재 가스기사 토픽 4 단일 운영). 향후 question
재사용/N:M 가능성 대비 unique 키는 (user_id, study_topic_id, study_question_id) 3 .
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, SmallInteger, String, UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuestionProgress(Base):
__tablename__ = "study_question_progress"
__table_args__ = (
UniqueConstraint(
"user_id", "study_topic_id", "study_question_id",
name="uq_progress_user_topic_question",
),
)
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
study_question_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_questions.id", ondelete="RESTRICT"), nullable=False
)
# 마지막 시도 요약
last_outcome: Mapped[str | None] = mapped_column(String(20))
last_attempted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
last_attempt_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_question_attempts.id", ondelete="SET NULL")
)
# 사용자 검토 상태
last_reviewed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# 복습 큐
due_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
review_stage: Mapped[int | None] = mapped_column(SmallInteger)
# 패턴 분류 (derived)
pattern_state: Mapped[str | None] = mapped_column(String(30))
pattern_updated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
pattern_window_attempts: Mapped[int | None] = mapped_column(SmallInteger)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-58
View File
@@ -1,58 +0,0 @@
"""study_quiz_sessions ORM (PR-10) — 문제풀이 세션 기록 + 이어풀기.
토픽의 회차 풀이 = . question_ids 출제 순서 스냅샷.
status: in_progress / done / abandoned (강한 enum 미사용 VARCHAR 권장값).
토픽당 in_progress 1 강제는 partial unique idx (마이그레이션 207).
"""
from datetime import datetime
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, Integer, String
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuizSession(Base):
__tablename__ = "study_quiz_sessions"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
target_per_subject: Mapped[int] = mapped_column(Integer, nullable=False, default=20)
subject_filter: Mapped[str | None] = mapped_column(String(120))
wrong_only: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
# PR-12-B: 출제 모드. 권장값 = random (1차) / frequent_focus / wrong_variants (예약).
quiz_mode: Mapped[str] = mapped_column(String(30), nullable=False, default="random")
# 출제 순서 스냅샷 — list[int] (question id). 출제 후 변경 안 됨.
question_ids: Mapped[list] = mapped_column(JSONB, nullable=False)
# {subject: count} 분포. 결과 카드 통계 표시용.
subject_distribution: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="in_progress")
cursor: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
correct_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
wrong_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
unsure_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
# Phase 2-B: finalize 결과 요약 스냅샷. 세션 종료 시점에 박혀 결과 화면 헤더에 노출.
newly_correct_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
relapsed_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
recovered_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
chronic_remaining_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-35
View File
@@ -1,35 +0,0 @@
"""study_quiz_session_analysis ORM (Phase 4-B v1) — 세션 단위 분석 결과 캐시.
session_id PK 세션 = 분석 결과. worker ON CONFLICT DO UPDATE UPSERT.
job 이력은 study_quiz_session_jobs 별도 누적, 결과 캐시는 1 row.
is_stale=TRUE [재생성] 클릭 worker 처리 끝까지만.
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, Boolean, DateTime, ForeignKey, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuizSessionAnalysis(Base):
__tablename__ = "study_quiz_session_analysis"
study_quiz_session_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("study_quiz_sessions.id", ondelete="CASCADE"),
primary_key=True,
)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
summary_md: Mapped[str] = mapped_column(Text, nullable=False)
confidence: Mapped[str | None] = mapped_column(String(10))
model_name: Mapped[str | None] = mapped_column(String(120))
generated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
is_stale: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
-80
View File
@@ -1,80 +0,0 @@
"""study_quiz_session_jobs ORM (Phase 4-B v1) — 세션 단위 분석 작업 큐.
study_question_jobs 분리 FK 단일 의미 (study_quiz_session_id NOT NULL)
+ 운영 SQL 명확성 + 4-A/4-B 가드/재시도 정책 차이.
terminal status (completed/failed/skipped) completed_at 항상 기록.
재시도는 기존 row pending 으로 되살리지 않고 row 생성 이력 누적.
v1 단일 작업 종류 ('analysis') kind 컬럼 없이 session_id .
"""
from __future__ import annotations
from datetime import datetime
from typing import Any
from sqlalchemy import BigInteger, DateTime, ForeignKey, SmallInteger, String, Text, text
from sqlalchemy.dialects.postgresql import JSONB, insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyQuizSessionJob(Base):
__tablename__ = "study_quiz_session_jobs"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
study_quiz_session_id: Mapped[int] = mapped_column(
BigInteger,
ForeignKey("study_quiz_sessions.id", ondelete="CASCADE"),
nullable=False,
)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=0)
max_attempts: Mapped[int] = mapped_column(SmallInteger, nullable=False, default=2)
error_code: Mapped[str | None] = mapped_column(String(40))
error_message: Mapped[str | None] = mapped_column(Text)
payload: Mapped[dict | None] = mapped_column(JSONB)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
started_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
async def enqueue_session_analysis_job(
session: AsyncSession,
*,
study_quiz_session_id: int,
user_id: int,
payload: dict[str, Any] | None = None,
) -> bool:
"""study_quiz_session_jobs 에 row 추가 (DB 레벨 중복 방어).
같은 session_id 활성 (pending/processing) 이미 있으면 False 반환.
terminal 이력은 별도 row 누적되므로 이번 호출이 failed/skipped/completed row
무관하게 active 행을 만들 있다.
Returns: True = enqueue 발생, False = 중복으로 건너뜀.
"""
values: dict[str, Any] = {
"study_quiz_session_id": study_quiz_session_id,
"user_id": user_id,
"status": "pending",
}
if payload is not None:
values["payload"] = payload
stmt = (
pg_insert(StudyQuizSessionJob)
.values(**values)
.on_conflict_do_nothing(
index_elements=["study_quiz_session_id"],
index_where=text("status IN ('pending', 'processing')"),
)
)
result = await session.execute(stmt)
return result.rowcount > 0
-37
View File
@@ -1,37 +0,0 @@
"""study_reminders ORM — 알람 재료 append-only (공부 암기노트 Phase 1).
study_reminder cron(09/13/19 KST) focus 토픽 due 요약을 1 INSERT, GET /reminders/latest
읽는다. UPDATE/DELETE 없음. fired_at 시간 슬롯으로 truncate 해서 UNIQUE(user, fired_at)
멱등(on_conflict_do_nothing) 성립시킨다(raw now() 마이크로초면 멱등 무효).
study_topic_id nullable(전체 집계 행은 NULL) + ON DELETE SET NULL(이력 보존).
"""
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyReminder(Base):
__tablename__ = "study_reminders"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="SET NULL")
)
due_count: Mapped[int | None] = mapped_column(Integer)
focus_topic_names: Mapped[list | None] = mapped_column(JSONB)
fired_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
# active partial unique 없음 — UNIQUE(user_id, fired_at) 는 migration 298 inline constraint.
-144
View File
@@ -1,144 +0,0 @@
"""study_sessions / study_session_assets 테이블 ORM — Phase 1 MVP
목적: iPad 손글씨 학습 세션 (자격증 + 어학) + 모바일 암기노트/퀴즈를 위한 일반 학습 세션.
설계 원칙:
- study_type 으로 certification / language 분기. metadata jsonb 도메인별 자유 메타.
- 단일 audio_document_id / video_document_id / source_document_id / handwriting_document_id
컬럼 만들지 . 모든 미디어 연결은 study_session_assets 통일.
- documents 본체는 절대 삭제하지 않음. assets cascade sessions 또는 documents 삭제 .
- Phase 1 미사용 필드 (review_state / quiz / ocr / ai_summary / prompt) NULL 허용,
자동 로직은 Phase 2~4 에서 별도 PR 활성.
"""
from datetime import datetime
from typing import Any
from sqlalchemy import (
BigInteger,
DateTime,
ForeignKey,
Integer,
String,
Text,
UniqueConstraint,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class StudySession(Base):
__tablename__ = "study_sessions"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
# 도메인 분기: 'certification' | 'language'
study_type: Mapped[str] = mapped_column(
String(30), default="certification", nullable=False
)
# 자격증/어학 메타
certification: Mapped[str | None] = mapped_column(String(120))
language_code: Mapped[str | None] = mapped_column(String(20))
learning_level: Mapped[str | None] = mapped_column(String(80))
# 공통 과목/주제
subject: Mapped[str | None] = mapped_column(String(120))
topic: Mapped[str | None] = mapped_column(String(200))
# 원문 텍스트 snapshot (assets 의 source_scan 과 별개로 발췌 텍스트만 보존)
source_text: Mapped[str | None] = mapped_column(Text)
source_page: Mapped[int | None] = mapped_column(Integer)
# 학습 모드: 'copy'/'trace'/'blank-repeat'/'dictation'/'shadowing'/'quiz'/'flashcard'
mode: Mapped[str] = mapped_column(String(30), default="copy", nullable=False)
prompt_question: Mapped[str | None] = mapped_column(Text)
expected_answer: Mapped[str | None] = mapped_column(Text)
# 도메인별 자유 메타 (어학 reading/meaning, 자격증 law_article 등)
metadata_json: Mapped[dict[str, Any] | None] = mapped_column(
"metadata", JSONB
)
# 횟수 카운트 (보조)
target_count: Mapped[int | None] = mapped_column(Integer)
repetition_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
# 필기 데이터 (원본) — Phase 1 핵심
strokes_json: Mapped[dict[str, Any] | None] = mapped_column(JSONB)
canvas_width: Mapped[int | None] = mapped_column(Integer)
canvas_height: Mapped[int | None] = mapped_column(Integer)
schema_version: Mapped[int] = mapped_column(Integer, default=1, nullable=False)
# 필기 파생 텍스트 — Phase 2 채움 (Phase 1 NULL)
ocr_text: Mapped[str | None] = mapped_column(Text)
user_corrected_text: Mapped[str | None] = mapped_column(Text)
ai_summary: Mapped[str | None] = mapped_column(Text)
# SRS / 퀴즈 통계 — Phase 4 활성, Phase 1 NULL
review_state: Mapped[str | None] = mapped_column(String(20))
next_review_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
last_quiz_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
correct_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
incorrect_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
# 학습 워크스페이스(study_topic) 1:N. NULL 허용 — 미분류 세션이 정상 상태.
study_topic_id: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="SET NULL")
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
# 연관 assets — 세션 삭제 시 함께 삭제 (DB ON DELETE CASCADE 와 일치)
assets: Mapped[list["StudySessionAsset"]] = relationship(
back_populates="session",
cascade="all, delete-orphan",
order_by="StudySessionAsset.sort_order",
)
# 연관 학습 워크스페이스
study_topic: Mapped["StudyTopic | None"] = relationship(
"StudyTopic", back_populates="sessions", lazy="noload"
)
class StudySessionAsset(Base):
__tablename__ = "study_session_assets"
__table_args__ = (
# POST /assets 의 409 근거. NULL role 끼리는 Postgres 기본대로 다른 값으로 취급.
UniqueConstraint(
"study_session_id", "document_id", "asset_type", "role",
name="study_session_assets_session_id_document_id_asset_type_rol_key",
),
)
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
study_session_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_sessions.id", ondelete="CASCADE"), nullable=False
)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False
)
# 'source_scan' | 'handwriting_png' | 'audio' | 'video' | 'transcript' | 'reference'
asset_type: Mapped[str] = mapped_column(String(30), nullable=False)
# 'prompt' | 'answer' | 'pronunciation' | 'lecture' | 'listening_source'
# | 'shadowing_source' | 'reference'
role: Mapped[str | None] = mapped_column(String(40))
sort_order: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
session: Mapped["StudySession"] = relationship(back_populates="assets")
-92
View File
@@ -1,92 +0,0 @@
"""study_topics / study_topic_documents 테이블 ORM — 학습 워크스페이스 1차 컨테이너
목적: 필기 세션(StudySession) 자료(documents) 학습 주제(: 가스기사)
아래로 묶는 컨테이너. 향후 단어장/오디오/문제세트 같은 학습 자산이 같은
컨테이너 아래로 들어올 있도록 설계.
설계 원칙:
- documents.category(자료실 UI ) 직교한 별도 분류 . 자료실 facet/카테고리 미터치.
- StudySession.certification/subject/topic 컬럼은 보존, 컨테이너 직교 세부 메타.
- study_type 느슨한 분류. DB/Pydantic 강한 enum 미사용. 권장값: certification /
language / school / work / general (UI 드롭다운에서만 안내).
- soft delete (deleted_at). 동일 user_id+name active 행만 partial unique index
중복 방지 삭제된 주제명 재생성 가능.
- 자산 다대다 매핑: PR documents (study_topic_documents). 향후 자산 타입별
조인 테이블 추가 (study_topic_audio_assets ). polymorphic 단일 테이블 금지.
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, Integer, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column, relationship
from core.database import Base
class StudyTopic(Base):
__tablename__ = "study_topics"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
name: Mapped[str] = mapped_column(String(120), nullable=False)
description: Mapped[str | None] = mapped_column(Text)
color: Mapped[str | None] = mapped_column(String(20))
# 느슨한 분류 (certification/language/school/work/general 권장)
study_type: Mapped[str | None] = mapped_column(String(40))
sort_order: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
# PR-6: 시험 메타 (회차당 문항 수 + 과목 리스트)
exam_round_size: Mapped[int | None] = mapped_column(Integer)
exam_subjects: Mapped[list] = mapped_column(JSONB, nullable=False, default=list)
# 공부 암기노트 Phase 1: 공부중 태그 (DDL=migration 295).
# focused_at IS NOT NULL = 포커스 중 (reminder/세션-prep 대상).
focused_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
deleted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
# 연관 — 세션 (1:N), 자료 매핑 (N:M), 문제 (1:N PR-2)
sessions: Mapped[list["StudySession"]] = relationship( # type: ignore[name-defined] # noqa: F821
"StudySession", back_populates="study_topic", lazy="noload"
)
document_links: Mapped[list["StudyTopicDocument"]] = relationship(
back_populates="topic",
cascade="all, delete-orphan",
order_by="StudyTopicDocument.sort_order",
lazy="noload",
)
questions: Mapped[list["StudyQuestion"]] = relationship( # type: ignore[name-defined] # noqa: F821
"StudyQuestion", back_populates="topic", lazy="noload"
)
class StudyTopicDocument(Base):
__tablename__ = "study_topic_documents"
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), primary_key=True
)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), primary_key=True
)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
sort_order: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
topic: Mapped["StudyTopic"] = relationship(back_populates="document_links")
-38
View File
@@ -1,38 +0,0 @@
"""study_topic_subject_notes ORM (PR-9) — 분야 설명 캐시.
(user, study_topic, subject, scope) 단위 unique. AI 즉석 생성 + 캐시.
사용자가 풀이 결과 화면에서 "모르겠음" 카드 클릭 호출.
status: none/pending/ready/failed/stale (PR-3 패턴 동일).
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class StudyTopicSubjectNote(Base):
__tablename__ = "study_topic_subject_notes"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
study_topic_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("study_topics.id", ondelete="CASCADE"), nullable=False
)
subject: Mapped[str] = mapped_column(String(120), nullable=False)
scope: Mapped[str] = mapped_column(String(200), nullable=False, default="")
content: Mapped[str | None] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(20), default="none", nullable=False)
generated_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
model: Mapped[str | None] = mapped_column(String(120))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now, nullable=False
)
-2
View File
@@ -16,9 +16,7 @@ class User(Base):
password_hash: Mapped[str] = mapped_column(Text, nullable=False)
totp_secret: Mapped[str | None] = mapped_column(String(64))
is_active: Mapped[bool] = mapped_column(Boolean, default=True)
is_admin: Mapped[bool] = mapped_column(Boolean, default=False, server_default="false")
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
last_login_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
password_changed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-76
View File
@@ -1,76 +0,0 @@
"""worker_capabilities + worker_heartbeats + worker_jobs 테이블 ORM.
1A scaffold (mig 270~274) + 1B 활성화 (mig 275~276). 1B = WorkerJob 신규 + 5 endpoint 구현.
"""
from datetime import datetime
from sqlalchemy import BigInteger, DateTime, ForeignKey, SmallInteger, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class WorkerCapability(Base):
__tablename__ = "worker_capabilities"
worker_id: Mapped[str] = mapped_column(Text, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id"), nullable=False
)
device_label: Mapped[str] = mapped_column(Text, nullable=False)
worker_class: Mapped[str] = mapped_column(Text, nullable=False)
tier: Mapped[str] = mapped_column(Text, nullable=False)
capabilities: Mapped[list] = mapped_column(JSONB, default=list, nullable=False)
models_loaded: Mapped[list] = mapped_column(JSONB, default=list, nullable=False)
endpoint: Mapped[str | None] = mapped_column(Text)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
last_registered_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
class WorkerHeartbeat(Base):
__tablename__ = "worker_heartbeats"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
worker_id: Mapped[str] = mapped_column(
Text, ForeignKey("worker_capabilities.worker_id"), nullable=False
)
heartbeat_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
status: Mapped[str] = mapped_column(Text, nullable=False)
current_job_id: Mapped[int | None] = mapped_column(BigInteger)
battery: Mapped[str | None] = mapped_column(Text)
thermal: Mapped[str | None] = mapped_column(Text)
raw_payload: Mapped[dict] = mapped_column(JSONB, default=dict, nullable=False)
class WorkerJob(Base):
# user_id = job owner user_id (실 사용자). worker bot 아님. worker 인증은 worker_id+JWT 별도.
# result = raw JSONB only (policy §B.2 invariant 3 — canonical promote = Notebook-Pilot-1).
__tablename__ = "worker_jobs"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
user_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("users.id"), nullable=False
)
job_type: Mapped[str] = mapped_column(Text, nullable=False)
status: Mapped[str] = mapped_column(Text, nullable=False, default="pending")
worker_id: Mapped[str | None] = mapped_column(
Text, ForeignKey("worker_capabilities.worker_id")
)
payload: Mapped[dict] = mapped_column(JSONB, default=dict, nullable=False)
result: Mapped[dict | None] = mapped_column(JSONB)
error_message: Mapped[str | None] = mapped_column(Text)
attempts: Mapped[int] = mapped_column(SmallInteger, default=0, nullable=False)
max_attempts: Mapped[int] = mapped_column(SmallInteger, default=3, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, nullable=False
)
claimed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
completed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-5
View File
@@ -1,5 +0,0 @@
"""AI policy layer — pure-function judgment engine.
Runtime 동작 변경 없음. 패키지를 app/workers app/api 에서 import 하지
(PR-A CI gate: import 격리 검증).
"""
-56
View File
@@ -1,56 +0,0 @@
"""Audit — 4B 가 자체 답변한 경우 금지 패턴 검출.
escalate_to_26b=False 이벤트에만 호출. 위반 검출 policy_violation=true
analyze_events 기록되고 야간 sweep 에서 under_escalation 후보로 포획된다.
detection_patterns Python re.search() 평가 (Postgres regex 아님).
"""
from __future__ import annotations
import re
from functools import lru_cache
from typing import Iterable
from policy.loader import load_policy
from policy.schema import DomainPolicy, ForbiddenRule
@lru_cache(maxsize=256)
def _compiled_patterns(pattern_tuple: tuple[str, ...]) -> tuple[re.Pattern[str], ...]:
return tuple(re.compile(p) for p in pattern_tuple)
def _rules_for_subject(
policy: DomainPolicy, subject_domain: str
) -> Iterable[ForbiddenRule]:
for rule in policy.forbidden_for_4b:
if subject_domain in rule.applies_when_subject_in:
yield rule
def check_4b_output_violations(
output_text: str,
subject_domain: str,
*,
policy: DomainPolicy | None = None,
) -> list[str]:
"""Return list of violated forbidden-rule IDs (빈 리스트면 위반 없음).
Parameters
----------
output_text: 4B 생성한 자체 답변 텍스트.
subject_domain: routing 에서 결정된 도메인 이름. fallback 도메인은 `generic`.
policy: 주입용 (테스트). None 이면 load_policy().
"""
if not output_text:
return []
if policy is None:
policy = load_policy()
violations: list[str] = []
for rule in _rules_for_subject(policy, subject_domain):
patterns = _compiled_patterns(tuple(rule.detection_patterns))
if any(p.search(output_text) for p in patterns):
violations.append(rule.id)
return violations
-67
View File
@@ -1,67 +0,0 @@
"""domain_policy.yaml loader with lru_cache."""
from __future__ import annotations
import os
from functools import lru_cache
from pathlib import Path
import yaml
from policy.schema import DomainPolicy
DEFAULT_POLICY_FILENAME = "domain_policy.yaml"
POLICY_PATH_ENV = "POLICY_PATH"
def _resolve_path(path: str | None) -> Path:
if path is not None:
return Path(path)
env_path = os.environ.get(POLICY_PATH_ENV)
if env_path:
return Path(env_path)
# 검색 순서 (multi-env 호환):
# 1. cwd / domain_policy.yaml 로컬 pytest (repo-root 실행)
# 2. /app / domain_policy.yaml container bind-mount 경로
# 3. /app/../domain_policy.yaml container: /app 의 parent
# 4. <this>.parent.parent.parent / yaml policy 패키지 기준 repo-root
candidates = [
Path.cwd() / DEFAULT_POLICY_FILENAME,
Path("/app") / DEFAULT_POLICY_FILENAME,
Path("/app").parent / DEFAULT_POLICY_FILENAME,
Path(__file__).resolve().parent.parent.parent / DEFAULT_POLICY_FILENAME,
]
for c in candidates:
if c.is_file():
return c
# 찾지 못한 경우 첫 후보 반환 → 나중에 FileNotFoundError 로 명확히 실패
return candidates[0]
@lru_cache(maxsize=8)
def _load_cached(resolved: str) -> DomainPolicy:
text = Path(resolved).read_text(encoding="utf-8")
raw = yaml.safe_load(text)
return DomainPolicy.model_validate(raw)
def load_policy(path: str | None = None) -> DomainPolicy:
"""Load policy yaml and validate via pydantic.
Cache key = resolved absolute path (문자열). 테스트에서 다른 path 주면 별도 캐시.
"""
resolved = str(_resolve_path(path).resolve())
return _load_cached(resolved)
def clear_cache() -> None:
"""테스트용 — 연속 호출 시 서로 다른 yaml 을 반영해야 할 때."""
_load_cached.cache_clear()
def read_policy_bytes(path: str | None = None) -> bytes:
"""policy_version hash 계산용 — yaml 원본 바이트."""
resolved = _resolve_path(path).resolve()
return resolved.read_bytes()
-153
View File
@@ -1,153 +0,0 @@
"""Prompt rendering — yaml excerpt 를 template placeholder 에 주입.
템플릿에는 다음 placeholder 있다:
{forbidden_block} subject forbidden_for_4b 블록 주입
{subject_description} subject_domains[domain].description
{confidence_threshold} escalation.confidence_threshold
{context_cap} escalation.context_char_cap_4b
{context_cap_doc_count} P6 전용 (batch 문서 cap, 기본 500)
policy_version() = sha256(yaml_bytes + template_bytes)[:12].
yaml 또는 template 바뀌면 자동 bump analyze_events.policy_version 으로 추적.
"""
from __future__ import annotations
import hashlib
from functools import lru_cache
from pathlib import Path
from policy.loader import load_policy, read_policy_bytes
from policy.schema import DomainPolicy
# 기본 템플릿 경로 — repo root 기준
TEMPLATE_DIR = Path(__file__).resolve().parent.parent / "prompts" / "policy"
# 4B / 26B 구분 (관측성 + 테스트 편의)
KNOWN_4B_TASKS = {
"p1_triage",
"p2_nas_rule",
"p3a_short_summary",
"p3b_entities",
"p4a_advice_trigger",
"p4b_retrieval",
"p6_night_sweep",
}
KNOWN_26B_TASKS = {
"p3c_deep_summary",
"p4b_synthesis",
}
def _template_path(task: str) -> Path:
return TEMPLATE_DIR / f"{task}.txt"
@lru_cache(maxsize=64)
def _read_template(task: str) -> str:
path = _template_path(task)
if not path.exists():
raise FileNotFoundError(f"policy template '{task}' not found at {path}")
return path.read_text(encoding="utf-8")
@lru_cache(maxsize=64)
def _read_template_bytes(task: str) -> bytes:
return _template_path(task).read_bytes()
def _forbidden_block_for(
policy: DomainPolicy, subject_domain: str
) -> str:
"""해당 도메인에 적용되는 forbidden_for_4b 규칙을 프롬프트 블록으로 렌더."""
lines = ["=== 4B 절대 금지 작업 ===",
"다음에 해당하면 자체 답변 금지, escalate_to_26b=true + envelope 만 응답.",
""]
count = 0
for rule in policy.forbidden_for_4b:
if subject_domain in rule.applies_when_subject_in:
count += 1
lines.append(f"{count}. [{rule.id}] {rule.description}")
if count == 0:
lines.append("(해당 도메인에 등록된 금지 항목 없음 — 일반 규칙만 적용)")
lines.append("")
lines.append("금지 위반 시 사후 audit (check_4b_output_violations) 에서 탐지되어")
lines.append("policy_violation=true 로 기록 + under_escalation 큐로 재처리.")
return "\n".join(lines)
def render_4b(
task: str,
subject_domain: str,
*,
policy: DomainPolicy | None = None,
) -> str:
"""4B 용 템플릿에 정책 excerpt 를 주입하고 반환.
사용자 input placeholder ({{filename}}, {{extracted_text}} , 이중중괄호)
그대로 남는다. PR-B worker str.format 또는 Template 으로 최종 주입.
"""
if task not in KNOWN_4B_TASKS:
raise ValueError(f"'{task}' is not a 4B task (known: {KNOWN_4B_TASKS})")
if policy is None:
policy = load_policy()
template = _read_template(task)
domain_spec = (
policy.subject_domains.get(subject_domain)
or policy.fallback_domain
)
return template.format(
forbidden_block=_forbidden_block_for(policy, subject_domain),
subject_description=domain_spec.description,
confidence_threshold=policy.escalation.confidence_threshold,
context_cap=policy.escalation.context_char_cap_4b,
context_cap_doc_count=500,
)
def render_26b(
task: str,
subject_domain: str,
*,
policy: DomainPolicy | None = None,
) -> str:
"""26B 용 템플릿 렌더."""
if task not in KNOWN_26B_TASKS:
raise ValueError(f"'{task}' is not a 26B task (known: {KNOWN_26B_TASKS})")
if policy is None:
policy = load_policy()
template = _read_template(task)
domain_spec = (
policy.subject_domains.get(subject_domain)
or policy.fallback_domain
)
return template.format(
forbidden_block=_forbidden_block_for(policy, subject_domain),
subject_description=domain_spec.description,
confidence_threshold=policy.escalation.confidence_threshold,
context_cap=policy.escalation.context_char_cap_26b,
context_cap_doc_count=500,
)
def policy_version(task: str, *, policy_path: str | None = None) -> str:
"""Return sha256(yaml_bytes + template_bytes)[:12].
Deterministic 같은 (yaml, template) 같은 hash. 쪽만 변경돼도 변경됨.
analyze_events.policy_version 저장되어 drift 추적.
"""
yaml_bytes = read_policy_bytes(policy_path)
template_bytes = _read_template_bytes(task)
h = hashlib.sha256(yaml_bytes + template_bytes).hexdigest()
return h[:12]
def clear_cache() -> None:
"""테스트용 — 템플릿 재읽기."""
_read_template.cache_clear()
_read_template_bytes.cache_clear()
-178
View File
@@ -1,178 +0,0 @@
"""Routing engine — 4B 출력 + 상황을 받아 26B 에스컬레이션 여부를 결정.
6 invariants (모두 deterministic, code-level HARD rules):
INV-1 self_declare_add_only
deterministic_high_impact=True AND self_declare=False high_impact_task=True
(self_declare ADD only; OFF 불가)
INV-2 risk_flag_requires_26b_forces_escalation
any(flag where policy.risk_flags[flag].requires_26b) escalate=True
INV-3 context_cap_forces_escalation
content_chars > policy.escalation.context_char_cap_4b escalate=True, reason="long_context"
INV-4 multi_doc_forces_escalation
evidence_doc_count >= policy.escalation.escalate_on_multi_doc_count
escalate=True, reason="multi_doc", add "multi_doc_dependency" to risk_flags
INV-5 risk_flags_union
final risk_flags = UNION(domain.default_risk_flags, self_declared, derived)
self_declared ADD only; default 있어도 self 추가 flag 붙이면 합집합
INV-6 fallback_domain for unknown
subject_domain not in policy.subject_domains use policy.fallback_domain
(routing None/undefined 빠지는 edge case 0)
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Iterable
from policy.loader import load_policy
from policy.schema import DomainPolicy, SubjectDomain, FallbackDomain
# --- Reason 문자열 상수 (tests 에서 참조) -----------------------------------
REASON_HIGH_IMPACT = "high_impact"
REASON_RISK_FLAG = "risk_flag_requires_26b"
REASON_LOW_CONFIDENCE = "low_confidence"
REASON_LONG_CONTEXT = "long_context"
REASON_MULTI_DOC = "multi_doc"
REASON_FALLBACK_DOMAIN = "fallback_domain"
@dataclass(frozen=True)
class RoutingDecision:
escalate_to_26b: bool
escalation_reasons: tuple[str, ...]
risk_flags: tuple[str, ...]
high_impact_task: bool
synthesis_directives: tuple[str, ...]
subject_domain_used: str # 실제 적용된 도메인 이름 (fallback 인 경우 fallback_domain.name)
used_fallback: bool = False
def _resolve_domain(
policy: DomainPolicy, subject_domain: str
) -> tuple[SubjectDomain | FallbackDomain, str, bool]:
"""INV-6 — 매칭 실패 시 fallback_domain."""
spec = policy.subject_domains.get(subject_domain)
if spec is not None:
return spec, subject_domain, False
return policy.fallback_domain, policy.fallback_domain.name, True
def decide_routing(
*,
subject_domain: str,
content_chars: int,
deterministic_keyword_hits: Iterable[str] = (),
self_declared_high_impact: bool = False,
self_declared_risk_flags: Iterable[str] = (),
confidence: float = 1.0,
evidence_doc_count: int = 0,
policy: DomainPolicy | None = None,
) -> RoutingDecision:
"""Pure function — yaml 과 입력만으로 결정론적 결과.
Parameters
----------
subject_domain: upstream (keyword/source_channel 매칭) 정한 도메인 이름.
content_chars: 4B 들어간 본문 문자 .
deterministic_keyword_hits: upstream keyword 매칭 결과 (비어있어도 domain.high_impact
True INV 그대로 작동).
self_declared_high_impact: 4B 출력의 high_impact_self_declared 필드.
self_declared_risk_flags: 4B 출력의 risk_flags 자기선언.
confidence: 4B 출력의 confidence (0.0~1.0).
evidence_doc_count: /ask 경로 등에서 합성 대상 문서 .
policy: 주입용 (테스트). None 이면 loader.load_policy().
"""
if policy is None:
policy = load_policy()
domain_spec, domain_name, used_fallback = _resolve_domain(policy, subject_domain)
reasons: list[str] = []
flags: set[str] = set()
# --- INV-1: high_impact (deterministic → self_declare 는 ADD only) -----
deterministic_high_impact = (
bool(list(deterministic_keyword_hits))
or domain_spec.high_impact
)
high_impact = deterministic_high_impact
if self_declared_high_impact:
high_impact = True # ADD only — False 로 되돌릴 수 없음
if high_impact:
reasons.append(REASON_HIGH_IMPACT)
# --- INV-5: risk_flags UNION merge -------------------------------------
# (a) domain 기본
flags.update(domain_spec.default_risk_flags)
# (b) 4B 자기선언 (ADD only)
flags.update(self_declared_risk_flags)
# --- INV-3: long_context (derived flag 추가 전에 판정) ----------------
if content_chars > policy.escalation.context_char_cap_4b:
reasons.append(REASON_LONG_CONTEXT)
# --- INV-4: multi_doc (derived flag 추가) -----------------------------
if evidence_doc_count >= policy.escalation.escalate_on_multi_doc_count:
reasons.append(REASON_MULTI_DOC)
flags.add("multi_doc_dependency")
# --- low_confidence (derived flag 추가) --------------------------------
if confidence < policy.escalation.confidence_threshold:
reasons.append(REASON_LOW_CONFIDENCE)
flags.add("low_confidence_reasoning")
# --- INV-2: risk_flag_requires_26b -------------------------------------
requires_26b_flag = any(
policy.risk_flags[f].requires_26b
for f in flags
if f in policy.risk_flags and policy.risk_flags[f].requires_26b
)
if requires_26b_flag:
reasons.append(REASON_RISK_FLAG)
# --- INV-6: fallback 사용 사실 기록 -----------------------------------
if used_fallback:
# 에스컬레이션 자체를 강제하진 않지만 visibility 위해 reason 에 추가
reasons.append(REASON_FALLBACK_DOMAIN)
# --- synthesis directives 수집 (26B 에 전달될 규칙) -------------------
directives: list[str] = []
for f in sorted(flags):
rf = policy.risk_flags.get(f)
if rf is not None and rf.synthesis_directive:
directives.append(rf.synthesis_directive)
# --- 최종 escalate 판정 ---------------------------------------------
escalate = (
high_impact
or requires_26b_flag
or content_chars > policy.escalation.context_char_cap_4b
or evidence_doc_count >= policy.escalation.escalate_on_multi_doc_count
or confidence < policy.escalation.confidence_threshold
)
# 중복 reason 제거 (순서 유지)
seen: set[str] = set()
dedup_reasons: list[str] = []
for r in reasons:
if r not in seen:
seen.add(r)
dedup_reasons.append(r)
return RoutingDecision(
escalate_to_26b=escalate,
escalation_reasons=tuple(dedup_reasons),
risk_flags=tuple(sorted(flags)),
high_impact_task=high_impact,
synthesis_directives=tuple(directives),
subject_domain_used=domain_name,
used_fallback=used_fallback,
)
-133
View File
@@ -1,133 +0,0 @@
"""Pydantic v2 models for domain_policy.yaml.
Loader yaml DomainPolicy 파싱. Schema 위반 ValidationError 배포 차단.
"""
from __future__ import annotations
from typing import Literal
from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator
# documents.category enum (migration 143 + 152)
UICategory = Literal["document", "library", "news", "memo", "audio", "video", "law"]
SelfDeclareSemantics = Literal["additive_trigger_only"]
class SubjectDomain(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
description: str
suggested_ui_category: UICategory
high_impact: bool = False
default_risk_flags: tuple[str, ...] = ()
deep_summary_risk_flags: tuple[str, ...] = ()
keywords: tuple[str, ...] = ()
note: str | None = None
class FallbackDomain(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
name: str
description: str
suggested_ui_category: UICategory
high_impact: bool = False
default_risk_flags: tuple[str, ...] = ()
requires_human_review: bool = True
class RiskFlag(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
description: str
requires_26b: bool
synthesis_directive: str | None = None
output_mask_required: bool = False
@field_validator("synthesis_directive")
@classmethod
def _directive_length(cls, v: str | None) -> str | None:
if v is not None and len(v) > 500:
raise ValueError("synthesis_directive must be <= 500 chars")
return v
class ForbiddenRule(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
id: str
description: str
applies_when_subject_in: tuple[str, ...]
detection_patterns: tuple[str, ...] = ()
class Escalation(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
confidence_threshold: float = Field(ge=0.0, le=1.0)
context_char_cap_4b: int = Field(gt=0)
context_char_cap_26b: int = Field(gt=0)
escalate_on_multi_doc_count: int = Field(ge=1)
class HealthRange(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
min: float | None = None
max: float | None = None
class Observability(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
required_event_fields: tuple[str, ...]
health_ranges: dict[str, HealthRange]
class DomainPolicy(BaseModel):
model_config = ConfigDict(extra="forbid", frozen=True)
version: int
last_updated: str
scope: tuple[str, ...]
self_declare_semantics: SelfDeclareSemantics
subject_domains: dict[str, SubjectDomain]
fallback_domain: FallbackDomain
risk_flags: dict[str, RiskFlag]
forbidden_for_4b: tuple[ForbiddenRule, ...]
escalation: Escalation
observability: Observability
@model_validator(mode="after")
def _cross_reference_check(self) -> "DomainPolicy":
"""Cross-field validation — yaml 내부 일관성."""
known_flags = set(self.risk_flags.keys())
# 1. 모든 subject_domain.default_risk_flags 가 risk_flags 에 정의돼 있어야 함
for name, dom in self.subject_domains.items():
for flag in (*dom.default_risk_flags, *dom.deep_summary_risk_flags):
if flag not in known_flags:
raise ValueError(
f"subject_domain '{name}' references unknown risk_flag '{flag}'"
)
for flag in self.fallback_domain.default_risk_flags:
if flag not in known_flags:
raise ValueError(
f"fallback_domain references unknown risk_flag '{flag}'"
)
# 2. forbidden_for_4b.applies_when_subject_in 의 도메인이 subject_domains 에 있어야 함
known_domains = set(self.subject_domains.keys())
for rule in self.forbidden_for_4b:
for dom_name in rule.applies_when_subject_in:
if dom_name not in known_domains:
raise ValueError(
f"forbidden rule '{rule.id}' references unknown subject_domain '{dom_name}'"
)
return self
-90
View File
@@ -1,90 +0,0 @@
"""ShadowLogger — Protocol + in-memory implementation.
Live 전환 1 shadow 기간에 "만약 이 정책이면 어디로 라우팅했을지" 기록.
실제 DB writer (DBShadowLogger) PR-B 책임. PR-A :
1. Protocol 인터페이스 확정.
2. InMemoryShadowLogger 테스트 가능한 fake 제공.
PR-B Protocol 시그니처를 변경하지 않는 것이 불변식.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Protocol, runtime_checkable
from policy.routing import RoutingDecision
@dataclass(frozen=True)
class ShadowRecord:
"""단일 shadow 이벤트 — InMemoryShadowLogger 가 dict 로 보관."""
doc_id: str
decision: RoutingDecision
actual_model_used: str
prompt_version: str
policy_version: str
recorded_at: datetime
extra: dict[str, Any] = field(default_factory=dict)
@runtime_checkable
class ShadowLogger(Protocol):
"""PR-A 가 정의하는 shadow 기록 인터페이스.
PR-B DBShadowLogger(ShadowLogger) 구현할 시그니처를 그대로 준수.
"""
async def record_would_route(
self,
*,
doc_id: str,
decision: RoutingDecision,
actual_model_used: str,
prompt_version: str,
policy_version: str,
extra: dict[str, Any] | None = None,
) -> None:
...
class InMemoryShadowLogger:
"""테스트 전용 구현. PR-B 의 DBShadowLogger 와 시그니처 호환."""
def __init__(self) -> None:
self._records: list[ShadowRecord] = []
async def record_would_route(
self,
*,
doc_id: str,
decision: RoutingDecision,
actual_model_used: str,
prompt_version: str,
policy_version: str,
extra: dict[str, Any] | None = None,
) -> None:
self._records.append(
ShadowRecord(
doc_id=doc_id,
decision=decision,
actual_model_used=actual_model_used,
prompt_version=prompt_version,
policy_version=policy_version,
recorded_at=datetime.now(timezone.utc),
extra=dict(extra or {}),
)
)
# --- Inspection helpers (테스트 전용) ----------------------------------
@property
def records(self) -> tuple[ShadowRecord, ...]:
return tuple(self._records)
def clear(self) -> None:
self._records.clear()
def count(self) -> int:
return len(self._records)
-46
View File
@@ -1,46 +0,0 @@
너는 다국적 뉴스 비교 분석가다.
아래는 같은 주제로 군집된 야간 수집 뉴스들 — 각 줄 앞 (국가코드 · 소스) 표시로 출처가 표시되어 있다.
이 정보만으로 cross-country 비교 분석을 JSON 으로만 출력하라.
목표:
- 같은 사건을 각 나라가 어떻게 다르게 다루는지 / 무엇이 공통인지를 1페이지 카드 형태로 정리.
- 사용자는 한국어 독자. 한국어로 출력.
절대 금지:
- 제공된 summary 에 없는 사실 추가
- 추측 표현 ("보인다", "~할 것이다", "~할 전망" 등)
- JSON 외의 모든 텍스트 (설명, 마크다운, 코드블록 금지)
- 인용부호 안 원문에 없던 단어 생성 (key_quotes 는 원문 그대로만)
분량 cap (반드시 지킬 것):
- country_perspectives: 최대 10개, 각 summary 는 1~2문장 (한국어 120자 이내)
- divergences: 최대 3개, 각 200자 이내
- convergences: 최대 2개, 각 200자 이내
- key_quotes: 최대 5개, 각 quote 240자 이내
- historical_context: 1~2문장 (한국어 120자 이내), 의미 있을 때만 채우고 아니면 null
출력 형식 (JSON 객체 하나만 출력, 위 cap 초과 금지):
{
"topic_label": "5~10 단어의 한국어 토픽 제목",
"headline": "전체를 한 줄로 압축한 한국어 headline (≤80자)",
"country_perspectives": [
{"country": "KR", "summary": "...", "article_ids": []},
{"country": "US", "summary": "...", "article_ids": []}
],
"divergences": ["A국=X 강조 / B국=Y 비판 / C국=Z 부각"],
"convergences": ["모든 매체가 Z 사실은 일치"],
"key_quotes": [{"country": "US", "source": "NYT", "quote": "..."}],
"historical_context": null
}
규칙:
- country_perspectives 의 country 는 입력 기사의 국가코드 그대로 (대문자).
- article_ids 는 비워둬도 됨 (서버가 채움).
- 단일 국가만 다룬 경우 divergences 는 빈 배열.
- historical_context 는 아래 "이전 흐름 참고" 섹션이 비어있으면 반드시 null.
오늘 새벽 기사 묶음:
{articles_block}
이전 흐름 참고 (직접 인용 금지, 맥락 파악 용도):
{historical_block}
-33
View File
@@ -1,33 +0,0 @@
You are an answerability judge. Given a query and evidence chunks, determine if the evidence can answer the query. Respond ONLY in JSON.
## CALIBRATION (CRITICAL)
- verdict=full: evidence is SUFFICIENT to answer the CORE of the query. Missing minor details does NOT make it insufficient.
- verdict=partial: evidence covers SOME major aspects but CLEARLY MISSES others the user explicitly asked about.
- verdict=insufficient: evidence has NO relevant information for the query, or is completely off-topic.
Example: Query="제6장 주요 내용", Evidence covers 제6장 definition+scope → verdict=full (core is covered).
Example: Query="제6장 처벌 조항", Evidence covers 제6장 definition but NOT 처벌 → verdict=partial.
Example: Query="감귤 출하량", Evidence about 산업안전보건법 → verdict=insufficient.
## Rules
1. Your "verdict" must be based ONLY on whether the CONTENT semantically answers the query. Ignore retrieval scores for this field.
2. "covered_aspects": query aspects that evidence covers. Korean labels for Korean queries.
3. "missing_aspects": query aspects that evidence does NOT cover. Korean labels.
4. Keep aspects concise (2-5 words each), non-overlapping.
## Output Schema
{
"verdict": "full" | "partial" | "insufficient",
"covered_aspects": ["aspect1"],
"missing_aspects": ["aspect2"],
"confidence": "high" | "medium" | "low"
}
## Query
{query}
## Evidence chunks:
{chunks}
## Retrieval scores (for reference only, NOT for verdict):
[{scores}]
+3 -29
View File
@@ -1,20 +1,14 @@
[DEPRECATED 2026-04-24] — summary_triage.txt 로 이관됨 (PR-B B-1 tier routing).
이 파일은 B-1 안정화 기간 동안 rollback 경로를 위해 유지. 신규 호출 경로는
summary_triage.txt + summary_deep.txt 조합 사용. 실제 삭제는 별도 cleanup PR.
You are a document classification AI. Analyze the document below and respond ONLY in JSON format. No other text.
## Response Format
{
"domain": "Level1/Level2/Level3",
"document_type": "one of document_types",
"facet_doctype": "one of facet_doctypes or null",
"confidence": 0.85,
"tags": ["tag1", "tag2"],
"importance": "medium",
"sourceChannel": "inbox_route",
"dataOrigin": "work or external",
"docPurpose": "business or knowledge"
"dataOrigin": "work or external"
}
## Domain Taxonomy (select the most specific leaf node)
@@ -62,7 +56,7 @@ General/
- 2-level paths allowed ONLY when no leaf exists (e.g., Engineering/Civil)
## Document Types (select exactly ONE)
Reference, Standard, Manual, Drawing, Template, Note, Academic_Paper, Law_Document, Report, Memo, Checklist, Meeting_Minutes, Specification, 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
Reference, Standard, Manual, Drawing, Template, Note, Academic_Paper, Law_Document, Report, Memo, Checklist, Meeting_Minutes, Specification
### Document Type Detection Rules
- Step-by-step instructions → Manual
@@ -71,22 +65,9 @@ Reference, Standard, Manual, Drawing, Template, Note, Academic_Paper, Law_Docume
- Meeting discussion → Meeting_Minutes
- Checklist format → Checklist
- Academic/research format → Academic_Paper
- Technical drawings → Drawing / 도면
- 발주 내역, 품목·수량·단가 표 → 발주서
- 공급자/공급받는자/세액 양식 → 세금계산서
- 거래 명세/납품 명세 → 명세표
- 자격 증빙·수료·재직 → 증명서
- 업무·프로젝트 추진안 → 계획서
- 공사 시방·재료 기준 → 시방서
- Technical drawings → Drawing
- If unclear → Note
## facet_doctype (실무 문서 유형 식별 신호)
Select ONE of: 발주서, 세금계산서, 명세표, 도면, 증명서, 계획서, 시방서
If the document clearly does NOT fit any of the above, return null.
- This field is independent of document_type — use it to flag business-document types
that drive 자료실(library) 자동 분류 제안.
- 발주서 / 세금계산서 / 명세표 는 자료실 "거래" 분류의 승인 대기 제안으로 연결된다.
## Confidence (0.0 ~ 1.0)
- How confident are you in the domain classification?
- 0.85+ = high confidence, 0.6~0.85 = moderate, <0.6 = uncertain
@@ -108,12 +89,5 @@ If the document clearly does NOT fit any of the above, return null.
- work: company-related (TK, Technicalkorea, factory, production)
- external: external reference (news, papers, laws, general info)
## docPurpose
- business: 업무 수행에 직접 사용 (양식, 보고서, 체크리스트, 제출물, 계획서)
- knowledge: 참조·학습·보관 목적 (법령, 논문, 기사, 레퍼런스, 기술 문서, 교육 자료)
- Template, Checklist, Report, Specification → business 가능성 높음
- Academic_Paper, Law_Document, Reference, Standard → knowledge 가능성 높음
- Meeting_Minutes, Memo → 문맥 판단 (실행 기록이면 business, 참조용이면 knowledge)
## Document to classify
{document_text}
-19
View File
@@ -1,19 +0,0 @@
너는 팩트 기반 뉴스 토픽 요약 도우미다.
아래는 같은 사건으로 군집된 기사들의 ai_summary다.
이 정보만으로 다음을 JSON으로만 출력하라.
절대 금지:
- 제공된 summary에 없는 사실 추가
- 해석/비교/예측/의견
- "보인다", "~할 것이다", "~할 전망" 같은 추측 표현
- 인용부호 안 원문 외 단어 생성
- JSON 외의 모든 텍스트 (설명, 마크다운, 코드블록 금지)
출력 형식 (JSON 객체 하나만 출력):
{
"topic_label": "5~10 단어의 한국어 제목",
"summary": "1~2 문장, 사실만, 수동태 허용"
}
기사 요약:
{articles_block}
-30
View File
@@ -1,30 +0,0 @@
You are a document analyzer. Respond ONLY in JSON. No markdown wrapping, no explanation.
## Task
Given a document, produce a structured analysis with up to 4 layers.
Skip any layer that does not apply. Always include "summary".
## Output Schema
{
"layers": [
{"layer": "evidence", "title": "근거", "content": "..."},
{"layer": "explanation", "title": "해설", "content": "..."},
{"layer": "examples", "title": "사례", "content": "..."},
{"layer": "summary", "title": "요약", "content": "..."}
]
}
## Rules
- Each content: 200~400 characters, in the same language as the document (Korean documents → Korean).
- "evidence": Key factual claims or data points stated in the document. Skip for narrative/opinion documents.
- "explanation": Why the facts matter, context, or interpretation. Skip for pure data/tables.
- "examples": Concrete cases, scenarios, or instances explicitly mentioned. Skip if none exist.
- "summary": Always present. 2-3 sentences capturing the document's core message.
- Use ONLY information in the document. No outside knowledge.
- If a layer does not apply, OMIT it entirely from the layers array. Do NOT write "해당 없음", "정보 없음", "N/A" — just skip.
- Maximum 4 layers. Minimum 1 (summary).
## Document
Title: {document_title}
Content:
{document_text}
-77
View File
@@ -1,77 +0,0 @@
You are an evidence span extractor. Respond ONLY in JSON. No markdown, no explanation.
## Task
For each numbered candidate, extract the most query-relevant span from the original text (copy verbatim, 50-200 chars) and rate relevance 0.0~1.0. If the candidate has no connection at all to the query topic, set span=null, relevance=0.0, skip_reason. Partial or indirect relevance should still get a span and relevance >= 0.3.
## Output Schema
{
"items": [
{
"n": 1,
"span": "...",
"relevance": 0.0,
"skip_reason": null
}
]
}
## Rules
- `n`: candidate 번호 (1-based, 입력 순서와 동일). **모든 n을 반환** (skip된 것도 포함).
- `span`: 원문에서 **그대로 복사한** 50~200자. 요약/변형 금지. 원문에 없는 단어는 절대 포함하지 말 것. 여러 문장이어도 무방.
- 관련 span이 없으면 `span: null`, `relevance: 0.0`, `skip_reason`에 한 줄 사유.
- `relevance`: 0.0~1.0 float
- 0.9+ query에 직접 답함
- 0.7~0.9 강한 연관
- 0.5~0.7 명확한 부분 연관 (query의 핵심 측면 일부를 커버)
- 0.3~0.5 약한 부분 연관 (query 주제에 관련되나 직접 답은 아님)
- <0.3 무관
- `skip_reason`: span=null 일 때만 필수. 예: "no_direct_relevance", "off_topic", "generic_boilerplate"
- **원문 그대로 복사 강제**: 번역/paraphrase/요약 모두 금지. evidence span은 citation 원문이 되어야 한다.
## Example 1 (hit)
query: `산업안전보건법 제6장 주요 내용`
candidates:
[1] title: 산업안전보건법 해설 / text: 제6장은 "안전보건관리체제"에 관한 장으로, 사업주의 안전보건관리책임자 선임 의무와 관리감독자 지정 등을 규정한다. 제15조부터 제19조까지 구성된다...
[2] title: 회사 복지 규정 / text: 직원의 연차휴가 사용 규정과 경조사 지원 내용을 담고 있다...
{
"items": [
{
"n": 1,
"span": "제6장은 \"안전보건관리체제\"에 관한 장으로, 사업주의 안전보건관리책임자 선임 의무와 관리감독자 지정 등을 규정한다. 제15조부터 제19조까지 구성된다",
"relevance": 0.95,
"skip_reason": null
},
{
"n": 2,
"span": null,
"relevance": 0.0,
"skip_reason": "off_topic"
}
]
}
## Example 2 (partial)
query: `Python async best practice`
candidates:
[1] title: FastAPI tutorial / text: FastAPI supports both async and sync endpoints. For I/O-bound operations, use async def with await for database and HTTP calls. Avoid blocking calls in async functions or use run_in_executor...
{
"items": [
{
"n": 1,
"span": "For I/O-bound operations, use async def with await for database and HTTP calls. Avoid blocking calls in async functions or use run_in_executor",
"relevance": 0.82,
"skip_reason": null
}
]
}
## Query
{query}
## Candidates
{numbered_candidates}
-41
View File
@@ -1,41 +0,0 @@
[System]
너는 Document Server 의 업로드 라우터다. 업로드된 파일의 메타데이터와 (있다면) 텍스트 preview 를 보고, 어떤 처리 파이프라인이 필요한지만 결정한다. 문서 내용을 요약하거나 태깅하지 않는다.
subject_description: {subject_description}
규칙:
- mime/확장자가 명확하면 그대로 따른다. 모르겠으면 "unknown" 으로 표시하고 needs_ocr=true.
- 이미지·PDF 의 text_density < 0.3 → needs_ocr=true.
- 오디오(m4a/mp3/wav)·비디오(mp4/webm) → needs_stt=true.
- 확신도 낮으면 priority="needs_human" 로만 표시하고 추측하지 않는다.
{forbidden_block}
출력 (JSON only, 다른 텍스트 금지):
{{
"subject_domain": "safety_reference|safety_operational|msds|hazard_specific|incident_report|health_record|safety_video|news_item|news_digest_request|generic",
"needs_ocr": bool,
"needs_stt": bool,
"needs_summary": bool,
"summary_tier": "short|standard|deep|none",
"priority": "normal|high|needs_human",
"high_impact_self_declared": bool,
"high_impact_reason": "한 줄 한국어",
"confidence": 0.0~1.0,
"escalate_to_26b": bool,
"escalation_reason": "한 줄 한국어 (escalate=true 일 때만)"
}}
에스컬레이션 기준 (one-of):
- 입력 preview > {context_cap} chars
- confidence < {confidence_threshold}
- 규칙 충돌 / 다중 도메인 혼재
- 사용자 대면 자연어 응답 필요 (여긴 해당 없음)
[User]
파일명: {{filename}}
MIME: {{mime}}
크기: {{size_bytes}} bytes
소스: {{source}} (upload | nas_watcher | law_monitor | news_collector)
Text preview (처음 2000자):
{{text_preview_or_empty}}
-40
View File
@@ -1,40 +0,0 @@
[System]
너는 Document Server 의 자료 분류 어시스턴트다. 문서 메타데이터 + 짧은 요약 + 추출 태그를 보고 사용자 승인용 UI 카테고리/파일명/태그 제안을 생성한다.
**자동 이동 금지** — 네 출력은 승인 대기용 제안일 뿐, 즉시 DB category 를 변경하지 않는다 (PR-B 의 ai_suggestion 플로우에서 사용자 승인 후 반영).
subject_description: {subject_description}
{forbidden_block}
제약:
- suggested_ui_category 는 {{document, library, news, memo, audio, video, law}} 중에서만 선택.
- 규칙에 없는 카테고리는 만들지 않는다. 애매하면 needs_human_review=true.
- cat_library=1 과 has_library_tag=1 자동 전이 금지 (정책).
- 개인정보 (주민번호/계좌/전화/차량번호) 가 본문에 보이면 tags 에 "pii" 추가 + confidence 감점.
- category 매칭을 subject_domain 판정 키로 절대 역산하지 말 것 (UI 축과 정책 축 분리 원칙).
출력 (JSON only):
{{
"suggested_ui_category": "document|library|news|memo|audio|video|law",
"target_subfolder": "...",
"suggested_filename": "...",
"tags_auto": ["tag1", "tag2"],
"library_suggestion": bool,
"confidence": 0.0~1.0,
"needs_human_review": bool,
"reason": "한 줄 한국어",
"escalate_to_26b": bool
}}
에스컬레이션:
- 입력 > {context_cap} chars → escalate
- confidence < {confidence_threshold} → escalate
- 도메인·카테고리 조합에 대한 룰이 상충 → escalate
[User]
파일명: {{filename}}
subject_domain: {{subject_domain}}
추출 요약 (P3a short tier): {{short_summary}}
추출 태그 후보: {{extracted_tags}}
유사 기존 문서 top3: {{similar_docs_titles}}
-60
View File
@@ -1,60 +0,0 @@
[System]
너는 한국어 문서 태거 + 짧은 요약기다. 입력 본문을 읽고 TL;DR + 핵심 bullets + tags 만 생성한다. **상세 문단·entities 는 생성하지 않는다** (깊은 요약은 26B, entity 는 P3b 담당).
subject_description: {subject_description}
{forbidden_block}
태깅 원칙:
- 태그 5~12개, 명사구. 동사/조사 금지.
- "문서 종류" 태그 1개 필수 (예: 법령, MSDS, 회의록, 보고서, 메모, 뉴스, 영상전사).
- 시점 태그 (YYYY-QN / YYYY-MM) 추출 가능 시 포함.
- 중복 의미 태그 금지 ("계약" + "계약서" → "계약서" 하나).
- pii 감지 시 "pii" 추가 + confidence 감점.
요약 규칙:
- **TL;DR**: 1문장, 최대 60자.
- **Bullets**: 정확히 5개, 각 30~60자.
- 본문에 없는 정보 추가 금지 (hallucination 금지).
- 숫자·날짜·고유명사는 원문 그대로.
출력 (JSON only):
{{
"tldr": "1문장 최대 60자",
"bullets": ["...", "...", "...", "...", "..."],
"tags": ["..."],
"doc_type": "...",
"time_scope": "YYYY-QN|YYYY-MM|null",
"confidence": 0.0~1.0,
"high_impact_self_declared": bool,
"high_impact_reason": "한 줄",
"recommend_deep_summary": bool,
"recommend_entity_pass": bool,
"escalate_to_26b": bool,
"risk_flags": ["..."],
"event_kind_hint": "note|task|calendar_event|activity_log|reference|null",
"event_kind_confidence": 0.0~1.0
}}
event_kind_hint 분류 (사용자 메모 inbox triage 용 — AI 가 events row 직접 생성하지 않고 사용자 1-click promote 의 추천만 제공):
- "task": 사용자가 미래에 해야 할 일 (예: "내일 견적 요청", "세무사 전화하기"). due 시각 있어도 task 가능.
- "calendar_event": 시간/날짜가 고정된 일정 (예: "5/15 14:00 회의", "내일 2시 세무사 전화"). 본문에 명시적 시간 단서.
- "activity_log": 이미 한 행동 기록 (예: "방금 PR 머지 완료", "오늘 GPU 서버 점검함"). 과거형 또는 "방금/오늘/지금" 표지.
- "reference": 나중에 참조할 자료/링크/요약 (예: 웹 클립, 외부 자료, "이거 나중에 봐야 함").
- "note": 위 4개 어디에도 명확하지 않은 일반 메모/생각 (default).
- event_kind_confidence: 0.01.0. 명확하지 않으면 낮게 (< 0.5). 사용자가 결정.
- 본문이 짧거나 의도 불명이면 "note" + confidence 낮게.
recommend_deep_summary=true 조건:
- 본문 > 40,000 chars
- 다수 당사자 또는 시계열 전개가 있는 법령/절차/보고서
- 사용자가 이 문서를 기반으로 결정을 내려야 할 가능성
에스컬레이션 (escalate_to_26b=true):
- 본문 > {context_cap} chars
- confidence < {confidence_threshold}
- subject_domain 의 high_impact=true 이고 판단 정확성이 중요
- 5개 이상 핵심 주장 교차 — 상세 분석 필요
[User]
{{extracted_text}}
-42
View File
@@ -1,42 +0,0 @@
[System]
너는 고유명사 추출기다. 본문에서 인물/조직/프로젝트명만 추출한다.
subject_description: {subject_description}
{forbidden_block}
원칙:
- 추측·유추·번역 금지. 본문에 문자 그대로 등장하는 것만.
- 각 entity 는 원문 근접 5단어를 evidence 로 제공 (fabrication 방지).
- 확신 없으면 빈 배열 + abstained=true. 과추출 페널티 > 과소추출 페널티.
- 동의어·별칭 병합 금지 (원문 그대로 두 개 각각 기록).
abstained=true 가 되는 경우 (P3c 26B 가 재추출):
- 이름 후보가 10개 이상인데 문맥 구분 불가
- 익명 주체가 주요 행위자인 문서
- 번역·음역으로 표기 불일치 심한 경우
출력 (JSON only):
{{
"people": [
{{"name": "...", "evidence": "원문 그대로 주변 5단어"}}
],
"orgs": [
{{"name": "...", "evidence": "..."}}
],
"projects": [
{{"name": "...", "evidence": "..."}}
],
"confidence": 0.0~1.0,
"abstained": bool,
"abstain_reason": "한 줄 한국어 (abstained=true 일 때만)",
"escalate_to_26b": bool
}}
에스컬레이션:
- 본문 > {context_cap} chars
- confidence < {confidence_threshold}
- subject_domain 의 high_impact=true (안전/법령/MSDS 등 — entity 오독 실무 피해)
[User]
{{extracted_text}}
-51
View File
@@ -1,51 +0,0 @@
[System]
너는 긴 문서·문서 묶음 분석가다. 4B 가 넘긴 envelope 를 먼저 읽고, original_pointers 로 원문 범위를 재조회하여 최종 분석을 작성한다.
subject_description: {subject_description}
{forbidden_block}
envelope 를 읽는 순서:
1. risk_flags 를 먼저 본다. 어떤 위험 때문에 올라온 것인지 파악.
2. synthesis_directives 를 system 지시로 간주하여 반드시 준수.
3. distilled_context 는 "참고 요지"일 뿐, 숫자·조문·인용은 original 에서 재확인.
단일 문서:
- TL;DR (1문장, 최대 60자)
- 핵심 (bullets 5개, 각 30~80자)
- 상세 (2 문단, 각 3~5문장, 원문 흐름 유지)
문서 묶음 (법령 연대기 / 회의록 시리즈 / 사고 보고 계열):
- 묶음 개요 (1문단)
- 시계열 또는 논리 흐름 (3~7 단계)
- 각 문서 역할 1줄
- 일관성 이슈 (수치 모순, 날짜 모순) — 있을 때만
제약:
- 본문에 없는 정보 금지 (hallucination 금지).
- synthesis_directives 의 문구 규칙 ("원인은 ~" 금지 등) 반드시 준수.
- multi_reference_synthesis flag 있으면 레퍼런스별 입장 분리 기술, 종합 권고 금지.
출력 (JSON only):
{{
"mode": "single|bundle",
"tldr": "...",
"bullets": ["..."],
"detail": "...\\n\\n...",
"bundle_flow": ["..."] | null,
"inconsistencies": ["..."] | null,
"entities_confirmed": {{
"people": [{{"name": "...", "evidence": "..."}}],
"orgs": [...],
"projects": [...]
}},
"directives_applied": ["..."],
"confidence": 0.0~1.0
}}
[User]
Envelope:
{{escalation_envelope_json}}
원문 (ranges — original_pointers 기반 슬라이스):
{{original_text_slices}}
-43
View File
@@ -1,43 +0,0 @@
[System]
너는 Document Server 의 선제적 조언 탐지기다. 조언이 **실제로 사용자에게 유용한 시점** 만 감지한다. 모든 문서마다 조언하지 않는다 — **침묵이 기본값**이다.
subject_description: {subject_description}
{forbidden_block}
트리거 조건 (하나 이상 충족해야 should_advise=true):
1. reference_version_drift
같은 주제의 레퍼런스가 2개 이상이고, 개정일이 1년 이상 차이
2. safety_reference_vs_news
최근 뉴스 digest 에 보유 레퍼런스 주제의 법령 개정 시그널 탐지
3. conflict_in_refs
동일 위험 유형(예: 밀폐공간)에 대해 보유 문서들이 서로 다른 절차 제시
4. unsummarized_long_video
STT 끝난 safety_video 중 챕터는 분리됐으나 deep summary 없음
5. news_cluster_needs_synthesis
같은 이벤트 cluster 에 국가/출처 3개 이상 누적
트리거 없으면 should_advise=false, 다른 필드는 null.
출력 (JSON only):
{{
"should_advise": bool,
"trigger_type": "reference_version_drift|safety_reference_vs_news|conflict_in_refs|unsummarized_long_video|news_cluster_needs_synthesis|none",
"evidence_doc_ids": ["..."],
"urgency": "low|medium|high",
"draft_hint": "26B 에게 전달할 한 줄 컨텍스트",
"confidence": 0.0~1.0,
"escalate_to_26b": bool
}}
에스컬레이션:
- should_advise=true → 자연어 문장 작성은 26B 담당 (항상 escalate=true)
- 입력 > {context_cap} chars → escalate
- confidence < {confidence_threshold} → escalate
[User]
최근 이벤트:
{{event_context}}
관련 문서 메타:
{{docs_metadata_json}}
-44
View File
@@ -1,44 +0,0 @@
[System]
너는 질문-근거 매칭기다. 사용자 질문과 검색 후보 snippet (이미 bge-m3 + reranker 통과) 을 받아, 실제로 질문에 답하는 데 쓸 근거만 추려낸다. **최종 답변은 쓰지 않는다** — 26B synthesis 가 쓴다.
subject_description: {subject_description}
{forbidden_block}
규칙:
- 각 snippet 이 질문의 어느 부분에 답하는지 1문장으로 기술.
- 무관한 snippet 은 제외 (개수 늘리려 억지 포함 금지).
- 근거들 간 모순이 보이면 conflicts 에 기록.
- 근거 부족 시 answerability=insufficient + suggested_queries.
answerability 3-state:
- direct = 질문의 모든 측면이 근거에 있음 → 26B synthesis full-answer
- partial = 일부만 있음 (중요한 측면 1개 누락) → 26B synthesis "제한적 답변"
- insufficient = 핵심 측면 대부분 누락 → 26B 호출 안 함, 사용자에 suggested_queries 리턴
출력 (JSON only):
{{
"answerability": "direct|partial|insufficient",
"selected_evidence": [
{{"doc_id": "...", "snippet": "...", "relevance": "질문의 X 부분에 답함"}}
],
"coverage_analysis": {{
"answered_aspects": ["..."],
"unanswered_aspects": ["..."]
}},
"conflicts": ["..."] | null,
"suggested_queries": ["..."] | null,
"draft_hint": "26B 에게 줄 답변 방향 1~2 줄",
"confidence": 0.0~1.0,
"escalate_to_26b": bool
}}
에스컬레이션:
- answerability in {{direct, partial}} → 26B synthesis 호출 (escalate=true)
- answerability=insufficient → 26B 호출 안 함 (escalate=false, 사용자에게 추가 쿼리 제안)
- confidence < {confidence_threshold} → escalate (answerability 재검토)
[User]
질문: {{question}}
후보 snippets:
{{candidates_json}}
-31
View File
@@ -1,31 +0,0 @@
[System]
너는 근거 기반 답변 작성자다. 한국어로 존댓말, 이모지 금지.
subject_description: {subject_description}
envelope 가 먼저 제공된다. risk_flags 와 synthesis_directives 를 **반드시 준수**.
{forbidden_block}
응답 형식:
- 1~2문단. 구어체 금지, 문어체.
- 인용은 [doc_id:N] 형태 인라인 표기.
- 숫자·날짜·조문·고유명사는 evidence 의 snippet 그대로 복제.
- evidence 밖 정보 인용 절대 금지 (hallucination 금지).
- conflicts 있으면 마지막 문장에 "근거 간 모순" 명시.
mode 별 분기:
- full : 완전 답변. 모든 측면 커버.
- limited : 제한적 답변. 답변 마지막에 반드시
"다만 {{unanswered_aspects}} 에 대해서는 문서에 근거가 부족합니다." 삽입.
multi_reference_synthesis flag 있으면 종합 결론 금지, 레퍼런스별 분리 기술.
medical_health_judgment flag 있으면 "전문의 상담 권장" 문구 포함.
[User]
mode: {{mode}} (full | limited)
질문: {{question}}
Envelope:
{{escalation_envelope_system_injection}}
Evidence (4B 선별):
{{selected_evidence_json}}

Some files were not shown because too many files have changed in this diff Show More