Compare commits

..

18 Commits

Author SHA1 Message Date
hyungi 0cbb188375 feat(ui): DS 웹 UI 세이지 재설계 5페이지 일괄 배포 (대시보드·문서·뉴스·digest·메모)
goal 1~4 + digest. 적대 리뷰(15-agent) 확정 결함 반영본. frontend-only.
원본 커밋 = gitea feat/ui-sage-all a1a46f2. eid 마이그레이션 301~305 워킹트리
격리(미티게이션) 미접촉.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 11:12:17 +00:00
hyungi 3a865978af feat(ui): 홈 대시보드 데일리 홈 cockpit 재설계 (안1 골격+안2 위젯+안3 분포)
확정 시안 dashboard-sage-3 의 권장 합성(안1 데일리 홈 골격 + 안2 검토/파이프라인
위젯 + 안3 도메인 분포 한 줄)으로 콘텐츠 재구조화. F1 세이지 테마 위 레이아웃 개편.
- 인사 헤더 + 오늘 요약 띠(검토 대기 + 디제스트 톱 + 스탯 띠)
- 2열: 좌(빠른 캡처·활동 타임라인) / 우(학습·도메인 분포+파이프라인 칩·고정)
- digest/도메인 분포는 기존 엔드포인트 wiring(백엔드 변경 0), 학습 streak는 링크형 degrade

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 07:57:59 +00:00
hyungi 4001949d78 fix(study): 복습함 탭 전환 시 선택 초기화 (탭별 독립 선택)
검토 지적: 탭 바꿔도 selected 잔존 → 탭별 독립 선택으로 setTab 에서 selected={} 리셋. (선택 복습은 이미 현재 탭 shown 기준이라 데이터 오염은 없었고 UX 정합 개선.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 07:22:36 +00:00
hyungi af07fb427b feat(study): 복습함(B4 v1) — 오늘 할 일/미확인 2탭 + 멀티셀렉트 선택 복습
/study/review-box: GET /study-cards/due(review_stage) 를 2탭 분리(오늘 할 일=review_stage 보유 / 미확인=review_stage null 신규). 카드 멀티셀렉트 → pendingReviewCards store 로 cards-study 복습 세션에 선택분 전달(백엔드 세션 X = eid contention 중 fastapi 무재빌드). '이 탭 전체 복습'도. 완료 탭은 졸업카드 엔드포인트 필요라 비활성('추후'). 허브에 복습함 진입 카드.
- 신규 store /stores/studySession.ts(pendingReviewCards). cards-study startReview 가 consume. 전부 frontend-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 07:18:13 +00:00
hyungi 50b070fcd3 polish(study): 암기카드 학습 문구 다듬기 + '이 카드 이상해요' 버튼 강조
시안 합의본 문구 실제 반영: 탭하면 정답이 보여요 / 봤어요·다음 / 오늘 복습을 마쳤어요 / 애매하거나 몰랐던 카드는 내일 다시 만나요 / 공부로 돌아가기 / 앞—떠올리기 / 평가 sublabel 내일 다시·N일 뒤. 키보드 힌트(Space·Enter)는 sm:inline(데스크탑만). 플래그 버튼=흐린 텍스트→테두리 칩(hover 경고색).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 07:07:02 +00:00
hyungi 56d1caf1c8 fix(study): 검수함 카드 마크다운+수식 렌더 — 근거/앞면/정답
cards-review view 모드가 cue/cloze/fact/근거를 평문으로 뿌려 표·**굵게**·수식이 raw 노출. cards-study와 동일하게 renderMathMarkdown(근거 블록)·renderMathMarkdownInline(앞면·정답) 적용. 편집모드 textarea는 raw 유지.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 06:43:10 +00:00
hyungi c0772bc21c fix(study): 검수함 each_key_duplicate 크래시 — 자료(수동) 그룹 null 키 중복 해소
manual 카드 그룹은 source_question_id=null 이라 자료가 2개+ 면 {#each ... (g.source_question_id)} 키 중복 → Svelte each_key_duplicate 크래시. 키를 (source_question_id ?? question_text) 고유값으로 변경. 추가로 자료 그룹은 approve-batch 가 source_question_id:int 필수라 422 → 일괄승인 버튼을 question 그룹에만 노출. 개별 승인/수정/삭제는 cardId 기반이라 자료도 정상.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 06:39:16 +00:00
hyungi a812d93c64 Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 06:14:22 +00:00
hyungi e0cd89aef7 Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 06:07:49 +00:00
hyungi 420ee0fefb Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 05:47:08 +00:00
hyungi 56df3d3b4f Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 05:37:29 +00:00
hyungi b9802fcee5 Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 02:45:15 +00:00
hyungi 4eb01bcd11 Merge remote-tracking branch 'origin/feat/study-memo-card-p1' into feat/email-pkm-folder 2026-06-07 02:38:39 +00:00
hyungi eda1da1507 merge(study): 카드 그룹핑(material) 2026-06-07 00:41:26 +00:00
hyungi b679f3ecb5 merge(study): 암기카드 검수 UI 2026-06-06 23:49:25 +00:00
hyungi 9334c50bd3 merge(study): 암기노트 P1 HR/A (정정훅·needs_review 큐·알람) 2026-06-06 23:09:10 +00:00
hyungi 0bd82a7b0f merge(study): 암기노트 Phase 1 추출 파이프라인 (additive, disjoint) 2026-06-06 12:39:39 +00:00
hyungi 2e5baa8329 feat(email): PKM 폴더 선별 수집 + eml 본문 추출 활성화
mailplus_archive: INBOX 전체 → MAILPLUS_FOLDER(기본 PKM) 선별 수집, eml extract 스킵 가드 제거(검색·색인 편입), 폴더별 UID 상태(job_name=mailplus:<folder>), 폴더 부재 시 no-op. extract_worker: _extract_eml 신설(From/To/Date/Subject 헤더 prepend + 본문 text/plain 우선·html→bs4 fallback, 첨부 extract_meta 인벤토리 scaffold). preview/marker 는 eml 자동 skip 이라 라우팅 무변경. DNS drift(D25, mailplus.hyungi.net→.227) 교정 후 활성. 할일(events) 연계 없음.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:41:17 +09:00
83 changed files with 510 additions and 2979 deletions
+10 -185
View File
@@ -21,8 +21,8 @@ from fastapi import (
UploadFile,
status,
)
from fastapi.responses import FileResponse, StreamingResponse
from pydantic import BaseModel, field_validator
from fastapi.responses import FileResponse
from pydantic import BaseModel
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from starlette.requests import ClientDisconnect
@@ -30,19 +30,12 @@ from starlette.requests import ClientDisconnect
from ai.client import AIClient, _load_prompt, parse_json_response
from core.auth import get_current_user
from core.config import settings
from core.database import async_session, get_session
from core.database import get_session
from core.utils import file_hash
from models.document import Document
from models.document_image import DocumentImage
from models.queue import ProcessingQueue, enqueue_stage
from models.user import User
from services.dedup import (
DUPLICATE_GROUPS_SQL,
DEDUP_OFF_CHANNELS,
find_canonical_for_hash,
find_near_duplicates,
)
from services.storage import StorageNotConfigured, get_storage_backend
from services.document_telemetry import record_analyze_event, sanitize_source
from services.prompt_versions import ANALYZE_PROMPT_VERSION, resolve_primary_model
from services.search.llm_gate import Priority, acquire_mlx_gate
@@ -69,53 +62,6 @@ def _upload_error(status_code: int, error_code: str, message: str) -> HTTPExcept
)
async def _near_dup_scan_bg(doc_id: int) -> None:
"""B-3: post-upload near_duplicate 스캔 (BackgroundTask). 자체 세션, best-effort.
업로드 직후엔 doc.embedding 이 아직 없을 수 있어(embed stage 미완) trigram 후보만
기록되는 경우가 많다 — non-gating. 어떤 예외도 업로드 결과(201)에 영향 주지 않는다.
영속화는 보류(on-the-fly) — 현재는 로깅까지. /duplicates 의 near-dup 노출은 phase2.
"""
try:
async with async_session() as bg_session:
findings = await find_near_duplicates(bg_session, doc_id)
if findings:
top = findings[0]
logger.info(
"[dedup] near_dup_scan doc=%s candidates=%d top=%s(cosine=%s)",
doc_id, len(findings), top["doc_id"], top.get("cosine"),
)
except Exception:
logger.warning("[dedup] near_dup_scan failed doc=%s", doc_id, exc_info=True)
def _parse_byte_range(range_header: str | None, size: int) -> tuple[int | None, int | None]:
"""HTTP Range 헤더(`bytes=start-end`) 파싱 → (start, end) inclusive. 없거나 무효면 (None, None).
D-2 원격 백엔드 Range pass-through 용 (local 은 FileResponse 가 자동 처리). suffix 형식
(`bytes=-N`) 도 지원. 다중 range 는 첫 구간만.
"""
if not range_header or not range_header.startswith("bytes=") or size <= 0:
return None, None
spec = range_header[len("bytes="):].split(",")[0].strip()
if "-" not in spec:
return None, None
lo, hi = spec.split("-", 1)
try:
if lo == "": # suffix range: 마지막 N 바이트
n = int(hi)
if n <= 0:
return None, None
return max(0, size - n), size - 1
start = int(lo)
end = int(hi) if hi else size - 1
except ValueError:
return None, None
if start > end or start >= size:
return None, None
return start, min(end, size - 1)
# ─── 스키마 ───
@@ -167,10 +113,6 @@ class DocumentResponse(BaseModel):
# 회독 추적 (자료실 등) — 현재 사용자 기준. 다른 endpoint 응답에선 0/None.
read_count: int = 0
last_read_at: datetime | None = None
# S1-ADD (migration 287): 원본 파일명 + 중복검사. 앱은 옵셔널 디코딩, 없으면 폴백.
original_filename: str | None = None # 다운로드 라벨용. 없으면 file_path basename 폴백(앱 측).
duplicate_of: int | None = None # canonical doc id (자기 자신이 canonical 이면 None).
duplicate_count: int = 0 # 본인 제외 동일 판정 사본 수 (canonical 행 기준).
class Config:
from_attributes = True
@@ -198,16 +140,6 @@ class DocumentDetailResponse(DocumentResponse):
md_extraction_engine_version: str | None = None
md_generated_at: datetime | None = None
@field_validator("md_status", mode="before")
@classmethod
def _db_success_to_completed(cls, v: str | None) -> str | None:
"""DB CHECK enum 은 'success'; 계약/fixture·앱 MD-first 렌더 트리거는 'completed'.
read-time(DB→API) 단방향 매핑만 — write 경로(ORM)는 이 모델을 거치지 않아 미적용.
pending/processing/partial/failed/skipped 는 양쪽 동일하므로 'success' 만 매핑한다.
(불변식: md_status ∈ {success,partial} ⟹ md_content 非공백 = 워커 postcondition, C-5.)
"""
return "completed" if v == "success" else v
class AcceptSuggestionRequest(BaseModel):
"""§1 accept-suggestion 요청 body — stale payload / doc 수정 검출."""
@@ -260,11 +192,6 @@ async def get_document_tree(
FROM documents
WHERE ai_domain IS NOT NULL AND ai_domain != '' AND ai_domain != 'News'
AND deleted_at IS NULL
-- 문서함(list) 기본 제외와 동일하게 맞춤: 뉴스/법령 채널·메모는 문서함에 안 뜨므로
-- 트리 카운트도 제외해야 "트리 N건인데 클릭하면 0건" 불일치가 안 생긴다.
AND source_channel != 'news'
AND source_channel != 'law_monitor'
AND file_type != 'note'
GROUP BY ai_domain
ORDER BY ai_domain
""")
@@ -597,53 +524,6 @@ async def list_documents(
)
# ─── 중복검사 (dedup) — B-2 ───
# ★ 고정 path 라우트(/duplicates)는 동적 /{doc_id} 라우트보다 *위*에 등록해야 매칭 충돌이 없다.
class DuplicateGroup(BaseModel):
canonical_id: int
members: list[int]
reason: str
detail: str | None = None
class DuplicatesResponse(BaseModel):
groups: list[DuplicateGroup]
total_groups: int
total_duplicate_docs: int
@router.get("/duplicates", response_model=DuplicatesResponse)
async def list_duplicates(
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
):
"""content_hash(= file_hash exact) 중복 그룹 목록.
OFF-whitelist(law_monitor) 제외 + deleted 제외. idx_documents_hash 재사용(신규 인덱스/테이블 불요).
near_duplicate(유사도 기반) 그룹은 영속화 보류 → S1 은 exact 그룹만 노출(계약 shape 동일,
detail 문구만 'file_hash' 기준). 응답 shape = ds-app contract `documents_duplicates.json`.
"""
rows = (
await session.execute(DUPLICATE_GROUPS_SQL, {"off_channels": list(DEDUP_OFF_CHANNELS)})
).all()
groups = [
DuplicateGroup(
canonical_id=r.canonical_id,
members=list(r.members),
reason="content_hash",
detail="동일 file_hash (원본 바이트 SHA-256 일치)",
)
for r in rows
]
return DuplicatesResponse(
groups=groups,
total_groups=len(groups),
# 사본 수 = 그룹별 (멤버수-1) 합 (canonical 제외) — fixture total_duplicate_docs 정의와 동일.
total_duplicate_docs=sum(len(g.members) - 1 for g in groups),
)
@router.get("/{doc_id}", response_model=DocumentDetailResponse)
async def get_document(
doc_id: int,
@@ -802,7 +682,6 @@ async def get_document_file(
session: Annotated[AsyncSession, Depends(get_session)],
token: str | None = Query(None, description="Bearer token (iframe용)"),
download: bool = Query(False, description="true면 attachment (브라우저 다운로드)"),
range_header: str | None = Header(None, alias="Range"),
user: User | None = Depends(lambda: None),
):
"""문서 원본 파일 서빙 (Bearer 헤더 또는 ?token= 쿼리 파라미터)"""
@@ -825,10 +704,9 @@ async def get_document_file(
if not doc.file_path:
raise HTTPException(status_code=404, detail="파일이 없는 문서입니다 (메모)")
# D-2: 물리 경로 해석을 storage 백엔드로 단일화. local=FileResponse(Range 자동) /
# 원격=ABC.stream(range). /file URL·바디 shape 불변(non-breaking). 현재 활성 백엔드는
# LocalBackend only 라 동작 변경 0.
backend = get_storage_backend()
file_path = Path(settings.nas_mount_path) / doc.file_path
if not file_path.exists():
raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")
# 미디어 타입 매핑
# HTML5 <audio>/<video> 직접 재생을 위해 audio/video mime 포함. Starlette
@@ -849,7 +727,7 @@ async def get_document_file(
# 비디오 — direct play 호환 (§3 최소판)
".mp4": "video/mp4", ".webm": "video/webm",
}
suffix = Path(doc.file_path).suffix.lower()
suffix = file_path.suffix.lower()
media_type = media_types.get(suffix, "application/octet-stream")
# Content-Disposition: download=true면 attachment (한글 filename* 호환)
@@ -861,40 +739,10 @@ async def get_document_file(
else:
disposition = "inline"
# 로컬 백엔드: 기존과 동일하게 FileResponse (Range 자동 처리).
if backend.is_local:
local = backend.local_path(doc.file_path)
if local is None or not Path(local).exists():
raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")
return FileResponse(
path=str(local),
media_type=media_type,
headers={"Content-Disposition": disposition},
)
# 원격 백엔드: D-1 ABC 의 Range pass-through. 미프로비전 백엔드는 stat() 가
# StorageNotConfigured → 503 (silent fallback 금지). 현재 LocalBackend only 라 미도달.
try:
st = await backend.stat(doc.file_path)
except StorageNotConfigured as exc:
raise HTTPException(status_code=503, detail=str(exc))
if not st.exists:
raise HTTPException(status_code=404, detail="파일을 찾을 수 없습니다")
start, end = _parse_byte_range(range_header, st.size)
headers = {"Content-Disposition": disposition, "Accept-Ranges": "bytes"}
if start is None:
headers["Content-Length"] = str(st.size)
status_code = 200
else:
headers["Content-Range"] = f"bytes {start}-{end}/{st.size}"
headers["Content-Length"] = str(end - start + 1)
status_code = 206
return StreamingResponse(
backend.stream(doc.file_path, start=start, end=end),
status_code=status_code,
return FileResponse(
path=str(file_path),
media_type=media_type,
headers=headers,
headers={"Content-Disposition": disposition},
)
@@ -955,7 +803,6 @@ async def get_document_image_raw(
async def upload_document(
request: Request,
file: UploadFile,
background_tasks: BackgroundTasks,
user: Annotated[User, Depends(get_current_user)],
session: Annotated[AsyncSession, Depends(get_session)],
doc_purpose: str | None = Form(None, description="business | knowledge"),
@@ -1107,9 +954,6 @@ async def upload_document(
file_size=written,
file_type="immutable",
title=target.stem,
# B-1: 업로드 원본 파일명(다운로드 라벨용). file_path 는 충돌 시 _N 리네임되므로
# 원본명을 별도 보존. safe_name = Path(file.filename).name (경로 이탈 제거된 basename).
original_filename=safe_name,
source_channel="manual",
doc_purpose=doc_purpose,
user_tags=[library_tag] if library_tag else [],
@@ -1120,22 +964,6 @@ async def upload_document(
)
session.add(doc)
await session.flush()
# B-1: file_hash exact 중복 채움 (OFF-whitelist=law_monitor 제외). 거부(409) 아님 —
# 허용 + duplicate_of 링크 + canonical duplicate_count++ (법령 의도적 중복 보존 정책).
# 홈랩 저동시성이라 동시 동일-hash 업로드 TOCTOU 는 멱등/B-4 backfill 로 수습(락 불요).
canonical = await find_canonical_for_hash(session, fhash, exclude_id=doc.id)
if canonical is not None:
# 원래 canonical 이 soft-delete(deleted_at) 되어 former member 가 승격되면, 그 survivor 의
# stale duplicate_of 를 비워 'member 이자 counter' 모순을 막는다(B-4 불변식 유지). 문서는
# soft-delete only 라 FK ON DELETE SET NULL 이 발화하지 않아 잔여가 남기 때문(리뷰 발견).
# (삭제된 canonical 을 가리키는 다른 sibling 멤버의 잔여 포인터·overcount 는 야간
# dedup_reconcile 잡(B-4, 03:30 KST 멱등 절대 재계산)이 정리.)
if canonical.duplicate_of is not None:
canonical.duplicate_of = None
doc.duplicate_of = canonical.id
canonical.duplicate_count = (canonical.duplicate_count or 0) + 1
# document + processing_queue 는 단일 트랜잭션으로 묶어 원자적 정리
await enqueue_stage(session, doc.id, "extract")
await session.commit()
@@ -1145,9 +973,6 @@ async def upload_document(
target.unlink(missing_ok=True)
raise
# B-3: near_duplicate 스캔은 post-upload 비동기 — 201 응답을 막지 않는다(non-gating 기록).
background_tasks.add_task(_near_dup_scan_bg, doc.id)
return DocumentResponse.model_validate(doc)
-4
View File
@@ -48,7 +48,6 @@ async def lifespan(app: FastAPI):
from services.search.query_analyzer import prewarm_analyzer
from workers.briefing_worker import run as morning_briefing_run
from workers.daily_digest import run as daily_digest_run
from workers.dedup_reconcile import run as dedup_reconcile_run
from workers.digest_worker import run as global_digest_run
from workers.file_watcher import watch_inbox
from workers.law_monitor import run as law_monitor_run
@@ -121,9 +120,6 @@ async def lifespan(app: FastAPI):
# 이드 W3-2: 공부중 토픽 약점 derived 스냅샷 (nightly 04:30 KST, LLM 0). study_diagnosis 표면 source.
scheduler.add_job(study_weakness_run, CronTrigger(hour=4, minute=30, timezone=KST), id="study_weakness")
scheduler.add_job(news_collector_run, "interval", hours=6, id="news_collector")
# plan ds-s1-backend-1 B-4: dedup 컬럼(duplicate_of/duplicate_count) 야간 절대 재계산.
# soft-delete 잔여 드리프트 정리(멱등, 드리프트 없으면 no-op). cron 03:30 (다른 잡과 비충돌).
scheduler.add_job(dedup_reconcile_run, CronTrigger(hour=3, minute=30, timezone=KST), id="dedup_reconcile")
scheduler.start()
# Phase 2.1 (async 구조): QueryAnalyzer prewarm.
+1 -14
View File
@@ -3,7 +3,7 @@
from datetime import datetime
from pgvector.sqlalchemy import Vector
from sqlalchemy import BigInteger, Boolean, DateTime, Enum, ForeignKey, Integer, String, Text
from sqlalchemy import BigInteger, Boolean, DateTime, Enum, Integer, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
@@ -28,19 +28,6 @@ class Document(Base):
)
import_source: Mapped[str | None] = mapped_column(Text)
# 1계층: 원본명 + 중복검사 (S1-ADD, migration 287)
# original_filename = 업로드 원본 파일명(다운로드 라벨용). file_path 는 충돌 시 _N 리네임됨.
# cf. original_format(ODF 변환용) / original_path·original_hash(007 legacy dead) 와 의미 구분.
# duplicate_of = canonical doc id (자기 자신이 canonical 이면 NULL). FK ON DELETE SET NULL.
# duplicate_count = canonical 행에 담는 '본인 제외 동일 판정 사본 수' (group_size-1). 업로드/backfill 가 갱신.
original_filename: Mapped[str | None] = mapped_column(Text)
duplicate_of: Mapped[int | None] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="SET NULL")
)
duplicate_count: Mapped[int] = mapped_column(
Integer, nullable=False, default=0, server_default="0"
)
# 2계층: 텍스트 추출
extracted_text: Mapped[str | None] = mapped_column(Text)
extracted_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
-3
View File
@@ -21,6 +21,3 @@ pymupdf>=1.24.0
trafilatura>=1.12.0
readability-lxml>=0.8.1
markdownify>=0.13.1
# office OOXML(docx/xlsx/pptx) → md (plan ds-s1-backend-1 C-1). hwp 는 LibreOffice+markdownify 경로.
# 정확한 핀은 E-1 markitdown OOXML PoC(devsbx/버전핀 컨텍스트)에서 확정.
markitdown[docx,xlsx,pptx]>=0.1.0
-239
View File
@@ -1,239 +0,0 @@
"""중복검사(dedup) 공용 로직 — plan ds-s1-backend-1 B 그룹.
세 소비처가 공유:
- B-1 업로드 채움 (api/documents.upload_document) → find_canonical_for_hash
- B-2 GET /documents/duplicates → DEDUP_OFF_CHANNELS (그룹 SQL 은 라우터에)
- B-4 backfill (scripts/backfill_dedup.py) → DEDUP_OFF_CHANNELS / canonical = min(id)
- B-3 near_duplicate → find_near_duplicates
OFF-whitelist (DEDUP_OFF_CHANNELS):
law_monitor = 법령 개정본을 의도적으로 별 행으로 보존(개정일 추적). file_hash 가 같아도
collapse 하면 개정 이력이 사라지므로 dedup 비참여. (P0-2 실측: dup 18그룹/36행 중
law_monitor 17그룹 = 의도된 개정 보존, manual 1그룹 = 진짜 content dedup.)
file_hash 는 이미 채널별 키를 인코딩(note=본문SHA / devonagent=URL / news=article_id)하므로
채널별 키 분기는 두지 않고 단일 OFF-list 만 데이터로 둔다(P0-2 결정).
near_duplicate (B-3):
title trigram 후보 → 후보에만 doc-level embedding 코사인 rerank. 전수 28.9k 임베딩 스캔 회피.
저장된 embedding read-only(검색실험 Soft Lock: 재생성 금지). 임계·결과는 전부 non-gating 기록값
(trigram-first recall gap = 본문동일·제목상이 near-dup 은 놓침 → phase2 ivfflat 회수 대상).
영속화는 보류(on-the-fly) — S1 은 helper + 호출부 로깅까지. duplicate_of 영속화는 exact(file_hash)만.
"""
from __future__ import annotations
import logging
from sqlalchemy import bindparam, or_, select, text
from sqlalchemy.ext.asyncio import AsyncSession
logger = logging.getLogger(__name__)
# file_hash dedup 제외 채널 (단일 OFF-whitelist). B-1/B-2/B-4 공용.
DEDUP_OFF_CHANNELS: tuple[str, ...] = ("law_monitor",)
# near_duplicate 파라미터 — 전부 기록값·non-gating (phase2 ivfflat 가 recall gap 회수).
NEAR_DUP_TRGM_THRESHOLD = 0.30 # pg_trgm title 후보 컷 (느슨 — 후보 생성용)
NEAR_DUP_COSINE_THRESHOLD = 0.95 # 후보 embedding 코사인 near-dup 판정 컷 (≈0.95~0.97)
NEAR_DUP_MAX_CANDIDATES = 50 # trigram 후보 상한 — 전수 임베딩 스캔 회피
async def find_canonical_for_hash(
session: AsyncSession, file_hash: str, *, exclude_id: int | None = None
):
"""주어진 file_hash 의 canonical 문서(가장 오래된 = min id)를 반환. 없으면 None.
OFF-whitelist 채널(law_monitor)은 canonical 후보에서 제외 → 업로드가 법령 개정본에
링크되지 않는다. exclude_id = 방금 INSERT 한 신규 행 자신 제외(B-1).
"""
from models.document import Document # 지연 import (순환 회피)
stmt = (
select(Document)
.where(
Document.file_hash == file_hash,
Document.deleted_at.is_(None),
or_(
Document.source_channel.is_(None),
Document.source_channel.notin_(DEDUP_OFF_CHANNELS),
),
)
.order_by(Document.id.asc())
)
if exclude_id is not None:
stmt = stmt.where(Document.id != exclude_id)
return (await session.execute(stmt)).scalars().first()
# B-2 /documents/duplicates 의 file_hash 그룹 SQL. 라우터가 직접 execute (Pydantic 응답은 라우터에).
# reason='content_hash' = file_hash exact 그룹(idx_documents_hash 재사용, 신규 인덱스/테이블 불요).
# canonical_id = min(id), members = id 오름차순 배열, n = 그룹 크기.
DUPLICATE_GROUPS_SQL = text(
"""
SELECT file_hash,
min(id) AS canonical_id,
array_agg(id ORDER BY id) AS members,
count(*) AS n
FROM documents
WHERE deleted_at IS NULL
AND file_hash IS NOT NULL
AND (source_channel IS NULL OR source_channel NOT IN :off_channels)
GROUP BY file_hash
HAVING count(*) > 1
ORDER BY min(id)
"""
).bindparams(bindparam("off_channels", expanding=True))
async def reconcile_dedup(
session: AsyncSession, *, apply: bool = True, chunk_size: int = 500, sample_size: int = 40
) -> dict:
"""file_hash exact 그룹의 duplicate_of/duplicate_count 를 재계산해 정합화 (B-4 코어).
멱등 — 목표값과 다른 행만 UPDATE. 야간 잡(workers.dedup_reconcile)과 backfill 스크립트가
공유한다. 문서는 soft-delete only(FK ON DELETE SET NULL 미발화) → 비정규화 dedup 컬럼이
삭제 시 드리프트(멤버의 stale 포인터·canonical overcount)하므로 절대 재계산이 정합 보장.
반환 = {groups, docs, changes, applied, sample}. sample = 적용될/된 변경 미리보기(최대 sample_size).
canonical = 그룹 최古(min id): duplicate_of=NULL, duplicate_count=group_size-1. 멤버: duplicate_of=canonical, count=0.
"""
groups = (
await session.execute(
DUPLICATE_GROUPS_SQL, {"off_channels": list(DEDUP_OFF_CHANNELS)}
)
).all()
desired: dict[int, tuple[int | None, int]] = {}
for g in groups:
members = list(g.members)
canonical = g.canonical_id
desired[canonical] = (None, len(members) - 1)
for m in members:
if m != canonical:
desired[m] = (canonical, 0)
if not desired:
return {"groups": 0, "docs": 0, "changes": 0, "applied": 0, "sample": []}
ids = list(desired.keys())
current: dict[int, tuple[int | None, int]] = {}
for i in range(0, len(ids), 1000):
batch = ids[i : i + 1000]
rows = (
await session.execute(
text(
"SELECT id, duplicate_of, duplicate_count "
"FROM documents WHERE id = ANY(:ids)"
).bindparams(ids=batch)
)
).all()
for r in rows:
current[r.id] = (r.duplicate_of, int(r.duplicate_count or 0))
changes = [
(i, dof, dcnt)
for i, (dof, dcnt) in desired.items()
if current.get(i) != (dof, dcnt)
]
sample = [
{"id": i, "duplicate_of": dof, "duplicate_count": dcnt}
for (i, dof, dcnt) in changes[:sample_size]
]
applied = 0
if apply and changes:
for i in range(0, len(changes), chunk_size):
for did, dof, dcnt in changes[i : i + chunk_size]:
await session.execute(
text(
"UPDATE documents SET duplicate_of = :dof, duplicate_count = :dcnt "
"WHERE id = :id"
).bindparams(dof=dof, dcnt=dcnt, id=did)
)
await session.commit()
applied += len(changes[i : i + chunk_size])
return {
"groups": len(groups),
"docs": len(ids),
"changes": len(changes),
"applied": applied,
"sample": sample,
}
async def find_near_duplicates(
session: AsyncSession,
doc_id: int,
*,
cosine_threshold: float = NEAR_DUP_COSINE_THRESHOLD,
trgm_threshold: float = NEAR_DUP_TRGM_THRESHOLD,
max_candidates: int = NEAR_DUP_MAX_CANDIDATES,
) -> list[dict]:
"""anchor doc 의 near-duplicate 후보를 trigram→embedding 2단계로 찾는다(read-only).
반환 = [{doc_id, title, title_sim?, cosine}] (cosine 내림차순). embedding 미생성 시
(업로드 직후 흔함) trigram 후보만 cosine=None 으로 반환(non-gating 기록). 어떤 행도
수정/삭제하지 않으며 저장된 embedding 만 읽는다(Soft Lock 준수).
"""
anchor = (
await session.execute(
text(
"SELECT id, title, (embedding IS NOT NULL) AS has_emb "
"FROM documents WHERE id = :id AND deleted_at IS NULL"
).bindparams(id=doc_id)
)
).first()
if anchor is None or not anchor.title:
return []
# (1) title trigram 후보. similarity() 컷으로 후보를 max_candidates 로 줄여 전수 임베딩
# 스캔을 회피한다. (index-accelerated `%` 연산자 경로는 후보 생성이 병목이 될 때의
# phase2 최적화 — 짧은 title 28.9k seq 평가는 비동기 post-upload 에서 충분히 저렴.)
cand_rows = (
await session.execute(
text(
"""
SELECT id, title, similarity(title, :t) AS title_sim
FROM documents
WHERE id <> :id
AND deleted_at IS NULL
AND title IS NOT NULL
AND similarity(title, :t) >= :trgm
ORDER BY similarity(title, :t) DESC
LIMIT :lim
"""
).bindparams(id=doc_id, t=anchor.title, trgm=trgm_threshold, lim=max_candidates)
)
).all()
if not cand_rows:
return []
if not anchor.has_emb:
# 임베딩 미생성 — 후보만 기록(cosine rerank 는 embed stage 완료 후). non-gating.
return [
{"doc_id": r.id, "title": r.title, "title_sim": float(r.title_sim), "cosine": None}
for r in cand_rows
]
# (2) 후보에만 doc-level embedding 코사인 rerank. 저장값 read-only.
cand_ids = [r.id for r in cand_rows]
rer = (
await session.execute(
text(
"""
SELECT c.id, c.title,
(1 - (c.embedding <=> (SELECT embedding FROM documents WHERE id = :id))) AS cosine
FROM documents c
WHERE c.id = ANY(:ids) AND c.embedding IS NOT NULL
"""
).bindparams(id=doc_id, ids=cand_ids)
)
).all()
out = [
{"doc_id": r.id, "title": r.title, "cosine": float(r.cosine)}
for r in rer
if r.cosine is not None and float(r.cosine) >= cosine_threshold
]
out.sort(key=lambda x: x["cosine"], reverse=True)
return out
-39
View File
@@ -1,39 +0,0 @@
"""스토리지 계층 추상화 패키지 (plan ds-s1-backend-1 D 그룹, scaffold-first).
활성 백엔드 선택 = get_storage_backend():
- env DS_STORAGE_BACKEND (기본 'local') 로 결정 — config.yaml storage 섹션 편집 없이도
동작(검색실험 Soft Lock 동안 config 불가침). 실 활성(외부 백엔드)은 D-3.
- 'local' → LocalBackend(settings.nas_mount_path) : 현행 NAS NFS, /file 동작 불변.
- 'nas_api'/'nas'→ NasApiBackend(env DS_NAS_API_BASE_URL) : 미프로비전 시 503(silent fallback X).
"""
from __future__ import annotations
import os
from functools import lru_cache
from core.config import settings
from .base import StatResult, StorageBackend, StorageNotConfigured
from .local import LocalBackend
from .nas_api import NasApiBackend
__all__ = [
"StorageBackend",
"StorageNotConfigured",
"StatResult",
"LocalBackend",
"NasApiBackend",
"get_storage_backend",
]
@lru_cache(maxsize=1)
def get_storage_backend() -> StorageBackend:
"""활성 스토리지 백엔드 1개 반환 (프로세스 단위 캐시)."""
backend = os.getenv("DS_STORAGE_BACKEND", "local").lower()
if backend == "local":
return LocalBackend(settings.nas_mount_path)
if backend in ("nas_api", "nas"):
return NasApiBackend(os.getenv("DS_NAS_API_BASE_URL"))
raise StorageNotConfigured(f"unknown DS_STORAGE_BACKEND={backend!r}")
-50
View File
@@ -1,50 +0,0 @@
"""스토리지 백엔드 추상 인터페이스 — plan ds-s1-backend-1 D-1.
ABC 는 첫날부터 Range(offset/length) stream 계약을 포함한다 — D-2 의 원격 streaming
Range pass-through 가 afterthought 가 아니라 인터페이스 의무가 되도록.
is_local=True 백엔드는 로컬 파일시스템 경로를 노출 → 호출부가 Starlette FileResponse
(Range 자동 처리)를 그대로 쓴다. 원격 백엔드는 stream()/stat() 로 Range 를 구현한다.
"""
from __future__ import annotations
import os
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator
from dataclasses import dataclass
class StorageNotConfigured(RuntimeError):
"""활성화되지 않은(미프로비전) 백엔드 호출 — 503 으로 표면화. silent fallback 금지."""
@dataclass
class StatResult:
exists: bool
size: int
class StorageBackend(ABC):
"""원본 파일 접근 추상 인터페이스."""
# 로컬 파일시스템 경로를 노출하는가 (FileResponse 직결 가능 여부).
is_local: bool = False
@abstractmethod
def local_path(self, rel_path: str) -> os.PathLike[str] | None:
"""is_local=True 면 물리 경로 반환(FileResponse 용). 원격 백엔드는 None."""
@abstractmethod
async def stat(self, rel_path: str) -> StatResult:
"""크기/존재 여부. 미구성 백엔드는 StorageNotConfigured raise."""
@abstractmethod
def stream(
self, rel_path: str, *, start: int | None = None, end: int | None = None
) -> AsyncIterator[bytes]:
"""[start, end] 바이트 범위(inclusive)를 async 청크로 yield (Range pass-through).
start/end 가 None 이면 전체. 미구성 백엔드는 StorageNotConfigured raise.
"""
raise NotImplementedError
-50
View File
@@ -1,50 +0,0 @@
"""LocalBackend — 현행 NAS NFS(volume4) 마운트. /file 동작 불변 (plan D-1)."""
from __future__ import annotations
import os
from collections.abc import AsyncIterator
from pathlib import Path
from .base import StatResult, StorageBackend
_STREAM_CHUNK = 256 * 1024
class LocalBackend(StorageBackend):
"""루트(=settings.nas_mount_path) 하위 상대경로를 로컬 파일시스템으로 해석."""
is_local = True
def __init__(self, root: str) -> None:
self._root = Path(root)
def local_path(self, rel_path: str) -> os.PathLike[str]:
return self._root / rel_path
async def stat(self, rel_path: str) -> StatResult:
p = self._root / rel_path
if not p.exists():
return StatResult(exists=False, size=0)
return StatResult(exists=True, size=p.stat().st_size)
async def stream(
self, rel_path: str, *, start: int | None = None, end: int | None = None
) -> AsyncIterator[bytes]:
"""로컬 파일을 청크 stream (Range 지원). /file 의 로컬 경로는 FileResponse 가
Range 를 자동 처리하므로 이 메서드는 인터페이스 대칭/원격 동등성을 위한 구현."""
p = self._root / rel_path
with p.open("rb") as f:
if start:
f.seek(start)
remaining = None if end is None else (end - (start or 0) + 1)
while True:
to_read = _STREAM_CHUNK if remaining is None else min(_STREAM_CHUNK, remaining)
if to_read <= 0:
break
data = f.read(to_read)
if not data:
break
yield data
if remaining is not None:
remaining -= len(data)
-33
View File
@@ -1,33 +0,0 @@
"""NasApiBackend — 외부 스토리지(맥미니4TB / NAS Docker API) stub (plan D-1).
★ 미프로비전 = 503. silent fallback 금지(다른 백엔드로 자동 우회 X). 실 프로비전 후
D-3 에서 활성화. infra_inventory.md 갱신(Update Rule) 이 선행이다.
"""
from __future__ import annotations
import os
from collections.abc import AsyncIterator
from .base import StatResult, StorageBackend, StorageNotConfigured
_MSG = "NasApiBackend 미구성 — 외부 스토리지 프로비전 후 활성(D-3). silent fallback 없음."
class NasApiBackend(StorageBackend):
is_local = False
def __init__(self, base_url: str | None = None) -> None:
self._base_url = base_url
def local_path(self, rel_path: str) -> os.PathLike[str] | None:
return None
async def stat(self, rel_path: str) -> StatResult:
raise StorageNotConfigured(_MSG)
async def stream(
self, rel_path: str, *, start: int | None = None, end: int | None = None
) -> AsyncIterator[bytes]:
raise StorageNotConfigured(_MSG)
yield b"" # 도달 불가 — async generator 형태 유지용(호출부 `async for` 계약 일치).
-32
View File
@@ -1,32 +0,0 @@
"""야간 dedup 컬럼 재계산 잡 (plan ds-s1-backend-1 B-4 '야간 배치').
duplicate_of / duplicate_count 는 비정규화 캐시다. 문서는 soft-delete only(deleted_at)라
FK ON DELETE SET NULL 이 발화하지 않아, canonical/멤버를 soft-delete 하면 잔여 드리프트가
생긴다(멤버의 stale 포인터·canonical overcount). B-1 업로드 채움은 신규 행만 다루므로,
이 야간 절대 재계산이 전체 정합을 보장한다. 멱등 — 드리프트 없으면 no-op(로그만).
응답 계약(DocumentResponse.duplicate_count/duplicate_of)을 앱(S3)이 읽으므로 정합이 중요.
"""
import logging
from core.database import async_session
from services.dedup import reconcile_dedup
logger = logging.getLogger("dedup_reconcile")
async def run() -> None:
try:
async with async_session() as session:
r = await reconcile_dedup(session, apply=True)
if r["changes"]:
logger.info(
"[dedup_reconcile] groups=%s docs=%s changes=%s applied=%s",
r["groups"], r["docs"], r["changes"], r["applied"],
)
else:
logger.info(
"[dedup_reconcile] no drift (groups=%s docs=%s)", r["groups"], r["docs"]
)
except Exception:
logger.exception("[dedup_reconcile] failed")
+95
View File
@@ -1,9 +1,11 @@
"""텍스트 추출 워커 — kordoc / PyMuPDF / Surya OCR / LibreOffice / 직접 읽기 / 웹 HTML"""
import email
import hashlib
import re
import subprocess
from datetime import datetime, timezone
from email.header import decode_header
from pathlib import Path
import httpx
@@ -23,6 +25,8 @@ TEXT_FORMATS = {"md", "txt", "csv", "json", "xml", "html"}
OFFICE_FORMATS = {"xlsx", "xls", "docx", "doc", "pptx", "ppt", "odt", "ods", "odp", "odoc", "osheet"}
# OCR 대상 이미지 포맷
IMAGE_FORMATS = {"jpg", "jpeg", "png", "tiff", "tif", "bmp", "gif", "webp"}
# 이메일 (선별 PKM 폴더 수집 → 헤더+본문 추출)
EML_FORMATS = {"eml"}
EXTRACTOR_VERSION = "kordoc@1.7"
PYMUPDF_VERSION = "pymupdf"
@@ -233,6 +237,90 @@ async def _extract_web_html(doc: Document, html_path: Path) -> None:
)
# ─── 이메일(.eml) 추출 ───
def _decode_eml_header(raw: str) -> str:
"""MIME 인코딩 헤더 디코딩."""
if not raw:
return ""
out = []
for data, charset in decode_header(raw):
if isinstance(data, bytes):
out.append(data.decode(charset or "utf-8", errors="replace"))
else:
out.append(data)
return "".join(out)
async def _extract_eml(doc: Document, eml_path: Path) -> None:
"""이메일(.eml) 본문 추출 — From/To/Date/Subject 헤더 블록 + 본문.
본문은 text/plain 우선, 없으면 text/html → bs4 평문(_extract_web_with_bs4 재사용).
헤더를 본문 머리에 prepend 해 검색·요약이 발신자/제목 맥락을 갖게 함.
첨부는 extract_meta['email_attachments'] 에 인벤토리만 (본문 추출은 후속 — scaffold).
"""
raw = eml_path.read_bytes()
msg = email.message_from_bytes(raw)
hdr_lines = []
for label in ("From", "To", "Date", "Subject"):
val = _decode_eml_header(msg.get(label, ""))
if val:
hdr_lines.append(f"{label}: {val}")
body = ""
html_body = ""
attachments = []
if msg.is_multipart():
for part in msg.walk():
ctype = part.get_content_type()
disp = (part.get("Content-Disposition") or "").lower()
if "attachment" in disp:
payload = part.get_payload(decode=True)
attachments.append({
"filename": _decode_eml_header(part.get_filename() or ""),
"content_type": ctype,
"size": len(payload) if payload else 0,
})
continue
if ctype == "text/plain" and not body:
payload = part.get_payload(decode=True)
if payload is not None:
body = payload.decode(part.get_content_charset() or "utf-8", errors="replace")
elif ctype == "text/html" and not html_body:
payload = part.get_payload(decode=True)
if payload is not None:
html_body = payload.decode(part.get_content_charset() or "utf-8", errors="replace")
else:
payload = msg.get_payload(decode=True)
if payload is not None:
decoded = payload.decode(msg.get_content_charset() or "utf-8", errors="replace")
if msg.get_content_type() == "text/html":
html_body = decoded
else:
body = decoded
# text/plain 없으면 html → bs4 평문 (devonagent 최종 fallback 재사용, 신규 의존성 0)
if not body and html_body:
body, _ = _extract_web_with_bs4(html_body)
if attachments:
names = ", ".join(a["filename"] for a in attachments if a["filename"])
hdr_lines.append(f"Attachments: {len(attachments)}" + (f" ({names})" if names else ""))
header_block = "\n".join(hdr_lines)
full_text = (header_block + "\n\n" + (body or "")).replace("\x00", "").strip()
doc.extracted_text = full_text
doc.extracted_at = datetime.now(timezone.utc)
doc.extractor_version = "eml@stdlib"
if attachments:
meta = dict(doc.extract_meta or {})
meta["email_attachments"] = attachments
doc.extract_meta = meta
logger.info(f"[eml] {doc.file_path} ({len(full_text)}자, 첨부 {len(attachments)})")
# ─── 메인 처리 ───
async def process(document_id: int, session: AsyncSession) -> None:
@@ -257,6 +345,13 @@ async def process(document_id: int, session: AsyncSession) -> None:
await _extract_web_html(doc, full_path)
return
# ─── 이메일 (.eml) — 헤더+본문 추출 (선별 PKM 폴더 수집) ───
if fmt in EML_FORMATS:
if not full_path.exists():
raise FileNotFoundError(f"파일 없음: {full_path}")
await _extract_eml(doc, full_path)
return
# ─── 텍스트 파일 — 직접 읽기 ───
if fmt in TEXT_FORMATS:
if not full_path.exists():
+22 -18
View File
@@ -55,13 +55,20 @@ def _detect_origin(subject: str, body: str) -> str:
return "external"
def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid: int | None):
"""동기 IMAP 메일 가져오기 (asyncio.to_thread에서 실행)"""
def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid: int | None, folder: str):
"""동기 IMAP 메일 가져오기 (asyncio.to_thread에서 실행).
선별 폴더(MAILPLUS_FOLDER, 기본 'PKM')만 수집 — INBOX 전체 X.
폴더 부재 시 no-op (사용자가 MailPlus 규칙으로 폴더 생성 전까진 안전하게 0건).
"""
results = []
conn = imaplib.IMAP4_SSL(host, port, timeout=30)
try:
conn.login(user, password)
conn.select("INBOX")
typ, _ = conn.select(folder)
if typ != "OK":
logger.info(f"[메일] 폴더 '{folder}' 없음/접근불가 — 수집 건너뜀 (no-op)")
return results
if last_uid:
# 증분 동기화: last_uid 이후
@@ -71,14 +78,13 @@ def _fetch_emails_sync(host: str, port: int, user: str, password: str, last_uid:
since = (datetime.now() - timedelta(days=7)).strftime("%d-%b-%Y")
_, data = conn.uid("search", None, f"SINCE {since}")
uids = data[0].split()
uids = (data[0] or b"").split()
for uid_bytes in uids:
uid = int(uid_bytes)
_, msg_data = conn.uid("fetch", uid_bytes, "(RFC822)")
if msg_data[0] is None:
continue
raw = msg_data[0][1]
results.append((uid, raw))
results.append((uid, msg_data[0][1]))
finally:
conn.logout()
@@ -91,15 +97,18 @@ async def run():
port = int(os.getenv("MAILPLUS_PORT", "993"))
user = os.getenv("MAILPLUS_USER", "")
password = os.getenv("MAILPLUS_PASS", "")
folder = os.getenv("MAILPLUS_FOLDER", "PKM")
if not all([host, user, password]):
logger.warning("MailPlus 인증 정보 미설정")
return
job_name = f"mailplus:{folder}"
async with async_session() as session:
# 마지막 UID 조회
# 마지막 UID 조회 (UID 는 폴더별 네임스페이스 → job_name 에 폴더 포함)
state = await session.execute(
select(AutomationState).where(AutomationState.job_name == "mailplus")
select(AutomationState).where(AutomationState.job_name == job_name)
)
state_row = state.scalar_one_or_none()
last_uid = int(state_row.last_check_value) if state_row and state_row.last_check_value else None
@@ -107,7 +116,7 @@ async def run():
# IMAP 동기 호출을 비동기로 래핑
try:
emails = await asyncio.to_thread(
_fetch_emails_sync, host, port, user, password, last_uid,
_fetch_emails_sync, host, port, user, password, last_uid, folder,
)
except Exception as e:
logger.error(f"IMAP 연결 실패: {e}")
@@ -174,15 +183,10 @@ async def run():
session.add(doc)
await session.flush()
safe_subj = subject.replace("\n", " ").replace("\r", " ")[:200]
# 검색·색인 편입 (extract → classify → embed/chunk). 할일 연계 없음.
await enqueue_stage(session, doc.id, "extract")
# TODO: extract_worker가 eml 본문/첨부 파싱 지원 시 이 조건 제거
if doc.file_format != "eml":
await enqueue_stage(session, doc.id, "extract")
else:
logger.debug(f"[메일] {safe_subj} — eml extract 미지원, 큐 스킵")
archived.append(safe_subj)
archived.append(subject.replace("\n", " ").replace("\r", " ")[:200])
max_uid = max(max_uid, uid)
except Exception as e:
@@ -194,7 +198,7 @@ async def run():
state_row.last_run_at = datetime.now(timezone.utc)
else:
session.add(AutomationState(
job_name="mailplus",
job_name=job_name,
last_check_value=str(max_uid),
last_run_at=datetime.now(timezone.utc),
))
+6 -72
View File
@@ -17,7 +17,6 @@ md_content ref 형식: `![alt](docimg:img_001)` — image_key 가 sequence 기
plan: ~/.claude/plans/piped-humming-crystal.md
"""
import asyncio
import base64
import hashlib
import json
@@ -69,13 +68,9 @@ _FORMAT_TO_MIME = {
"gif": "image/gif",
}
# Phase 1B = PDF only (marker-service). office/hwp 는 C-2 에서 office_md 하이브리드로 분기.
# Phase 1B = PDF only. DOCX 등은 후속 Phase.
SUPPORTED_EXTENSIONS = {".pdf"}
# C-2: office/hwp → md (OOXML=markitdown / hwp=LibreOffice). 변환기가 지원하는 suffix 집합.
# 레거시 바이너리(.doc/.xls/.ppt)는 markitdown 미지원 → 여기 없음(=PDF-only 게이트에서 skip).
from workers.office_md import SUPPORTED as OFFICE_MD_SUPPORTED # noqa: E402
# config.yaml document_types 의 한국어 label 직접 사용 (Pre-flight 결과).
# Round 0 사용자 의도 = 표 중심 발주/계산/명세 도메인.
SKIP_DOC_TYPES = {
@@ -182,18 +177,9 @@ async def process(document_id: int, session: AsyncSession) -> None:
return
container_path = _to_marker_path(doc.file_path)
# ---- (3) PDF only ----
suffix = Path(container_path).suffix.lower()
# ---- (3) office/hwp → md (C-2): PDF 외 지원 포맷은 office_md 하이브리드 변환 ----
if suffix in OFFICE_MD_SUPPORTED:
await session.execute(
update(Document).where(Document.id == document_id).values(md_status="processing")
)
await session.commit()
await _process_office(doc, document_id, container_path, session)
return
# ---- (3.5) PDF only (그 외 확장자 = skip) ----
if suffix not in SUPPORTED_EXTENSIONS:
logger.info(f"markdown_skip_unsupported_extension id={document_id} ext={suffix}")
await _set_skipped(
@@ -382,56 +368,6 @@ async def _process_markdown_passthrough(
)
async def _process_office(
doc: Document, document_id: int, container_path: str, session: AsyncSession
) -> None:
"""office/hwp → md (C-2). C-5 상태머신 postcondition 의 office arm.
office_md.convert_office_to_md 는 이진 계약: 성공=비공백 md 반환 / 실패·빈출력·타임아웃·
의존성부재=OfficeMdError raise. 따라서:
- 성공 → md_status='success' (+ 비공백 md). 불변식 md_status ∈ {success,partial} ⟹ md 非공백 유지.
- 실패/예외 → _fail (md_status='failed', ¬success·¬skipped). silent 'success+빈md' 절대 없음.
partial arm 은 PDF split 전용 — office 는 이진이라 여기 없음. 'completed' 는 A-3 직렬화 전용(워커 미사용).
quality 는 content-type-aware: office=scored(_compute_quality). 동기 변환은 to_thread 로 event loop 비차단.
"""
from workers.office_md import OfficeMdError, convert_office_to_md
is_hwp = Path(container_path).suffix.lower() in (".hwp", ".hwpx")
engine = "libreoffice_hwp" if is_hwp else "markitdown"
try:
# 동기 subprocess(LibreOffice)/markitdown — 스레드로 빼서 이벤트 루프 비차단.
md_content = await asyncio.to_thread(convert_office_to_md, container_path)
except OfficeMdError as exc:
logger.warning(f"[marker] office md 변환 실패 id={document_id} engine={engine}: {exc}")
await _fail(session, document_id, f"office_md: {str(exc)[:990]}", engine=engine)
return
except Exception as exc: # 예기치 못한 예외도 failed (success+빈md 절대 금지)
logger.exception(f"[marker] office md unexpected error id={document_id}: {exc}")
await _fail(session, document_id, f"office_md_unexpected: {str(exc)[:980]}", engine=engine)
return
# 성공 — 계약상 md_content 는 비공백(빈출력은 raise). quality scored.
quality = _compute_quality(md_content, doc.extracted_text or "", {"page_count": None})
await session.execute(
update(Document).where(Document.id == document_id).values(
md_content=md_content,
md_status="success",
md_extraction_engine=engine,
md_extraction_engine_version=None,
md_extraction_quality=quality,
md_content_hash=hashlib.sha256(md_content.encode("utf-8")).hexdigest(),
md_source_hash=doc.file_hash,
md_generated_at=_now(),
md_extraction_error=None,
md_frontmatter=doc.md_frontmatter or {},
md_format_version="1.0",
content_origin="extracted",
)
)
await session.commit()
logger.info(f"[marker] office success id={document_id} engine={engine} len={len(md_content)}")
async def _process_split(
doc: Document,
document_id: int,
@@ -843,17 +779,15 @@ async def _set_skipped(session: AsyncSession, document_id: int, reason: str) ->
await session.commit()
async def _fail(
session: AsyncSession, document_id: int, error: str, *, engine: str = "marker"
) -> None:
"""doc-level failed (재시도 무의미). engine = 실패한 변환 엔진(office=markitdown/libreoffice_hwp)."""
async def _fail(session: AsyncSession, document_id: int, error: str) -> None:
"""doc-level failed (재시도 무의미)."""
await session.execute(
update(Document).where(Document.id == document_id).values(
md_status="failed",
md_content=None,
md_content_hash=None,
md_extraction_error=error,
md_extraction_engine=engine,
md_extraction_engine="marker",
md_generated_at=_now(),
content_origin="extracted",
)
-136
View File
@@ -1,136 +0,0 @@
"""office/hwp → Markdown 하이브리드 변환기 (plan ds-s1-backend-1, C-1 PoC).
★ PoC 상태 — marker_worker 에 아직 연결하지 않음(그건 C-2). 본 모듈은 변환 *계약*과
PoC 하니스(scripts/poc_office_md.py)가 호출하는 순수 함수만 제공한다.
전략 (하이브리드):
- OOXML(.docx/.xlsx/.pptx) → markitdown ← 신규 의존성(pip install markitdown). lazy import.
- .hwp/.hwpx → LibreOffice(headless) → HTML → markdownify ← markdownify 기존 의존성.
(LibreOffice 가 hwp import 필터 보유. .hwpx 는 .hwp 와 다른 필터·버전 의존 → E-1: prod LibreOffice
버전핀 안전컨텍스트에서 PoC 실행. 표 fidelity 가 진짜 리스크 — 하니스가 측정.)
실패 계약 (C-5 postcondition 의 backend 절반):
변환 실패·빈 출력·타임아웃·의존성 부재 → OfficeMdError 를 raise 한다.
**success + 빈 md 를 절대 반환하지 않는다** — 호출부(C-2 marker_worker)가 이를 잡아
md_status='failed'(¬success·¬skipped) 로 라우팅한다. 불변식: md_status ∈ {success,partial} ⟹ md_content 非공백.
"""
from __future__ import annotations
import os
import shutil
import subprocess
import tempfile
from pathlib import Path
OOXML_FORMATS = {".docx", ".xlsx", ".pptx"}
HWP_FORMATS = {".hwp", ".hwpx"}
SUPPORTED = OOXML_FORMATS | HWP_FORMATS
# 빈 출력 판정 임계 — 공백 제거 후 이 미만이면 '실패(빈 변환)'로 본다.
_MIN_BODY_CHARS = 16
# extract_worker.py 가 이미 `libreoffice` 바이너리로 office 텍스트 추출에 성공(컨테이너 검증된
# 이름) → 기본값 정합. soffice 만 있는 환경은 LIBREOFFICE_BIN 으로 override.
_SOFFICE_BIN = os.environ.get("LIBREOFFICE_BIN", "libreoffice")
class OfficeMdError(Exception):
"""office/hwp → md 변환 실패 신호. 호출부는 md_status='failed' 로 라우팅."""
def convert_office_to_md(path: str | Path, *, timeout: int = 90) -> str:
"""office/hwp 파일을 Markdown 문자열로 변환. 실패/빈출력 시 OfficeMdError raise."""
p = Path(path)
suffix = p.suffix.lower()
if suffix not in SUPPORTED:
raise OfficeMdError(f"unsupported suffix for office_md: {suffix!r}")
if not p.exists():
raise OfficeMdError(f"file not found: {p}")
if suffix in OOXML_FORMATS:
md = _via_markitdown(p)
else: # .hwp / .hwpx
md = _via_libreoffice_html(p, timeout=timeout)
md = (md or "").strip()
if len(md) < _MIN_BODY_CHARS:
raise OfficeMdError(f"empty/too-short conversion ({len(md)} chars) for {p.name}")
return md
def _via_markitdown(path: Path) -> str:
try:
from markitdown import MarkItDown # lazy — 신규 의존성
except ImportError as e: # noqa: BLE001
raise OfficeMdError(
"markitdown 미설치 (OOXML 변환에 필요) — `pip install markitdown`. "
"C-1 PoC 는 prod worker 이미지/버전핀 컨텍스트에서 실행(E-1)."
) from e
try:
result = MarkItDown().convert(str(path))
except Exception as e: # noqa: BLE001 — 어떤 변환 예외든 failed 로 라우팅
raise OfficeMdError(f"markitdown 변환 실패: {path.name}: {e}") from e
return getattr(result, "text_content", "") or ""
def _via_libreoffice_html(path: Path, *, timeout: int) -> str:
"""LibreOffice headless 로 HTML 변환 후 markdownify. hwp/hwpx 용."""
try:
from markdownify import markdownify # 기존 의존성
except ImportError as e: # noqa: BLE001
raise OfficeMdError("markdownify 미설치(기존 의존성이어야 함)") from e
with tempfile.TemporaryDirectory(prefix="office_md_") as tmp:
tmpdir = Path(tmp)
# soffice 동시 실행 시 user profile 락 충돌 회피 — 호출별 격리 프로필.
profile = tmpdir / "lo_profile"
cmd = [
_SOFFICE_BIN,
"--headless",
"--nologo",
"--nofirststartwizard",
f"-env:UserInstallation=file://{profile}",
"--convert-to",
"html",
"--outdir",
str(tmpdir),
str(path),
]
try:
proc = subprocess.run(
cmd, capture_output=True, text=True, timeout=timeout, check=False
)
except FileNotFoundError as e:
raise OfficeMdError(
f"LibreOffice 바이너리 부재({_SOFFICE_BIN}) — LIBREOFFICE_BIN 설정 또는 설치 필요"
) from e
except subprocess.TimeoutExpired as e:
raise OfficeMdError(f"LibreOffice 변환 타임아웃({timeout}s): {path.name}") from e
html_path = tmpdir / f"{path.stem}.html"
if proc.returncode != 0 or not html_path.exists():
raise OfficeMdError(
f"LibreOffice html 변환 실패: {path.name} (rc={proc.returncode}): "
f"{(proc.stderr or proc.stdout or '').strip()[:300]}"
)
html = html_path.read_text(encoding="utf-8", errors="replace")
# 표 보존 위해 markdownify 가 table 을 GFM 으로 — heading_style ATX.
return markdownify(html, heading_style="ATX", strip=["span", "font"])
def table_fidelity(md: str) -> dict:
"""E-1 표 fidelity 의 crude 지표 — GFM 표 행/구분행 카운트 (정밀 평가 아님, 회귀 신호)."""
lines = md.splitlines()
pipe_rows = sum(1 for ln in lines if ln.strip().startswith("|") and ln.strip().endswith("|"))
sep_rows = sum(
1 for ln in lines
if ln.strip().startswith("|") and set(ln.strip()) <= set("|-: ")
)
return {
"chars": len(md),
"lines": len(lines),
"table_pipe_rows": pipe_rows,
"table_separator_rows": sep_rows, # 표 개수의 근사
"has_heading": any(ln.lstrip().startswith("#") for ln in lines),
}
+4 -20
View File
@@ -1,32 +1,16 @@
import SwiftUI
import AppFeature
/// Thin @main entry: window + DI only. = (GPU DS) (AppModel.live
/// LiveDSClient + AIFabric , base publicTLS = https://document.hyungi.net/api).
/// env : DSAPP_FIXTURE=1 (Fixture+Mock) / DSAPP_DS_URL base
/// (: http://100.110.63.63:8000/api). Feature logic lives in AppFeature, keeping the seam to a
/// future iPhone/Watch target trivial.
/// Thin @main entry: window + DI only. Injects AppModel (FixtureDSClient + AIRouter(MockAIProvider))
/// so the whole pipeline renders with zero real backend / zero real LLM. Feature logic lives in
/// AppFeature, keeping the seam to a future Xcode/iPhone target trivial.
@main
struct DSApp: App {
@State private var model: AppModel
@MainActor
init() {
let env = ProcessInfo.processInfo.environment
let initial: AppModel
if env["DSAPP_FIXTURE"] == "1" {
initial = .preview
} else if let raw = env["DSAPP_DS_URL"] {
// dev prod(publicTLS) silent fallback , .
let trimmed = raw.hasSuffix("/") ? String(raw.dropLast()) : raw
guard let url = URL(string: trimmed), url.scheme != nil, url.host() != nil else {
fatalError("DSAPP_DS_URL 파싱 실패: \(raw)")
}
initial = .live(base: .custom(url))
} else {
initial = .live()
}
_model = State(initialValue: initial)
_model = State(initialValue: AppModel.preview)
}
var body: some Scene {
-5
View File
@@ -47,11 +47,6 @@ let package = Package(
dependencies: ["DSKit", "AIFabric"],
swiftSettings: [.swiftLanguageMode(.v6)]
),
.testTarget(
name: "AppFeatureTests",
dependencies: ["AppFeature", "DSKit"],
swiftSettings: [.swiftLanguageMode(.v6)]
),
.testTarget(
name: "AITests",
dependencies: ["AIFabric"],
@@ -11,14 +11,15 @@ struct DashboardView: View {
if let s = model.stats {
LazyVGrid(columns: [GridItem(.adaptive(minimum: 150), spacing: 12)], spacing: 12) {
StatCard(title: "전체", value: s.total, color: Sage.brand)
StatCard(title: "문서", value: s.counts["document"] ?? 0, color: Sage.brand)
StatCard(title: "승인 대기", value: s.libraryPendingSuggestions, color: Sage.amber)
StatCard(title: "문서", value: s.documents, color: Sage.brand)
StatCard(title: "검토 대기", value: s.reviewPending, color: Sage.amber)
StatCard(title: "파이프라인 실패", value: s.pipelineFailed, color: Sage.danger)
}
VStack(alignment: .leading, spacing: 10) {
Text("카테고리 분포").font(.headline).foregroundStyle(Sage.ink)
ForEach(s.counts.sorted { $0.value > $1.value }, id: \.key) { key, value in
DomainBar(name: Self.categoryLabel(key), count: value, max: s.counts.values.max() ?? 1)
Text("도메인 분포").font(.headline).foregroundStyle(Sage.ink)
ForEach(s.byDomain.sorted { $0.value > $1.value }, id: \.key) { key, value in
DomainBar(name: key, count: value, max: s.byDomain.values.max() ?? 1)
.contentShape(Rectangle())
.onTapGesture { model.section = .documents }
}
@@ -34,18 +35,4 @@ struct DashboardView: View {
}
.background(Sage.surface)
}
/// category enum ( raw ).
static func categoryLabel(_ key: String) -> String {
switch key {
case "document": return "문서"
case "library": return "자료실"
case "news": return "뉴스"
case "law": return "법령"
case "memo": return "메모"
case "audio": return "오디오"
case "video": return "비디오"
default: return key
}
}
}
@@ -1,97 +0,0 @@
import SwiftUI
import DSKit
/// (GPU DS) ( FU-E ).
/// refresh / ; HttpOnly refresh
/// . TOTP ( ).
public struct LoginView: View {
@Environment(AppModel.self) private var model
@State private var username = ""
@State private var password = ""
@State private var totp = ""
@State private var submitting = false
@FocusState private var focus: Field?
private enum Field { case username, password, totp }
public init() {}
public var body: some View {
VStack(spacing: 0) {
Spacer()
VStack(alignment: .leading, spacing: 16) {
VStack(alignment: .leading, spacing: 4) {
Text("Document Server")
.font(.title2.weight(.semibold))
.foregroundStyle(Sage.ink)
Text(serverHost)
.font(.caption)
.foregroundStyle(Sage.muted)
}
VStack(spacing: 10) {
TextField("아이디", text: $username)
.textFieldStyle(.roundedBorder)
.focused($focus, equals: .username)
.onSubmit { focus = .password }
SecureField("비밀번호", text: $password)
.textFieldStyle(.roundedBorder)
.focused($focus, equals: .password)
.onSubmit { submit() }
TextField("2FA 코드 (설정한 경우)", text: $totp)
.textFieldStyle(.roundedBorder)
.focused($focus, equals: .totp)
.onSubmit { submit() }
}
if let error = model.loginError {
Text(error)
.font(.callout)
.foregroundStyle(Sage.danger)
.fixedSize(horizontal: false, vertical: true)
}
Button(action: submit) {
Group {
if submitting {
ProgressView().controlSize(.small)
} else {
Text("로그인")
}
}
.frame(maxWidth: .infinity)
}
.buttonStyle(.borderedProminent)
.tint(Sage.brand)
.disabled(submitting || username.isEmpty || password.isEmpty)
}
.padding(28)
.frame(width: 360)
.background(Sage.card, in: RoundedRectangle(cornerRadius: 12))
.overlay(RoundedRectangle(cornerRadius: 12).stroke(Sage.line))
Spacer()
}
.frame(maxWidth: .infinity, maxHeight: .infinity)
.background(Sage.surface)
.onAppear { focus = .username }
}
/// base host (: document.hyungi.net / 100.110.63.63).
private var serverHost: String {
model.base.url.host() ?? model.base.url.absoluteString
}
private func submit() {
guard !submitting, !username.isEmpty, !password.isEmpty else { return }
submitting = true
Task {
await model.login(username: username, password: password, totp: totp)
submitting = false
}
}
}
#if DEBUG
#Preview("로그인") {
@Previewable @State var model = AppModel.preview
LoginView()
.environment(model)
.frame(width: 700, height: 500)
}
#endif
@@ -3,7 +3,6 @@ import DSKit
/// DEVONthink-style 3-column shell. RootView only ROUTES; each page owns its own interior treatment
/// (no shell-level auto-inherit). macOS-only target.
/// : checking( refresh ) loggedOut(LoginView) ready(3-pane ).
public struct RootView: View {
@Environment(AppModel.self) private var model
@State private var columnVisibility: NavigationSplitViewVisibility = .all
@@ -11,22 +10,6 @@ public struct RootView: View {
public init() {}
public var body: some View {
Group {
switch model.authPhase {
case .checking:
ProgressView("서버 연결 확인 중")
.frame(maxWidth: .infinity, maxHeight: .infinity)
.background(Sage.surface)
case .loggedOut:
LoginView()
case .ready:
shell
}
}
.task { await model.bootstrap() }
}
private var shell: some View {
NavigationSplitView(columnVisibility: $columnVisibility) {
Sidebar()
.navigationSplitViewColumnWidth(min: 220, ideal: 250)
@@ -38,24 +21,7 @@ public struct RootView: View {
}
.navigationSplitViewStyle(.balanced)
.tint(Sage.brand)
.safeAreaInset(edge: .bottom) {
// (no-silent-fallback) .
if let err = model.errorText {
HStack(spacing: 10) {
Text(err)
.font(.callout)
.foregroundStyle(.white)
.lineLimit(2)
Spacer()
Button("닫기") { model.errorText = nil }
.buttonStyle(.plain)
.foregroundStyle(.white.opacity(0.85))
}
.padding(.horizontal, 14)
.padding(.vertical, 8)
.background(Sage.danger)
}
}
.task { await model.loadInitial() }
}
}
@@ -154,6 +120,7 @@ struct EmptyState: View {
@Previewable @State var model = AppModel.preview
RootView()
.environment(model)
.task { await model.loadInitial() }
.frame(minWidth: 1000, minHeight: 660)
}
#endif
@@ -23,10 +23,6 @@ public final class AppModel {
}
}
/// : refresh (checking) (loggedOut)
/// (ready). Fixture refresh fixture ready.
public enum AuthPhase: Equatable { case checking, loggedOut, ready }
public var section: Section = .dashboard
public var selectedDocumentID: Int?
public var selectedMemoID: Int?
@@ -45,23 +41,14 @@ public final class AppModel {
public var digest: DigestResponse?
public var errorText: String?
public private(set) var authPhase: AuthPhase = .checking
/// ( ).
public var loginError: String?
/// bootstrap single-shot ( ).
private var didBootstrap = false
let client: any DSClient
let ai: AIService
/// DS base URL (live()/preview ).
let base: DSBaseURL
/// access ( ?token= ). bootstrap/login .
/// Placeholder token from the auth fixture builds a real-SHAPED download URL with no expectation it resolves offline.
public private(set) var accessToken: String = ""
public init(client: any DSClient, ai: AIService, base: DSBaseURL = .publicTLS) {
public init(client: any DSClient, ai: AIService) {
self.client = client
self.ai = ai
self.base = base
}
@MainActor
@@ -69,66 +56,8 @@ public final class AppModel {
AppModel(client: FixtureDSClient(), ai: AIService(router: AppAIComposition.mockRouter()))
}
/// (GPU DS) : LiveDSClient + AIFabric (realRouter). ask closure
/// client TokenProvider (401 refresh ). = InMemory
/// access 15 , HttpOnly refresh (7,
/// HTTPCookieStorage ) . Keychain .
@MainActor
public static func live(
base: DSBaseURL = .publicTLS,
persistence: TokenPersistence = InMemoryTokenStore()
) -> AppModel {
let client = LiveDSClient(base: base, persistence: persistence)
let router = AppAIComposition.realRouter(base: base) { await client.currentAccessToken() }
return AppModel(client: client, ai: AIService(router: router), base: base)
}
/// 1 (single-shot / .task ):
/// refresh . 401( /) = loggedOut( ) /
/// ( ) = loggedOut + loginError (no-silent-fallback) /
/// task ( ) = appear .
public func bootstrap() async {
guard !didBootstrap else { return }
didBootstrap = true
// authPhase .checking ready UI .
do {
let token = try await client.refresh().accessToken
accessToken = token
authPhase = .ready
await loadInitial()
} catch let e as DSError where e.isAuthExpired {
authPhase = .loggedOut
} catch {
if Task.isCancelled {
didBootstrap = false
return
}
authPhase = .loggedOut
loginError = (error as? LocalizedError)?.errorDescription ?? "\(error)"
}
}
/// (POST /auth/login JWT). totp / .
public func login(username: String, password: String, totp: String?) async {
loginError = nil
do {
let code = totp.map {
$0.replacingOccurrences(of: " ", with: "").trimmingCharacters(in: .whitespacesAndNewlines)
}
let response = try await client.login(
username: username,
password: password,
totpCode: (code?.isEmpty ?? true) ? nil : code
)
accessToken = response.accessToken
authPhase = .ready
await loadInitial()
} catch {
loginError = (error as? LocalizedError)?.errorDescription ?? "\(error)"
}
}
public func loadInitial() async {
await guarded { self.accessToken = (try? await self.client.login(username: "hyungi", password: "x", totpCode: nil).accessToken) ?? "" }
await guarded { self.tree = try await self.client.documentTree() }
await guarded { self.stats = try await self.client.categoryCounts() }
await guarded { self.documentList = try await self.client.documents(DocumentListQuery()).items }
@@ -159,26 +88,11 @@ public final class AppModel {
public func downloadURL(for doc: DocumentResponse) -> URL? {
guard doc.hasDownloadableOriginal, !accessToken.isEmpty else { return nil }
return DSDownload.fileURL(base: base, documentID: doc.id, accessToken: accessToken)
return DSDownload.fileURL(base: .publicTLS, documentID: doc.id, accessToken: accessToken)
}
private func guarded(_ work: () async throws -> Void) async {
do {
try await work()
} catch let e as DSError where e.isAuthExpired {
// LiveDSClient refresh+ (refresh /) .
authPhase = .loggedOut
loginError = "세션이 만료되었습니다. 다시 로그인하세요."
} catch {
errorText = (error as? LocalizedError)?.errorDescription ?? "\(error)"
}
await syncAccessToken()
}
/// 401 (LiveDSClient refresh) ?token= guarded
/// . = TokenProvider.
private func syncAccessToken() async {
guard let live = client as? LiveDSClient, let t = await live.currentAccessToken() else { return }
if t != accessToken { accessToken = t }
do { try await work() }
catch { errorText = (error as? LocalizedError)?.errorDescription ?? "\(error)" }
}
}
+4 -4
View File
@@ -1,8 +1,8 @@
import Foundation
/// Injectable base URL. Public TLS by default; Tailscale alternative = GPU canonical
/// Tailscale IP (infra_inventory.md , 2026-06-07 DS = GPU ,
/// contract/CONTRACT.md·CompositionTests ).
/// Injectable base URL. Public TLS by default; Tailscale alternative uses a MagicDNS hostname
/// (NOT a hardcoded 100.x IP, which changes on node re-registration). Scaffold never makes a live
/// call, so the Tailscale host is a placeholder until FU-A.
public enum DSBaseURL: Sendable {
case publicTLS
case tailscale
@@ -11,7 +11,7 @@ public enum DSBaseURL: Sendable {
public var url: URL {
switch self {
case .publicTLS: return URL(string: "https://document.hyungi.net/api")!
case .tailscale: return URL(string: "http://100.110.63.63:8000/api")!
case .tailscale: return URL(string: "http://ds-gpu.tailnet-name.ts.net:8000/api")!
case .custom(let u): return u
}
}
@@ -39,9 +39,6 @@ public final class LiveDSClient: DSClient, @unchecked Sendable {
public func setAccessToken(_ token: String) async { await tokens.set(token) }
/// realRouter ask closure TokenProvider (401 refresh ).
public func currentAccessToken() async -> String? { await tokens.current() }
// MARK: - Request building / sending
private func makeRequest(_ endpoint: DSEndpoint, token: String?) throws -> URLRequest {
@@ -12,21 +12,19 @@ public struct DomainTreeNode: Codable, Sendable, Identifiable {
public var kids: [DomainTreeNode] { children ?? [] }
}
/// GET /documents/stats/category-counts Pydantic response model raw dict
/// . shape (total/by_domain/...) decode
/// 2026-06-07 (fixture documents_stats.json = CAPTURED_LIVE).
public struct CategoryCounts: Codable, Sendable {
/// category(enum) : document/library/news/law/memo/audio.
public let counts: [String: Int]
public let libraryPendingSuggestions: Int
public let total: Int
public let documents: Int
public let byDomain: [String: Int]
public let reviewPending: Int
public let pipelineFailed: Int
enum CodingKeys: String, CodingKey {
case counts
case libraryPendingSuggestions = "library_pending_suggestions"
case total, documents
case byDomain = "by_domain"
case reviewPending = "review_pending"
case pipelineFailed = "pipeline_failed"
}
/// (counts ) .
public var total: Int { counts.values.reduce(0, +) }
}
public struct DuplicateGroup: Codable, Sendable, Identifiable {
@@ -1,11 +1,14 @@
{
"counts": {
"library": 391,
"law": 229,
"document": 381,
"news": 6182,
"memo": 4,
"audio": 2
"total": 1163,
"documents": 783,
"by_domain": {
"Industrial_Safety": 426,
"Engineering": 351,
"General": 189,
"Programming": 60,
"법령": 23,
"Philosophy": 12
},
"library_pending_suggestions": 0
"review_pending": 725,
"pipeline_failed": 19
}
@@ -1,182 +0,0 @@
import XCTest
@testable import AppFeature
import DSKit
/// 0 (Fixture/stub ).
/// bootstrap: refresh =ready / =loggedOut. login: =ready+ / 401= .
final class AppModelAuthTests: XCTestCase {
@MainActor
private func makeModel(client: any DSClient) -> AppModel {
AppModel(client: client, ai: AIService(router: AppAIComposition.mockRouter()))
}
// refresh ( Fixture fixture ) ready +
@MainActor
func testBootstrapRefreshSuccessGoesReady() async {
let model = AppModel.preview
await model.bootstrap()
XCTAssertEqual(model.authPhase, .ready)
XCTAssertFalse(model.accessToken.isEmpty)
XCTAssertFalse(model.documentList.isEmpty, "ready 진입 시 초기 로드까지 수행해야 함")
}
// refresh ( /) loggedOut,
@MainActor
func testBootstrapRefreshFailureGoesLoggedOut() async {
let model = makeModel(client: AuthStubClient(refreshFails: true))
await model.bootstrap()
XCTAssertEqual(model.authPhase, .loggedOut)
XCTAssertTrue(model.accessToken.isEmpty)
XCTAssertTrue(model.documentList.isEmpty)
}
// loggedOut login ready +
@MainActor
func testLoginSuccessTransitionsToReady() async {
let model = makeModel(client: AuthStubClient(refreshFails: true))
await model.bootstrap()
XCTAssertEqual(model.authPhase, .loggedOut)
await model.login(username: "hyungi", password: "pw", totp: nil)
XCTAssertEqual(model.authPhase, .ready)
XCTAssertFalse(model.accessToken.isEmpty)
XCTAssertNil(model.loginError)
XCTAssertFalse(model.documentList.isEmpty)
}
// login 401 loginError + loggedOut +
@MainActor
func testLoginFailureSurfacesErrorAndStaysLoggedOut() async {
let model = makeModel(client: AuthStubClient(refreshFails: true, loginFails: true))
await model.bootstrap()
await model.login(username: "hyungi", password: "wrong", totp: nil)
XCTAssertEqual(model.authPhase, .loggedOut)
XCTAssertNotNil(model.loginError)
XCTAssertTrue(model.accessToken.isEmpty)
}
// totp / totpCode nil ( totp )
@MainActor
func testLoginSendsNilForBlankTotp() async {
let stub = AuthStubClient(refreshFails: true)
let model = makeModel(client: stub)
await model.login(username: "u", password: "p", totp: " ")
XCTAssertNotNil(stub.recordedLogin, "login 이 호출돼야 함")
XCTAssertNil(stub.recordedLogin?.totp, "공백 totp 는 nil 로 정규화")
await model.login(username: "u", password: "p", totp: "123456")
XCTAssertEqual(stub.recordedLogin?.totp, "123456")
}
// totp (/ ) "123 456\n" "123456"
@MainActor
func testLoginNormalizesTotpNewlineAndSpaces() async {
let stub = AuthStubClient(refreshFails: true)
let model = makeModel(client: stub)
await model.login(username: "u", password: "p", totp: "123 456\n")
XCTAssertEqual(stub.recordedLogin?.totp, "123456")
await model.login(username: "u", password: "p", totp: " \n ")
XCTAssertNil(stub.recordedLogin?.totp, "개행+공백뿐이면 nil")
}
// bootstrap single-shot (.task ) refresh 1, ready
@MainActor
func testBootstrapIsSingleShot() async {
let stub = AuthStubClient()
let model = makeModel(client: stub)
await model.bootstrap()
XCTAssertEqual(model.authPhase, .ready)
await model.bootstrap() // appear
XCTAssertEqual(model.authPhase, .ready, "재진입이 checking 으로 리셋하면 안 됨")
XCTAssertEqual(stub.refreshCount, 1, "refresh 는 1회만")
}
// bootstrap transport ( ) loggedOut + ( )
@MainActor
func testBootstrapTransportFailureExposesReason() async {
let model = makeModel(client: AuthStubClient(refreshTransportFails: true))
await model.bootstrap()
XCTAssertEqual(model.authPhase, .loggedOut)
XCTAssertNotNil(model.loginError, "transport 실패 사유가 로그인 화면에 노출돼야 함")
}
// ( refresh+ ) ready loggedOut
@MainActor
func testAuthExpiredDuringUseDemotesToLoggedOut() async {
let stub = AuthStubClient()
let model = makeModel(client: stub)
await model.bootstrap()
XCTAssertEqual(model.authPhase, .ready)
stub.dataAuthExpired = true // 401 (refresh )
await model.openDocument(1)
XCTAssertEqual(model.authPhase, .loggedOut)
XCTAssertNotNil(model.loginError)
}
// live : LiveDSClient + base ( )
@MainActor
func testLiveFactoryComposition() {
let model = AppModel.live(base: .tailscale)
XCTAssertTrue(model.client is LiveDSClient)
XCTAssertEqual(model.base.url.absoluteString, DSBaseURL.tailscale.url.absoluteString)
}
}
/// FixtureDSClient + ( 0).
/// task @unchecked Sendable .
final class AuthStubClient: DSClient, @unchecked Sendable {
private let inner = FixtureDSClient()
private let refreshFails: Bool
private let refreshTransportFails: Bool
private let loginFails: Bool
private(set) var recordedLogin: (username: String, totp: String?)?
private(set) var refreshCount = 0
/// true 401 ( LiveDSClient )
var dataAuthExpired = false
init(refreshFails: Bool = false, refreshTransportFails: Bool = false, loginFails: Bool = false) {
self.refreshFails = refreshFails
self.refreshTransportFails = refreshTransportFails
self.loginFails = loginFails
}
private func gateData() throws {
if dataAuthExpired { throw DSError.unauthorized(message: nil) }
}
// Auth
func login(username: String, password: String, totpCode: String?) async throws -> AccessTokenResponse {
recordedLogin = (username, totpCode)
if loginFails { throw DSError.unauthorized(message: "아이디 또는 비밀번호가 올바르지 않습니다") }
return try await inner.login(username: username, password: password, totpCode: totpCode)
}
func refresh() async throws -> AccessTokenResponse {
refreshCount += 1
if refreshTransportFails { throw DSError.transport(underlying: "Could not connect to the server") }
if refreshFails { throw DSError.unauthorized(message: "refresh failed") }
return try await inner.refresh()
}
func me() async throws -> UserResponse { try await inner.me() }
func logout() async throws { try await inner.logout() }
// Fixture (dataAuthExpired )
func documents(_ query: DocumentListQuery) async throws -> DocumentListResponse { try gateData(); return try await inner.documents(query) }
func document(id: Int) async throws -> DocumentDetailResponse { try gateData(); return try await inner.document(id: id) }
func documentContent(id: Int) async throws -> DocumentContentResponse { try await inner.documentContent(id: id) }
func documentTree() async throws -> [DomainTreeNode] { try await inner.documentTree() }
func categoryCounts() async throws -> CategoryCounts { try await inner.categoryCounts() }
func duplicates() async throws -> DuplicatesResponse { try await inner.duplicates() }
func patchDocument(id: Int, _ update: DocumentUpdate) async throws -> DocumentResponse { try await inner.patchDocument(id: id, update) }
func putContent(id: Int, content: String) async throws { try await inner.putContent(id: id, content: content) }
func deleteDocument(id: Int) async throws { try await inner.deleteDocument(id: id) }
func search(q: String, mode: SearchMode?, page: Int?, debug: Bool?) async throws -> SearchResponse { try await inner.search(q: q, mode: mode, page: page, debug: debug) }
func ask(q: String, limit: Int?, backend: String?, debug: Bool?) async throws -> AskResponse { try await inner.ask(q: q, limit: limit, backend: backend, debug: debug) }
func memos(_ query: MemoListQuery) async throws -> MemoListResponse { try await inner.memos(query) }
func memo(id: Int) async throws -> MemoResponse { try await inner.memo(id: id) }
func createMemo(_ create: MemoCreate) async throws -> MemoResponse { try await inner.createMemo(create) }
func patchMemo(id: Int, _ update: MemoUpdate) async throws -> MemoResponse { try await inner.patchMemo(id: id, update) }
func pinMemo(id: Int, pinned: Bool) async throws -> MemoResponse { try await inner.pinMemo(id: id, pinned: pinned) }
func archiveMemo(id: Int, archived: Bool) async throws -> MemoResponse { try await inner.archiveMemo(id: id, archived: archived) }
func toggleMemoTask(id: Int, taskIndex: Int, checked: Bool) async throws -> MemoResponse { try await inner.toggleMemoTask(id: id, taskIndex: taskIndex, checked: checked) }
func deleteMemo(id: Int) async throws { try await inner.deleteMemo(id: id) }
func digest(date: String?, country: String?) async throws -> DigestResponse { try await inner.digest(date: date, country: country) }
}
@@ -65,10 +65,8 @@ final class FixtureDecodeTests: XCTestCase {
func testStats() async throws {
let s = try await client.categoryCounts()
XCTAssertEqual(s.counts["news"], 6182)
XCTAssertEqual(s.counts["library"], 391)
XCTAssertEqual(s.libraryPendingSuggestions, 0)
XCTAssertEqual(s.total, 391 + 229 + 381 + 6182 + 4 + 2) // = counts
XCTAssertEqual(s.documents, 783)
XCTAssertEqual(s.byDomain["법령"], 23) // non-ASCII dict key
}
func testDuplicates() async throws {
+1 -1
View File
@@ -53,7 +53,7 @@ UserResponse { id: Int, username: String, is_active: Bool, totp_enabled: Bool, l
| GET | `/documents/{id}/file` | `?token=<access>&download=true` | **바이너리 원본** (PDF/이미지/오디오/원본) | — |
| GET | `/documents/{id}/content` | — | 경량 텍스트(`content` 15k cap) | `document_content.json` |
| GET | `/documents/tree` | — | 도메인 트리(사이드바) | `documents_tree.json` |
| GET | `/documents/stats/category-counts` | — | `{counts: {category: n}, library_pending_suggestions}`**raw dict 반환(Pydantic 모델 없음), 2026-06-07 라이브 재캡처로 정정**(초기 추출이 shape 합성 오류) | `documents_stats.json` |
| GET | `/documents/stats/category-counts` | — | 카테고리 카운트 | `documents_stats.json` |
| POST | `/documents/` (multipart) | 파일 업로드 | `DocumentResponse` (201) | `document_detail.json` |
| PATCH | `/documents/{id}` | `DocumentUpdate` | `DocumentResponse` | — |
| PUT | `/documents/{id}/content` | `{content}` (md 편집 저장) | `{}` | — |
@@ -1,11 +1,14 @@
{
"counts": {
"library": 391,
"law": 229,
"document": 381,
"news": 6182,
"memo": 4,
"audio": 2
"total": 1163,
"documents": 783,
"by_domain": {
"Industrial_Safety": 426,
"Engineering": 351,
"General": 189,
"Programming": 60,
"법령": 23,
"Philosophy": 12
},
"library_pending_suggestions": 0
"review_pending": 725,
"pipeline_failed": 19
}
+16 -10
View File
@@ -114,14 +114,20 @@ services:
start_period: 300s
restart: unless-stopped
# ── ollama 서비스 제거 (2026-06-08) ──
# 정본 ollama = standalone `~/ollama/docker-compose.yml`(container_name: ollama).
# 그 컨테이너가 hyungi_document_server_default 망(external) + 동일 볼륨
# hyungi_document_server_ollama_data(external, bge-m3) 부착으로 fastapi 의 `ollama:11434`
# 임베딩을 이미 서빙(재부팅에도 durable). 본 중복 서비스는 같은 host 127.0.0.1:11434 를
# 점유 다퉈, 재부팅 후 `docker compose up` 을 'port already allocated' 로 abort →
# 뒤 의존서비스(caddy·frontend) 미기동 = 웹 outage 유발 → 제거. (ollama_data 볼륨 def 는
# standalone 이 external 로 참조하므로 아래 volumes: 에 보존.)
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- "127.0.0.1:11434:11434"
restart: unless-stopped
# Phase 1.3: bge-reranker-v2-m3 (TEI) — internal only, fastapi에서 reranker:80으로 호출
# fastapi가 depends_on 안 함 → 단독 시작 가능, 없어도 fastapi 동작 (rerank=false fallback)
@@ -167,8 +173,8 @@ services:
- FALLBACK_ENDPOINT=http://ollama:11434/v1/chat/completions
- CLAUDE_API_KEY=${CLAUDE_API_KEY:-}
- DAILY_BUDGET_USD=${DAILY_BUDGET_USD:-5.00}
# depends_on: ollama 제거 (2026-06-08) — ollama 서비스가 standalone 으로 이관됨.
# FALLBACK_ENDPOINT 의 ollama:11434 는 standalone(동일 hostname, DS 망 부착)으로 해소.
depends_on:
- ollama
restart: unless-stopped
fastapi:
-87
View File
@@ -1,87 +0,0 @@
# S1 데이터·백엔드 트랙 적용 runbook (plan ds-s1-backend-1)
> 코드는 `feat/s1-dedup-fields` 브랜치에 완성. 이 문서는 **prod(GPU) 적용 게이트** 절차.
> ⚠ 적용은 사용자 명시 go 필요 — 본 runbook 은 자동 실행되지 않는다.
## 0. 사전 조건 (게이트)
- [ ] **검색실험 Soft Lock 확인**`~/.claude/.search-experiment-active` 부재여야 함.
현재(2026-06-05) 부재 = 비활성. migration 317 은 startup 자동적용 → `docker compose up`
이 restart 를 유발하므로, 실험 활성 시엔 예외창 합의 후에만.
- [ ] **불가침 면 (검색실험 유효성)**: embedding 모델 / 벡터 인덱스(ivfflat/partial) /
retrieval config / config.yaml 의 ai·model 섹션 **미접촉**. 본 트랙 변경면은
dedup 컬럼 + office_md + storage scaffold(env) 뿐.
## 1. migration 번호
- 317(dedup 3컬럼) **단일** 클레임. P0-4=(C) 무변경이라 신규 migration 미추가.
- S2/S3 트랙이 같은 317 을 발행하지 않도록 조율(startup 카오스 방지).
## 2. restart 셋 (한 번에 배치)
| 서비스 | 변경 | 재시작 사유 |
|---|---|---|
| `fastapi` | A(317 dedup) + B(dedup API) + D(storage scaffold) | startup migration 자동적용 + 코드 |
| `marker_worker`(fastapi 내 스케줄러) | C(office_md 분기) + **markitdown 신규 pip dep** | rebuild 필요 |
> markitdown 은 신규 의존성 → `docker compose build` 필수(force-recreate 만으론 image 미갱신,
> feedback_docker_compose_build_vs_force_recreate). office 변환(OOXML)에만 필요.
## 3. 적용 순서 (inventory → config → deploy → verify)
```bash
ssh gpu && cd ~/Documents/code/hyungi_Document_Server
# (1) pre-A-1 안전망 — DB 덤프 (repo 밖)
bash scripts/s1_pre_change_backup.sh pre-a1
# (2) 코드 가져오기 + 빌드(markitdown dep 반영) + 적용
git fetch && git checkout feat/s1-dedup-fields # 또는 main 머지 후 main
docker compose build fastapi # markitdown 설치 (requirements 에 추가 필요)
docker compose up -d fastapi # startup 에서 migration 317 자동적용
# (3) migration 317 적용 확인
docker compose exec -T postgres psql -U pkm -d pkm -c \
"SELECT version,name FROM schema_migrations WHERE version=317;"
docker compose exec -T postgres psql -U pkm -d pkm -c \
"\d documents" | grep -E 'original_filename|duplicate_of|duplicate_count'
```
> **requirements**: office OOXML 변환에 `markitdown` 추가 필요(`requirements.txt`/pyproject).
> markdownify·LibreOffice 는 기존. 빌드 전 dep 추가 PR 필수(없으면 OOXML 변환이 OfficeMdError→failed,
> hwp/PDF/passthrough 는 정상).
## 4. backfill (코드 적용·검증 후, 야간 비중첩창)
> dedup 컬럼 정합은 **야간 잡 `dedup_reconcile`(03:30 KST, main.py)** 이 매일 멱등 재계산한다
> (soft-delete 잔여 드리프트 자동 정리). 아래 `backfill_dedup.py` 수동 실행은 적용 직후 1회
> 초기 채움/즉시 확인용 — 이후엔 야간 잡이 유지.
```bash
# (4a) dedup backfill (초기 1회) — 먼저 dry-run 으로 정확한 UPDATE set 확인
bash scripts/s1_pre_change_backup.sh pre-b4
docker compose exec fastapi python /app/scripts/backfill_dedup.py --dry-run
docker compose exec fastapi python /app/scripts/backfill_dedup.py --apply
# (4b) office/hwp pending markdown 백필 — C-2 라이브 ingestion 과 비중첩 야간창
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --dry-run
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --apply --limit 20 # sample 먼저
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --apply # 전체
```
## 5. verify (smoke)
```bash
# /duplicates shape
curl -s -H "Authorization: Bearer $TOK" https://document.hyungi.net/api/documents/duplicates | jq '{total_groups,total_duplicate_docs, g0:.groups[0]}'
# office 변환 결과 (sample doc)
docker compose exec -T postgres psql -U pkm -d pkm -c \
"SELECT md_status,md_extraction_engine,length(md_content) FROM documents WHERE id=<office_doc_id>;"
# md_status success→completed 직렬화 (앱 계약)
curl -s -H "Authorization: Bearer $TOK" https://document.hyungi.net/api/documents/<id> | jq '.md_status'
```
## 6. 롤백
- 컬럼만 빠른 롤백: `scripts/rollback_317.sql` (수동, schema_migrations 317 행도 삭제).
- 전체 복원: `scripts/s1_pre_change_backup.sh` 가 출력한 `.sql.gz` → psql 복원.
+114 -201
View File
@@ -3,17 +3,10 @@
import { addToast } from '$lib/stores/toast';
import { marked } from 'marked';
import DOMPurify from 'dompurify';
import { ExternalLink, Save } from 'lucide-svelte';
import { ExternalLink, Save, RefreshCw } from 'lucide-svelte';
import Tabs from '$lib/components/ui/Tabs.svelte';
import MarkdownDoc from '$lib/components/MarkdownDoc.svelte';
import MarkdownStatusBadge from '$lib/components/MarkdownStatusBadge.svelte';
import SectionOutline from '$lib/components/SectionOutline.svelte';
import { getViewerType } from '$lib/utils/viewerType';
import { isMdSuccess } from '$lib/utils/mdStatus';
import { buildAnchorMap } from '$lib/utils/outlineAnchors';
import { cleanHeading } from '$lib/utils/headingPath';
// 편집 미리보기 전용 plain marked (본문 렌더는 MarkdownDoc 가 담당).
// marked + sanitize
marked.use({ mangle: false, headerIds: false });
function renderMd(text) {
return DOMPurify.sanitize(marked(text), {
@@ -29,19 +22,33 @@
let loading = $state(true);
let viewerType = $state('none');
// Markdown 편집 (md/txt — extracted_text 가 표시·편집 단일 필드)
// Markdown 편집
let editMode = $state(false);
let editContent = $state('');
let editTab = $state('edit');
let saving = $state(false);
let rawMarkdown = $state('');
function getViewerType(format) {
if (['md', 'txt'].includes(format)) return 'markdown';
if (format === 'pdf') return 'pdf';
if (['hwp', 'hwpx'].includes(format)) return 'preview-pdf';
if (['odoc', 'osheet', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp'].includes(format)) return 'preview-pdf';
if (['jpg', 'jpeg', 'png', 'gif', 'bmp', 'tiff'].includes(format)) return 'image';
if (['csv', 'json', 'xml', 'html'].includes(format)) return 'text';
if (['dwg', 'dxf'].includes(format)) return 'cad';
return 'unsupported';
}
const ODF_FORMATS = ['ods', 'odt', 'odp', 'odoc', 'osheet'];
function getEditInfo(d) {
if (d.edit_url) return { url: d.edit_url, label: '편집' };
if (ODF_FORMATS.includes(d.file_format)) return { url: 'https://link.hyungi.net', label: 'Synology Drive에서 열기' };
if (['dwg', 'dxf'].includes(d.file_format)) return { url: 'https://web.autocad.com', label: 'AutoCAD Web' };
function getEditInfo(doc) {
// DB에 저장된 편집 URL 우선
if (doc.edit_url) return { url: doc.edit_url, label: '편집' };
// ODF 포맷 → Synology Drive
if (ODF_FORMATS.includes(doc.file_format)) return { url: 'https://link.hyungi.net', label: 'Synology Drive에서 열기' };
// CAD
if (['dwg', 'dxf'].includes(doc.file_format)) return { url: 'https://web.autocad.com', label: 'AutoCAD Web' };
return null;
}
@@ -54,19 +61,18 @@
async function loadFullDoc(id) {
loading = true;
rawMarkdown = '';
sections = [];
try {
fullDoc = await api(`/documents/${id}`);
viewerType = getViewerType(fullDoc.file_format, fullDoc.source_channel);
loadSections(id);
viewerType = fullDoc.source_channel === 'news' ? 'article' : getViewerType(fullDoc.file_format);
// 본문 markdown(md/txt) 인데 extracted_text 가 비면 원본 파일 직접 로드.
// Markdown: extracted_text 없으면 원본 파일 직접 가져오기
if (viewerType === 'markdown' && !fullDoc.extracted_text) {
try {
const resp = await fetch(`/api/documents/${id}/file?token=${getAccessToken()}`);
if (resp.ok) rawMarkdown = await resp.text();
} catch (e) { rawMarkdown = ''; }
} else {
rawMarkdown = '';
}
} catch (err) {
fullDoc = null;
@@ -76,86 +82,6 @@
}
}
// PDF markdown-first: marker 가 만든 canonical md_content 가 있으면 기본으로 그것을 보여주고
// "PDF 원본" 토글 제공. lastDocId 는 prop(fullDoc.id) 로 키잉 — 3-pane 은 라우트 리마운트가
// 없어 page.params 가드는 no-op 이 된다.
let pdfViewMode = $state('markdown');
let lastDocId = $state(null);
let canShowMarkdown = $derived(
!!(isMdSuccess(fullDoc?.md_status) && fullDoc?.md_content?.trim())
);
$effect(() => {
if (!fullDoc) return;
if (fullDoc.id !== lastDocId) {
lastDocId = fullDoc.id;
pdfViewMode = canShowMarkdown ? 'markdown' : 'pdf';
}
if (!canShowMarkdown && pdfViewMode === 'markdown') pdfViewMode = 'pdf';
});
// ── 절 목차(개요) rail + 점프 + scroll-spy (outlineAnchors, 경로 A) ──
let sections = $state([]);
async function loadSections(id) {
try {
const r = await api(`/documents/${id}/sections`);
if (id === doc?.id) sections = r?.sections ?? [];
} catch {
if (id === doc?.id) sections = [];
}
}
// window 빈제목(31% 노이즈) 등 표시 가능한 제목 없는 항목은 rail 에서 제외(클린업).
let outlineSections = $derived(
sections.filter(
(s) => !!(cleanHeading(s.section_title) || cleanHeading((s.heading_path || '').split('>').pop() || '')),
),
);
// MarkdownDoc 가 실제 렌더하는 텍스트(anchor offset 기준과 일치해야 함).
let mdRenderText = $derived.by(() => {
if (!fullDoc) return '';
if (viewerType === 'pdf') return pdfViewMode === 'markdown' && canShowMarkdown ? (fullDoc.md_content || '') : '';
if (viewerType === 'markdown') return fullDoc.extracted_text || rawMarkdown || '';
if (viewerType === 'hwp-markdown' || viewerType === 'article') return fullDoc.md_content || fullDoc.extracted_text || '';
return '';
});
let anchorMap = $derived(sections.length && mdRenderText ? buildAnchorMap(mdRenderText, sections).anchors : {});
let showRail = $derived(outlineSections.length > 0 && !!mdRenderText);
let scrollEl = $state();
let activeKey = $state(null);
function jumpTo(chunkId) {
const el = scrollEl?.querySelector(`#sec-${chunkId}`);
if (el) el.scrollIntoView({ block: 'start', behavior: 'smooth' });
}
// scroll-spy: scrollEl 내 .md-anchor 중 컨테이너 상단(+120) 지난 마지막 = 현재 절.
$effect(() => {
void anchorMap;
const el = scrollEl;
if (!el) return;
let raf = 0;
const onScroll = () => {
if (raf) return;
raf = requestAnimationFrame(() => {
raf = 0;
const threshold = el.getBoundingClientRect().top + 120;
let cur = null;
el.querySelectorAll('.md-anchor').forEach((a) => {
if (a.getBoundingClientRect().top <= threshold) cur = a;
});
if (cur) {
const m = cur.id.match(/^sec-(\d+)$/);
if (m) activeKey = Number(m[1]);
}
});
};
el.addEventListener('scroll', onScroll, { passive: true });
const t = setTimeout(onScroll, 0);
return () => {
el.removeEventListener('scroll', onScroll);
clearTimeout(t);
if (raf) cancelAnimationFrame(raf);
};
});
function startEdit() {
editContent = fullDoc?.extracted_text || rawMarkdown || '';
editMode = true;
@@ -187,7 +113,6 @@
}
let editInfo = $derived(fullDoc ? getEditInfo(fullDoc) : null);
const PROSE = 'prose prose-invert prose-base max-w-none';
</script>
<svelte:window on:keydown={handleKeydown} />
@@ -200,161 +125,149 @@
<div class="flex items-center gap-2">
{#if viewerType === 'markdown'}
{#if editMode}
<button onclick={saveContent} disabled={saving}
class="flex items-center gap-1 px-2 py-1 text-xs bg-accent text-white rounded hover:bg-accent-hover disabled:opacity-50">
<button
onclick={saveContent}
disabled={saving}
class="flex items-center gap-1 px-2 py-1 text-xs bg-accent text-white rounded hover:bg-accent-hover disabled:opacity-50"
>
<Save size={12} /> {saving ? '저장 중...' : '저장'}
</button>
<button onclick={() => editMode = false} class="px-2 py-1 text-xs text-dim hover:text-text">취소</button>
<button
onclick={() => editMode = false}
class="px-2 py-1 text-xs text-dim hover:text-text"
>취소</button>
{:else}
<button onclick={startEdit} class="px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded">편집</button>
<button
onclick={startEdit}
class="px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded"
>편집</button>
{/if}
{/if}
{#if editInfo}
<a href={editInfo.url} target="_blank" rel="noopener"
class="flex items-center gap-1 px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded">
<a
href={editInfo.url}
target="_blank"
rel="noopener"
class="flex items-center gap-1 px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded"
>
<ExternalLink size={12} /> {editInfo.label}
</a>
{/if}
<a href="/documents/{fullDoc.id}" class="px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded">전체 보기</a>
<a
href="/documents/{fullDoc.id}"
class="px-2 py-1 text-xs text-dim hover:text-accent border border-default rounded"
>전체 보기</a>
</div>
</div>
{/if}
<!-- 뷰어 본문 (+ 절 목차 rail) -->
<div class="flex-1 flex min-h-0">
{#if showRail}
<aside class="hidden lg:block w-[176px] shrink-0 overflow-y-auto border-r border-default p-2 bg-sidebar">
<SectionOutline sections={outlineSections} onJump={jumpTo} {activeKey} />
</aside>
{/if}
<div class="flex-1 overflow-auto min-h-0" bind:this={scrollEl}>
<!-- 뷰어 본문 -->
<div class="flex-1 overflow-auto min-h-0">
{#if loading}
<div class="flex items-center justify-center h-full"><p class="text-sm text-dim">로딩 중...</p></div>
<div class="flex items-center justify-center h-full">
<p class="text-sm text-dim">로딩 중...</p>
</div>
{:else if fullDoc}
{#if viewerType === 'markdown'}
{#if editMode}
<!-- Markdown 편집 (Tabs 프리미티브 — E.4) -->
<div class="flex flex-col h-full">
<Tabs tabs={[{ id: 'edit', label: '편집' }, { id: 'preview', label: '미리보기' }]} bind:value={editTab} class="flex flex-col h-full">
<Tabs
tabs={[
{ id: 'edit', label: '편집' },
{ id: 'preview', label: '미리보기' },
]}
bind:value={editTab}
class="flex flex-col h-full"
>
{#snippet children(activeId)}
{#if activeId === 'edit'}
<textarea bind:value={editContent}
<textarea
bind:value={editContent}
class="flex-1 w-full p-4 bg-bg text-text text-sm font-mono resize-none outline-none min-h-[300px]"
spellcheck="false" aria-label="마크다운 편집"></textarea>
spellcheck="false"
aria-label="마크다운 편집"
></textarea>
{:else}
<div class="flex-1 overflow-auto p-4 markdown-body">{@html renderMd(editContent)}</div>
<div class="flex-1 overflow-auto p-4 markdown-body">
{@html renderMd(editContent)}
</div>
{/if}
{/snippet}
</Tabs>
</div>
{:else}
<!-- md/txt = extracted_text 단일 필드(표시=편집), MarkdownDoc 로 앵커/KaTeX/이미지 렌더 -->
<div class="p-4">
<MarkdownDoc
documentId={fullDoc.id}
mdContent={null}
mdStatus={fullDoc.md_status}
mdExtractionError={fullDoc.md_extraction_error}
mdExtractionQuality={fullDoc.md_extraction_quality}
anchorMap={anchorMap}
extractedText={fullDoc.extracted_text || rawMarkdown}
class={PROSE}
/>
<div class="p-4 markdown-body">
{@html renderMd(fullDoc.extracted_text || rawMarkdown || '*텍스트 추출 대기 중*')}
</div>
{/if}
{:else if viewerType === 'pdf'}
<div class="p-4">
<div class="mb-2 flex items-center gap-2">
<MarkdownStatusBadge mdStatus={fullDoc.md_status} mdExtractionError={fullDoc.md_extraction_error} mdExtractionQuality={fullDoc.md_extraction_quality} />
{#if canShowMarkdown}
<button onclick={() => (pdfViewMode = 'markdown')}
class="px-2 py-1 text-xs rounded border {pdfViewMode === 'markdown' ? 'bg-accent text-white border-accent' : 'text-dim border-default hover:text-accent'}">Markdown</button>
<button onclick={() => (pdfViewMode = 'pdf')}
class="px-2 py-1 text-xs rounded border {pdfViewMode === 'pdf' ? 'bg-accent text-white border-accent' : 'text-dim border-default hover:text-accent'}">PDF 원본</button>
{/if}
</div>
{#if pdfViewMode === 'markdown' && canShowMarkdown}
<MarkdownDoc
documentId={fullDoc.id}
mdContent={fullDoc.md_content}
mdFrontmatter={fullDoc.md_frontmatter}
mdStatus={fullDoc.md_status}
mdExtractionError={fullDoc.md_extraction_error}
mdExtractionQuality={fullDoc.md_extraction_quality}
anchorMap={anchorMap}
extractedText={fullDoc.extracted_text}
class={PROSE}
/>
{:else}
<iframe src="/api/documents/{fullDoc.id}/file?token={getAccessToken()}" class="w-full h-[80vh] border-0 rounded" title={fullDoc.title}></iframe>
{/if}
</div>
{:else if viewerType === 'hwp-markdown'}
<div class="p-4">
<MarkdownDoc
documentId={fullDoc.id}
mdContent={fullDoc.md_content}
mdFrontmatter={fullDoc.md_frontmatter}
mdStatus={fullDoc.md_status}
mdExtractionError={fullDoc.md_extraction_error}
mdExtractionQuality={fullDoc.md_extraction_quality}
extractedText={fullDoc.extracted_text}
class={PROSE}
/>
</div>
<iframe
src="/api/documents/{fullDoc.id}/file?token={getAccessToken()}"
class="w-full h-full border-0"
title={fullDoc.title}
></iframe>
{:else if viewerType === 'preview-pdf'}
<iframe src="/api/documents/{fullDoc.id}/preview?token={getAccessToken()}" class="w-full h-full border-0" title={fullDoc.title} onerror={() => {}}></iframe>
<iframe
src="/api/documents/{fullDoc.id}/preview?token={getAccessToken()}"
class="w-full h-full border-0"
title={fullDoc.title}
onerror={() => {}}
></iframe>
{:else if viewerType === 'image'}
<div class="flex items-center justify-center h-full p-4">
<img src="/api/documents/{fullDoc.id}/file?token={getAccessToken()}" alt={fullDoc.title} class="max-w-full max-h-full object-contain rounded" />
<img
src="/api/documents/{fullDoc.id}/file?token={getAccessToken()}"
alt={fullDoc.title}
class="max-w-full max-h-full object-contain rounded"
/>
</div>
{:else if viewerType === 'text'}
<div class="p-4"><pre class="text-sm text-text whitespace-pre-wrap font-mono">{fullDoc.extracted_text || '텍스트 없음'}</pre></div>
{:else if viewerType === 'synology'}
<div class="flex flex-col items-center justify-center h-full gap-3">
<p class="text-sm text-dim">Synology Office 문서 — 외부 편집기에서 열어야 합니다.</p>
<a href={fullDoc.edit_url || 'https://link.hyungi.net'} target="_blank" rel="noopener"
class="flex items-center gap-1 px-3 py-1.5 text-sm bg-accent text-white rounded-lg hover:bg-accent-hover">
<ExternalLink size={14} /> 새 창에서 열기
</a>
<div class="p-4">
<pre class="text-sm text-text whitespace-pre-wrap font-mono">{fullDoc.extracted_text || '텍스트 없음'}</pre>
</div>
{:else if viewerType === 'cad'}
<div class="flex flex-col items-center justify-center h-full gap-3">
<p class="text-sm text-dim">CAD 미리보기 (향후 지원 예정)</p>
<a href="https://web.autocad.com" target="_blank" rel="noopener" class="px-3 py-1.5 text-sm bg-accent text-white rounded hover:bg-accent-hover">AutoCAD Web에서 열기</a>
<a
href="https://web.autocad.com"
target="_blank"
class="px-3 py-1.5 text-sm bg-accent text-white rounded hover:bg-accent-hover"
>AutoCAD Web에서 열기</a>
</div>
{:else if viewerType === 'article'}
<!-- 뉴스 전용 뷰어 -->
<div class="p-5 max-w-3xl mx-auto">
<h1 class="text-lg font-bold mb-2 text-text">{fullDoc.title}</h1>
<h1 class="text-lg font-bold mb-2">{fullDoc.title}</h1>
<div class="flex items-center gap-2 mb-4 text-xs text-dim">
{#if fullDoc.ai_tags?.length}
{#each fullDoc.ai_tags.filter(t => t.startsWith('News/')) as tag}
<span class="px-1.5 py-0.5 rounded bg-accent/15 text-accent-hover">{tag.replace('News/', '')}</span>
<span class="px-1.5 py-0.5 rounded bg-blue-900/30 text-blue-400">{tag.replace('News/', '')}</span>
{/each}
{/if}
<span>{new Date(fullDoc.created_at).toLocaleDateString('ko-KR', { year: 'numeric', month: 'short', day: 'numeric', hour: '2-digit', minute: '2-digit' })}</span>
</div>
<MarkdownDoc
documentId={fullDoc.id}
mdContent={fullDoc.md_content}
mdStatus={fullDoc.md_status}
mdExtractionError={fullDoc.md_extraction_error}
mdExtractionQuality={fullDoc.md_extraction_quality}
extractedText={fullDoc.extracted_text}
class="{PROSE} mb-6"
/>
{#if fullDoc.edit_url}
<div class="flex items-center gap-3 pt-4 border-t border-default">
<a href={fullDoc.edit_url} target="_blank" rel="noopener noreferrer"
class="flex items-center gap-1 px-3 py-1.5 text-sm bg-accent text-white rounded-lg hover:bg-accent-hover">
<div class="markdown-body mb-6">
{@html renderMd(fullDoc.extracted_text || '')}
</div>
<div class="flex items-center gap-3 pt-4 border-t border-default">
{#if fullDoc.edit_url}
<a
href={fullDoc.edit_url}
target="_blank"
rel="noopener noreferrer"
class="flex items-center gap-1 px-3 py-1.5 text-sm bg-accent text-white rounded-lg hover:bg-accent-hover"
>
<ExternalLink size={14} /> 원문 보기
</a>
</div>
{/if}
{/if}
</div>
</div>
{:else}
<div class="flex items-center justify-center h-full"><p class="text-sm text-dim">미리보기를 지원하지 않는 형식입니다 ({fullDoc.file_format})</p></div>
<div class="flex items-center justify-center h-full">
<p class="text-sm text-dim">미리보기를 지원하지 않는 형식입니다 ({fullDoc.file_format})</p>
</div>
{/if}
{/if}
</div>
</div>
</div>
+1 -21
View File
@@ -28,9 +28,6 @@
mdStatus?: string | null;
mdExtractionError?: string | null;
mdExtractionQuality?: Record<string, unknown> | null;
/** 개요 점프용 anchor: {chunk_id: md_content char offset}. 렌더 전 해당 위치에
* <span id="sec-{chunk_id}"> 주입(점프 타깃). buildAnchorMap(outlineAnchors) 산출물. */
anchorMap?: Record<number, number> | null;
placeholder?: string;
/** 추가 래퍼 클래스. tailwind prose-* / spacing 등을 호출 측에서 입혀야 할 때. */
class?: string;
@@ -44,27 +41,10 @@
mdStatus = null,
mdExtractionError = null,
mdExtractionQuality = null,
anchorMap = null,
placeholder = '*텍스트 추출 대기 중*',
class: klass = '',
}: Props = $props();
// 개요 anchor 주입: body 의 각 offset(내림차순)에 빈 <span id="sec-N"> 삽입(점프 타깃).
// offset 은 buildAnchorMap 이 body 와 동일 문자열 기준으로 산출했어야 함(호출측 책임).
function spliceAnchors(text: string, map: Record<number, number> | null): string {
if (!map) return text;
const ents = Object.entries(map)
.map(([id, off]) => [id, Number(off)] as [string, number])
.filter(([, o]) => Number.isFinite(o) && o >= 0 && o <= text.length)
.sort((a, b) => b[1] - a[1]);
if (!ents.length) return text;
let out = text;
for (const [id, off] of ents) {
out = out.slice(0, off) + `<span id="sec-${id}" class="md-anchor"></span>\n` + out.slice(off);
}
return out;
}
let usingMarkdown = $derived(!!(mdContent && mdContent.trim()));
let body = $derived(
usingMarkdown
@@ -73,7 +53,7 @@
? extractedText
: placeholder,
);
let renderedHtml = $derived(renderDocMarkdown(spliceAnchors(body, anchorMap)));
let renderedHtml = $derived(renderDocMarkdown(body));
let frontmatterEntries = $derived.by(() => {
if (!usingMarkdown || !mdFrontmatter) return [] as [string, unknown][];
@@ -9,7 +9,7 @@
*
* 정책 (사용자 결정):
* - pending 은 표시 안 함 (legacy 9792 건에 모두 노출되는 시각적 노이즈 회피).
* - processing/success/partial/skipped/failed 5 상태 표시 (partial = 대형 split 일부 실패).
* - processing/success/skipped/failed 4 상태 표시.
* - success 도 작은 chip 으로 노출 — 1D pilot 에서 markdown 화면 식별용.
* - skipped/failed 는 tooltip 으로 reason/error 보조 표시.
*
@@ -77,18 +77,11 @@
case 'processing':
return { tone: 'accent', label: 'Markdown 변환 중', tooltip: null };
case 'success':
case 'completed': // API field_validator 가 DB 'success'→'completed' remap (S1 backend) — 동의어
return {
tone: 'success',
label: 'Markdown',
tooltip: qualitySummary(mdExtractionQuality),
};
case 'partial':
return {
tone: 'warning',
label: 'Markdown 일부',
tooltip: qualitySummary(mdExtractionQuality) ?? mdExtractionError ?? null,
};
case 'skipped':
return {
tone: 'neutral',
@@ -15,12 +15,8 @@
interface Props {
sections: DocumentSection[];
/** 항목 클릭 시 본문 점프 콜백(부모가 #sec-{chunkId} scrollIntoView). 없으면 아코디언만. */
onJump?: (chunkId: number) => void;
/** scroll-spy 현재 절(chunk_id) — 강조용. */
activeKey?: number | null;
}
let { sections, onJump, activeKey = null }: Props = $props();
let { sections }: Props = $props();
let layout = $derived(groupOrFlat(sections));
let total = $derived(sections.length);
@@ -41,17 +37,15 @@
{#snippet itemRow(item: OutlineItem)}
{@const s = item.section}
{@const open = selectedId === s.chunk_id}
{@const active = activeKey != null && activeKey === s.chunk_id}
{@const typeLabel = sectionTypeLabel(s.section_type)}
<li>
<button
type="button"
onclick={() => { toggle(item); onJump?.(s.chunk_id); }}
onclick={() => toggle(item)}
aria-expanded={open}
aria-current={active ? 'true' : undefined}
class={[
'w-full text-left px-2 py-1.5 rounded-md text-xs flex items-start gap-1.5 transition-colors border-l-2',
open ? 'bg-surface-active text-text border-accent' : active ? 'bg-surface text-accent-hover border-accent' : 'text-dim hover:bg-surface hover:text-text border-transparent',
'w-full text-left px-2 py-1.5 rounded-md text-xs flex items-start gap-1.5 transition-colors',
open ? 'bg-surface-active text-text' : 'text-dim hover:bg-surface hover:text-text',
].join(' ')}
>
<span class="flex-1 min-w-0 leading-snug break-words">{title(s)}</span>
@@ -1,78 +0,0 @@
<script>
/**
* 학습 진단 패널 (study_diagnosis surface) — 이드 코치 표면.
*
* 워커(study_weakness)가 산출한 최신 약점 스냅샷을 코치 언어로 번역. 데이터 없으면 status='none'.
* LLM 호출이라 버튼 트리거(자동 호출 X). /study/diagnosis 와 /study/topics 양쪽에서 재사용.
*/
import { api } from '$lib/api';
import { addToast } from '$lib/stores/toast';
import { Activity } from 'lucide-svelte';
import Button from '$lib/components/ui/Button.svelte';
import Card from '$lib/components/ui/Card.svelte';
import Skeleton from '$lib/components/ui/Skeleton.svelte';
import { renderMathMarkdown } from '$lib/utils/mathMarkdown';
let { class: className = '' } = $props();
let diag = $state(null); // StudyDiagnosisResponse | null
let diagLoading = $state(false);
async function generateDiagnosis() {
if (diagLoading) return;
diagLoading = true;
try {
diag = await api('/study-topics/diagnosis/generate', { method: 'POST' });
} catch {
addToast('error', '진단 생성 실패');
} finally {
diagLoading = false;
}
}
function fmtDiagTime(s) {
if (!s) return '';
const d = new Date(s);
if (isNaN(d.getTime())) return '';
return d.toLocaleString('ko-KR', { month: 'short', day: 'numeric', hour: '2-digit', minute: '2-digit' });
}
</script>
<Card class={className}>
{#snippet children()}
<div class="p-3">
<div class="flex items-center justify-between gap-2">
<div class="flex items-center gap-2 min-w-0">
<Activity size={16} class="text-accent shrink-0" />
<span class="text-sm font-semibold text-text">학습 진단</span>
<span class="text-[11px] text-faint truncate hidden sm:inline">누적 풀이 약점·학습 태도 코치</span>
</div>
<Button onclick={generateDiagnosis} size="sm" variant={diag ? 'ghost' : 'primary'} loading={diagLoading}>
{diag ? '새로고침' : '진단 생성'}
</Button>
</div>
{#if diagLoading}
<div class="mt-3 space-y-2">
<Skeleton w="w-full" h="h-4" /><Skeleton w="w-5/6" h="h-4" /><Skeleton w="w-2/3" h="h-4" />
</div>
{:else if diag && diag.status === 'ready'}
<div class="markdown-body math-area mt-3 text-sm leading-relaxed text-text">{@html renderMathMarkdown(diag.content)}</div>
{#if diag.review_set_draft_id}
<div class="mt-2.5 inline-block text-xs text-accent-hover bg-accent/10 rounded-md px-2.5 py-1.5">
권장 복습세트 초안 #{diag.review_set_draft_id} — 복습함에서 1클릭 확인 후 편성
</div>
{/if}
<div class="mt-2 text-[11px] text-faint">
{#if diag.snapshot_at}스냅샷 {fmtDiagTime(diag.snapshot_at)}{/if}{#if diag.generated_at} · 생성 {fmtDiagTime(diag.generated_at)}{/if}{#if diag.model} · {diag.model}{/if}
</div>
{:else if diag && diag.status === 'none'}
<p class="mt-3 text-xs text-dim leading-relaxed">
아직 진단할 약점 데이터가 없습니다. 학습 주제를 <b class="text-text">공부중</b>으로 표시하면 매일 새벽 누적 풀이에서 약점·태도 스냅샷이 만들어지고, 여기서 진단 코치를 받을 수 있습니다.
</p>
{:else}
<p class="mt-3 text-xs text-dim leading-relaxed">
누적 학습 이력을 근거로 약점 토픽과 학습 태도를 진단합니다. <span class="text-text font-medium">진단 생성</span>을 눌러보세요.
</p>
{/if}
</div>
{/snippet}
</Card>
-25
View File
@@ -1,25 +0,0 @@
// md_status 어휘 단일 source.
//
// DB CHECK enum 은 'success' 이지만, API 직렬화 시 field_validator
// `_db_success_to_completed`(app/api/documents.py) 가 'success' → 'completed' 로 remap 한다
// (S1 backend). 나머지 상태(pending/processing/partial/skipped/failed)는 양쪽 동일.
//
// 따라서 프론트는 두 어휘를 모두 "성공" 으로 취급해야 S1 backend 배포 전(API='success')·
// 후(API='completed') 모두 안전하다. (DB↔API enum divergence guard — md_status 비교는
// 반드시 이 헬퍼 경유, raw `=== 'success'` / `=== 'completed'` 산재 금지.)
/** DB 'success' 또는 API 'completed' = 변환 성공(markdown 준비됨). */
export function isMdSuccess(status: string | null | undefined): boolean {
return status === 'success' || status === 'completed';
}
/** md상태 칩 렌더 대상 상태. pending/null 은 숨김(legacy 대량 노이즈 회피). */
export function isMdStatusVisible(status: string | null | undefined): boolean {
return (
status === 'processing' ||
isMdSuccess(status) ||
status === 'partial' ||
status === 'skipped' ||
status === 'failed'
);
}
@@ -1,128 +0,0 @@
// 순수함수 회귀 테스트. 실행(로컬, 의존성 0): node --test src/lib/utils/outlineAnchors.test.ts
// (Node ≥23 또는 22.6+ --experimental-strip-types — TS 타입 네이티브 strip.)
import { test } from 'node:test';
import assert from 'node:assert/strict';
import { buildAnchorMap } from './outlineAnchors.ts';
import { type DocumentSection } from './headingPath.ts';
let _id = 0;
function sec(p: Partial<DocumentSection>): DocumentSection {
return {
chunk_id: ++_id,
section_title: null,
heading_path: null,
level: null,
node_type: null,
is_leaf: true,
section_type: null,
summary: null,
confidence: null,
...p,
};
}
const md = (lines: string[]) => lines.join('\n');
const lineOff = (lines: string[], idx: number) => {
let o = 0;
for (let i = 0; i < idx; i++) o += lines[i].length + 1;
return o;
};
test('ATX heading 정확 매칭 + offset', () => {
const lines = ['# 개요', '본문 a', '## 설계 기준', '본문 b'];
const s = [
sec({ chunk_id: 101, section_title: '개요' }),
sec({ chunk_id: 102, section_title: '설계 기준' }),
];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[101], lineOff(lines, 0));
assert.equal(r.anchors[102], lineOff(lines, 2));
assert.equal(r.matched, 2);
});
test('★ false early match 방어 — 상호참조가 heading 보다 먼저', () => {
const lines = ['# 개요', '본 절은 Part UW 를 참조한다.', '내용', '# Part UW', '강판'];
const s = [
sec({ chunk_id: 1, section_title: '개요' }),
sec({ chunk_id: 2, section_title: 'Part UW' }),
];
const r = buildAnchorMap(md(lines), s);
// 상호참조(line 1)가 아니라 실제 heading(line 3)으로
assert.equal(r.anchors[2], lineOff(lines, 3));
assert.notEqual(r.anchors[2], lineOff(lines, 1));
});
test('중복 제목 — 단조 커서로 N번째 출현 매칭', () => {
const lines = ['## General', 'a', '## Scope', 'b', '## General', 'c'];
const s = [
sec({ chunk_id: 1, section_title: 'General' }),
sec({ chunk_id: 2, section_title: 'Scope' }),
sec({ chunk_id: 3, section_title: 'General' }),
];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 0)); // 첫 General
assert.equal(r.anchors[2], lineOff(lines, 2)); // Scope
assert.equal(r.anchors[3], lineOff(lines, 4)); // 둘째 General (오점프 아님)
});
test('prefix 가드 — 제1조 가 제1조의2 를 오매칭 안 함', () => {
const lines = ['# 제1조의2', 'x', '# 제1조', 'y'];
const s = [sec({ chunk_id: 1, section_title: '제1조' })];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 2)); // 제1조의2(line0) 아님
});
test('비-ATX 평문 제N조 (전체-라인 매칭)', () => {
const lines = ['제1조(목적) 이 법은 OO 을 정한다.', '본문', '제2조(정의) 용어는...'];
const s = [
sec({ chunk_id: 1, section_title: '제1조(목적) 이 법은 OO 을 정한다.', node_type: 'clause' }),
sec({ chunk_id: 2, section_title: '제2조(정의) 용어는...', node_type: 'clause' }),
];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 0));
assert.equal(r.anchors[2], lineOff(lines, 2));
});
test('window 조각 skip (anchor 없음)', () => {
const lines = ['## 절', 'aaa', 'bbb'];
const s = [
sec({ chunk_id: 1, section_title: '절' }),
sec({ chunk_id: 2, section_title: '절', node_type: 'window' }), // 부모 제목 상속 조각
];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 0));
assert.equal(r.anchors[2], undefined); // window = 점프 비활성
assert.equal(r.total, 1);
});
test('코드펜스 내부 heading 제외', () => {
const lines = ['```', '# General', '```', '# General', 'x'];
const s = [sec({ chunk_id: 1, section_title: 'General' })];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 3)); // 펜스 밖
});
test('miss = anchor 없음 (점프 비활성, 오점프 아님)', () => {
const lines = ['# 개요', '본문'];
const s = [
sec({ chunk_id: 1, section_title: '개요' }),
sec({ chunk_id: 2, section_title: '존재하지 않는 절' }),
];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 0));
assert.equal(r.anchors[2], undefined);
assert.equal(r.total, 2);
assert.equal(r.matched, 1);
});
test('heading_path 마지막 세그먼트 fallback', () => {
const lines = ['# 도입', 'x'];
const s = [sec({ chunk_id: 1, section_title: null, heading_path: 'A > 도입' })];
const r = buildAnchorMap(md(lines), s);
assert.equal(r.anchors[1], lineOff(lines, 0));
});
test('빈 입력 안전', () => {
assert.deepEqual(buildAnchorMap('', [sec({ section_title: 'x' })]).anchors, {});
assert.deepEqual(buildAnchorMap('# x', []).anchors, {});
assert.deepEqual(buildAnchorMap(null, null).anchors, {});
});
-101
View File
@@ -1,101 +0,0 @@
// 개요(절 목차) → 본문 deterministic 점프용 anchor offset 산출 (경로 A: FE-only).
//
// hier 절(section_title)은 md_content 의 heading 라인에서 나왔으나(builder.py build_hier_tree,
// md_content 순수함수), 비-ATX(제N조/Chapter)는 본문에 markdown heading 요소·id 가 안 생기고
// 중복 제목(표-1·Part UW…)이 흔해 슬러그·textContent 매칭이 깨진다. 그래서 md_content 에서
// 각 절의 heading 위치(char offset)를 직접 찾아 <a id="sec-{chunk_id}"> 를 주입할 좌표를 만든다.
//
// ★ false early match 방어 3중 (리뷰 반영):
// 1. 라인-시작(전체-라인) 매칭 — 본문 중간 상호참조("see Part UW for…")는 라인 전체가 제목과
// 같지 않으므로 제외. heading 라인(선두 #/리스트마커 제거 후 전체)만 매칭.
// 2. 전체 매칭 + truncation 처리 — 'first-N-chars' prefix 금지('제1조'가 '제1조의2' 오매칭 차단).
// builder 가 KO/ENG 제목을 [:200] truncate 하므로 truncated(매우 긴 제목)일 때만 startsWith.
// 3. 단조 커서 + 코드펜스 회피 — 매칭은 직전 매칭 다음 라인부터(역행 불가) + ``` ~~~ 펜스 내부 제외.
// 미스/역행은 anchor 없음 = 점프 비활성(아코디언 폴백). 오점프보다 무점프.
//
// ⚠ 잔여 한계: 본문 앞 '목차(TOC)'가 절 제목을 단독 라인으로 순서대로 나열하면 커서가 TOC 를
// 먼저 잡을 수 있다(연쇄 시프트). 4-1 의 '정확도' 측정으로 검출 — 빈번하면 경로 B(builder offset).
import { cleanHeading, type DocumentSection } from './headingPath.ts';
const TRUNCATE_HINT = 180; // builder.py 가 KO/ENG 제목을 [:200] 으로 자름 → 거의 그 길이면 truncated 로 간주
function norm(s: string | null | undefined): string {
return cleanHeading(s).toLowerCase();
}
/** 한 라인을 heading 후보 텍스트로: 선두 ATX #(1~6) / 리스트마커(-*+) / blockquote(>) 제거 후 정규화. */
function normLine(raw: string): string {
const stripped = raw.replace(/^\s{0,3}(?:#{1,6}\s+|[-*+]\s+|>\s+)?/, '');
return cleanHeading(stripped).toLowerCase();
}
export interface AnchorMapResult {
/** chunk_id → md_content 내 heading 라인 시작 char offset. (없으면 점프 비활성) */
anchors: Record<number, number>;
/** 후보(비-window·제목有) 절 수 — 4-1 커버리지 분모. */
total: number;
/** 신뢰 anchor 수 — 4-1 커버리지 분자. (정확도는 별도 수작업 검증) */
matched: number;
}
/**
* sections chunk_index ( ) (GET /documents/{id}/sections ORDER BY).
*/
export function buildAnchorMap(
mdContent: string | null | undefined,
sections: DocumentSection[] | null | undefined,
): AnchorMapResult {
const anchors: Record<number, number> = {};
if (!mdContent || !sections || sections.length === 0) {
return { anchors, total: 0, matched: 0 };
}
// 라인별 (offset, 정규화 텍스트, 펜스 여부) 사전계산.
const rawLines = mdContent.split('\n');
const lines: { off: number; norm: string }[] = [];
let off = 0;
let inFence = false;
for (const raw of rawLines) {
const fenceToggle = /^\s{0,3}(```|~~~)/.test(raw);
const fencedHere = inFence || fenceToggle; // 펜스 경계 라인도 매칭 제외
lines.push({ off, norm: fencedHere ? '' : normLine(raw) });
if (fenceToggle) inFence = !inFence;
off += raw.length + 1; // '\n'
}
let cursor = 0; // 단조 전진 라인 인덱스
let total = 0;
let matched = 0;
for (const s of sections) {
// window/section_split 조각은 자체 heading 없음(부모 제목 상속) → 건너뜀.
if (s.node_type === 'window' || s.node_type === 'section_split') continue;
let nt = norm(s.section_title);
if (!nt && s.heading_path) {
const last = s.heading_path.split('>').pop();
nt = norm(last);
}
if (!nt) continue;
total++;
const truncated = nt.length >= TRUNCATE_HINT;
let foundIdx = -1;
for (let i = cursor; i < lines.length; i++) {
const ln = lines[i].norm;
if (!ln) continue; // 빈 라인 / 펜스 내부
if (ln === nt || (truncated && ln.startsWith(nt))) {
foundIdx = i;
break;
}
}
if (foundIdx >= 0) {
anchors[s.chunk_id] = lines[foundIdx].off;
cursor = foundIdx + 1; // 단조: 다음 절은 이 라인 이후만
matched++;
}
// 미스 → anchor 없음(점프 비활성, 폴백)
}
return { anchors, total, matched };
}
-46
View File
@@ -1,46 +0,0 @@
// 뷰어 타입 분류 단일 source — 상세페이지(/documents/[id])와 3-pane 중앙 리더
// (DocumentViewer)가 공유한다. 두 곳이 각자 getViewerType 을 두면 csv/hwp/office 분기가
// drift 하므로(이원화 재발) 여기 하나로 수렴한다.
//
// ⚠ 소비 컴포넌트는 이 함수가 낼 수 있는 모든 ViewerType 에 render 분기가 있어야 한다.
// (분류 통합 ≠ render 통합 — 양쪽 컴포넌트의 {#if viewerType===...} 에 누락 없는지 확인.)
export type ViewerType =
| 'article'
| 'markdown'
| 'hwp-markdown'
| 'pdf'
| 'preview-pdf'
| 'image'
| 'text'
| 'synology'
| 'cad'
| 'unsupported';
const MARKDOWN = new Set(['md', 'txt']);
// csv/json/xml/html 은 markdown 으로 렌더하면 콤마/행이 한 문단으로 뭉친다 → <pre> 로 원형 보존.
const TEXT = new Set(['csv', 'json', 'xml', 'html']);
const HWP = new Set(['hwp', 'hwpx']);
// LibreOffice headless → PDF preview (/preview) 로 인앱 표시.
const OFFICE_PREVIEW = new Set(['docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp']);
// Synology Office 네이티브 — 인앱 변환 부적합, 외부 편집기로.
const SYNOLOGY = new Set(['odoc', 'osheet']);
const IMAGE = new Set(['jpg', 'jpeg', 'png', 'gif', 'bmp', 'tiff']);
const CAD = new Set(['dwg', 'dxf']);
export function getViewerType(
format: string | null | undefined,
sourceChannel?: string | null,
): ViewerType {
if (sourceChannel === 'news') return 'article';
const f = (format ?? '').toLowerCase();
if (MARKDOWN.has(f)) return 'markdown';
if (f === 'pdf') return 'pdf';
if (HWP.has(f)) return 'hwp-markdown';
if (OFFICE_PREVIEW.has(f)) return 'preview-pdf';
if (SYNOLOGY.has(f)) return 'synology';
if (IMAGE.has(f)) return 'image';
if (TEXT.has(f)) return 'text';
if (CAD.has(f)) return 'cad';
return 'unsupported';
}
+2 -16
View File
@@ -3,7 +3,7 @@
import { browser } from '$app/environment';
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { Menu, EllipsisVertical, ChevronDown, FileText, Newspaper, HelpCircle, StickyNote, Inbox, PanelLeft } from 'lucide-svelte';
import { Menu, EllipsisVertical, ChevronDown, FileText, Newspaper, HelpCircle, StickyNote, Inbox } from 'lucide-svelte';
import { isAuthenticated, user, tryRefresh, logout } from '$lib/stores/auth';
import { toasts, removeToast } from '$lib/stores/toast';
import { refresh as refreshPublicConfig } from '$lib/stores/config';
@@ -32,15 +32,6 @@
let menuOpen = $state(false); // ⋮ 설정 메뉴
let navMenu = $state(''); // '' | 'docs' | 'news' — 상단 드롭다운
// 데스크탑 분류(소스트리) 사이드바 접기/펴기 — localStorage 기억. 접으면 콘텐츠가 넓어짐.
let sidebarCollapsed = $state(
typeof localStorage !== 'undefined' ? localStorage.getItem('sidebarCollapsed') === 'true' : false
);
function toggleSidebarCollapse() {
sidebarCollapsed = !sidebarCollapsed;
if (typeof localStorage !== 'undefined') localStorage.setItem('sidebarCollapsed', String(sidebarCollapsed));
}
function isActive(path) {
return $page.url.pathname.startsWith(path);
}
@@ -94,11 +85,6 @@
<div class="lg:hidden">
<IconButton icon={Menu} size="sm" aria-label="사이드바" onclick={() => ui.openDrawer('sidebar')} />
</div>
<div class="hidden lg:block">
<IconButton icon={PanelLeft} size="sm" aria-label={sidebarCollapsed ? '사이드바 펴기' : '사이드바 접기'}
aria-pressed={!sidebarCollapsed} title={sidebarCollapsed ? '사이드바 펴기' : '사이드바 접기'}
onclick={toggleSidebarCollapse} />
</div>
{/if}
<a href="/" class="flex items-center gap-2 shrink-0">
<span class="w-7 h-7 rounded-md bg-accent text-white grid place-items-center text-[10px] font-extrabold tracking-wide">DS</span>
@@ -164,7 +150,7 @@
<!-- 메인: 데스크탑 상시 사이드바 + 콘텐츠 -->
<div class="flex-1 min-h-0 flex">
{#if showSidebar}
<aside class="hidden lg:block shrink-0 overflow-hidden transition-[width] duration-200 ease-out {sidebarCollapsed ? 'w-0 border-r-0' : 'w-sidebar border-r border-default'}">
<aside class="hidden lg:block w-sidebar shrink-0 overflow-hidden border-r border-default">
<Sidebar />
</aside>
{/if}
+7 -7
View File
@@ -195,7 +195,7 @@
</script>
<div class="p-4 lg:p-8">
<div class="max-w-[1680px] mx-auto">
<div class="max-w-5xl mx-auto">
<!-- ═══ 인사 헤더 ═══ -->
<div class="flex items-baseline gap-2.5 flex-wrap">
@@ -211,7 +211,7 @@
<Skeleton w="w-full" h="h-4" class="mt-4" />
<Skeleton w="w-2/3" h="h-4" class="mt-2" />
</div>
<div class="grid grid-cols-1 lg:grid-cols-[1fr_360px] gap-5">
<div class="grid grid-cols-1 lg:grid-cols-[1fr_320px] gap-5">
<div class="space-y-5">
<div class="bg-surface border border-default rounded-card p-5"><Skeleton w="w-24" h="h-4" /><Skeleton w="w-full" h="h-10" class="mt-3" /></div>
<div class="bg-surface border border-default rounded-card p-5"><Skeleton w="w-24" h="h-4" /><Skeleton w="w-full" h="h-40" class="mt-3" /></div>
@@ -280,7 +280,7 @@
</div>
<!-- ═══ 2열 본문 ═══ -->
<div class="grid grid-cols-1 lg:grid-cols-[1fr_360px] gap-5 items-start">
<div class="grid grid-cols-1 lg:grid-cols-[1fr_320px] gap-5 items-start">
<!-- ─── 왼쪽 ─── -->
<div class="space-y-5">
@@ -329,13 +329,13 @@
<div class="flex flex-col">
{#each summary.recent_documents as doc, i (doc.id)}
<a href="/documents/{doc.id}"
class="grid grid-cols-[auto_14px_minmax(0,1fr)] gap-x-3 py-2.5 {i > 0 ? 'border-t border-default' : ''} group">
class="grid grid-cols-[auto_14px_1fr] gap-x-3 py-2.5 {i > 0 ? 'border-t border-default' : ''} group">
<div class="text-[10px] text-faint text-right pt-1 whitespace-nowrap tabular-nums w-14">{formatTime(doc.created_at)}</div>
<div class="flex flex-col items-center">
<span class="w-2 h-2 rounded-full mt-1.5 shrink-0 {domainBgClass(doc.ai_domain)}"></span>
{#if i < summary.recent_documents.length - 1}<span class="flex-1 w-px bg-default mt-1"></span>{/if}
</div>
<div class="pb-1 min-w-0">
<div class="pb-1">
<div class="text-[10px] font-bold uppercase tracking-wide text-dim mb-0.5">{domainLabel(doc.ai_domain)}</div>
<div class="text-[13px] text-text leading-snug group-hover:text-accent transition-colors truncate">{doc.title || '제목 없음'}</div>
</div>
@@ -382,7 +382,7 @@
{#each domainDist.slice(0, 6) as d (d.name)}
<a href="/documents?domain={encodeURIComponent(d.name)}" class="flex items-center gap-2 text-xs hover:text-accent transition-colors group">
<span class="w-2.5 h-2.5 rounded-sm shrink-0 {domainBgClass(d.name)}"></span>
<span class="flex-1 min-w-0 text-text truncate group-hover:text-accent">{domainLabel(d.name)}</span>
<span class="flex-1 text-text truncate group-hover:text-accent">{domainLabel(d.name)}</span>
<span class="font-semibold text-dim tabular-nums">{d.count.toLocaleString()}</span>
</a>
{/each}
@@ -410,7 +410,7 @@
{#each pinnedMemos as memo (memo.id)}
<a href="/memos" class="flex items-start gap-2.5 px-3 py-2.5 rounded-lg bg-bg hover:bg-surface-hover transition-colors">
<span class="text-[9px] font-bold rounded px-1.5 py-0.5 uppercase tracking-wide shrink-0 mt-0.5 text-accent-hover bg-accent/10">메모</span>
<span class="text-xs text-text leading-snug flex-1 min-w-0 break-words">{pinTitle(memo)}</span>
<span class="text-xs text-text leading-snug flex-1">{pinTitle(memo)}</span>
<Pin size={11} class="text-faint shrink-0 mt-0.5" />
</a>
{/each}
+4 -4
View File
@@ -204,8 +204,8 @@
<div class="h-full overflow-auto">
<!-- 상단 검색바 (sticky) -->
<div class="sticky top-0 z-10 bg-bg/80 backdrop-blur border-b border-default px-4 py-3">
<div class="flex flex-wrap items-center gap-2 max-w-[1680px] mx-auto">
<div class="relative flex-1 min-w-0">
<div class="flex items-center gap-2 max-w-5xl mx-auto">
<div class="relative flex-1">
<Search
size={14}
class="absolute left-3 top-1/2 -translate-y-1/2 text-dim pointer-events-none"
@@ -234,7 +234,7 @@
<select
bind:value={selectedBackend}
title="Backend 선택 — silent fallback 0 정책 (선택한 backend 만 시도, 실패 시 503)."
class="py-2 px-2 bg-surface border border-default rounded-lg text-text text-xs focus:border-accent outline-none min-w-0 max-w-[42vw] truncate"
class="py-2 px-2 bg-surface border border-default rounded-lg text-text text-xs focus:border-accent outline-none"
>
<option value="auto">Auto (router)</option>
<option value="mac-mini-default">Mac mini (default)</option>
@@ -261,7 +261,7 @@
</div>
<!-- 본문 -->
<div class="max-w-[1680px] mx-auto p-4">
<div class="max-w-5xl mx-auto p-4">
{#if backendUnavailable}
<div class="py-16">
<EmptyState
+1 -1
View File
@@ -53,7 +53,7 @@
}
</script>
<div class="p-6 max-w-[1680px] mx-auto">
<div class="p-6 max-w-[1200px] mx-auto">
<header class="flex items-center gap-2 mb-4">
<Mic size={20} />
<h1 class="text-xl font-semibold">Audio</h1>
+1 -1
View File
@@ -428,7 +428,7 @@
/* ── App shell ── */
.app {
max-width: 1680px;
max-width: 1180px;
margin: 0 auto;
background: var(--surface);
min-height: 100vh;
+1 -3
View File
@@ -10,8 +10,6 @@
import { addToast } from '$lib/stores/toast';
import { Info, X, Plus, Trash2, Tag, FolderTree, Sparkles, ChevronLeft, ArrowUpDown } from 'lucide-svelte';
import DocumentViewer from '$lib/components/DocumentViewer.svelte';
import MarkdownStatusBadge from '$lib/components/MarkdownStatusBadge.svelte';
import { isMdStatusVisible } from '$lib/utils/mdStatus';
import UploadDropzone from '$lib/components/UploadDropzone.svelte';
import Drawer from '$lib/components/ui/Drawer.svelte';
import Modal from '$lib/components/ui/Modal.svelte';
@@ -680,7 +678,7 @@
{#if doc.ai_sub_group}<div class="flex justify-between gap-2 text-xs py-1"><span class="text-dim">하위</span><span class="text-text font-medium text-right truncate">{doc.ai_sub_group}</span></div>{/if}
<div class="flex justify-between gap-2 text-xs py-1"><span class="text-dim">수정</span><span class="text-text font-medium text-right">{shortDate(doc.updated_at || doc.created_at)}</span></div>
{#if size}<div class="flex justify-between gap-2 text-xs py-1"><span class="text-dim">원본</span><span class="text-text font-medium text-right">{size}</span></div>{/if}
{#if isMdStatusVisible(doc.md_status)}<div class="flex items-center justify-between gap-2 text-xs py-1"><span class="text-dim">md 상태</span><MarkdownStatusBadge mdStatus={doc.md_status} mdExtractionError={doc.md_extraction_error} mdExtractionQuality={doc.md_extraction_quality} /></div>{/if}
{#if doc.md_status}<div class="flex justify-between gap-2 text-xs py-1"><span class="text-dim">md 상태</span><span class="font-medium text-right {doc.md_status === 'completed' ? 'text-success' : 'text-warning'}">{doc.md_status}</span></div>{/if}
{#if doc.read_count}<div class="flex justify-between gap-2 text-xs py-1"><span class="text-dim">읽음</span><span class="text-text font-medium text-right">{doc.read_count}</span></div>{/if}
</div>
</div>
@@ -6,8 +6,6 @@
import { page } from '$app/stores';
import { goto } from '$app/navigation';
import { api, getAccessToken } from '$lib/api';
import { isMdSuccess } from '$lib/utils/mdStatus';
import { buildAnchorMap } from '$lib/utils/outlineAnchors';
import { addToast } from '$lib/stores/toast';
import { marked } from 'marked';
import DOMPurify from 'dompurify';
@@ -149,7 +147,7 @@
let pdfViewMode = $state('markdown'); // 'markdown' | 'pdf'
let lastDocId = $state(null);
let canShowMarkdown = $derived(
!!(isMdSuccess(doc?.md_status) && doc?.md_content?.trim())
!!(doc?.md_status === 'success' && doc?.md_content?.trim())
);
$effect(() => {
@@ -164,45 +162,6 @@
}
});
// ── 개요 점프 (outlineAnchors, 경로 A) ──
// anchorMap = md_content 의 각 절 heading offset. MarkdownDoc 가 <span id="sec-N"> 주입.
let anchorMap = $derived(
hasSections && canShowMarkdown && doc?.md_content
? buildAnchorMap(doc.md_content, sections).anchors
: {}
);
let activeKey = $state(null);
function jumpToSection(chunkId) {
const el = document.getElementById(`sec-${chunkId}`);
if (el) el.scrollIntoView({ behavior: 'smooth', block: 'start' });
}
// scroll-spy: 화면 상단(120px)을 지난 마지막 .md-anchor = 현재 절. [id] 는 window 스크롤.
$effect(() => {
void anchorMap; // 문서/섹션 변화 시 재바인딩
if (typeof window === 'undefined') return;
let raf = 0;
const onScroll = () => {
if (raf) return;
raf = requestAnimationFrame(() => {
raf = 0;
let cur = null;
document.querySelectorAll('.md-anchor').forEach((a) => {
if (a.getBoundingClientRect().top <= 120) cur = a;
});
if (cur) {
const m = cur.id.match(/^sec-(\d+)$/);
if (m) activeKey = Number(m[1]);
}
});
};
window.addEventListener('scroll', onScroll, { passive: true });
onScroll();
return () => {
window.removeEventListener('scroll', onScroll);
if (raf) cancelAnimationFrame(raf);
};
});
function getViewerType(format) {
if (['md', 'txt', 'csv', 'html'].includes(format)) return 'markdown';
if (format === 'pdf') return 'pdf';
@@ -269,7 +228,7 @@
<!-- 좌측 절 목차 — xl+ sticky rail (그 아래 viewport 는 본문 상단 collapsible) -->
<aside class="hidden xl:block xl:sticky xl:top-6 xl:self-start xl:max-h-[calc(100vh-3rem)] xl:overflow-y-auto">
<Card>
<SectionOutline {sections} onJump={jumpToSection} {activeKey} />
<SectionOutline {sections} />
</Card>
</aside>
{/if}
@@ -280,7 +239,7 @@
<!-- xl 미만: 절 목차 접이식 -->
<details class="xl:hidden">
<summary class="cursor-pointer text-sm text-dim px-1 py-2 select-none">절 목차 ({sections.length})</summary>
<Card class="mt-2"><SectionOutline {sections} onJump={jumpToSection} {activeKey} /></Card>
<Card class="mt-2"><SectionOutline {sections} /></Card>
</details>
{/if}
<!-- Affordance row -->
@@ -329,7 +288,6 @@
mdStatus={doc.md_status}
mdExtractionError={doc.md_extraction_error}
mdExtractionQuality={doc.md_extraction_quality}
anchorMap={anchorMap}
extractedText={doc.extracted_text || rawMarkdown}
class="prose prose-invert prose-base lg:prose-sm max-w-none"
/>
+3 -3
View File
@@ -223,7 +223,7 @@
<title>events · hyungi PKM</title>
</svelte:head>
<div class="mx-auto max-w-[1240px] space-y-6 px-4 py-6 sm:px-6 lg:px-8">
<div class="mx-auto max-w-3xl space-y-6 px-4 py-6">
<header class="flex items-end justify-between gap-3">
<div class="space-y-1">
<h1 class="text-2xl font-semibold">events</h1>
@@ -278,13 +278,13 @@
<li>
<Card class="flex items-start gap-3 p-3 {KIND_COLOR[item.kind]}">
<div class="flex-1 min-w-0">
<div class="flex min-w-0 items-center gap-2 text-xs text-slate-500">
<div class="flex items-center gap-2 text-xs text-slate-500">
<span>{KIND_LABEL[item.kind]}</span>
<span class="rounded px-1.5 py-0.5 text-[10px] {STATUS_COLOR[item.status]}">
{STATUS_LABEL[item.status]}
</span>
{#if item.project_tag}
<span class="min-w-0 break-all text-slate-400">#{item.project_tag}</span>
<span class="text-slate-400">#{item.project_tag}</span>
{/if}
</div>
<a href="/events/{item.id}" class="mt-1 block break-words text-sm font-medium hover:underline">
+2 -2
View File
@@ -229,7 +229,7 @@
}
</script>
<div class="p-4 lg:p-6 max-w-[1240px] mx-auto">
<div class="p-4 lg:p-6 max-w-5xl mx-auto">
<!-- 헤더 -->
<div class="flex items-center justify-between mb-4">
<div class="flex items-center gap-3">
@@ -355,7 +355,7 @@
<span class="text-faint"><FormatIcon format={doc.file_format} size={14} /></span>
<a
href="/documents/{doc.id}"
class="text-sm font-medium text-text hover:text-accent truncate min-w-0"
class="text-sm font-medium text-text hover:text-accent truncate"
>
{doc.title || '제목 없음'}
</a>
+6 -6
View File
@@ -438,7 +438,7 @@
<div class="p-4 lg:p-6">
<!-- breadcrumb -->
<div class="flex flex-wrap items-center gap-2 text-sm mb-4 text-dim">
<div class="flex items-center gap-2 text-sm mb-4 text-dim">
<a href="/documents" class="hover:text-text">문서</a>
<span class="text-faint">/</span>
<span class="text-text">자료실</span>
@@ -448,7 +448,7 @@
<button
type="button"
onclick={() => navigate(activePath.split('/').slice(0, i + 1).join('/'))}
class="hover:text-text min-w-0 truncate max-w-[40vw]"
class="hover:text-text"
>
{segment}
</button>
@@ -457,14 +457,14 @@
</div>
<!-- 승인 대기함 (§2) — ai_suggestion.proposed_category='library' 문서 -->
<div class="max-w-[1680px] mx-auto mb-4">
<div class="max-w-7xl mx-auto mb-4">
<SuggestionReview
proposedCategory="library"
onChange={handleSuggestionChange}
/>
</div>
<div class="max-w-[1680px] mx-auto grid grid-cols-1 lg:grid-cols-12 gap-6">
<div class="max-w-7xl mx-auto grid grid-cols-1 lg:grid-cols-12 gap-6">
<!-- 왼쪽: 트리 (5/12) -->
<aside class="lg:col-span-5 xl:col-span-4">
<div class="bg-surface border border-default rounded-card p-3">
@@ -532,14 +532,14 @@
<button
onclick={() => navigate(n.path)}
class="flex-1 min-w-0 flex items-center justify-between px-2 py-1.5 rounded-md text-sm transition-colors
class="flex-1 flex items-center justify-between px-2 py-1.5 rounded-md text-sm transition-colors
{isActive
? 'bg-accent/15 text-accent'
: isParent
? 'text-text'
: 'text-dim hover:bg-surface-hover hover:text-text'}"
>
<span class="truncate min-w-0">{n.name}</span>
<span class="truncate">{n.name}</span>
<span class="text-xs text-dim shrink-0 ml-2">{n.count}</span>
</button>
+2 -4
View File
@@ -656,14 +656,12 @@
</div>
<style>
.memo-content { overflow-wrap: anywhere; word-break: break-word; }
.memo-content :global(p) { margin: 0.2em 0; }
.memo-content :global(ul), .memo-content :global(ol) { margin: 0.2em 0; padding-left: 1.5em; }
.memo-content :global(li) { margin: 0.1em 0; }
.memo-content :global(code) { background: var(--bg); padding: 0.1em 0.3em; border-radius: 3px; font-size: 0.85em; overflow-wrap: anywhere; word-break: break-word; }
.memo-content :global(code) { background: var(--bg); padding: 0.1em 0.3em; border-radius: 3px; font-size: 0.85em; }
.memo-content :global(pre) { background: var(--bg); padding: 0.75em; border-radius: 6px; overflow-x: auto; margin: 0.5em 0; }
.memo-content :global(table) { display: block; overflow-x: auto; max-width: 100%; }
.memo-content :global(a) { color: var(--accent); overflow-wrap: anywhere; word-break: break-word; }
.memo-content :global(a) { color: var(--accent); }
.memo-content :global(blockquote) { border-left: 3px solid var(--border-default); padding-left: 0.75em; color: var(--text-dim); margin: 0.5em 0; }
.memo-content :global(.memo-checkbox) {
cursor: pointer;
+1 -12
View File
@@ -3,7 +3,7 @@
// 주제로 보기(퀴즈·복습·통계) / 자료 학습 / 필사 세션 / 암기카드 검수.
import { onMount } from 'svelte';
import { api } from '$lib/api';
import { BookOpen, PenLine, GraduationCap, FolderKanban, Layers, Repeat, Flag, Inbox, Activity } from 'lucide-svelte';
import { BookOpen, PenLine, GraduationCap, FolderKanban, Layers, Repeat, Flag, Inbox } from 'lucide-svelte';
let cardReviewCount = $state(0);
let questionFlagCount = $state(0);
@@ -38,17 +38,6 @@
<p class="text-xs text-dim">"가스기사" 같은 학습 주제 아래에 필기 세션과 자료를 함께 묶어 본다. 한 주제 안에서 필기·자료를 한눈에.</p>
</a>
<a
href="/study/diagnosis"
class="block mb-3 p-5 rounded-lg border border-default bg-surface hover:border-accent hover:bg-accent/5 transition-colors"
>
<div class="flex items-center gap-2 mb-2">
<Activity size={18} class="text-accent" />
<h2 class="text-base font-semibold text-text">학습 진단</h2>
</div>
<p class="text-xs text-dim">누적 풀이 이력에서 약점 토픽과 학습 태도를 코치(이드)가 진단합니다. 매일 새벽 약점 스냅샷을 만들고, 권장 복습세트 초안까지 제안.</p>
</a>
<div class="grid grid-cols-1 sm:grid-cols-2 gap-3">
<a
href="/study/sources"
@@ -1,31 +0,0 @@
<script>
/**
* /study/diagnosis — 학습 진단(이드 코치) 전용 페이지.
*
* 누적 풀이 약점·학습 태도를 코치 언어로 진단하는 cross-topic 표면. 허브(/study)에서 진입.
* 패널 본체는 공유 컴포넌트 StudyDiagnosisPanel (/study/topics 상단에도 동일 노출).
*/
import { ArrowLeft, Activity } from 'lucide-svelte';
import StudyDiagnosisPanel from '$lib/components/StudyDiagnosisPanel.svelte';
</script>
<svelte:head><title>학습 진단 — 공부</title></svelte:head>
<div class="p-4 md:p-6 max-w-5xl mx-auto">
<div class="flex items-center gap-2 text-xs md:text-sm mb-3">
<a href="/study" class="text-dim hover:text-text flex items-center gap-1">
<ArrowLeft size={14} /> 공부
</a>
<span class="text-faint">/</span>
<span class="text-text font-medium flex items-center gap-1.5">
<Activity size={14} class="text-accent" /> 학습 진단
</span>
</div>
<header class="mb-4">
<h1 class="text-lg font-semibold text-text">학습 진단</h1>
<p class="text-xs text-dim mt-1">누적 풀이 이력을 근거로 약점 토픽과 학습 태도를 코치가 진단합니다. 약점·수치는 매일 새벽 약점 스냅샷에서만 인용되며, 스냅샷에 없는 토픽은 만들지 않습니다.</p>
</header>
<StudyDiagnosisPanel />
</div>
@@ -21,7 +21,6 @@
import TextInput from '$lib/components/ui/TextInput.svelte';
import Select from '$lib/components/ui/Select.svelte';
import Textarea from '$lib/components/ui/Textarea.svelte';
import StudyDiagnosisPanel from '$lib/components/StudyDiagnosisPanel.svelte';
const STUDY_TYPE_OPTIONS = [
{ value: '', label: '미지정' },
@@ -206,9 +205,6 @@
<p class="text-xs text-dim mt-1">한 주제 아래에 필기 세션과 자료를 묶어 보고 진도 관리. 향후 단어장·오디오·문제세트도 같은 묶음으로 연결됩니다.</p>
</header>
<!-- 이드 학습 진단 (공유 컴포넌트 — /study/diagnosis 와 동일 패널) -->
<StudyDiagnosisPanel class="mb-4" />
<!-- 새 주제 -->
<Card class="mb-4">
{#snippet children()}
+1 -1
View File
@@ -50,7 +50,7 @@
}
</script>
<div class="p-6 max-w-[1680px] mx-auto">
<div class="p-6 max-w-[1400px] mx-auto">
<header class="flex items-center gap-2 mb-4">
<Film size={20} />
<h1 class="text-xl font-semibold">Video</h1>
+40
View File
@@ -0,0 +1,40 @@
-- 301_eid_study_weakness.sql
-- 이드 학습 약점 스냅샷 (append-only derived-fact). eid_study_weakness 워커가 study_question_progress
-- + study_quiz_sessions 집계로 산출(LLM 0). study_diagnosis 표면이 최신 행을 읽어 코치 발화.
--
-- ★ append-only 구조강제 (project_eid_persona_substrate 불변식 #8) — 2중:
-- (1) INSERT 스탬프 누락 거부: actor·source_generated_at = NOT NULL·DEFAULT 없음
-- → 스탬프 없는 INSERT 를 DB 가 거부. NOT NULL 은 owner 포함 모든 role 에 적용(role 독립).
-- (2) UPDATE/DELETE 차단: CREATE RULE ... DO INSTEAD NOTHING → 행 불변(owner·superuser 독립).
--
-- ★ 설계 원안 'REVOKE UPDATE,DELETE' 정정(load-bearing): 단일 DB role `pkm` 이 테이블 OWNER 라
-- REVOKE 가 무효(owner 는 GRANT/REVOKE 우회). plpgsql trigger(RAISE)는 migration 검증기가
-- 본문의 BEGIN 키워드를 거부(_validate_sql_content)해 불가. → RULE 이 owner 독립 + 검증기 통과하는
-- 유일한 구조 enforcement(silent no-op, 행은 구조적으로 불변). 별도 read-only role 미존재.
--
-- ★ '현재' 스냅샷 = 최신 created_at 행(WHERE status='active'). 상태전이 UPDATE 없음(append-only).
-- dispute = status='disputed' + supersedes_id 로 특정 스냅샷 무효화(새 INSERT). 표면이 disputed 제외.
--
-- runner = exec_driver_sql(simple protocol) → multi-statement 처리(001_initial_schema 선례, 18 stmt).
-- BEGIN/COMMIT/ROLLBACK 없음(검증기 통과). CREATE RULE 은 IF NOT EXISTS 미지원 → OR REPLACE 로 idempotent.
CREATE TABLE IF NOT EXISTS eid_study_weakness (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
weaknesses JSONB NOT NULL, -- [{topic_id,topic,chronic,relapsed,leech,coverage_gap,unsure,overdue,trend,tier}]
habit_signals JSONB NOT NULL, -- {avoidance_topics,session_abandon_rate,stale_due_count,skew_topics}
trend_label VARCHAR(20) NOT NULL, -- overall 개선|정체|악화 (코드 산출)
sample_attempts INTEGER NOT NULL DEFAULT 0,
is_shallow_sample BOOLEAN NOT NULL DEFAULT false,
status VARCHAR(20) NOT NULL DEFAULT 'active', -- 'active' | 'disputed'(dispute marker)
supersedes_id BIGINT REFERENCES eid_study_weakness(id) ON DELETE SET NULL,
actor VARCHAR(20) NOT NULL, -- 스탬프(no default) = 'eid'
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프(no default) = 집계 시점
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE OR REPLACE RULE eid_study_weakness_no_update AS ON UPDATE TO eid_study_weakness DO INSTEAD NOTHING;
CREATE OR REPLACE RULE eid_study_weakness_no_delete AS ON DELETE TO eid_study_weakness DO INSTEAD NOTHING;
CREATE INDEX IF NOT EXISTS idx_eid_weakness_user_current
ON eid_study_weakness (user_id, created_at DESC) WHERE status = 'active';
@@ -1,16 +0,0 @@
-- 301_eid_study_weakness_table.sql — 301_eid_study_weakness.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE TABLE IF NOT EXISTS eid_study_weakness (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
weaknesses JSONB NOT NULL, -- [{topic_id,topic,chronic,relapsed,leech,coverage_gap,unsure,overdue,trend,tier}]
habit_signals JSONB NOT NULL, -- {avoidance_topics,session_abandon_rate,stale_due_count,skew_topics}
trend_label VARCHAR(20) NOT NULL, -- overall 개선|정체|악화 (코드 산출)
sample_attempts INTEGER NOT NULL DEFAULT 0,
is_shallow_sample BOOLEAN NOT NULL DEFAULT false,
status VARCHAR(20) NOT NULL DEFAULT 'active', -- 'active' | 'disputed'(dispute marker)
supersedes_id BIGINT REFERENCES eid_study_weakness(id) ON DELETE SET NULL,
actor VARCHAR(20) NOT NULL, -- 스탬프(no default) = 'eid'
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프(no default) = 집계 시점
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
+26
View File
@@ -0,0 +1,26 @@
-- 302_eid_review_set_draft.sql
-- 이드 복습세트 초안 (append-only derived-fact). 워커가 약점 스냅샷에서 권장 복습세트를 '제안'만 한다.
-- study overlay 항목6: "복습세트를 실제 복습 큐에 편성은 자율로 못 한다 — 초안만 제시, 사용자 1클릭".
-- 실제 편성(study_question_progress.due_at 편집)은 별도 T2 액션 — 이 draft 는 불변 제안 기록.
--
-- append-only 구조강제(=301 동일): actor·source_generated_at NOT NULL no-default(스탬프) + RULE(불변).
-- 상태전이 없음 — '현재 제안' = 최신 created_at. 새 제안은 supersedes_id 로 이전 것 가리킴(새 INSERT).
-- question_ids = ordered list[int] snapshot (study_quiz_sessions.question_ids 패턴, junction 안 씀).
CREATE TABLE IF NOT EXISTS eid_review_set_draft (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
study_topic_id BIGINT REFERENCES study_topics(id) ON DELETE CASCADE, -- nullable = cross-topic 세트
question_ids JSONB NOT NULL, -- ordered list[int]
reason VARCHAR(40) NOT NULL, -- chronic | relapse | coverage | overdue
actor VARCHAR(20) NOT NULL, -- 스탬프 = 'eid'
source_weakness_id BIGINT REFERENCES eid_study_weakness(id) ON DELETE SET NULL,
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프
supersedes_id BIGINT REFERENCES eid_review_set_draft(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE OR REPLACE RULE eid_review_set_draft_no_update AS ON UPDATE TO eid_review_set_draft DO INSTEAD NOTHING;
CREATE OR REPLACE RULE eid_review_set_draft_no_delete AS ON DELETE TO eid_review_set_draft DO INSTEAD NOTHING;
CREATE INDEX IF NOT EXISTS idx_eid_review_draft_user ON eid_review_set_draft (user_id, created_at DESC);
@@ -1,3 +0,0 @@
-- 302_eid_study_weakness_no_update.sql — 301_eid_study_weakness.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_study_weakness_no_update AS ON UPDATE TO eid_study_weakness DO INSTEAD NOTHING;
@@ -1,3 +0,0 @@
-- 303_eid_study_weakness_no_delete.sql — 301_eid_study_weakness.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_study_weakness_no_delete AS ON DELETE TO eid_study_weakness DO INSTEAD NOTHING;
+27
View File
@@ -0,0 +1,27 @@
-- 303_eid_weekly_recap.sql
-- 이드 주간 회고 카드 (append-only derived-fact). 회고 워커(scaffold, 미배선 — W4/Phase2)가 산출.
-- recap overlay: 'T1 write 자율 eid_weekly_recap(append-only)'. 미결 액션아이템 open/done UPDATE 는
-- events 측(가변)이지 이 카드가 아님 — 카드 자체는 불변 스냅샷.
-- 현재는 통합 migration 의 scaffold 테이블(dispatch enum WRITE_WEEKLY_RECAP 의 write target 예약).
--
-- append-only 구조강제(=301 동일): 스탬프 NOT NULL no-default + RULE(불변). '현재' = 최신 created_at.
CREATE TABLE IF NOT EXISTS eid_weekly_recap (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
period_start DATE NOT NULL,
period_end DATE NOT NULL,
recap JSONB NOT NULL, -- {activity_summary, open_action_items:[event_id], recurring_topics}
trend_label VARCHAR(20),
status VARCHAR(20) NOT NULL DEFAULT 'active', -- 'active' | 'disputed'
supersedes_id BIGINT REFERENCES eid_weekly_recap(id) ON DELETE SET NULL,
actor VARCHAR(20) NOT NULL, -- 스탬프 = 'eid'
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE OR REPLACE RULE eid_weekly_recap_no_update AS ON UPDATE TO eid_weekly_recap DO INSTEAD NOTHING;
CREATE OR REPLACE RULE eid_weekly_recap_no_delete AS ON DELETE TO eid_weekly_recap DO INSTEAD NOTHING;
CREATE INDEX IF NOT EXISTS idx_eid_recap_user_current
ON eid_weekly_recap (user_id, created_at DESC) WHERE status = 'active';
+24
View File
@@ -0,0 +1,24 @@
-- 304_approval_requests.sql
-- 외부 전송 승인 큐 (★ 가변 workflow queue — append-only 아님). 설계 3-4 명시 카브아웃:
-- "approval_requests 는 status 를 pending→approved 로 바꾸는 가변 state 라 eid_* 불변 REVOKE/RULE 대상 아님".
-- → 여기엔 RULE(append-only) 안 건다. status 전이(UPDATE) 허용.
--
-- ★ Phase1 현재: app/eid/tools/dispatch.py 의 request_external_approval = 즉시 거부(INSERT 0).
-- dispatcher 워커(유일 egress 집행)는 Phase3. 이 테이블은 그때까지 scaffold(빈 상태).
-- ★ payload 는 고정 템플릿 슬롯만(free-form 금지) — app 층이 request_type 별 화이트리스트 검증.
-- 승인 UI 는 전송 body 전문 diff 노출. 불변 결정 원장이 필요하면 별도 append-only approval_events(Phase3).
CREATE TABLE IF NOT EXISTS approval_requests (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
request_type VARCHAR(40) NOT NULL, -- 고정 템플릿 슬롯 타입(app 화이트리스트)
payload JSONB NOT NULL, -- 고정 템플릿 슬롯만
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending | approved | rejected (전이 허용)
requester VARCHAR(20) NOT NULL, -- 'eid'
decided_by VARCHAR(40),
decided_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_approval_requests_status ON approval_requests (status, created_at);
@@ -1,4 +0,0 @@
-- 304_eid_study_weakness_idx.sql — 301_eid_study_weakness.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE INDEX IF NOT EXISTS idx_eid_weakness_user_current
ON eid_study_weakness (user_id, created_at DESC) WHERE status = 'active';
@@ -1,14 +0,0 @@
-- 305_eid_review_set_draft_table.sql — 302_eid_review_set_draft.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE TABLE IF NOT EXISTS eid_review_set_draft (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
study_topic_id BIGINT REFERENCES study_topics(id) ON DELETE CASCADE, -- nullable = cross-topic 세트
question_ids JSONB NOT NULL, -- ordered list[int]
reason VARCHAR(40) NOT NULL, -- chronic | relapse | coverage | overdue
actor VARCHAR(20) NOT NULL, -- 스탬프 = 'eid'
source_weakness_id BIGINT REFERENCES eid_study_weakness(id) ON DELETE SET NULL,
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프
supersedes_id BIGINT REFERENCES eid_review_set_draft(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
+33
View File
@@ -0,0 +1,33 @@
-- 305_eid_schedule_views.sql
-- 이드 일정(schedule_brief, 미래 surface) 파생뷰 2. 신규 schedule 테이블 0 — events/events_history 재활용.
-- quadrant(중요×긴급)·D-N 정렬은 app 층(schedule overlay). 뷰는 raw 입력 필드 + today/defer 집계만.
-- CREATE VIEW 선례 = 010_soft_delete / 283_corpus_chunks. BEGIN/COMMIT 없음.
--
-- v_schedule_today: 오늘(Asia/Seoul local day) 활성 일정. active 필터 = events.py:list_today reference.
-- today 경계 = Seoul 자정→UTC 변환(date_trunc ... AT TIME ZONE 왕복). LATERAL 로 1회 계산.
-- v_schedule_defer_pattern: events_history change_kind IN(defer,reschedule) 를 event_id 별 COUNT.
-- '반복 미룸' 임계 3회+ (schedule overlay 판단근거 #5). reactivate 는 제외.
CREATE OR REPLACE VIEW v_schedule_today AS
SELECT e.id, e.user_id, e.title, e.kind, e.status, e.priority,
e.due_at, e.start_at, e.end_at, e.started_at, e.defer_until, e.project_tag
FROM events e
CROSS JOIN LATERAL (
SELECT (date_trunc('day', now() AT TIME ZONE 'Asia/Seoul') AT TIME ZONE 'Asia/Seoul') AS lo
) b
WHERE (e.status IN ('inbox','next','scheduled','in_progress')
OR (e.status = 'deferred' AND e.defer_until IS NOT NULL AND e.defer_until <= now()))
AND (
(e.kind = 'task' AND e.due_at >= b.lo AND e.due_at < b.lo + interval '1 day')
OR (e.kind = 'calendar_event' AND e.start_at >= b.lo AND e.start_at < b.lo + interval '1 day')
OR (e.kind = 'activity_log' AND e.started_at >= b.lo AND e.started_at < b.lo + interval '1 day')
);
CREATE OR REPLACE VIEW v_schedule_defer_pattern AS
SELECT eh.event_id,
COUNT(*)::int AS defer_reschedule_count,
MAX(eh.changed_at) AS last_changed_at,
(COUNT(*) >= 3) AS is_repeat_defer
FROM events_history eh
WHERE eh.change_kind IN ('defer','reschedule')
GROUP BY eh.event_id;
@@ -1,3 +0,0 @@
-- 306_eid_review_set_draft_no_update.sql — 302_eid_review_set_draft.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_review_set_draft_no_update AS ON UPDATE TO eid_review_set_draft DO INSTEAD NOTHING;
@@ -1,3 +0,0 @@
-- 307_eid_review_set_draft_no_delete.sql — 302_eid_review_set_draft.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_review_set_draft_no_delete AS ON DELETE TO eid_review_set_draft DO INSTEAD NOTHING;
@@ -1,3 +0,0 @@
-- 308_eid_review_set_draft_idx.sql — 302_eid_review_set_draft.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE INDEX IF NOT EXISTS idx_eid_review_draft_user ON eid_review_set_draft (user_id, created_at DESC);
-15
View File
@@ -1,15 +0,0 @@
-- 309_eid_weekly_recap_table.sql — 303_eid_weekly_recap.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE TABLE IF NOT EXISTS eid_weekly_recap (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
period_start DATE NOT NULL,
period_end DATE NOT NULL,
recap JSONB NOT NULL, -- {activity_summary, open_action_items:[event_id], recurring_topics}
trend_label VARCHAR(20),
status VARCHAR(20) NOT NULL DEFAULT 'active', -- 'active' | 'disputed'
supersedes_id BIGINT REFERENCES eid_weekly_recap(id) ON DELETE SET NULL,
actor VARCHAR(20) NOT NULL, -- 스탬프 = 'eid'
source_generated_at TIMESTAMPTZ NOT NULL, -- 스탬프
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
@@ -1,3 +0,0 @@
-- 310_eid_weekly_recap_no_update.sql — 303_eid_weekly_recap.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_weekly_recap_no_update AS ON UPDATE TO eid_weekly_recap DO INSTEAD NOTHING;
@@ -1,3 +0,0 @@
-- 311_eid_weekly_recap_no_delete.sql — 303_eid_weekly_recap.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE RULE eid_weekly_recap_no_delete AS ON DELETE TO eid_weekly_recap DO INSTEAD NOTHING;
-4
View File
@@ -1,4 +0,0 @@
-- 312_eid_weekly_recap_idx.sql — 303_eid_weekly_recap.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE INDEX IF NOT EXISTS idx_eid_recap_user_current
ON eid_weekly_recap (user_id, created_at DESC) WHERE status = 'active';
@@ -1,14 +0,0 @@
-- 313_approval_requests_table.sql — 304_approval_requests.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE TABLE IF NOT EXISTS approval_requests (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
request_type VARCHAR(40) NOT NULL, -- 고정 템플릿 슬롯 타입(app 화이트리스트)
payload JSONB NOT NULL, -- 고정 템플릿 슬롯만
status VARCHAR(20) NOT NULL DEFAULT 'pending', -- pending | approved | rejected (전이 허용)
requester VARCHAR(20) NOT NULL, -- 'eid'
decided_by VARCHAR(40),
decided_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-3
View File
@@ -1,3 +0,0 @@
-- 314_approval_requests_idx.sql — 304_approval_requests.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE INDEX IF NOT EXISTS idx_approval_requests_status ON approval_requests (status, created_at);
@@ -1,16 +0,0 @@
-- 315_eid_schedule_views_v_schedule_today.sql — 305_eid_schedule_views.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE VIEW v_schedule_today AS
SELECT e.id, e.user_id, e.title, e.kind, e.status, e.priority,
e.due_at, e.start_at, e.end_at, e.started_at, e.defer_until, e.project_tag
FROM events e
CROSS JOIN LATERAL (
SELECT (date_trunc('day', now() AT TIME ZONE 'Asia/Seoul') AT TIME ZONE 'Asia/Seoul') AS lo
) b
WHERE (e.status IN ('inbox','next','scheduled','in_progress')
OR (e.status = 'deferred' AND e.defer_until IS NOT NULL AND e.defer_until <= now()))
AND (
(e.kind = 'task' AND e.due_at >= b.lo AND e.due_at < b.lo + interval '1 day')
OR (e.kind = 'calendar_event' AND e.start_at >= b.lo AND e.start_at < b.lo + interval '1 day')
OR (e.kind = 'activity_log' AND e.started_at >= b.lo AND e.started_at < b.lo + interval '1 day')
);
@@ -1,10 +0,0 @@
-- 316_eid_schedule_views_v_schedule_defer_pattern.sql — 305_eid_schedule_views.sql 분리본 (single-statement). asyncpg 러너가 prepared statement 로 처리해 multi-statement 거부 → 1 stmt/파일. 내용 불변.
CREATE OR REPLACE VIEW v_schedule_defer_pattern AS
SELECT eh.event_id,
COUNT(*)::int AS defer_reschedule_count,
MAX(eh.changed_at) AS last_changed_at,
(COUNT(*) >= 3) AS is_repeat_defer
FROM events_history eh
WHERE eh.change_kind IN ('defer','reschedule')
GROUP BY eh.event_id;
-18
View File
@@ -1,18 +0,0 @@
-- 317_documents_dedup_fields.sql
-- S1-ADD (plan ds-s1-backend-1, A-1): 원본 파일명 + 중복검사 메타 3컬럼.
-- 계약: ds-app contract/CONTRACT.md [S1-ADD] — original_filename / duplicate_of / duplicate_count.
--
-- asyncpg exec_driver_sql 단일 statement 제약 — ALTER TABLE 다중 ADD COLUMN 절은 단일 statement 라 허용.
-- BEGIN/COMMIT 금지. PG 16: ADD COLUMN ... DEFAULT <constant> 는 fast path (table rewrite 없음).
-- duplicate_of self-FK 는 신규 all-NULL 컬럼이라 검증 스캔 trivial (NOT VALID 불요).
-- ON DELETE SET NULL: 원본(canonical) hard delete 허용 (RESTRICT=삭제 차단 / CASCADE=사본 연쇄삭제 위험 회피).
-- 기존 dup 그룹(law_monitor 제외)의 duplicate_of/duplicate_count backfill 은 B-4 별 배치 스크립트.
-- 28,941행 대량 UPDATE 를 startup migration(단일 트랜잭션)에 넣지 않는다.
--
-- original_filename 은 original_format(ODF 변환용)·original_path/original_hash(migration 007 legacy dead,
-- app 코드 미참조 — P0-1 grep 0건) 와 의미가 다르다: 업로드 시점 원본 파일명(다운로드 라벨용).
ALTER TABLE documents
ADD COLUMN IF NOT EXISTS original_filename TEXT,
ADD COLUMN IF NOT EXISTS duplicate_of BIGINT REFERENCES documents(id) ON DELETE SET NULL,
ADD COLUMN IF NOT EXISTS duplicate_count INTEGER NOT NULL DEFAULT 0;
-90
View File
@@ -1,90 +0,0 @@
"""기존 file_hash 중복 그룹 backfill — plan ds-s1-backend-1 B-4.
목적:
A-1 migration 287 추가된 duplicate_of / duplicate_count *기존* 중복 그룹에 채운다.
migration(단일 트랜잭션) 분리한 배치(database.py:29-30 정책 대량 UPDATE
startup migration 넣지 않는다). 업로드 시점 채움(B-1) 신규 행만 다루므로 과거는 스크립트.
판정:
- file_hash exact 그룹(OFF-whitelist=law_monitor 제외, deleted 제외, count>1).
near_duplicate 영속화 보류(on-the-fly) 여기서 다루지 않는다.
- canonical = 그룹 최古(min id). canonical.duplicate_of=NULL, duplicate_count=group_size-1.
- -canonical 멤버 = duplicate_of=canonical, duplicate_count=0.
안전:
- 멱등 이미 목표값인 행은 UPDATE (재실행 안전). --dry-run 적용될 정확한 set 미리보기.
- --chunk(기본 500)/txn 청크 커밋 28,941 단일 트랜잭션 lock 회피.
실행:
docker compose exec fastapi python /app/scripts/backfill_dedup.py --dry-run
docker compose exec fastapi python /app/scripts/backfill_dedup.py --apply
# 변경 전 안전망은 E-3 pre-B-4 pg_dump (별 단계).
"""
import argparse
import asyncio
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from services.dedup import reconcile_dedup # 코어 재계산 (야간 잡과 공유)
async def run(*, apply: bool, chunk_size: int) -> int:
database_url = os.getenv(
"DATABASE_URL", "postgresql+asyncpg://pkm:pkm@localhost:5432/pkm"
)
engine = create_async_engine(database_url)
session_factory = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
try:
async with session_factory() as session:
result = await reconcile_dedup(session, apply=apply, chunk_size=chunk_size)
print(f"=== dedup 그룹 {result['groups']}개 · 관련 문서 {result['docs']}건 ===")
if result["groups"] == 0:
print("dedup 그룹 없음(OFF-whitelist 제외 후 count>1 없음) — 종료.")
return 0
already = result["docs"] - result["changes"]
print(f"변경 필요 {result['changes']}건 / 이미 목표값 {already}건 (멱등)")
if result["changes"] == 0:
print("모두 목표값 — 적용할 변경 없음.")
return 0
# 적용될/된 정확한 UPDATE set 미리보기 (상위 40건)
print("\n=== UPDATE set (id → duplicate_of / duplicate_count) ===")
for s in result["sample"]:
role = "canonical" if s["duplicate_of"] is None else f"dup→{s['duplicate_of']}"
print(
f" id={s['id']:>7} duplicate_of={s['duplicate_of']} "
f"duplicate_count={s['duplicate_count']} [{role}]"
)
if result["changes"] > len(result["sample"]):
print(f" ... 외 {result['changes'] - len(result['sample'])}")
if not apply:
print(f"\n[dry-run] {result['changes']}건 변경 예정. --apply 로 실제 적용.")
else:
print(f"\n[apply] 완료 — {result['applied']}건 갱신.")
return 0
finally:
await engine.dispose()
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--apply", action="store_true", help="실제 적용 (기본 dry-run)")
parser.add_argument("--dry-run", action="store_true", help="명시적 dry-run (default 동등)")
parser.add_argument("--chunk", type=int, default=500, help="txn 당 UPDATE 행 수 (기본 500)")
args = parser.parse_args()
if args.apply and args.dry_run:
parser.error("--apply 와 --dry-run 동시 지정 불가")
return asyncio.run(run(apply=args.apply, chunk_size=args.chunk))
if __name__ == "__main__":
sys.exit(main())
-146
View File
@@ -1,146 +0,0 @@
"""과거 office/hwp pending 문서 markdown stage 백필 — plan ds-s1-backend-1 C-4.
신규 ingest classifymarkdown 전이(queue_consumer.py:142) 자동 도달하므로 스크립트는
*과거* office/hwp 행만 다룬다. C-2 office_md 변환을 붙이기 전까지 markdown stage 에서
skip 행들을 다시 큐에 넣어 md_content 생성한다.
대상 (WHERE):
- file_format IN (office_md 지원 실값: docx, xlsx, pptx, hwp, hwpx)
제외 = file_format. INCLUDE 필터가 article(file_format='article') 구조적으로 배제
P0-3 가드(md 없는 article completed 도달 금지, correctness-critical). source_channel 불필요.
레거시 바이너리(.doc/.xls/.ppt) markitdown 미지원 기본 목록 제외(넣어도 marker skip).
- md_status = 'pending' (이미 success/failed/skipped 건드리지 않음)
- extracted_text IS NOT NULL (폴백 존재 모집단)
C-5 failed-postcondition 상속: 변환 실패는 md_status='failed' 시끄럽게 남는다(앱이
'변환 실패' 표시). extracted_text NULL office(폴백 없음) 배제 실패 시끄러운
집합이라 phase2 재검토(C-4 배제 honest).
스케줄:
C-2 라이브 office ingestion 백필 비중첩 markdown 워커는 BATCH=1 직렬이라
야간 단발로 돌려 라이브 office 업로드 stall 회피(plan C-2 reflection).
실행:
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --dry-run
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --apply
docker compose exec fastapi python /app/scripts/backfill_nonpdf_markdown.py --apply --limit 50
"""
import argparse
import asyncio
import json
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from sqlalchemy import bindparam, text
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
# office_md 가 실제 변환하는 file_format(확장자 소문자, 점 없음). 단일 source.
DEFAULT_FORMATS = ("docx", "xlsx", "pptx", "hwp", "hwpx")
CANDIDATES_SQL = text(
"""
SELECT id, file_format, title, file_path
FROM documents
WHERE deleted_at IS NULL
AND md_status = 'pending'
AND extracted_text IS NOT NULL
AND file_format IN :formats
ORDER BY id
"""
).bindparams(bindparam("formats", expanding=True))
# 활성 markdown 큐 행이 없는 doc 만 통과 (UNIQUE 부분 인덱스). 충돌 = silent skip.
ENQUEUE_SQL = text(
"""
INSERT INTO processing_queue (document_id, stage, status, payload)
VALUES (:doc_id, 'markdown', 'pending', CAST(:payload AS jsonb))
ON CONFLICT DO NOTHING
"""
)
def _chunks(seq, size):
for i in range(0, len(seq), size):
yield seq[i : i + size]
async def run(*, apply: bool, formats: tuple[str, ...], limit: int | None, chunk_size: int) -> int:
database_url = os.getenv(
"DATABASE_URL", "postgresql+asyncpg://pkm:pkm@localhost:5432/pkm"
)
engine = create_async_engine(database_url)
session_factory = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
try:
async with session_factory() as session:
rows = (
await session.execute(CANDIDATES_SQL, {"formats": list(formats)})
).all()
if limit:
rows = rows[:limit]
print(f"=== office/hwp pending 후보 = {len(rows)}건 (formats={','.join(formats)}) ===")
if not rows:
print("후보 없음 — 종료.")
return 0
by_fmt: dict[str, int] = {}
for r in rows:
by_fmt[r.file_format] = by_fmt.get(r.file_format, 0) + 1
print("포맷별:", ", ".join(f"{k}={v}" for k, v in sorted(by_fmt.items())))
for r in rows[:20]:
print(f" id={r.id:>7} {r.file_format:<5} {(r.title or '')[:70]}")
if len(rows) > 20:
print(f" ... 외 {len(rows) - 20}")
if not apply:
print(f"\n[dry-run] {len(rows)}건 markdown 큐 enqueue 예정. --apply 로 실제 적용.")
print(" (적용 전 C-2 라이브 office ingestion 과 비중첩 야간창 확인.)")
return 0
payload = json.dumps(
{"force_reprocess": True, "reason": "c4_nonpdf_markdown_backfill"}
)
inserted = 0
processed = 0
for batch in _chunks(rows, chunk_size):
for r in batch:
result = await session.execute(
ENQUEUE_SQL, {"doc_id": r.id, "payload": payload}
)
if result.rowcount > 0:
inserted += 1
await session.commit()
processed += len(batch)
print(f"[apply] {processed}/{len(rows)} 처리 (enqueue 누적 {inserted})")
print(f"\n[apply] 완료 — {inserted}/{len(rows)} 신규 markdown 큐 추가.")
print(" (skip = 이미 활성 markdown 큐 행이 있는 문서)")
return 0
finally:
await engine.dispose()
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--apply", action="store_true", help="실제 enqueue (기본 dry-run)")
parser.add_argument("--dry-run", action="store_true", help="명시적 dry-run (default 동등)")
parser.add_argument(
"--formats", type=str, default=",".join(DEFAULT_FORMATS),
help=f"쉼표 구분 file_format (기본 {','.join(DEFAULT_FORMATS)})",
)
parser.add_argument("--limit", type=int, default=None, help="후보 상한(샘플 검증용)")
parser.add_argument("--chunk", type=int, default=200, help="enqueue txn 청크 크기")
args = parser.parse_args()
if args.apply and args.dry_run:
parser.error("--apply 와 --dry-run 동시 지정 불가")
formats = tuple(f.strip().lower() for f in args.formats.split(",") if f.strip())
return asyncio.run(
run(apply=args.apply, formats=formats, limit=args.limit, chunk_size=args.chunk)
)
if __name__ == "__main__":
sys.exit(main())
-77
View File
@@ -1,77 +0,0 @@
#!/usr/bin/env python3
"""C-1 PoC 하니스 — office/hwp → md 변환 품질(특히 표 fidelity) 측정.
plan ds-s1-backend-1 C-1/E-1:
- hwp/hwpx 결과는 LibreOffice 버전 의존 **prod extract_worker 동일 버전(버전핀 안전컨텍스트)** 에서 실행해야
신호가 transfer . live worker job 태우는 아님(점유 0).
- OOXML markitdown(신규 dep): `pip install markitdown`.
- 샘플은 trivial 말고 **대표 복잡본**(법령·KGS 중심 .hwp/.hwpx, 병합셀/다중시트 xlsx).
사용:
python scripts/poc_office_md.py <file_or_dir> [<file_or_dir> ...]
# 예: 현 코퍼스 백필 후보(doc/docx/xls/xlsx/hwp) 샘플 디렉토리
python scripts/poc_office_md.py ~/poc_samples/
파일: 변환 성공 char/ 행수/heading 지표 + 본문 미리보기.
실패(OfficeMdError) FAILED 출력 이것이 C-5 md_status='failed' 라우팅할 케이스(설계대로).
"""
from __future__ import annotations
import os
import sys
from pathlib import Path
# app/ 를 path 에 (모듈 import 용).
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from workers.office_md import SUPPORTED, OfficeMdError, convert_office_to_md, table_fidelity # noqa: E402
def _iter_targets(args: list[str]):
for a in args:
p = Path(a).expanduser()
if p.is_dir():
for child in sorted(p.rglob("*")):
if child.is_file() and child.suffix.lower() in SUPPORTED:
yield child
elif p.is_file():
yield p
else:
print(f" (skip, 경로 없음: {p})")
def main(argv: list[str]) -> int:
if not argv:
print(__doc__)
return 2
targets = list(_iter_targets(argv))
if not targets:
print("변환 대상(.docx/.xlsx/.pptx/.hwp/.hwpx) 없음.")
return 1
ok = fail = 0
for path in targets:
print(f"\n=== {path.name} ({path.suffix.lower()}) ===")
try:
md = convert_office_to_md(path)
except OfficeMdError as e:
fail += 1
print(f" FAILED → (C-5 가 md_status='failed' 라우팅) : {e}")
continue
ok += 1
fid = table_fidelity(md)
print(f" OK chars={fid['chars']} lines={fid['lines']} "
f"table_rows={fid['table_pipe_rows']} (sep≈표수 {fid['table_separator_rows']}) "
f"heading={fid['has_heading']}")
preview = "\n".join(f" | {ln}" for ln in md.splitlines()[:12])
print(preview)
print(f"\n--- 합계: OK {ok} / FAILED {fail} / 총 {len(targets)} ---")
print("표 fidelity 가 낮으면(table_rows 0 등) 해당 포맷은 변환기/필터 재검토 — "
"OOXML↔markitdown, hwp/hwpx↔LibreOffice 경계를 데이터로 확정(C-1).")
return 0 if fail == 0 else 1
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))
-18
View File
@@ -1,18 +0,0 @@
-- rollback_317.sql — plan ds-s1-backend-1 E-3. migration 317(dedup 3컬럼) 되돌림.
--
-- ★ migrations/ 밖에 둔다 — init_db() 자동 스캔(NNN_*.sql) 대상이 아니므로 자동 적용되지 않는다.
-- 수동 실행 전용:
-- docker compose cp scripts/rollback_317.sql postgres:/tmp/rollback_317.sql
-- docker compose exec -T postgres psql -U pkm -d pkm -f /tmp/rollback_317.sql
-- (또는) docker compose exec -T postgres psql -U pkm -d pkm < scripts/rollback_317.sql
--
-- 주의: original_filename / duplicate_of / duplicate_count 데이터 영구 삭제(B-1 채움·B-4 backfill 결과 포함).
-- schema_migrations 의 317 행도 함께 제거해야 재적용(다음 startup)이 가능하다.
-- 전체 복원이 필요하면 E-3 pre-change pg_dump 를 쓴다(이 스크립트는 '컬럼만 빠른 롤백').
ALTER TABLE documents
DROP COLUMN IF EXISTS duplicate_of,
DROP COLUMN IF EXISTS duplicate_count,
DROP COLUMN IF EXISTS original_filename;
DELETE FROM schema_migrations WHERE version = 317;
-30
View File
@@ -1,30 +0,0 @@
#!/usr/bin/env bash
# pre-change pg_dump — plan ds-s1-backend-1 E-3.
# A-1(migration 287) / B-4 backfill 적용 *전* 안전망. repo cp -p 가 아니라 진짜 DB 덤프.
#
# 사용 (GPU 서버, repo 루트에서):
# bash scripts/s1_pre_change_backup.sh # pre-A-1
# bash scripts/s1_pre_change_backup.sh pre-b4 # pre-B-4 (라벨만 다름)
#
# 백업 위치 = repo 밖 (feedback_backup_outside_repo): $HOME/.local/share/ds-s1-backups/
set -euo pipefail
LABEL="${1:-pre-a1}"
DATE="$(date +%Y%m%d-%H%M%S)"
BACKUP_DIR="${BACKUP_DIR:-$HOME/.local/share/ds-s1-backups}"
mkdir -p "$BACKUP_DIR"
OUT="$BACKUP_DIR/pkm-${LABEL}-${DATE}.sql.gz"
echo "[s1-backup] pg_dump pkm → $OUT"
# 단일 pkm DB 덤프(pg_dumpall 아님). gzip 은 redirect(파일명 추측 함정 회피).
docker compose exec -T postgres pg_dump -U pkm -d pkm | gzip > "$OUT"
echo "[s1-backup] done: $(du -h "$OUT" | cut -f1)"
echo -n "[s1-backup] gzip 무결성: "
gzip -t "$OUT" && echo "OK"
echo
echo "[s1-backup] 롤백 옵션:"
echo " (a) 287 컬럼만 되돌림(빠름): scripts/rollback_287.sql 수동 실행"
echo " (b) 전체 복원: gunzip -c '$OUT' | docker compose exec -T postgres psql -U pkm -d pkm"
echo "[s1-backup] 보존 7일 권장. (DR-grade 검증은 ephemeral restore — D5 트랙, 본 안전망 범위 밖.)"
-96
View File
@@ -1,96 +0,0 @@
"""S1-ADD (plan ds-s1-backend-1) B-2 /duplicates shape + D-2 Range 파서 + dedup 상수 단위 검증.
순수 단위(DB 불요). 실행 환경 = app/ 의존성 설치 컨텍스트(devsbx/GPU) 기존
test_s1_dedup_shape.py 동일 부트스트랩. DB 타는 검증(find_canonical/near_dup/엔드포인트)
GPU read-only/통합 매트릭스(E-1)에서.
"""
from __future__ import annotations
import json
import logging
import os
import sys
from pathlib import Path
import pytest
# logs/ 가 운영 daemon 소유일 때 import-time FileHandler PermissionError 방어 (test 한정).
_orig_file_handler = logging.FileHandler
def _safe_file_handler(filename, *args, **kwargs): # type: ignore[no-untyped-def]
try:
return _orig_file_handler(filename, *args, **kwargs)
except PermissionError:
return logging.NullHandler()
logging.FileHandler = _safe_file_handler # type: ignore[assignment]
os.environ.setdefault("DATABASE_URL", "postgresql+asyncpg://test:test@localhost:5432/test")
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from api.documents import ( # noqa: E402
DuplicateGroup,
DuplicatesResponse,
_parse_byte_range,
)
from services.dedup import DEDUP_OFF_CHANNELS # noqa: E402
_FIXDIR = Path(os.path.expanduser("~/Documents/code/ds-app/contract/fixtures"))
# ── 1. /duplicates 응답 shape = contract fixture ───────────────────────────────
def test_duplicates_response_shape_matches_total_formula():
# 엔드포인트 정의: total_duplicate_docs = Σ(멤버수-1). fixture 와 동일해야 함.
groups = [
DuplicateGroup(canonical_id=4912, members=[4912, 4977], reason="content_hash"),
DuplicateGroup(canonical_id=5120, members=[5120, 5121, 5260], reason="content_hash"),
]
total_dup = sum(len(g.members) - 1 for g in groups)
resp = DuplicatesResponse(
groups=groups, total_groups=len(groups), total_duplicate_docs=total_dup
)
assert resp.total_groups == 2
assert resp.total_duplicate_docs == 3 # (2-1)+(3-1)
@pytest.mark.skipif(not _FIXDIR.exists(), reason="ds-app contract fixtures 미존재")
def test_duplicates_contract_fixture_decodes():
payload = json.loads((_FIXDIR / "documents_duplicates.json").read_text())
m = DuplicatesResponse.model_validate(payload)
assert m.total_groups == payload["total_groups"]
assert m.total_duplicate_docs == payload["total_duplicate_docs"]
# Σ(멤버수-1) 정의가 fixture total 과 일치(계약 self-consistency).
assert sum(len(g.members) - 1 for g in m.groups) == payload["total_duplicate_docs"]
assert m.groups[0].canonical_id == payload["groups"][0]["canonical_id"]
# ── 2. D-2 Range 파서 (원격 백엔드 pass-through; local 은 FileResponse 자동) ──────
@pytest.mark.parametrize(
"header,size,expected",
[
(None, 1000, (None, None)),
("", 1000, (None, None)),
("bytes=0-99", 1000, (0, 99)),
("bytes=100-", 1000, (100, 999)), # 끝까지
("bytes=-200", 1000, (800, 999)), # suffix: 마지막 200
("bytes=0-99999", 1000, (0, 999)), # end clamp
("bytes=2000-3000", 1000, (None, None)), # start >= size → 무효(전체)
("bytes=abc-def", 1000, (None, None)), # 파싱 실패
("bytes=50-10", 1000, (None, None)), # start>end
("bytes=0-99", 0, (None, None)), # 빈 파일
],
)
def test_parse_byte_range(header, size, expected):
assert _parse_byte_range(header, size) == expected
# ── 3. dedup OFF-whitelist 단일 source ─────────────────────────────────────────
def test_dedup_off_channels_is_law_monitor_only():
# P0-2 결정: 단일 OFF-list = law_monitor (법령 개정본 보존). 확장은 의도적 결정으로만.
assert DEDUP_OFF_CHANNELS == ("law_monitor",)
-168
View File
@@ -1,168 +0,0 @@
"""S1-ADD (plan ds-s1-backend-1) A-4 — call-shape regression + md_status 매핑 동작 검증.
검증 대상 (값이 아니라 *동작*):
1. DB md_status='success' 응답 'completed' 단방향 매핑 (P0-3 silent-fallback 함정 가드의 backend 절반).
- partial/pending/failed/skipped/None 그대로 통과 ('success' 매핑).
2. [S1-ADD] 3필드(original_filename / duplicate_of / duplicate_count) 디코드 + 기본값(duplicate_count=0).
3. (있으면) ds-app contract fixtures 응답 모델로 디코드 계약 shape 비파괴.
주의 테스트는 backend 직렬화 절반만 커버한다.
앱이 'completed' 실제 md-first 렌더 분기로 태우는지(¬extracted_text) S3 fixture-render 테스트가 책임진다
(A 그룹 close = backend green AND S3 render green, owner 명기 plan A-4).
실행 환경: app/ 의존성 설치된 컨텍스트(devsbx/GPU). 순수 단위(DB 불요).
"""
from __future__ import annotations
import json
import logging
import os
import sys
from datetime import datetime, timezone
from pathlib import Path
import pytest
# logs/ 가 운영 daemon(root) 소유일 때 import-time FileHandler PermissionError 방어 (test 한정).
_orig_file_handler = logging.FileHandler
def _safe_file_handler(filename, *args, **kwargs): # type: ignore[no-untyped-def]
try:
return _orig_file_handler(filename, *args, **kwargs)
except PermissionError:
return logging.NullHandler()
logging.FileHandler = _safe_file_handler # type: ignore[assignment]
# api.documents import 가 SQLAlchemy engine init 를 트리거 — dummy DATABASE_URL (실제 connect X).
os.environ.setdefault("DATABASE_URL", "postgresql+asyncpg://test:test@localhost:5432/test")
# tests/ → 프로젝트 루트 → app/
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
from api.documents import ( # noqa: E402
DocumentDetailResponse,
DocumentListResponse,
DocumentResponse,
)
_NOW = datetime(2026, 6, 4, 8, 0, 0, tzinfo=timezone.utc)
def _base_detail(**overrides) -> dict:
"""DocumentDetailResponse 가 요구하는 전 필드(필수 포함) 완비 dict. overrides 로 일부 교체."""
d = {
"id": 4912,
"file_path": "Engineering/ASME/x.pdf",
"file_format": "pdf",
"file_size": 1338920,
"file_type": "document",
"title": "x",
"ai_domain": "Engineering",
"ai_sub_group": "압력용기",
"ai_tags": ["ASME"],
"ai_summary": "요약",
"document_type": "standard",
"importance": "high",
"ai_confidence": 0.9,
"user_note": None,
"user_tags": None,
"pinned": True,
"ask_includable": True,
"derived_path": None,
"original_format": "pdf",
"conversion_status": "completed",
"is_read": True,
"review_status": "approved",
"edit_url": None,
"preview_status": "ready",
"source_channel": "upload",
"data_origin": "external",
"doc_purpose": "reference",
"extracted_at": _NOW,
"ai_processed_at": _NOW,
"embedded_at": _NOW,
"created_at": _NOW,
"updated_at": _NOW,
# detail 전용
"extracted_text": "원문 폴백 텍스트",
"md_content": "# 제목\n본문",
"md_frontmatter": {},
"md_status": "success",
"md_extraction_engine": "marker",
"md_generated_at": _NOW,
}
d.update(overrides)
return d
# ── 1. ★ md_status 단방향 매핑 (success → completed) ──────────────────────────
def test_db_success_serializes_as_completed():
m = DocumentDetailResponse.model_validate(_base_detail(md_status="success"))
assert m.md_status == "completed", "DB 'success' 는 응답에서 'completed' 로 매핑돼야 함(MD-first 렌더 트리거)"
# model_dump(직렬화) 까지 확인 — 앱이 받는 실제 값.
assert m.model_dump()["md_status"] == "completed"
@pytest.mark.parametrize("raw", ["pending", "processing", "partial", "failed", "skipped", None])
def test_non_success_statuses_pass_through(raw):
m = DocumentDetailResponse.model_validate(_base_detail(md_status=raw))
assert m.md_status == raw, f"'{raw}' 는 매핑 대상 아님 — 그대로 통과해야 함"
def test_mapping_is_read_only_not_a_write_path():
# 이 모델은 응답 직렬화 전용 — write(ORM) 경로가 'completed' 를 DB 로 되쓰지 않는지의 1차 방어선.
# 'completed' 입력이 들어와도(예: fixture) 그대로 'completed' (재매핑 없음, 멱등).
m = DocumentDetailResponse.model_validate(_base_detail(md_status="completed"))
assert m.md_status == "completed"
# ── 2. [S1-ADD] 3필드 디코드 + 기본값 ────────────────────────────────────────
def test_s1add_fields_default_on_list_response():
# DocumentResponse(리스트 행)에도 3필드 존재 — 미제공 시 기본값.
base = {k: v for k, v in _base_detail().items()
if k not in {"extracted_text", "md_content", "md_frontmatter", "md_status",
"md_extraction_engine", "md_generated_at"}}
m = DocumentResponse.model_validate(base)
assert m.duplicate_count == 0
assert m.duplicate_of is None
assert m.original_filename is None
def test_s1add_fields_roundtrip_values():
m = DocumentDetailResponse.model_validate(
_base_detail(original_filename="보고서.docx", duplicate_of=4912, duplicate_count=2)
)
assert m.original_filename == "보고서.docx"
assert m.duplicate_of == 4912
assert m.duplicate_count == 2
# ── 3. ds-app contract fixtures 디코드 (있으면) ──────────────────────────────
_FIXDIR = Path(os.path.expanduser("~/Documents/code/ds-app/contract/fixtures"))
@pytest.mark.skipif(not _FIXDIR.exists(), reason="ds-app contract fixtures 미존재(독립 repo) — 디코드 회귀 skip")
@pytest.mark.parametrize("fname", ["document_detail.json", "document_detail_pending_md.json"])
def test_contract_detail_fixture_decodes(fname):
payload = json.loads((_FIXDIR / fname).read_text())
m = DocumentDetailResponse.model_validate(payload)
# fixture 의 md_status 는 이미 API 어휘('completed'/'pending') — 매핑 멱등.
assert m.md_status == payload["md_status"]
# [S1-ADD] 필드가 fixture 에 있으면 디코드 일치.
if "duplicate_count" in payload:
assert m.duplicate_count == payload["duplicate_count"]
@pytest.mark.skipif(not _FIXDIR.exists(), reason="ds-app contract fixtures 미존재")
def test_contract_list_fixture_decodes():
payload = json.loads((_FIXDIR / "documents_list.json").read_text())
m = DocumentListResponse.model_validate(payload)
assert m.total == payload["total"]
assert len(m.items) == len(payload["items"])