0cbba0ceeb
DEVONagent/DEVONthink 가 발견한 웹페이지를 NAS Web/ drop → file_watcher
ingest → extract 4-tier fallback (trafilatura/sibling-md/readability/bs4)
→ embed + chunk 까지. classify/preview/markdown SKIP.
- source_channel='devonagent' (migration 001 dormant 활성화)
- file_watcher: SCAN_TARGETS 통합 + Web/ rglob + canonical_url dedup +
sidecar 누락 정책 (skip 안 함, web_meta.sidecar_missing=true flag)
- extract_worker: HTML+devonagent 분기 + md_extraction_engine 4-tier 구분
(trafilatura → sibling .md ≥200char → readability+markdownify → bs4_text)
- queue_consumer: enqueue_next_stage 의 extract stage 만 source_channel-
aware override (devonagent → [embed, chunk])
- classify_worker: devonagent safety skip (law_monitor 패턴 mirror,
ai_domain='Web', ai_tags=['Web/{host}'])
- requirements: trafilatura/readability-lxml/markdownify 추가
- docs: devonthink-web-bridge.md 설치 가이드 + first-wins 정책 명시
Phase 1 closure 기준 = 재료 품질 (검색 가능 + 노이즈율 + dedup + 엔진 분포).
활용처(ai_tldr/digest/PKM 회고)는 1-2주 OR 30-50건 관찰 후 별 PR 에서 결정.
Plan: ~/.claude/plans/db-snuggly-petal.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24 lines
490 B
Plaintext
24 lines
490 B
Plaintext
fastapi>=0.110.0
|
|
uvicorn[standard]>=0.27.0
|
|
sqlalchemy[asyncio]>=2.0.0
|
|
asyncpg>=0.29.0
|
|
pgvector>=0.3.0
|
|
python-dotenv>=1.0.0
|
|
pyyaml>=6.0
|
|
httpx>=0.27.0
|
|
python-jose[cryptography]>=3.3.0
|
|
bcrypt>=4.0.0
|
|
pyotp>=2.9.0
|
|
caldav>=1.3.0
|
|
apscheduler>=3.10.0
|
|
anthropic>=0.40.0
|
|
markdown>=3.5.0
|
|
python-multipart>=0.0.9
|
|
jinja2>=3.1.0
|
|
feedparser>=6.0.0
|
|
pymupdf>=1.24.0
|
|
# Web/Blog ingest (devonagent 트랙) — HTML 본문 정화 4-tier fallback
|
|
trafilatura>=1.12.0
|
|
readability-lxml>=0.8.1
|
|
markdownify>=0.13.1
|