Files
hyungi_document_server/app/workers/study_card_enqueue.py
T
hyungi 0a7402b327 feat(study): 공부 암기노트 Phase 1 — card_extract 추출 파이프라인 (순수 additive)
study_memo_cards 추출 파이프라인 + 버전키 폴러 + needs_review 컬럼. 운영 SR 코드(session_finalize/quiz_selection) 무수정.

- migrations 287~298: study_memo_cards/_evidence/_jobs/_progress(P1 휴면)·study_reminders·study_topics.focused_at·study_questions needs_review 3컬럼. dedup PARTIAL UNIQUE(deleted_at IS NULL).
- 워커: in-process RAG gather → MLX {cards} → 카드 가드(정량=evidence 원문 등장·cue/cloze 누출·dedup) → supersede 구버전 retire → append. 별 consumer 로 기존 study_queue 격리.
- 폴러 study_card_enqueue: 버전키 NOT EXISTS(source_version) 멱등 + ai_explanation_generated_at NOT NULL 가드 + per-poll LIMIT(thundering-herd).
- 검증: 실 prod 스키마 덤프 위 12 마이그 적용 OK + dedup/supersede/active-unique 기능 7/7 PASS + 정규화 util 15/15.

plan: PKM plans/2026-06-05-study-memo-card-p1-plan.html

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:33:12 +09:00

81 lines
3.2 KiB
Python

"""study_card_enqueue — 버전키 폴러 (공부 암기노트 Phase 1).
ready ai_explanation 인데 '현재 버전' card_extract job 이 없는 question 을 enqueue.
버전 멱등(핵심): NOT EXISTS(job WHERE source_kind='question' AND source_id=q.id
AND source_version=q.ai_explanation_generated_at)
- 같은 버전 재추출 차단 — completed/all_dropped job 도 현 버전에 존재하면 재enqueue 0(livelock 방지).
- explanation 재생성(새 generated_at)이면 새 버전 job 부재 → 자동 재추출(정정-stale 해소).
NULL 가드: ai_explanation_generated_at IS NOT NULL 전제 — NULL 이면 NULL=NULL=UNKNOWN 으로
NOT EXISTS 가 항상 참이 되어 매 폴 재enqueue 폭주. ready 전이 직후 race 를 이 가드가 막는다.
thundering-herd: per-poll LIMIT(CARD_ENQUEUE_BATCH) + 최근(generated_at desc) 우선으로 backfill 완만.
"""
from __future__ import annotations
import logging
from sqlalchemy import select
from core.config import settings
from core.database import async_session
from models.study_memo_card_job import StudyMemoCardJob, enqueue_study_memo_card_job
from models.study_question import StudyQuestion
logger = logging.getLogger("study_card_enqueue")
CARD_ENQUEUE_BATCH = 20
SOURCE_KIND_QUESTION = "question"
async def run() -> None:
"""APScheduler 진입점. ready & 현 버전 job 부재 question 을 BATCH 만큼 enqueue."""
if not getattr(settings, "study_card_extract_enabled", True):
return
async with async_session() as session:
# 현재 ai_explanation_generated_at 버전에 대한 job 이 이미 있는지 (correlated NOT EXISTS).
job_exists = (
select(StudyMemoCardJob.id)
.where(
StudyMemoCardJob.source_kind == SOURCE_KIND_QUESTION,
StudyMemoCardJob.source_id == StudyQuestion.id,
StudyMemoCardJob.source_version == StudyQuestion.ai_explanation_generated_at,
)
.exists()
)
rows = (
await session.execute(
select(
StudyQuestion.id,
StudyQuestion.user_id,
StudyQuestion.ai_explanation_generated_at,
)
.where(
StudyQuestion.deleted_at.is_(None),
StudyQuestion.ai_explanation_status == "ready",
StudyQuestion.ai_explanation_generated_at.is_not(None),
~job_exists,
)
.order_by(StudyQuestion.ai_explanation_generated_at.desc())
.limit(CARD_ENQUEUE_BATCH)
)
).all()
if not rows:
return
enqueued = 0
for r in rows:
ok = await enqueue_study_memo_card_job(
session,
user_id=r.user_id,
source_kind=SOURCE_KIND_QUESTION,
source_id=r.id,
source_version=r.ai_explanation_generated_at,
kind="card_extract",
)
if ok:
enqueued += 1
await session.commit()
if enqueued:
logger.info("study_card_enqueue candidates=%d enqueued=%d", len(rows), enqueued)