fix(search): split markdown into dedicated queue consumer to prevent pipeline stall
대형 PDF split 변환(5210 ≈ 40분 실측)이 단일 consume_queue 코루틴을 점유해 extract/classify/embed/chunk 등 전 파이프라인을 stall 시키던 문제 제거. - consume_markdown_queue 신규 — markdown 전용 scheduler job (id=markdown_consumer) - consume_queue 는 MAIN_QUEUE_STAGES (markdown 제외) 만 처리 - _process_stage / _load_workers 헬퍼로 per-stage 로직 공유 - reset_stale_items(stages, threshold_minutes) 파라미터화: main=10min(markdown 제외), markdown=MARKDOWN_STALE_MINUTES(기본 120). marker_worker 는 heartbeat 미기록이라 40분 변환을 10분 stale 로 오인하던 함정 차단 - enqueue flow (classify -> embed,chunk,markdown) 불변 STT/deep_summary 분리 + GPU 동시성 튜닝은 out of scope (follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+5
-1
@@ -51,7 +51,7 @@ async def lifespan(app: FastAPI):
|
||||
from workers.law_monitor import run as law_monitor_run
|
||||
from workers.mailplus_archive import run as mailplus_run
|
||||
from workers.news_collector import run as news_collector_run
|
||||
from workers.queue_consumer import consume_queue
|
||||
from workers.queue_consumer import consume_queue, consume_markdown_queue
|
||||
from workers.study_queue_consumer import consume_study_queue
|
||||
from workers.study_session_queue_consumer import consume_study_session_queue
|
||||
from workers.study_question_embed_worker import (
|
||||
@@ -77,6 +77,10 @@ async def lifespan(app: FastAPI):
|
||||
scheduler = AsyncIOScheduler(timezone="Asia/Seoul")
|
||||
# 상시 실행
|
||||
scheduler.add_job(consume_queue, "interval", minutes=1, id="queue_consumer")
|
||||
# PR-DocSrv-Markdown-Consumer-Split-1: markdown(marker) 전용 consumer.
|
||||
# 대형 PDF split 변환(수십 분)이 메인 consume_queue 를 점유해 전 파이프라인을
|
||||
# stall 시키던 문제 제거. max_instances=1(기본) 으로 동시 marker 변환 2건은 방지.
|
||||
scheduler.add_job(consume_markdown_queue, "interval", minutes=1, id="markdown_consumer")
|
||||
scheduler.add_job(watch_inbox, "interval", minutes=5, id="file_watcher")
|
||||
scheduler.add_job(cleanup_orphan_uploads, "interval", minutes=10, id="upload_cleanup")
|
||||
# PR-4: study_questions 자동 임베딩 (status='none/failed/stale' 행을 batch=10 처리).
|
||||
|
||||
Reference in New Issue
Block a user