fix(ops): backfill 쿼리에 빈 extracted_text 제외 — 무한 retry 방지

3일 운영 결과 doc 4811, 5181 가 extracted_text='' (빈 문자열) 인데
IS NOT NULL 만 걸려 enqueue → classify_worker 의 not doc.extracted_text
truthy 체크에서 ValueError → max_attempts(3) 도달 → status=failed.
다음 backfill 사이클에서 다시 enqueue 되어 12회 반복, failed 24건 누적.

수정: tier_backfill.py + backfill_tier.py 양쪽 SQL 에
LENGTH(extracted_text) > 0 추가. 빈 문자열 문서는 enqueue 자체에서 제외.

기존 failed 24건 정리 SQL (사용자가 수동 실행):
  DELETE FROM processing_queue
  WHERE stage='classify' AND status='failed'
    AND error_message LIKE '%extracted_text%';

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hyungi Ahn
2026-04-27 08:25:12 +09:00
parent 10364fbe1b
commit 95bcdb851b
2 changed files with 9 additions and 1 deletions
+6 -1
View File
@@ -61,13 +61,18 @@ async def _classify_pending(session: AsyncSession) -> int:
async def _enqueue_domain(session: AsyncSession, filter_clause: str, limit: int) -> int:
"""도메인 조건 + NULL tier 문서 limit 건 classify 큐에 enqueue. 반환 = 실제 enqueue 수."""
"""도메인 조건 + NULL tier 문서 limit 건 classify 큐에 enqueue. 반환 = 실제 enqueue 수.
extracted_text 빈 문자열 (LENGTH=0) 도 제외 — classify_worker 는 not doc.extracted_text
truthy 체크라 빈 문자열에서 ValueError raise. 무한 retry 루프 방지.
"""
sql = text(f"""
INSERT INTO processing_queue (document_id, stage, status, attempts, max_attempts)
SELECT id, 'classify', 'pending', 0, 3
FROM documents
WHERE deleted_at IS NULL
AND extracted_text IS NOT NULL
AND LENGTH(extracted_text) > 0
AND ai_analysis_tier IS NULL
AND {filter_clause}
ORDER BY created_at DESC
+3
View File
@@ -59,6 +59,7 @@ SELECT COUNT(*)
FROM documents
WHERE deleted_at IS NULL
AND extracted_text IS NOT NULL
AND LENGTH(extracted_text) > 0
AND ai_analysis_tier IS NULL
AND {filter}
"""
@@ -69,6 +70,7 @@ SELECT id, LEFT(title, 60) AS title, ai_domain, source_channel,
FROM documents
WHERE deleted_at IS NULL
AND extracted_text IS NOT NULL
AND LENGTH(extracted_text) > 0
AND ai_analysis_tier IS NULL
AND {filter}
ORDER BY created_at DESC
@@ -81,6 +83,7 @@ SELECT id, 'classify', 'pending', 0, 3
FROM documents
WHERE deleted_at IS NULL
AND extracted_text IS NOT NULL
AND LENGTH(extracted_text) > 0
AND ai_analysis_tier IS NULL
AND {filter}
ORDER BY created_at DESC