"""crawl-24x7 사이클 2 — 순수 함수/형태 회귀 테스트 (DB 불요). Guardian 호출 형태 + fixture 응답 파싱 + 채널 정체성 + B-5 quirk. fixture = tests/fixtures/guardian_open_platform_search_response.json (2026-06-10 실키 live 박제, api-key 응답 본문 미포함 확인 — [[feedback_external_api_fixture_first]]). """ import json import re from pathlib import Path from workers.news_collector import ( _article_hash, _doc_identity, _guardian_request, _normalize_category, ) FIXTURE = Path(__file__).parent / "fixtures" / "guardian_open_platform_search_response.json" def _make_source(**kw): """ORM 인스턴스 없이 속성만 흉내 (식별성 함수는 속성 접근만 사용).""" class S: pass s = S() s.source_channel = kw.get("source_channel", "news") s.parser_quirk = kw.get("parser_quirk") return s class TestGuardianCallShape: def test_request_shape_matches_fixture_recipe(self): """fixture 박제 시 사용한 호출과 단일 source-of-truth 정합 ([[feedback_fixture_first_call_shape]]).""" endpoint, params = _guardian_request( "https://content.guardianapis.com/search?section=world", "KEY" ) assert endpoint == "https://content.guardianapis.com/search" assert params["section"] == "world" assert params["show-fields"] == "bodyText,trailText" assert params["order-by"] == "newest" assert params["api-key"] == "KEY" def test_feed_url_query_overridden_by_fixed_fields(self): # feed_url 에 show-fields 가 잘못 박혀 있어도 고정 필드가 이긴다 (dict merge 순서) _, params = _guardian_request( "https://content.guardianapis.com/search?section=world&show-fields=headline", "K" ) assert params["show-fields"] == "bodyText,trailText" class TestGuardianFixtureParsing: def test_fixture_response_shape(self): payload = json.loads(FIXTURE.read_text())["response"] assert payload["status"] == "ok" assert payload["results"], "fixture 에 결과 0건" for item in payload["results"]: assert item["webTitle"].strip() assert item["webUrl"].startswith("https://") assert "webPublicationDate" in item assert "sectionName" in item fields = item.get("fields") or {} assert "bodyText" in fields and "trailText" in fields def test_fixture_bodytext_is_fulltext_grade(self): payload = json.loads(FIXTURE.read_text())["response"] # 전문 게이트(200자)를 fixture 가 통과해야 어댑터 is_full 경로가 산다 assert any(len(i["fields"]["bodyText"]) >= 200 for i in payload["results"]) def test_fixture_contains_no_api_key(self): assert "api-key" not in FIXTURE.read_text() class TestChannelIdentity: def test_news_channel_unchanged(self): ident = _doc_identity(_make_source(source_channel="news"), "경향신문", "Society") assert ident == { "path_prefix": "news", "ai_domain": "News", "ai_tags": ["News/경향신문/Society"], } def test_crawl_channel_domain_identity(self): ident = _doc_identity(_make_source(source_channel="crawl"), "TWI", "Engineering") assert ident["path_prefix"] == "crawl" assert ident["ai_domain"] == "Engineering" assert ident["ai_tags"] == ["Engineering/TWI"] def test_crawl_channel_unknown_category_falls_back(self): ident = _doc_identity(_make_source(source_channel="crawl"), "X", "Other") assert ident["ai_domain"] == "Domain" def test_category_map_has_domain_axes(self): assert _normalize_category("안전") == "Safety" assert _normalize_category("Engineering") == "Engineering" assert _normalize_category("철학") == "Philosophy" class TestSkipVideoQuirk: PATTERN = re.compile(r"/videos?/") def test_video_urls_match(self): assert self.PATTERN.search("https://psyche.co/videos/some-film") assert self.PATTERN.search("https://aeon.co/video/another") def test_article_urls_pass(self): assert not self.PATTERN.search("https://psyche.co/ideas/how-to-think") class TestArticleHashStability: def test_static_corpus_hash_deterministic(self): a = _article_hash("Creep and Creep Failures", "static", "National Board 기술 아티클") b = _article_hash("Creep and Creep Failures", "static", "National Board 기술 아티클") assert a == b and len(a) == 32