Files
hyungi_document_server/tests/fixtures/csb_sitemap_sample.xml
T
hyungi 8583465c58 feat(news): crawl-24x7 사이클 3 — B-4 시그널·C-4 공학 지속·CSB sitemap·CCPS Beacon (마이그 327)
- B-4 fetch_method='signal-only': 페이지 fetch 0 + summarize 스킵(검색 색인만,
  맥미니 부하 0) + 본문 무절단(_entry_body — arXiv 초록 1.6K 보존). 다이제스트는
  ai_summary NULL 제외 규칙으로 자연 배제. 레지스트리 오설정(page) 방어 가드.
- 시드 9 소스 (전 URL 2026-06-11 live 검증): Bloomberg Markets/Technology(skip-video,
  비디오 혼재 실측)·Economist Latest·Nikkei Asia(RDF — feedparser 네이티브, 분기 불요
  fixture 박제)·ASME JPVT(site_1000037 실측 매핑)·arXiv 2종·IEEE Spectrum 2종(feed-full,
  피드 description 이 전문 7.9~14K자 실측).
- csb_collector: sitemap lastmod diff (weekly 월 06:50) — 워터마크(selector_override)
  + cap 40/회 점진 백필 + diff sanity 300 + 보고서 PDF(/assets/, recommendation 제외)
  → extract 파이프라인. 초기 일괄 = CLI --bulk.
- api_standards_collector: 공지 목록 링크 파싱(실측 — 페이지 diff 아님, 상세 URL
  10건/페이지) → 신규 상세만 ingest (monthly 5일 07:05). 초기 백필 = CLI --bulk.
- ccps_collector: aiche.org 평문 403(UA 무관 실측) → playwright-fetcher 익명 컨텍스트
  + referer 쿠키 승계 /download(base64) 신설로 월간 Beacon PDF (monthly 5일 07:20).
  헤드리스 차단 시 CrawlBlocked → health 가시화 (르몽드 PARK 선례).
- B-5 잔여: rdf/feed-reader-UA = 코드 분기 불요 실측 박제 (Economist 는 Archiver UA
  200). table-strip/gn-redirect 는 해당 소스 미진입 — 백로그 유지.
- 테스트 24건 신규 (fixture 9건 live 박제, economist/ieee 는 item trim) — 39 passed.
- 마이그 327 단일 statement (PKM 트랙과 번호 경합 주의 — 327 본 트랙 선점).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:13:17 +09:00

1 line
2.3 KiB
XML

<?xml version="1.0" encoding="utf-8"?><urlset xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://www.sitemaps.org/schemas/sitemap/0.9 https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="https://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://www.csb.gov/recommendations/preventive-maintenance/</loc><lastmod>2022-06-02T17:17:27-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/site-map/</loc><lastmod>2017-05-05T23:59:28-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/recommendations/preventive-maintenance-investigations/</loc><lastmod>2018-04-27T14:32:25-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/data-quality-/</loc><lastmod>2025-07-28T13:37:44-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/recommendations/preventive-maintenances/</loc><lastmod>2022-06-02T17:19:06-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/video-feedback-form/</loc><lastmod>2017-05-04T18:17:43-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/</loc><lastmod>2017-05-08T16:06:42-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/completed-investigations/</loc><lastmod>2017-05-30T19:02:58-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/investigations/current-investigations/</loc><lastmod>2020-10-19T15:06:55-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/</loc><lastmod>2017-03-09T13:38:53-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/take-more-action-to-prevent-dust-explosions/</loc><lastmod>2013-05-17T16:46:08-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url><url><loc>https://www.csb.gov/videos/protect-public-employees-from-workplace-accidents/</loc><lastmod>2013-05-17T16:46:34-06:00</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url></urlset>