Compare commits

..

23 Commits

Author SHA1 Message Date
hyungi 540bc00dba feat(docpage): D3 절 구조 탐색기 — 슬림 인사이트 레일 + 절 트리 (frontend only)
문서 상세 /documents/[id] 재구성 (BE 무변경):
- 우측 탭(정보/AI/관리) → 슬림 전역 인사이트 레일: 요약·심층·불일치를 탭
  게이트 없이 상시 노출(details open, 모바일은 접기 가능), 정보/관리는 접이.
  → 가공 자료가 탭/온디맨드에 묻히던 IA 문제(G1) 해소.
- SectionOutline 절 목차 레벨 기반 들여쓰기(평탄→트리 모양).
- 모바일: 본문 메인 + 절목차/인사이트/정보/관리 접이 + 절 탭 본문 이동(기존 구조 활용).
관련 문서(See Also)는 v1 제외(자리만 유지). 심화 목업 = comparisons/2026-06-13-ds-docpage-d3-deepened.html.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:18:15 +09:00
hyungi 30c235e4c1 Merge feat/safety-library-a1 (C-1 후속 version_status+facets) into ds-board-merged
검색 결과 wrapper decoration: 법령 version_status + facets 집계(ranking 무관·additive).
2026-06-13 15:08:24 +09:00
hyungi 8a3bea6b31 feat(safety): C-1 후속 — version_status decorate + facets 집계
검색 엔드포인트 wrapper decoration(run_search 코어 무접촉·ranking 무관):
- version_status: 법령 결과(material_type=law)에 legal_meta.version_status 부착
  (decorate_version_status, law 무결과 시 query skip). SearchResult.version_status 신설.
- facets=true: top-K 결과 분류 축(material_type/jurisdiction/version_status) 분포 라벨
  (compute_facets). 미요청=None(byte 불변). SearchResponse.facets 신설.
- result_decorate.py 신설. 단위 4건.
freshness incident 변경(law_365d 제거+흡수)=ranking 변경이라 별 슬라이스 defer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 15:07:57 +09:00
hyungi cd439b0ff4 Merge feat/safety-library-a1 (B-4 licensed_restricted + watch 폴더 license) into ds-board-merged
B-4 PR①②: licensed_restricted 단일 술어(retrieval 3-leg/digest/briefing/study 풀이 공유)
+ file_watcher Books/Manuals/Papers_Purchased license 주입. prod 통합 브랜치 배포용.
2026-06-13 14:53:34 +09:00
hyungi a6db6c999b fix(safety): B-4 리뷰 반영 — 단일 술어 중앙화 + study/briefing 경로 커버
적대 리뷰(10에이전트) 확정 반영:
- license_filter.py 신설 — restricted_exclude_sql(raw)/restricted_exclude_orm(ORM)
  단일 정의. retrieval _license_sql·digest·briefing·study 풀이가 공유(드리프트 방지).
- major: explanation_rag(study 문제 AI 풀이 RAG)에 술어 누락 → doc_meta 쿼리에 ORM
  적용(valid_doc_ids 경유로 청크도 차단). briefing/loader 2쿼리에 누락 → digest 와
  동일 술어 추가(news restricted 부재=방어적·경로 일관성).
- blocker(low-impact): file_watcher changed-doc 경로 material/license 보정(merge 주입·
  license 부재 시만 — extract_meta clobber 회피, pre-B-4 적재분 동기화).
- 테스트: 단일-source 검증 + ORM 구성 스모크 2건 추가.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 14:52:04 +09:00
hyungi ed7740beee feat(safety): B-4 PR①② — licensed_restricted 차단 술어 + watch 폴더 license 주입
PR① licensed_restricted 단일 술어(_license_sql) — retrieval 3-leg(text/vec-doc/
vec-chunk) + digest loader 공유. a안(U-2①): 색인 허용·구매자료 verbatim 을 RAG 증거/
digest 발행에서 구조적 제외. 술어=COALESCE(extract_meta->'license'->>'restricted',
'false')<>'true' (restricted 부재/false 미제외 → 기존 코퍼스 결과 불변). 개인 파일
열람 미차단. chunk leg 는 outer 의 documents JOIN(항상) 활용 post-rank(restricted 소수).
PR② file_watcher _TARGET_AXIS 확장 — Books/Papers_Purchased=restricted / Manuals=
non-restricted(사용자 결정) / KGS=law·KR·kogl. ingest 시 extract_meta.license
deterministic 주입(classify material IS NULL 일 때만 제안·meta 미기록=보존).
PR③(KGS 버전 flip)=별 슬라이스 deferred(파일 포맷 조사 선행).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 14:34:42 +09:00
hyungi 595f4b7d5e feat(board): 통합 보드 v3 — 머신 레인 + 정직 번다운/ETA (B-1·2·3·5)
ProcessingFlowBoard 를 통합안으로 재작성:
- 머신 3레인(GPU/맥미니/맥북) = "누가 일하나" + 요약 오프로드 가시화
  (요약 칩 분담 막대 맥미니 vs 맥북 + 맥북 레인 '요약 합류' 칩, summarize_by_machine 소비)
- 지배 백로그 스트립 + 정직 ETA(summarize_eta, 유입 차감 / null=소진 불가)
- 24h 번다운 SVG(유입 vs 소화) + 맥북 합류 변곡점 + 단계별 정직 ETA 미니리스트
- 신선도 '갱신 N초 전' + stale 경고(queueUpdatedAt, B-4)
- 실패 드로어 + 노드 상세 패널은 v2 자산 그대로 재사용
- 레인 stack + 칩 wrap 으로 모바일 반응형

svelte-check: 변경 파일 에러 0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 14:01:50 +09:00
hyungi b630c31077 feat(board): expose summarize_by_machine for offload visibility (A-1)
요약 풀의 머신별 완료 실적(맥미니 vs 맥북)을 /api/queue/overview 응답에
summarize_by_machine 로 노출. rows_to_summarize_split 이 이미 계산하던 값의
additive 투영 — 신규 수집 SQL/마이그 0. 통합 보드 레인의 오프로드 가시화
(맥북이 요약 86% 처리) 재료. + FE 타입 동기 + store 신선도 timestamp(B-4).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 13:54:39 +09:00
hyungi 235aa648ad feat(safety): B-2 KOSHA 사망사고 속보 수집기 (callApiId=1040)
data.go.kr 15119137 활용신청 전파 완료 → news_api02/getNews_api02 라이브.
collect_fatal_accidents: arno dedup(kosha-fatal|{arno}) + material_type=incident/
jurisdiction=KR + license=kogl. contents=HTML → _clean_html, published_date =
arno 접두 8자리(YYYYMMDD 등록일, 2019~ 라이브 전수 동형 검증). 첨부 API·business
필드 없는 별 채널(1040). run() 일일 잡(06:40 KST) 튜플 합류 — 소스별 실패 격리 유지.
순수 헬퍼 _fatal_fields + fixture 테스트(tests/test_kosha_fatal.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 13:42:12 +09:00
hyungi 60cb48bbe4 fix(safety): C-1 fusion 재구성 시 분류 축 메타 전파 — 3 SearchResult 재생성 지점
fusion legacy(line 66)·RRF(122)·multi_query rewrite(pipeline 456)가 명시 필드 나열로
SearchResult 재구성 → material_type/jurisdiction/published_date 누락(필터는 정확, D-1
유형 표시만 None). 세 지점 동기화. 흉터: SearchResult 필드 추가 시 재구성 지점 전수 동기 필요.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 13:03:27 +09:00
hyungi 79deae0644 feat(safety): C-1 검색 명시 필터 — material_type/jurisdiction/year 3-leg 동등 + documents exclude 해제
plan safety-library-1 C-1 (검색 핵심 경로 — byte 불변 invariant):
- AxisFilter + _axis_sql 헬퍼: 미지정 시 모든 SQL 절 빈 문자열(run_eval 회귀 0 보장)
- 3 leg 동등 적용: search_text(JOIN 후 WHERE) / _search_vector_docs(prod+cand) /
  _search_vector_chunks(★inner topk JOIN — R6 결정: outer post-filter면 ANN top-k 후
  좁은 필터 후보 붕괴. 미지정 시 JOIN 없음=byte 불변)
- SearchResult + material_type/jurisdiction/published_date (3 leg SELECT additive)
- year = COALESCE(published_date, created_at) (freshness 동일 사상)
- GET /documents/: material_type 지정 시 기본 exclude(news·law_monitor·note) 해제
- _axis_sql 단위 테스트 PASS (미지정=빈문자열+param0 / active 4절 / alias 분기)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:59:18 +09:00
hyungi 9a7e231dcc fix(safety): verify_statute_chain sys.path — /app 루트 자동 탐지 (workers import)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:44:58 +09:00
hyungi 1646617a31 feat(safety): B-1 PR③ — 법령 체인 검증 3술어 스크립트 (read-only 진단)
plan safety-library-1 B-1 PR③. E-1 법령 게이트 도구 겸용 (반복 실행 안전):
- ① 존재성: watch family 각 primary current 정확 1건 + annex 시리즈당 ≤1
- ② 노출 유일성: primary current 보유 family당 노출 1건 (③a에 흡수)
- ③ 고아 그물: 정규화 동등 매핑 — flip 누락(current family 노출 레거시)·무매핑(매핑 구멍) 0
- repealed family ①② 면제. 종료코드 0/1 (관찰 게이트용)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:44:25 +09:00
hyungi bacb36924b feat(safety): B-1 PR② — fetch_version(payload 리스트) + ingest 4축 + 생애주기 잡 통째 + 부트스트랩
plan safety-library-1 B-1 PR② (R8-B1: 승격·supersede·스윕·repeal = 잡 코드 통째 배포):
- kr.fetch_version: 전문 1콜 → primary+annex payload 리스트 (R4-M4)
  ★fixture 가 잡은 결함 2: 별표구분(별표/서식) 차원 누락 시 (번호,가지) 4건 충돌
  → version_key='MST|{구분}{번호}-{가지}' / 삭제 tombstone 3건(별표10·서식1·2) skip
  — KR 별표 삭제 = absence 아닌 명시 tombstone (R7-M3 absence 추론 불요 확정)
- ingest: 전 버전 pending 적재 + 4축(law/KR/COALESCE날짜/public_domain) + backfill 마커
- 생애주기 잡: 버전 시리즈 단위 승격·supersede(R7-B1) + 상태 기반 레거시 스윕(primary
  current 보유 한정) + repeal(레거시 매핑분 포함, R7-M2) — 단일 트랜잭션·KST
- 법령명 매핑: 정규화 동등 비교(prefix 금지 — 시행령 오폭 차단), 가운뎃점·공백 흡수
- 워터마크 = 파싱 검증 통과 후에만 / 스케줄 daily 07:00 KST (law_monitor 슬롯 승계)
- 테스트 14/14 (매핑 표본·시리즈 키·payload fixture)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:37:51 +09:00
hyungi a28f12b12e feat(safety): B-1 PR① — law_monitor 스케줄 제거 + statute KR poll_changes + fixture 박제 (mig 356)
plan safety-library-1 B-1 PR① (fixture-first):
- law.go.kr 라이브 fixture 5종 박제 (OC 새니타이즈 검증 — 응답 법령상세링크에 키 포함 함정)
- R7-M3 판정: 전문 1콜 XML = 조문 853+별표 23 전체 스냅샷(부분 실패 개념 없음)
  + 별표번호/가지번호 = 구조화 필드 — 조문 취득 = 전문 1콜+로컬 파싱 확정(R2-m1)
- legal_acts KR 시드 26행(법령ID 라이브 실측, watch=26 전부, FK 계열 9그룹)
  ★ '유해ㆍ위험작업...' 정식명 = 가운뎃점 — law_monitor 하드코딩(점 없음)은 영구 미매칭 잠복
- statute_adapters/kr.py: poll_changes(lawSearch MST diff) — 순수 파서 분리, fixture 테스트 8/8
- statute_collector.py: 관찰 전용 코어(워터마크 영속 0 — ingest=PR②), 스케줄 미등록(R8-B1)
- main.py: law_monitor 스케줄 제거 — 버전 체인 밖 레거시 매일 증식의 유일 경로 차단

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:01:21 +09:00
hyungi 0c8fb41366 fix(safety): backfill text() 콜론 bind 오인 — exec_driver_sql 로 교체
정규식 '(?:' 의 콜론을 text() 가 bind param 으로 해석 (migration 러너 동일 함정).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 06:49:58 +09:00
hyungi e5ddd0e4d6 feat(safety): A-3 backfill 스크립트 — 기존 코퍼스 분류 축 소급 (교정 술어)
plan safety-library-1 A-3. prod 실측 반영:
- KGS frontmatter = 'code' 키 확정(117/118, kgs_code 0) → 경로 술어
- 레거시 law 243건 — extract_meta 빈값, title '(YYYYMMDD)' 공포일 추출
- GUIDE ofancYmd = 'YYYY-MM-DD' 실측
- KOSHA 본문 = source_id JOIN (kind='case' 부재 — R2 blocker 교정 그대로)
- dry-run = 트랜잭션 ROLLBACK 방식 (정확 rowcount + 검증표, 변경 0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 06:48:30 +09:00
hyungi 3feddd012b feat(safety): A-2 수집기 ingest 시점 분류 축 부여 — 레지스트리 전파 + 승인 가드 (mig 352~355)
plan safety-library-1 A-2 (classify-skip 경로 전수 커버):
- news_sources 에 material_type/license_scheme/license_redistribute + 안전·공학 12행 시드
- news_collector: 레지스트리 → documents 전파 (_material_axis — paper 는 jurisdiction NULL 강제)
- kosha(사례·첨부=incident, GUIDE=guide)/csb(incident·US)/api_std(standard·US)/law_monitor(law·KR)
  /file_watcher(KGS=law·KR 타깃 매핑) deterministic 부여 + extract_meta.license 주입
- published_date: 소스별 가용 날짜 (GUIDE 공표일·CSB lastmod·API 공지일·법령 공포일·뉴스 발행일)
- classify_worker: document_type→material_type 결정적 매핑 제안 (자동 전이 금지)
- accept-suggestion: material 제안 적용 + law=jurisdiction 필수(기본값 없음) + 청크 미러 1문 동기화
- chunk_worker: 비뉴스 문서 country=jurisdiction 미러 (R3-m3: 검색측 country 소비자 0 실측)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 06:23:22 +09:00
hyungi 5da94213ec feat(safety): 분류 축 A-1 — material_type/jurisdiction/published_date + legal_acts/legal_meta (mig 340~351)
안전 자료실 plan safety-library-1 A-1 (r3 계약 반영):
- documents 3컬럼 (TEXT+CHECK, nullable additive) + law→jurisdiction NOT NULL 구조 강제
- legal_acts 단일 레지스트리(워치리스트 겸, watermark·repeal_detected_at 포함)
- legal_meta 최소형 (version_key 합성형 UNIQUE, 전 버전 pending 적재 계약)
- partial 인덱스 2 + family 인덱스 + paper DOI partial UNIQUE (doi=서지 단일 보유 계약)
- ephemeral PG16 스모크: 12파일 적용 + CHECK/UNIQUE 계약 6종 검증 PASS

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:25:04 +09:00
hyungi 85304878f4 test(eval): Phase 2A eval 산출물 — baseline exact 0.6315 vs qwen06/-0.041 qwen4/-0.032 qwen4m/-0.033 (no-go)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 05:53:20 +00:00
hyungi adce639445 ops(eval): Phase 2A eval 종료 — embed/chunk hold 복원 (E-4 = no-go, bge-m3 유지)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 12:03:39 +09:00
hyungi d05e41128a fix(board): 실패 뱃지 잘림(스크롤 컨테이너 헤드룸) + 구 단계별 현황 섹션 제거 + ETA 48h+ 일 표기
- 흐름 컨테이너 pt/px 헤드룸 — -top/-right 돌출 뱃지가 overflow-x-auto 에 잘리던 문제
- 단계별 현황 details = 흐름 보드가 대체(R2 통합안 의도) — 전용 파생값/헬퍼/chevron 동반 제거
- etaShort: 48시간 이상은 일 단위 (약 131시간 → 약 5.5일)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 03:02:04 +00:00
hyungi 2bbdf63d86 ops(eval): Phase 2A eval 동결 — embed/chunk 일시 hold (eval 후 복원)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 11:17:42 +09:00
70 changed files with 2893 additions and 285 deletions
+57 -4
View File
@@ -210,8 +210,14 @@ class DocumentDetailResponse(DocumentResponse):
class AcceptSuggestionRequest(BaseModel):
"""§1 accept-suggestion 요청 body — stale payload / doc 수정 검출."""
"""§1 accept-suggestion 요청 body — stale payload / doc 수정 검출.
jurisdiction: 안전 자료실 A-2 — material_type 제안 승인 시 사용자가 지정하는 관할.
law 승인은 필수 (기본값 없음 — KR 자동 부여 시 외국 자료가 KR 법령으로 오염되는
경로를 차단, plan A-2 계약).
"""
expected_source_updated_at: datetime
jurisdiction: str | None = None
class DocumentUpdate(BaseModel):
@@ -537,6 +543,8 @@ async def list_documents(
category: str | None = Query(None, description="doc_category enum — 지정 시 기본 news/memo 제외 해제"),
has_suggestion: bool | None = Query(None, description="true: ai_suggestion IS NOT NULL"),
proposed_category: str | None = Query(None, description="ai_suggestion.proposed_category 필터"),
material_type: str | None = Query(None, description="안전 자료실 C-1: 자료유형. 지정 시 기본 exclude 해제"),
jurisdiction: str | None = Query(None, description="안전 자료실 C-1: 관할 (KR/US/...)"),
):
"""문서 목록 조회 (페이지네이션 + 필터).
@@ -550,6 +558,10 @@ async def list_documents(
if category:
# 명시적 카테고리 필터 — 기본 exclude 해제
query = query.where(Document.category == category)
elif material_type:
# 안전 자료실 C-1: material_type 지정 = 기본 exclude(news·law_monitor·note) 해제.
# 안전 코퍼스 본체(KOSHA 사례·CSB·법령 등)가 전부 note/crawl 채널이라 exclude 면 빈 화면.
query = query.where(Document.material_type == material_type)
else:
# 기본 목록: 뉴스/메모/법령 제외 (문서함 용도)
query = query.where(
@@ -558,6 +570,9 @@ async def list_documents(
Document.file_type != "note",
)
if jurisdiction:
query = query.where(Document.jurisdiction == jurisdiction)
if has_suggestion is True:
query = query.where(Document.ai_suggestion.isnot(None))
elif has_suggestion is False:
@@ -1244,11 +1259,49 @@ async def accept_suggestion(
# payload 적용
proposed_category = doc.ai_suggestion.get("proposed_category")
proposed_path = doc.ai_suggestion.get("proposed_path")
# 안전 자료실 A-2 — material_type 제안 (classify 의 document_type 결정적 매핑)
proposed_material = doc.ai_suggestion.get("proposed_material_type")
if not proposed_category:
raise HTTPException(status_code=422, detail="proposed_category 누락된 suggestion")
if not proposed_category and not proposed_material:
raise HTTPException(
status_code=422,
detail="proposed_category/proposed_material_type 둘 다 누락된 suggestion",
)
doc.category = proposed_category
if proposed_category:
doc.category = proposed_category
if proposed_material:
_MATERIAL_TYPES = {"law", "paper", "book", "incident", "manual", "standard", "guide"}
_JURISDICTIONS = {"KR", "US", "EU", "JP", "GB", "INT"}
if proposed_material not in _MATERIAL_TYPES:
raise HTTPException(
status_code=422, detail=f"허용 밖 material_type: {proposed_material}"
)
jur = body.jurisdiction or doc.ai_suggestion.get("proposed_jurisdiction")
if jur is not None and jur not in _JURISDICTIONS:
raise HTTPException(status_code=422, detail=f"허용 밖 jurisdiction: {jur}")
# law = 국가 필수 입력, 기본값 없음 (plan A-2 — KR 자동 부여 시 외국 법령 오염.
# DB CHECK(chk_documents_law_jurisdiction) 도 거부하지만 422 로 명시 안내).
if proposed_material == "law" and not jur:
raise HTTPException(
status_code=422,
detail="법령(law) 승인은 jurisdiction 필수 — body.jurisdiction 으로 국가를 지정하세요 (기본값 없음)",
)
doc.material_type = proposed_material
doc.jurisdiction = jur
# 미러 동기화 1문 — jurisdiction 부여/정정 시 청크 country 동반 UPDATE
# (leg 간 국가 불일치 방지, plan A-2 계약. 단일 지점 = 본 승인 경로).
if jur:
from sqlalchemy import update as sa_update
from models.chunk import DocumentChunk
await session.execute(
sa_update(DocumentChunk)
.where(DocumentChunk.doc_id == doc.id)
.values(country=jur)
)
# user_tags append (중복 방지, normalize + dedup 통과)
if proposed_path:
+15
View File
@@ -59,6 +59,20 @@ class SummarizeEta(BaseModel):
eta_minutes: int | None
class MachineDone(BaseModel):
"""머신 1대의 summarize 완료 실적 (분담 표시용)."""
done_1h: int
done_today: int
class SummarizeByMachine(BaseModel):
"""summarize 풀의 머신별 완료 실적 분담 — 보드 레인의 '맥미니 vs 맥북'
오프로드 가시화용. rows_to_summarize_split 이 이미 계산하던 값의 노출
(ds-board-merged A-1, 신규 수집 SQL 0)."""
macmini: MachineDone
macbook: MachineDone
class TrendBucket(BaseModel):
"""summarize 24h 추이 버킷 — hour 는 KST "HH:00" 라벨."""
hour: str
@@ -93,6 +107,7 @@ class QueueOverviewResponse(BaseModel):
machines: list[MachineCard]
stages: list[StageRow]
summarize_eta: SummarizeEta
summarize_by_machine: SummarizeByMachine
trend_24h: list[TrendBucket]
totals: Totals
+34
View File
@@ -12,6 +12,7 @@
import asyncio
import hmac
import time
from datetime import date
from typing import Annotated, Literal
from fastapi import APIRouter, BackgroundTasks, Depends, Header, Query
@@ -31,6 +32,8 @@ from services.search.fusion_service import DEFAULT_FUSION
from services.search.grounding_check import check as grounding_check
from services.search.refusal_gate import RefusalDecision, decide as refusal_decide
from services.search import query_rewriter
from services.search.retrieval_service import AxisFilter
from services.search.result_decorate import compute_facets, decorate_version_status
from services.search.search_pipeline import PipelineResult, run_search
from services.search.synthesis_service import SynthesisResult, synthesize
from services.search.verifier_service import VerifierResult, verify
@@ -70,6 +73,14 @@ class SearchResult(BaseModel):
# PR-RAG-Time-1: freshness decay 디버그 메타. apply_freshness_decay 가 채움.
# 비적용 row 도 채워짐(freshness_policy=None). base_score 는 항상 보존.
freshness_debug: dict | None = None
# 안전 자료실 C-1: 분류 축 메타 (3 leg SELECT 에서 채움 — additive, ranking 무관).
# D-1 UI 결과 카드 유형별 렌더 + 해외 법령(B-5) 가동 시 국가 무표지 혼재 차단의 선행 조건.
material_type: str | None = None
jurisdiction: str | None = None
published_date: date | None = None
# 안전 자료실 C-1 후속: 법령 버전 상태(legal_meta.version_status) — wrapper 1회 decorate.
# law 결과만 채워짐(legal_meta 위성), 그 외/무매핑 law = None. D-1 버전 뱃지 선행.
version_status: str | None = None
# ─── Phase 0.4: 디버그 응답 스키마 ─────────────────────────
@@ -101,6 +112,9 @@ class SearchResponse(BaseModel):
query: str
mode: str
debug: SearchDebug | None = None
# 안전 자료실 C-1 후속: facets=true 일 때만 채워짐(미요청=None, byte 불변).
# top-K 결과 내 분류 축 분포 라벨 {axis: {label: count}}.
facets: dict[str, dict[str, int]] | None = None
def _to_debug_candidates(rows: list[SearchResult], n: int = 20) -> list[DebugCandidate]:
@@ -205,9 +219,23 @@ async def search(
"분리용. production 검색에는 사용 금지 (latency 큼)."
),
),
material_type: str | None = Query(
None, description="안전 자료실 C-1: 자료유형 필터 CSV (law,paper,incident,...). material_type = ANY"),
jurisdiction: str | None = Query(
None, description="안전 자료실 C-1: 관할 필터 (KR/US/EU/JP/GB/INT)"),
year_from: int | None = Query(None, ge=1900, le=2100, description="published_date 연도 하한 (NULL=created_at fallback)"),
year_to: int | None = Query(None, ge=1900, le=2100, description="published_date 연도 상한"),
facets: bool = Query(False, description="안전 자료실 C-1 후속: top-K 결과 분류 축 분포(material_type/jurisdiction/version_status)를 응답 facets 에 집계. 미지정=계산/노출 0"),
):
"""문서 검색 — FTS + ILIKE + 벡터 결합 (Phase 3.1 이후 run_search wrapper)"""
try:
axis = AxisFilter(
material_types=[m.strip() for m in material_type.split(",") if m.strip()]
if material_type else None,
jurisdiction=jurisdiction,
year_from=year_from,
year_to=year_to,
)
pr = await run_search(
session,
q,
@@ -223,6 +251,7 @@ async def search(
rewrite_backend=rewrite_backend,
corpus_variant=corpus_variant,
exact_knn=exact_knn,
axis=axis,
)
except ValueError as e:
# _resolve_backend / _resolve_reranker / _resolve_rewrite_backend / _resolve_corpus_variant unknown slug → HTTP 400
@@ -313,12 +342,17 @@ async def search(
debug_obj = _build_search_debug(pr) if debug else None
# 안전 자료실 C-1 후속 — wrapper decoration (검색 코어 무접촉, ranking 무관)
await decorate_version_status(session, pr.results) # 법령 결과에 version_status
facets_obj = compute_facets(pr.results) if facets else None
return SearchResponse(
results=pr.results,
total=len(pr.results),
query=q,
mode=pr.mode,
debug=debug_obj,
facets=facets_obj,
)
+4 -2
View File
@@ -53,8 +53,8 @@ async def lifespan(app: FastAPI):
from workers.dedup_reconcile import run as dedup_reconcile_run
from workers.digest_worker import run as global_digest_run
from workers.file_watcher import watch_inbox
from workers.law_monitor import run as law_monitor_run
from workers.mailplus_archive import run as mailplus_run
from workers.statute_collector import run as statute_run
from workers.news_collector import run as news_collector_run
from workers.fulltext_worker import reconcile_unresolved as fulltext_reconcile_run
from workers.kosha_collector import run as kosha_collector_run
@@ -120,7 +120,9 @@ async def lifespan(app: FastAPI):
# safety > law > manual 우선순위로 25건씩. 6720 레거시 → 야간당 ~150건 → 약 45일 소화.
scheduler.add_job(tier_backfill_run, "interval", minutes=30, id="tier_backfill")
# 일일 스케줄 (KST)
scheduler.add_job(law_monitor_run, CronTrigger(hour=7, timezone=KST), id="law_monitor")
# statute_collector = 구 law_monitor 대체 (safety-library-1 B-1 PR②) — poll→ingest→
# 생애주기 잡(버전 시리즈 승격·supersede·레거시 스윕·repeal) 통째 (R8-B1).
scheduler.add_job(statute_run, CronTrigger(hour=7, timezone=KST), id="statute_collector")
scheduler.add_job(mailplus_run, CronTrigger(hour=7, timezone=KST), id="mailplus_morning")
scheduler.add_job(mailplus_run, CronTrigger(hour=18, timezone=KST), id="mailplus_evening")
scheduler.add_job(daily_digest_run, CronTrigger(hour=20, timezone=KST), id="daily_digest")
+12 -2
View File
@@ -1,9 +1,9 @@
"""documents 테이블 ORM"""
from datetime import datetime
from datetime import date, datetime
from pgvector.sqlalchemy import Vector
from sqlalchemy import BigInteger, Boolean, DateTime, Enum, ForeignKey, Integer, String, Text
from sqlalchemy import BigInteger, Boolean, Date, DateTime, Enum, ForeignKey, Integer, String, Text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
@@ -146,6 +146,16 @@ class Document(Base):
# /accept-suggestion 승인 시에만 category / user_tags 반영 (자동 전이 금지)
ai_suggestion: Mapped[dict | None] = mapped_column(JSONB)
# === 안전 자료실 분류 축 (plan safety-library-1, migrations 340~345) ===
# 자료유형 — law/paper/book/incident/manual/standard/guide (TEXT+CHECK, enum 아님).
# 수집기 ingest 시점 deterministic 부여 (classify-skip 경로 다수 — classify_worker 의존 금지).
# AI 라우팅(subject_domain) 매칭 키 사용 금지 (axis separation — category 와 동일 불변식).
material_type: Mapped[str | None] = mapped_column(Text)
# 관할 — KR/US/EU/JP/GB/INT. law 는 CHECK 로 jurisdiction NOT NULL 구조 강제 (migration 344).
jurisdiction: Mapped[str | None] = mapped_column(Text)
# 유형별 대표 날짜 — 법령=COALESCE(시행일, 공포일) / 논문=발행일 / 재해=발생일
published_date: Mapped[date | None] = mapped_column(Date)
# PR-B B-1: summary_triage (4B, 상시) / summary_deep (26B, 에스컬레이션) 분할 산출
ai_tldr: Mapped[str | None] = mapped_column(Text) # ≤60자 TL;DR
ai_bullets: Mapped[list | None] = mapped_column(JSONB) # 3~5개 핵심 bullets
+73
View File
@@ -0,0 +1,73 @@
"""legal_acts / legal_meta 테이블 ORM — 법령 레지스트리(워치리스트 겸) + 버전 위성
plan: safety-library-1 (migrations 346~347).
- legal_acts = 폴링 순회 대상 목록이 테이블 (news_sources 패턴의 법령판).
KOSHA GUIDE(비법령)·KGS Code(watch-폴더 단독 트랙) 비대상.
- legal_meta = 법령 문서 1버전(또는 별표·해석례 1) 1, documents 1:0..1 위성.
version_status 전이는 statute_collector 일일 잡이 유일한 코드 지점
( 버전 pending 적재 잡이 승격·supersede·repeal 트랜잭션 처리).
"""
from datetime import date, datetime
from sqlalchemy import BigInteger, Boolean, Date, DateTime, ForeignKey, Text, UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column
from core.database import Base
class LegalAct(Base):
__tablename__ = "legal_acts"
# 'kr-law:{법령ID}' / 'us-cfr:29-1910' 형식. KGS 는 시드 비대상 (R3-M5).
family_id: Mapped[str] = mapped_column(Text, primary_key=True)
# 어댑터 상수 고정값 — 파싱 결과에서 추론 금지 (코어가 적재 직전 assert)
jurisdiction: Mapped[str] = mapped_column(Text, nullable=False)
# statute(법률) / decree(시행령) / rule(시행규칙·부령) / admin_rule(고시·예규) / code(법정 위임 상세기준)
law_level: Mapped[str] = mapped_column(Text, nullable=False)
title: Mapped[str] = mapped_column(Text, nullable=False)
title_ko: Mapped[str | None] = mapped_column(Text)
# 법률 → 시행령 → 시행규칙 계층
parent_family_id: Mapped[str | None] = mapped_column(ForeignKey("legal_acts.family_id"))
# 법령ID / CFR part / CELEX / e-Gov law_id 등 소스 고유 식별자
native_id: Mapped[str] = mapped_column(Text, nullable=False)
# 'law.go.kr' / 'ecfr' / 'cellar' / 'egov_v2' / 'leg_gov_uk'
source_api: Mapped[str] = mapped_column(Text, nullable=False)
# 시드 26개 전부 true — '우선순위'는 정렬일 뿐 watch 제외 아님 (R3-B1)
watch: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
poll_cycle: Mapped[str] = mapped_column(Text, nullable=False, default="daily")
# 변경이력 폴링 워터마크 — 파싱 검증 통과 후에만 영속
watermark: Mapped[str | None] = mapped_column(Text)
# 어댑터는 폐지 감지 마킹만, repealed 전이는 일일 잡 (R3-M3)
repeal_detected_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now, onupdate=datetime.now
)
class LegalMeta(Base):
__tablename__ = "legal_meta"
__table_args__ = (
# 버전 dedup 구조 강제 — annex 는 version_key='MST|별표N' 합성형 (R3-M4)
UniqueConstraint("family_id", "law_doc_kind", "version_key", name="uq_legal_meta_version"),
)
document_id: Mapped[int] = mapped_column(
BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), primary_key=True
)
family_id: Mapped[str] = mapped_column(
ForeignKey("legal_acts.family_id"), nullable=False
)
# primary(본문) / annex(별표·서식) / interpretation(해석례)
law_doc_kind: Mapped[str] = mapped_column(Text, nullable=False, default="primary")
version_key: Mapped[str] = mapped_column(Text, nullable=False)
promulgation_date: Mapped[date | None] = mapped_column(Date)
effective_date: Mapped[date | None] = mapped_column(Date)
# pending → current → superseded / repealed. 전이는 일일 잡 단일 지점, KST 기준.
version_status: Mapped[str] = mapped_column(Text, nullable=False, default="pending")
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=datetime.now
)
+9
View File
@@ -53,3 +53,12 @@ class NewsSource(Base):
name="source_channel"),
default="news",
)
# ── 안전 자료실 분류 축 (plan safety-library-1 A-2, migrations 352~355) ──
# 자료유형 기본값 — documents.material_type 으로 ingest 시점 전파 (NULL=비대상).
# jurisdiction 은 별도 컬럼 없이 country 전파, 단 paper 는 코드에서 NULL 강제.
material_type: Mapped[str | None] = mapped_column(Text)
# extract_meta.license 주입용 — kogl/ogl/public_domain/proprietary/unknown.
# 미확정 = 보수적(unknown + redistribute=false), 근거 확보 시 완화.
license_scheme: Mapped[str | None] = mapped_column(Text)
license_redistribute: Mapped[bool | None] = mapped_column(Boolean)
+7 -2
View File
@@ -15,11 +15,12 @@ from sqlalchemy import text
from core.database import async_session
from core.utils import setup_logger
from services.search.license_filter import restricted_exclude_sql
logger = setup_logger("briefing_loader")
_NEWS_WINDOW_SQL = text("""
_NEWS_WINDOW_SQL = text(f"""
SELECT
d.id,
d.title,
@@ -41,6 +42,8 @@ _NEWS_WINDOW_SQL = text("""
AND d.created_at < :window_end
AND d.embedding IS NOT NULL
AND d.ai_summary IS NOT NULL
-- 안전 자료실 B-4: licensed_restricted 발행 차단 (digest 동일 공유 술어, 경로 일관성)
AND {restricted_exclude_sql("d")}
""")
@@ -49,7 +52,7 @@ _SOURCE_COUNTRY_SQL = text("""
""")
_HISTORICAL_CANDIDATES_SQL = text("""
_HISTORICAL_CANDIDATES_SQL = text(f"""
SELECT
d.id,
d.title,
@@ -63,6 +66,8 @@ _HISTORICAL_CANDIDATES_SQL = text("""
AND d.created_at < :hist_end
AND d.embedding IS NOT NULL
AND d.ai_summary IS NOT NULL
-- 안전 자료실 B-4: licensed_restricted 발행 차단 (공유 술어)
AND {restricted_exclude_sql("d")}
""")
+5 -1
View File
@@ -15,11 +15,12 @@ from sqlalchemy import text
from core.database import async_session
from core.utils import setup_logger
from services.search.license_filter import restricted_exclude_sql
logger = setup_logger("digest_loader")
_NEWS_WINDOW_SQL = text("""
_NEWS_WINDOW_SQL = text(f"""
SELECT
d.id,
d.title,
@@ -41,6 +42,9 @@ _NEWS_WINDOW_SQL = text("""
AND d.created_at < :window_end
AND d.embedding IS NOT NULL
AND d.ai_summary IS NOT NULL
-- 안전 자료실 B-4: licensed_restricted 발행 차단 (모든 경로 공유 술어 = license_filter).
-- news 채널엔 현재 restricted 부재 = 방어적 게이트(미래 유료 news 소스 대비, 경로 누락 방지).
AND {restricted_exclude_sql("d")}
""")
+11
View File
@@ -213,6 +213,16 @@ def build_summarize_eta(stage_stats: dict[str, dict]) -> dict:
}
def build_summarize_by_machine(summarize_split: dict[str, dict]) -> dict:
"""summarize 머신별 완료 실적 분담 (macmini vs macbook) — 보드 레인의
오프로드 가시화용. rows_to_summarize_split 이미 만든 값을 응답 형태로
투영(done_1h/done_today , done_15m 내부 state 판정 전용이라 제외)."""
def m(key: str) -> dict:
s = summarize_split.get(key, {})
return {"done_1h": int(s.get("done_1h", 0)), "done_today": int(s.get("done_today", 0))}
return {"macmini": m("macmini"), "macbook": m("macbook")}
def build_trend(
inflow_buckets: dict[str, int],
done_buckets: dict[str, int],
@@ -292,6 +302,7 @@ def compose_overview(
),
"stages": build_stages(stage_stats),
"summarize_eta": build_summarize_eta(stage_stats),
"summarize_by_machine": build_summarize_by_machine(summarize_split),
"trend_24h": build_trend(inflow_buckets, done_buckets, now_kst),
"totals": build_totals(stage_stats),
}
+8
View File
@@ -72,6 +72,10 @@ class LegacyWeightedSum(FusionStrategy):
score=existing.score + r.score * 0.5,
snippet=existing.snippet,
match_reason=f"{existing.match_reason}+vector",
# C-1: 분류 축 메타 전파 (재구성 시 누락 = D-1 유형 표시 None)
material_type=existing.material_type,
jurisdiction=existing.jurisdiction,
published_date=existing.published_date,
)
elif r.score > 0.3:
merged[r.id] = r
@@ -128,6 +132,10 @@ class RRFOnly(FusionStrategy):
score=rrf_score,
snippet=base.snippet,
match_reason="+".join(reasons),
# C-1: 분류 축 메타 전파 (재구성 시 누락 = D-1 유형 표시 None)
material_type=base.material_type,
jurisdiction=base.jurisdiction,
published_date=base.published_date,
)
)
return merged[:limit]
+28
View File
@@ -0,0 +1,28 @@
"""안전 자료실 B-4 — licensed_restricted 단일 술어 (a안 U-2①, 모든 경로 공유 정의).
색인은 허용하되 restricted=true(구매 전자책·유료자료) verbatim span RAG 증거·발행물
(검색/ask·digest·morning_briefing·study 풀이) 들어가는 모든 경로를 구조적으로 차단.
경로마다 술어를 복붙하지 않고 정의를 공유 가드 누락/드리프트 방지
([[feedback_structural_integrity_over_path_discipline]]).
개인 파일 열람(GET /documents/{id}?download) a안상 허용 = 미적용.
표현(raw SQL / ORM) 의미 동일: restricted 부재·false·extract_meta NULL = COALESCE
미제외(redistribute=false 여도 restricted 부재면 미제외 redistributerestricted 핵심).
"""
def restricted_exclude_sql(alias: str = "") -> str:
"""raw text() 쿼리용 bare 술어('AND' 미포함). alias='' = 컬럼 직접 참조."""
p = (alias + ".") if alias else ""
return f"COALESCE({p}extract_meta -> 'license' ->> 'restricted', 'false') <> 'true'"
def restricted_exclude_orm():
"""SQLAlchemy ORM .where() 절 — restricted_exclude_sql 과 동일 의미(JSONB extract_meta)."""
from sqlalchemy import func
from models.document import Document
return func.coalesce(
Document.extract_meta["license"]["restricted"].astext, "false"
) != "true"
+55
View File
@@ -0,0 +1,55 @@
"""안전 자료실 C-1 후속 — 검색 결과 wrapper decoration (version_status + facets).
엔드포인트 wrapper 에서 run_search() 결과에 1 적용 검색 코어(run_search) 무접촉(r3).
- version_status: 법령 결과(material_type='law') legal_meta.version_status
(current/superseded/pending/repealed) 부착. legal_meta.document_id 1:0..1 위성
매핑 없는 law(레거시 ) None 유지. law 결과 없으면 query skip.
- facets: top-K 결과 분류 (material_type/jurisdiction/version_status) 분포 라벨(r2-M4).
facets=true 때만 계산(미요청 None = byte 불변·ranking 무관).
"""
from __future__ import annotations
from collections import Counter
from typing import TYPE_CHECKING
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession
if TYPE_CHECKING:
from api.search import SearchResult
async def decorate_version_status(
session: AsyncSession, results: list["SearchResult"]
) -> None:
"""법령 결과에 legal_meta.version_status 부착 (in-place). law 결과 없으면 query skip."""
law_ids = [r.id for r in results if r.material_type == "law" and r.id is not None]
if not law_ids:
return
rows = await session.execute(
text(
"SELECT document_id, version_status FROM legal_meta "
"WHERE document_id = ANY(:ids)"
),
{"ids": law_ids},
)
status_by_id = {row.document_id: row.version_status for row in rows}
for r in results:
if r.id in status_by_id:
r.version_status = status_by_id[r.id]
def compute_facets(results: list["SearchResult"]) -> dict[str, dict[str, int]]:
"""top-K 결과의 분류 축 분포 라벨. None 값은 제외(present 라벨만, 빈 축은 미포함)."""
axes = {
"material_type": [r.material_type for r in results],
"jurisdiction": [r.jurisdiction for r in results],
"version_status": [getattr(r, "version_status", None) for r in results],
}
facets: dict[str, dict[str, int]] = {}
for axis, vals in axes.items():
counter = Counter(v for v in vals if v is not None)
if counter:
facets[axis] = dict(counter.most_common())
return facets
+102 -12
View File
@@ -24,6 +24,7 @@ import asyncio
import hashlib
import re
import time
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any
from sqlalchemy import text
@@ -98,6 +99,63 @@ QWEN3_QUERY_INSTRUCT = (
"\nQuery: "
)
# ─── 안전 자료실 C-1: 분류 축 명시 필터 (3 leg 동등, byte 불변) ───────────────
# 미지정(active=False) 시 모든 SQL 절이 빈 문자열 → 기존 SQL byte 불변(run_eval 회귀 0).
# year 는 published_date NULL fallback created_at (freshness 와 동일 COALESCE 사상).
@dataclass
class AxisFilter:
material_types: list[str] | None = None # CSV → list, material_type = ANY
jurisdiction: str | None = None
year_from: int | None = None
year_to: int | None = None
def active(self) -> bool:
return bool(self.material_types or self.jurisdiction
or self.year_from is not None or self.year_to is not None)
def _axis_sql(alias: str, af: "AxisFilter | None", params: dict) -> str:
"""alias 기준 axis 필터 SQL — 미지정 시 '' (byte 불변). 반환 형태 ' AND ...'.
alias='' 이면 컬럼 직접 참조(단일 테이블 FROM documents 경로). 파라미터는 af_ prefix
호출측 기존 bind 충돌 방지.
"""
if af is None or not af.active():
return ""
p = (alias + ".") if alias else ""
cl: list[str] = []
if af.material_types:
cl.append(f"{p}material_type = ANY(:af_mt)")
params["af_mt"] = af.material_types
if af.jurisdiction:
cl.append(f"{p}jurisdiction = :af_jur")
params["af_jur"] = af.jurisdiction
if af.year_from is not None:
cl.append(f"COALESCE({p}published_date, {p}created_at::date) >= make_date(:af_yf, 1, 1)")
params["af_yf"] = af.year_from
if af.year_to is not None:
cl.append(f"COALESCE({p}published_date, {p}created_at::date) <= make_date(:af_yt, 12, 31)")
params["af_yt"] = af.year_to
return " AND " + " AND ".join(cl)
# ─── 안전 자료실 B-4: licensed_restricted 단일 술어 (a안 U-2① — 항상 적용) ──────
def _license_sql(alias: str) -> str:
"""licensed_restricted(extract_meta.license.restricted=true) 문서를 retrieval 에서 제외.
a안: 색인은 허용하되, 구매 전자책/유료자료의 verbatim span RAG 증거·digest 발행에
들어가는 경로를 구조적으로 차단. 단일 술어를 모든 retrieval leg + digest loader
공유 경로별 가드 누락 방지([[feedback_structural_integrity_over_path_discipline]]).
개인 파일 열람(GET /documents/{id}?download) a안상 허용이라 미적용.
axis 필터(조건부) 달리 항상 적용. restricted 부재/false = COALESCE 미제외
기존 코퍼스(restricted=true 0)에서 결과 불변. 반환 ' AND ...' (alias='' = 컬럼 직접).
술어 정의 = license_filter.restricted_exclude_sql 공유(digest/briefing/study 풀이와 단일 source).
"""
from services.search.license_filter import restricted_exclude_sql
return " AND " + restricted_exclude_sql(alias)
# 2단계 gate (R2-B1) — SQL string interpolation 직전 final allowlist.
_VALID_DOCS_TABLE = re.compile(r"^(documents|documents_cand_[a-z0-9_]+)$")
# corpus_chunks = document_chunks WHERE in_corpus=true 뷰 (Hier-Decomp-1 c2 choke point).
@@ -235,7 +293,7 @@ def query_embed_cache_stats() -> dict[str, int]:
async def search_text(
session: AsyncSession, query: str, limit: int
session: AsyncSession, query: str, limit: int, *, axis: "AxisFilter | None" = None
) -> list["SearchResult"]:
"""FTS + trigram 필드별 가중치 검색 (Phase 1.2-B UNION 분해).
@@ -266,8 +324,12 @@ async def search_text(
# SQLAlchemy async session 내 두 execute는 같은 connection 사용
await session.execute(text("SELECT set_limit(0.15)"))
_params: dict[str, Any] = {"q": query, "limit": limit}
# license(항상) + axis(조건부). license 가 항상 ' AND ...' 이라 WHERE 는 늘 존재.
_where = _license_sql("d") + _axis_sql("d", axis, _params)
result = await session.execute(
text("""
text(f"""
WITH candidates AS (
-- title trigram (idx_documents_title_trgm)
SELECT id FROM documents
@@ -320,13 +382,15 @@ async def search_text(
WHEN similarity(coalesce(d.ai_summary, ''), :q) >= 0.3 THEN 'summary'
WHEN similarity(coalesce(d.extracted_text, ''), :q) >= 0.3 THEN 'content'
ELSE 'fts'
END AS match_reason
END AS match_reason,
d.material_type, d.jurisdiction, d.published_date
FROM documents d
JOIN candidates c ON d.id = c.id
WHERE{_where[4:]}
ORDER BY score DESC
LIMIT :limit
"""),
{"q": query, "limit": limit},
_params,
)
return [SearchResult(**row._mapping) for row in result]
@@ -341,6 +405,7 @@ async def search_vector(
snapshot_chunk_id_max: int | None = None,
corpus_variant: str | None = None,
exact_knn: bool = False,
axis: "AxisFilter | None" = None,
) -> list["SearchResult"]:
"""Hybrid 벡터 검색 — doc + chunks 동시 retrieval (Phase 1.2-G).
@@ -415,6 +480,7 @@ async def search_vector(
docs_table=docs_table,
snapshot_doc_id_max=snapshot_doc_id_max,
exact_knn=exact_knn,
axis=axis,
)
async def _chunks_call() -> list["SearchResult"]:
@@ -424,6 +490,7 @@ async def search_vector(
chunks_table=chunks_table,
snapshot_chunk_id_max=snapshot_chunk_id_max,
exact_knn=exact_knn,
axis=axis,
)
doc_results, chunk_results = await asyncio.gather(_docs_call(), _chunks_call())
@@ -439,6 +506,7 @@ async def _search_vector_docs(
docs_table: str = "documents",
snapshot_doc_id_max: int | None = None,
exact_knn: bool = False,
axis: "AxisFilter | None" = None,
) -> list["SearchResult"]:
"""documents (또는 documents_cand_<slug>).embedding 직접 검색.
@@ -463,28 +531,34 @@ async def _search_vector_docs(
if snapshot_doc_id_max is not None:
snapshot_clause = " AND id <= :snapshot_doc_id_max"
params["snapshot_doc_id_max"] = snapshot_doc_id_max
axis_clause = _axis_sql("", axis, params) # alias 없음 (단일 FROM documents)
license_clause = _license_sql("") # B-4: restricted 항상 제외
sql = f"""
SELECT id, title, ai_domain, ai_summary, file_format,
(1 - (embedding <=> cast(:embedding AS vector))) AS score,
left(extracted_text, 1200) AS snippet,
'vector_doc' AS match_reason,
NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title
NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title,
material_type, jurisdiction, published_date
FROM documents
WHERE embedding IS NOT NULL AND deleted_at IS NULL{snapshot_clause}
WHERE embedding IS NOT NULL AND deleted_at IS NULL{snapshot_clause}{axis_clause}{license_clause}
ORDER BY embedding <=> cast(:embedding AS vector)
LIMIT :limit
"""
else:
# candidate: docs_table 은 (doc_id, embed_input, embed_input_hash, embedding) 만 보유 → JOIN documents
axis_clause = _axis_sql("d", axis, params)
license_clause = _license_sql("d") # B-4: restricted 항상 제외
sql = f"""
SELECT d.id, d.title, d.ai_domain, d.ai_summary, d.file_format,
(1 - (c.embedding <=> cast(:embedding AS vector))) AS score,
left(d.extracted_text, 1200) AS snippet,
'vector_doc' AS match_reason,
NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title
NULL::bigint AS chunk_id, NULL::integer AS chunk_index, NULL::text AS section_title,
d.material_type, d.jurisdiction, d.published_date
FROM {docs_table} c
JOIN documents d ON d.id = c.doc_id
WHERE d.deleted_at IS NULL
WHERE d.deleted_at IS NULL{axis_clause}{license_clause}
ORDER BY c.embedding <=> cast(:embedding AS vector)
LIMIT :limit
"""
@@ -500,6 +574,7 @@ async def _search_vector_chunks(
chunks_table: str = "document_chunks",
snapshot_chunk_id_max: int | None = None,
exact_knn: bool = False,
axis: "AxisFilter | None" = None,
) -> list["SearchResult"]:
"""document_chunks (또는 document_chunks_cand_<slug>).embedding window partition.
@@ -525,12 +600,25 @@ async def _search_vector_chunks(
snapshot_clause = " AND c.id <= :snapshot_chunk_id_max"
params["snapshot_chunk_id_max"] = snapshot_chunk_id_max
# C-1: axis 필터는 inner topk 에 JOIN (R6 결정 — outer post-filter 면 ANN top-:inner_k
# 후보를 뽑은 뒤 거르므로 좁은 필터(GB 법령 등)에서 후보 붕괴). 미지정 시 JOIN 없음 = byte 불변.
if axis and axis.active():
chunk_join = " JOIN documents df ON df.id = c.doc_id"
chunk_axis = _axis_sql("df", axis, params)
else:
chunk_join = ""
chunk_axis = ""
# B-4: restricted 제외 — outer 가 documents d 를 항상 JOIN 하므로 post-rank 위치.
# restricted 는 소수(구매자료)라 inner topk 후 제외해도 candidate collapse 없음(axis 와 상이).
license_clause = _license_sql("d")
sql = f"""
WITH topk AS (
SELECT c.id AS chunk_id, c.doc_id, c.chunk_index, c.section_title, c.text,
c.embedding <=> cast(:embedding AS vector) AS dist
FROM {chunks_table} c
WHERE c.embedding IS NOT NULL{snapshot_clause}
FROM {chunks_table} c{chunk_join}
WHERE c.embedding IS NOT NULL{snapshot_clause}{chunk_axis}
ORDER BY c.embedding <=> cast(:embedding AS vector)
LIMIT :inner_k
),
@@ -543,10 +631,12 @@ async def _search_vector_chunks(
d.ai_summary AS ai_summary, d.file_format AS file_format,
(1 - r.dist) AS score, left(r.text, 1200) AS snippet,
'vector_chunk' AS match_reason,
r.chunk_id AS chunk_id, r.chunk_index AS chunk_index, r.section_title AS section_title
r.chunk_id AS chunk_id, r.chunk_index AS chunk_index, r.section_title AS section_title,
d.material_type AS material_type, d.jurisdiction AS jurisdiction,
d.published_date AS published_date
FROM ranked r
JOIN documents d ON d.id = r.doc_id
WHERE r.rn <= 2 AND d.deleted_at IS NULL
WHERE r.rn <= 2 AND d.deleted_at IS NULL{license_clause}
ORDER BY r.dist
LIMIT :limit
"""
+9 -1
View File
@@ -47,6 +47,7 @@ from .rerank_service import (
rerank_chunks,
)
from .retrieval_service import (
AxisFilter,
compress_chunks_to_docs,
search_text,
search_vector,
@@ -148,6 +149,7 @@ async def run_search(
rewrite_backend: str | None = None,
corpus_variant: str | None = None,
exact_knn: bool = False,
axis: AxisFilter | None = None,
) -> PipelineResult:
"""검색 파이프라인 실행.
@@ -275,6 +277,7 @@ async def run_search(
snapshot_chunk_id_max=snapshot_chunk_id_max,
corpus_variant=corpus_variant,
exact_knn=exact_knn,
axis=axis,
)
timing["vector_ms"] = (time.perf_counter() - t0) * 1000
if not raw_chunks:
@@ -284,7 +287,7 @@ async def run_search(
results = vector_results
else:
t0 = time.perf_counter()
text_results = await search_text(session, q, limit)
text_results = await search_text(session, q, limit, axis=axis)
timing["text_ms"] = (time.perf_counter() - t0) * 1000
if mode == "hybrid":
@@ -306,6 +309,7 @@ async def run_search(
snapshot_chunk_id_max=snapshot_chunk_id_max,
corpus_variant=corpus_variant,
exact_knn=exact_knn,
axis=axis,
)
timing["vector_ms"] = (time.perf_counter() - t1) * 1000
@@ -458,6 +462,10 @@ def _rrf_fuse_variants(
score=rrf_score,
snippet=doc.snippet,
match_reason=f"{doc.match_reason}+multi_query_rrf",
# C-1: 분류 축 메타 전파 (SearchResult 재구성 지점 — fusion 2곳과 동기)
material_type=doc.material_type,
jurisdiction=doc.jurisdiction,
published_date=doc.published_date,
))
return fused[:limit]
+4
View File
@@ -24,6 +24,7 @@ from models.chunk import DocumentChunk
from models.document import Document
from models.study_question import StudyQuestion
from models.study_topic import StudyTopicDocument
from services.search.license_filter import restricted_exclude_orm
logger = logging.getLogger(__name__)
@@ -124,11 +125,14 @@ async def _gather_document_evidence(
return []
# 매핑된 documents 메타 (제목·요약 표기)
# B-4: licensed_restricted 제외 → valid_doc_ids 에서 빠지므로 아래 청크 쿼리(doc_id IN)도
# 자동 차단. study 풀이 RAG 도 retrieval/digest 와 동일 단일 술어 공유(a안 U-2①).
doc_meta_rows = (
await session.execute(
select(Document.id, Document.title, Document.ai_summary).where(
Document.id.in_(doc_ids),
Document.deleted_at.is_(None),
restricted_exclude_orm(),
)
)
).all()
+6
View File
@@ -175,10 +175,16 @@ async def _ingest_detail(session, source: NewsSource, url: str) -> str:
ai_domain="Engineering",
ai_sub_group=_SOURCE_NAME,
ai_tags=["Engineering/API 표준 공지"],
# 안전 자료실 A-2 — 표준 '공지' = standard (코드 본문 아님 — ASME/API 본문은 paywall)
material_type="standard",
jurisdiction="US",
published_date=pub_dt.date() if pub_dt else None,
extract_meta={
"source_id": source.id,
"source_name": _SOURCE_NAME,
"published_at": pub_dt.isoformat() if pub_dt else None,
"license": {"scheme": "proprietary", "redistribute": False,
"attribution": "American Petroleum Institute"},
"fulltext": {
"status": "api_announcement",
"engine": engine,
+4
View File
@@ -311,6 +311,10 @@ async def process(document_id: int, session: AsyncSession) -> None:
country, source, src_lang = await _lookup_news_source(session, doc)
if src_lang:
language = src_lang
# 안전 자료실 A-2 — 뉴스 lookup 미해당(crawl/law/업로드) 문서는 jurisdiction 을
# chunk.country 미러로 (leg 간 국가 일치. EU/INT 도 이 경로로 첫 유입 — String(10) 수용).
if country is None and doc.jurisdiction:
country = doc.jurisdiction
domain_category = "news" if doc.source_channel == "news" else "document"
# 기존 chunks 삭제 (재처리)
+27
View File
@@ -62,6 +62,15 @@ FACET_DOCTYPES = {"발주서", "세금계산서", "명세표", "도면", "증명
# 자료실 자동 분류 제안 대상 (거래 하위)
LIBRARY_SUGGESTION_DOCTYPES = {"발주서", "세금계산서", "명세표"}
# 안전 자료실 A-2 — document_type → material_type 결정적 매핑 (제안 전용, 자동 전이 금지).
# 모호한 doctype(Reference/Report 등)은 매핑하지 않음 — 무리한 전수 분류 시도 금지 (plan 0-1).
_DOCTYPE_TO_MATERIAL = {
"Law_Document": "law",
"Academic_Paper": "paper",
"Manual": "manual",
"Standard": "standard",
}
# PR-B prompt_version task 이름
SUMMARY_TRIAGE_TASK = "p3a_short_summary"
@@ -492,6 +501,24 @@ async def process(
if not doc.document_type:
doc.document_type = doc_type if doc_type in DOCUMENT_TYPES else "Note"
# ─── 안전 자료실 A-2: material_type 제안 (업로드 경로 — LLM 직접 부여 금지) ───
# document_type → material_type 결정적 매핑만 제안으로 적재 (프롬프트 변경 0).
# 승인(accept-suggestion) 시에만 전이 — law 는 국가 필수 입력 (KR 기본값 오염 차단,
# 자동 전이 금지 사상은 category 와 동일). 수집기 deterministic 경로는 이미 채워져
# 있어(material_type IS NOT NULL) 본 제안 비대상. 거래문서 제안(ai_suggestion 점유)과
# 충돌 시 기존 제안 우선 (두 제안이 겹치는 문서는 실무상 없음 — 거래 vs 안전자료).
_mt_prop = _DOCTYPE_TO_MATERIAL.get(doc.document_type or "")
if _mt_prop and doc.material_type is None and doc.ai_suggestion is None:
doc.ai_suggestion = {
"proposed_material_type": _mt_prop,
"proposed_jurisdiction": None,
"confidence": doc.ai_confidence,
"source_updated_at": (
doc.updated_at.isoformat() if doc.updated_at else None
),
"reason": "document_type→material_type 결정적 매핑",
}
# confidence
confidence = parsed.get("confidence", 0.5)
doc.ai_confidence = max(0.0, min(1.0, float(confidence)))
+12 -1
View File
@@ -202,7 +202,12 @@ async def _ingest_pdf(session, page_slug: str, pdf_url: str) -> bool:
import_source="csb_sitemap",
edit_url=pdf_url,
ai_tags=["Safety/CSB/보고서"],
extract_meta={"csb": {"page_slug": page_slug, "kind": "report_pdf"}},
# 안전 자료실 A-2 — ingest 시점 deterministic. CSB = 미 연방기관 = public domain.
material_type="incident",
jurisdiction="US",
extract_meta={"csb": {"page_slug": page_slug, "kind": "report_pdf"},
"license": {"scheme": "public_domain", "redistribute": True,
"attribution": "U.S. Chemical Safety Board"}},
)
session.add(doc)
await session.flush()
@@ -290,10 +295,16 @@ async def _ingest_url(session, source: NewsSource, url: str, lastmod: datetime)
ai_domain="Safety",
ai_sub_group=_SOURCE_NAME,
ai_tags=["Safety/CSB"],
# 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
material_type="incident",
jurisdiction="US",
published_date=lastmod.date() if lastmod else None,
extract_meta={
"source_id": source.id,
"source_name": _SOURCE_NAME,
"published_at": lastmod.isoformat(),
"license": {"scheme": "public_domain", "redistribute": True,
"attribution": "U.S. Chemical Safety Board"},
"fulltext": {
"status": "csb_sitemap",
"engine": engine,
+38
View File
@@ -58,6 +58,23 @@ SCAN_TARGETS: list[tuple[str, str | None]] = [
("Videos", "video"),
]
# 안전 자료실 A-2/B-4 — watch 타깃별 (material_type, jurisdiction, license) deterministic 축.
# 키 = 타깃 경로의 마지막 성분. license = extract_meta.license 에 그대로 주입(None=미주입).
# restricted=true → retrieval_service._license_sql 가 RAG 증거·digest 에서 제외(a안 U-2① —
# 구매자료 verbatim span 차단, 색인 자체는 허용. 개인 파일 열람은 미차단).
# 사용자 결정(2026-06-13): Books/Papers=proprietary+restricted / Manuals=proprietary·restricted=false
# (검색·RAG 활용) / KGS=법정 위임 상세기준 law/KR·KOGL 공공·restricted 아님.
_TARGET_AXIS: dict[str, tuple[str, str | None, dict | None]] = {
"KGS_Code": ("law", "KR", {"scheme": "kogl", "redistribute": True,
"restricted": False, "attribution": "한국가스안전공사(KGS)"}),
"Books": ("book", None, {"scheme": "proprietary", "redistribute": False,
"restricted": True, "attribution": "구매 도서"}),
"Papers_Purchased": ("paper", None, {"scheme": "proprietary", "redistribute": False,
"restricted": True, "attribution": "구매 논문"}),
"Manuals": ("manual", None, {"scheme": "proprietary", "redistribute": False,
"restricted": False, "attribution": "기술 매뉴얼"}),
}
def should_skip(path: Path) -> bool:
if path.name in SKIP_NAMES or path.name.startswith("._"):
@@ -242,6 +259,11 @@ async def watch_inbox():
if not scan_root.exists():
continue
# 안전 자료실 A-2/B-4 — 타깃 폴더 기반 (material, jurisdiction, license)
target_mt, target_jur, target_license = _TARGET_AXIS.get(
Path(sub).name, (None, None, None)
)
for file_path in scan_root.rglob("*"):
if not file_path.is_file() or should_skip(file_path):
continue
@@ -275,7 +297,14 @@ async def watch_inbox():
source_channel="drive_sync",
category=category,
needs_conversion=needs_conversion,
# 안전 자료실 A-2/B-4 — watch 타깃 매핑 (KGS=law/KR 등, 비대상=NULL)
material_type=target_mt,
jurisdiction=target_jur,
)
# B-4 — 타깃 폴더 license 주입(restricted 포함, 비대상=미주입). classify 는
# material_type IS NULL 일 때만 제안 + extract_meta 미기록이라 주입 보존.
if target_license:
doc.extract_meta = {"license": dict(target_license)}
session.add(doc)
await session.flush()
@@ -291,6 +320,15 @@ async def watch_inbox():
existing.category = category
if needs_conversion and not getattr(existing, "needs_conversion", False):
existing.needs_conversion = True
# B-4 — 축/license 보정(B-4 이전 적재분이 재변경 시): material 미설정 시 주입,
# license 부재 시에만 merge 주입(clobber 회피 — 기존 extract_meta 키 보존).
if existing.material_type is None and target_mt is not None:
existing.material_type = target_mt
existing.jurisdiction = target_jur
if target_license and not (existing.extract_meta or {}).get("license"):
meta = dict(existing.extract_meta or {})
meta["license"] = dict(target_license)
existing.extract_meta = meta
if next_stage:
await enqueue_stage(session, existing.id, next_stage)
+141 -4
View File
@@ -1,14 +1,17 @@
"""C-2 KOSHA Open API 수집 워커 (plan crawl-24x7-1).
3 API (2026-06-10 실키 live 검증 + fixture 박제 tests/fixtures/kosha_*_response.json):
4 API (2026-06-10/06-13 실키 live 검증 + fixture 박제 tests/fixtures/kosha_*_response.json):
재해사례 게시판: GET /B552468/disaster_api02/getdisaster_api02 callApiId=1060
재해사례 첨부: GET /B552468/disaster_attach_api02/Disaster_attach_api02 callApiId=1070
KOSHA GUIDE: GET /B552468/koshaguide/getKoshaGuide callApiId=1050
사망사고 속보: GET /B552468/news_api02/getNews_api02 callApiId=1040
daily 스케줄 1 (main.py):
재해사례 = 최근 페이지만 diff (boardno dedup) 사례 본문 Document(텍스트 네이티브)
+ 첨부 PDF/HWP 다운로드 /documents/crawl_raw/kosha/{boardno}/ 저장
파일 Document + extract enqueue (kordoc HWP/PDF 기존 파이프라인 재사용).
사망사고 = 최근 페이지만 diff (arno dedup) 속보 본문 Document(HTML _clean_html).
첨부 API 없음·business 필드 없음. 등록일 = arno 접두 8자리(YYYYMMDD).
GUIDE = 전체 레지스트리 메타 diff (1039, 100/page = 11 call) 신규/개정만,
일일 ingest cap(기본 25) = backlog 자동 점진 백필(~6) + 부하 평탄화.
cap 으로 미처리 잔량은 매회 로그 (silent cap 금지).
@@ -23,7 +26,7 @@ import hashlib
import os
import random
import re
from datetime import datetime, timezone
from datetime import date, datetime, timezone
from pathlib import Path
import httpx
@@ -38,6 +41,7 @@ from models.news_source import NewsSource
from models.queue import enqueue_stage
from workers.news_collector import (
FeedError,
_clean_html,
_get_or_create_health,
_record_failure,
_record_success,
@@ -49,17 +53,36 @@ _BASE = "https://apis.data.go.kr/B552468"
_BOARD_EP = f"{_BASE}/disaster_api02/getdisaster_api02"
_ATTACH_EP = f"{_BASE}/disaster_attach_api02/Disaster_attach_api02"
_GUIDE_EP = f"{_BASE}/koshaguide/getKoshaGuide"
_FATAL_EP = f"{_BASE}/news_api02/getNews_api02"
_CASE_SOURCE = "KOSHA 재해사례"
_GUIDE_SOURCE = "KOSHA GUIDE"
_FATAL_SOURCE = "KOSHA 사망사고"
_CASE_PAGES = 2 # daily diff 범위 (30×2 = 최근 60건 — 등록일 역순 API)
_CASE_ROWS = 30
_FATAL_PAGES = 2 # 사망사고 속보 daily diff (30×2 = 최근 60건 — 등록일 역순)
_FATAL_ROWS = 30
_GUIDE_ROWS = 100
_GUIDE_DAILY_CAP = int(os.getenv("KOSHA_GUIDE_DAILY_CAP", "25"))
_MAX_FILE_BYTES = 50 * 1024 * 1024
_DOWNLOAD_DELAY = (2.0, 5.0) # portal.kosha.or.kr 파일서버 — 연속 다운로드 간격
# 안전 자료실 A-2 — KOSHA 산출물 라이선스 (KOGL 유형 미확정 → 보수적 redistribute=False,
# 근거 확보 시 완화. 0-3 license 메타 deterministic 주입).
_KOSHA_LICENSE = {"scheme": "kogl", "redistribute": False, "attribution": "한국산업안전보건공단(KOSHA)"}
def _ymd_to_date(ymd: str | None) -> date | None:
"""'YYYYMMDD'/'YYYY-MM-DD' → date. 형식 불일치는 None (fail-quiet — 날짜는 보조 축)."""
digits = re.sub(r"\D", "", ymd or "")
if len(digits) != 8:
return None
try:
return date(int(digits[:4]), int(digits[4:6]), int(digits[6:8]))
except ValueError:
return None
def _api_key() -> str:
key = os.getenv("KOSHA_API_KEY", "")
@@ -93,6 +116,29 @@ def _items(payload: dict) -> list[dict]:
return [item] if isinstance(item, dict) else list(item)
def _fatal_fields(item: dict) -> dict | None:
"""사망사고 item(arno/keyword/contents 3필드 고정) → Document 필드 매핑.
순수 함수(httpx/DB 불요 fixture 단위 테스트 대상). 필수 = arno+keyword,
부재 None(skip). 날짜 전용 필드가 없어 등록 식별자 arno 접두에서 유도:
arno = 'YYYYMMDDHHMMSS' + 임의 6 (2019~ 라이브 전수 동형 검증). 접두 8자리=KST
등록일 published_date, 14자리=등록시각 reg_dt(원문 그대로, tz 해석 미주장).
"""
arno = str(item.get("arno") or "").strip()
title = (item.get("keyword") or "").strip()
if not arno or not title:
return None
text = _clean_html(item.get("contents") or "", max_len=None)
reg_dt = arno[:14] if re.fullmatch(r"\d{14}", arno[:14]) else None
return {
"arno": arno,
"title": title,
"text": text,
"published_date": _ymd_to_date(arno[:8]),
"reg_dt": reg_dt,
}
def _safe_filename(name: str) -> str:
"""NAS 파일명 정화 — 경로분리자/제어문자/공백연쇄 제거 (쉘 함정 회피)."""
name = re.sub(r"[/\\\x00-\x1f]", "_", name).strip()
@@ -155,7 +201,11 @@ async def _ingest_attachment(session, boardno: str, filenm: str, filepath: str)
import_source="kosha_api",
edit_url=filepath,
ai_tags=["Safety/KOSHA재해사례/첨부"],
extract_meta={"kosha": {"boardno": boardno, "kind": "case_attachment"}},
# 안전 자료실 A-2 — ingest 시점 deterministic (classify 경유해도 LLM 비의존)
material_type="incident",
jurisdiction="KR",
extract_meta={"kosha": {"boardno": boardno, "kind": "case_attachment"},
"license": dict(_KOSHA_LICENSE)},
)
session.add(doc)
await session.flush()
@@ -213,12 +263,16 @@ async def collect_disaster_cases(session) -> int:
ai_domain="Safety",
ai_sub_group=_CASE_SOURCE,
ai_tags=[f"Safety/KOSHA재해사례/{business or '기타'}"],
# 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
material_type="incident",
jurisdiction="KR",
extract_meta={
"source_id": source.id,
"source_name": _CASE_SOURCE,
"published_at": None,
"kosha": {"boardno": boardno, "business": business,
"atcflcnt": item.get("atcflcnt")},
"license": dict(_KOSHA_LICENSE),
},
)
session.add(doc)
@@ -250,6 +304,83 @@ async def collect_disaster_cases(session) -> int:
return new_count
async def collect_fatal_accidents(session) -> int:
"""사망사고 속보 daily diff — 최근 _FATAL_PAGES 페이지, arno dedup.
재해사례(1060) 채널(1040): business 필드·첨부 API 없음, contents=HTML.
본문 = 텍스트 네이티브(_clean_html) md 변환 비대상, summarize/embed/chunk .
"""
key = _api_key()
source = await _get_or_create_source(session, _FATAL_SOURCE, _FATAL_EP)
new_count = 0
for page in range(1, _FATAL_PAGES + 1):
payload = await _api_get(
f"{_FATAL_EP}?serviceKey={key}&callApiId=1040&pageNo={page}&numOfRows={_FATAL_ROWS}"
)
items = _items(payload)
if not items:
break
page_all_dup = True
for item in items:
fields = _fatal_fields(item)
if fields is None:
continue
arno = fields["arno"]
fhash = hashlib.sha256(f"kosha-fatal|{arno}".encode()).hexdigest()[:32]
existing = await session.execute(
select(Document).where(Document.file_hash == fhash).limit(1)
)
if existing.scalars().first():
continue
page_all_dup = False
text = fields["text"]
now = datetime.now(timezone.utc)
doc = Document(
file_path=f"crawl/{_FATAL_SOURCE}/{arno}",
file_hash=fhash,
file_format="article",
file_size=len(text.encode()),
file_type="note",
title=fields["title"],
extracted_text=f"{fields['title']}\n\n{text}",
extracted_at=now,
extractor_version="kosha_api",
md_status="skipped",
md_extraction_error="kosha fatal: 텍스트 네이티브, markdown 변환 비대상",
source_channel="crawl",
data_origin="external",
review_status="approved",
ai_domain="Safety",
ai_sub_group=_FATAL_SOURCE,
ai_tags=["Safety/KOSHA사망사고"],
# 안전 자료실 A-2 — ingest 시점 deterministic (classify-skip 경로)
material_type="incident",
jurisdiction="KR",
published_date=fields["published_date"],
extract_meta={
"source_id": source.id,
"source_name": _FATAL_SOURCE,
"published_at": None,
"kosha": {"arno": arno, "kind": "fatal_accident",
"reg_dt": fields["reg_dt"]},
"license": dict(_KOSHA_LICENSE),
},
)
session.add(doc)
await session.flush()
await enqueue_stage(session, doc.id, "summarize")
await enqueue_stage(session, doc.id, "embed")
await enqueue_stage(session, doc.id, "chunk")
new_count += 1
if page_all_dup:
break # 등록일 역순 — 페이지 전체가 기존이면 이후 페이지도 기존
logger.info(f"[kosha] 사망사고 신규 {new_count}")
return new_count
async def collect_kosha_guide(session, cap: int = _GUIDE_DAILY_CAP) -> int:
"""GUIDE 레지스트리 전체 메타 diff → 신규/개정만 다운로드 (일일 cap 점진 백필)."""
key = _api_key()
@@ -307,8 +438,13 @@ async def collect_kosha_guide(session, cap: int = _GUIDE_DAILY_CAP) -> int:
import_source="kosha_api",
edit_url=spec["url"],
ai_tags=["Safety/KOSHA GUIDE"],
# 안전 자료실 A-2 — GUIDE = 구속력 없는 권고 기술지침 (law 아님, plan 0-1)
material_type="guide",
jurisdiction="KR",
published_date=_ymd_to_date(spec["ymd"]),
extract_meta={"kosha": {"kind": "guide", "techGdlnNo": spec["no"],
"ofancYmd": spec["ymd"]}},
"ofancYmd": spec["ymd"]},
"license": dict(_KOSHA_LICENSE)},
)
session.add(doc)
await session.flush()
@@ -325,6 +461,7 @@ async def run() -> None:
"""daily 1회 — 소스별 실패 격리 (재해사례 실패가 GUIDE 를 막지 않게)."""
now = datetime.now(timezone.utc)
for name, collector in ((_CASE_SOURCE, collect_disaster_cases),
(_FATAL_SOURCE, collect_fatal_accidents),
(_GUIDE_SOURCE, collect_kosha_guide)):
async with async_session() as session:
result = await session.execute(select(NewsSource).where(NewsSource.name == name))
+18 -1
View File
@@ -6,7 +6,7 @@
import os
import re
from datetime import datetime, timezone
from datetime import date, datetime, timezone
from pathlib import Path
from xml.etree import ElementTree as ET
@@ -262,6 +262,16 @@ async def _save_law_split(
f"개정구분: {revision_type}"
)
# 안전 자료실 A-2 — 공포일 파싱 (law published_date = COALESCE(시행일, 공포일) 계약,
# 본 레거시 워커는 공포일만 보유 — 시행일 기반 버전 체인은 B-1 statute_collector 소관)
_digits = re.sub(r"\D", "", str(proclamation_date or ""))
pub_date = None
if len(_digits) == 8:
try:
pub_date = date(int(_digits[:4]), int(_digits[4:6]), int(_digits[6:8]))
except ValueError:
pub_date = None
doc = Document(
file_path=rel_path,
file_hash=file_hash(file_path),
@@ -272,6 +282,13 @@ async def _save_law_split(
source_channel="law_monitor",
data_origin="work",
category="law",
# 안전 자료실 A-2 — ingest 시점 deterministic. 법령 텍스트 = 저작권법 제7조
# 비보호 저작물 (public domain). 본 워커는 휴면(LAW_OC 미설정)이나 코드 경로 유지.
material_type="law",
jurisdiction="KR",
published_date=pub_date,
extract_meta={"license": {"scheme": "public_domain", "redistribute": True,
"attribution": "국가법령정보센터"}},
user_note=note or None,
)
session.add(doc)
+42 -1
View File
@@ -341,11 +341,35 @@ def _entry_body(source: NewsSource, entry, summary: str) -> tuple[str, str]:
def _build_extract_meta(source: NewsSource, pub_dt: datetime) -> dict:
"""fulltext_worker / 패널이 쓰는 출처 메타 (documents 에 source FK 가 없어 여기 기록)."""
return {
meta = {
"source_id": source.id,
"source_name": source.name,
"published_at": pub_dt.isoformat(),
}
# 안전 자료실 A-2: 소스 레지스트리의 라이선스를 deterministic 주입 (0-3 license 메타).
# P3 다이제스트/발행류가 redistribute=false 소스를 구조적으로 제외하는 게이트 입력.
if source.license_scheme:
meta["license"] = {
"scheme": source.license_scheme,
"redistribute": bool(source.license_redistribute),
"attribution": source.name,
}
return meta
def _material_axis(source: NewsSource) -> tuple[str | None, str | None]:
"""안전 자료실 분류 축 (material_type, jurisdiction) — 레지스트리 deterministic.
- material_type = news_sources.material_type (NULL = 비대상, 뉴스/철학 )
- jurisdiction = source.country 전파. paper NULL 강제
(국제 학술지에 관할 개념 부적합 plan 0-1 계약. 레지스트리 country=US 여도 미전파).
"""
mt = source.material_type
if not mt:
return None, None
if mt == "paper":
return mt, None
return mt, source.country
def _doc_identity(source: NewsSource, source_short: str, category: str) -> dict:
@@ -354,17 +378,22 @@ def _doc_identity(source: NewsSource, source_short: str, category: str) -> dict:
file_path 접두사가 채널 디렉토리. ai_domain 다이제스트/검색 필터의 분기 축이라
crawl 채널이 'News' 오염시키지 않게 분리 (0-5 채널 레벨 분리 사상).
"""
material_type, jurisdiction = _material_axis(source)
if source.source_channel == "crawl":
domain = category if category and category != "Other" else "Domain"
return {
"path_prefix": "crawl",
"ai_domain": domain,
"ai_tags": [f"{domain}/{source_short}"],
"material_type": material_type,
"jurisdiction": jurisdiction,
}
return {
"path_prefix": "news",
"ai_domain": "News",
"ai_tags": [f"News/{source_short}/{category}"],
"material_type": material_type,
"jurisdiction": jurisdiction,
}
@@ -528,6 +557,10 @@ async def _fetch_rss(session, source: NewsSource) -> tuple[int, str]:
ai_domain=ident["ai_domain"],
ai_sub_group=source_short,
ai_tags=ident["ai_tags"],
# 안전 자료실 A-2 — 레지스트리 deterministic (classify-skip 경로라 ingest 시점 필수)
material_type=ident["material_type"],
jurisdiction=ident["jurisdiction"],
published_date=pub_dt.date() if pub_dt else None,
extract_meta=_build_extract_meta(source, pub_dt),
)
session.add(doc)
@@ -661,6 +694,10 @@ async def _fetch_api_guardian(session, source: NewsSource) -> tuple[int, str]:
ai_domain=ident["ai_domain"],
ai_sub_group=source_short,
ai_tags=ident["ai_tags"],
# 안전 자료실 A-2 — 레지스트리 deterministic (classify-skip 경로라 ingest 시점 필수)
material_type=ident["material_type"],
jurisdiction=ident["jurisdiction"],
published_date=pub_dt.date() if pub_dt else None,
extract_meta=_build_extract_meta(source, pub_dt),
)
session.add(doc)
@@ -757,6 +794,10 @@ async def _fetch_api_nyt(session, source: NewsSource) -> tuple[int, str]:
ai_domain=ident["ai_domain"],
ai_sub_group=source_short,
ai_tags=ident["ai_tags"],
# 안전 자료실 A-2 — 레지스트리 deterministic (classify-skip 경로라 ingest 시점 필수)
material_type=ident["material_type"],
jurisdiction=ident["jurisdiction"],
published_date=pub_dt.date() if pub_dt else None,
extract_meta=_build_extract_meta(source, pub_dt),
)
session.add(doc)
+43
View File
@@ -0,0 +1,43 @@
"""statute_collector 나라별 어댑터 패키지 (plan safety-library-1 B-1).
어댑터 계약 (2함수 + 상수):
JURISDICTION: str 어댑터 상수 고정. 코어가 적재 직전 assert (파싱 결과 추론 금지).
poll_changes(client, watch_rows) -> list[ChangeEvent] 개정 감지만 (경량 호출).
fetch_version(client, act, change) -> list[VersionPayload] PR②.
payload 리스트: primary + annex 각각 자기 version_key (R4-M4).
ChangeEvent.kind: amend / repeal / bootstrap(합성 PR② 부트스트랩이 amend
동일 ingest 경로 재사용, R6-m2).
"""
from dataclasses import dataclass, field
@dataclass
class ChangeEvent:
"""개정 감지 이벤트 — poll_changes 산출물."""
family_id: str
kind: str # amend / repeal / bootstrap
new_version_key: str # KR = MST (법령일련번호)
title: str
promulgation_date: str | None = None # YYYYMMDD
effective_date: str | None = None # YYYYMMDD (목록 시행일자 — 조문별 차등 시행 주의)
revision_type: str | None = None # 제개정구분명
@dataclass
class VersionPayload:
"""fetch_version 산출물 1건 — primary 또는 annex 각자 자기 version_key (R4-M4).
전문 1 스냅샷 의미론(R7-M3 fixture 판정): 응답에서 primary + annex 전부 생성.
annex version_key = 'MST|{별표번호}-{별표가지번호}' (zero-padded 구조화 필드 그대로
suffix 문자열 파싱 아닌 필드 기반, R7-B1 a 업그레이드).
"""
law_doc_kind: str # primary / annex
version_key: str
title: str
content: str # 조문/별표 markdown 텍스트
promulgation_date: str | None = None # YYYYMMDD (본문 기본정보)
effective_date: str | None = None # YYYYMMDD (본문 기본정보 — 목록값과 다를 수 있음)
annex_label: str | None = None # '별표1' / '별표5의2' (표시용)
meta: dict = field(default_factory=dict)
+213
View File
@@ -0,0 +1,213 @@
"""KR 법령 어댑터 — 국가법령정보센터 (law.go.kr DRF) (plan safety-library-1 B-1 PR①).
poll_changes = lawSearch 목록 diff: 워치리스트 행별 정식 법령명 exact 조회
MST(법령일련번호) != watermark 이면 ChangeEvent. law_monitor 검증된 호출 형태 재사용.
fixture (2026-06-13 라이브 박제, tests/fixtures/statute_kr/):
- lawsearch_*.xml 목록 필드: 법령ID(불변)·법령일련번호(MST)·공포일자·시행일자·제개정구분명
- lawservice_*.xml.gz 전문 1 XML: 조문단위 853(산안기준규칙) + 별표단위 23 전부 포함
= 스냅샷 의미론 확정(R7-M3 : annex 부분 fetch 실패 개념 없음 같은 응답에 없는
별표 = 삭제 간주 가능). 별표번호+별표가지번호 = 구조화 필드(R7-M3 suffix 문자열
파싱 불요, version_key 합성은 필드 기반. PR② fetch_version 소관).
- 조문 취득 방식 판정(R2-m1): 전문 1 + 로컬 파싱 확정 lawjosub 단위 호출이면
산안기준규칙(853) 개정당 호출 폭증. lawjosub fixture 보조 박제.
주의: 응답의 '법령상세링크' 필드에 OC 키가 포함됨 fixture/로그에 raw 응답을 남길
새니타이즈 필수 (repo fixture __OC_REDACTED__ 처리됨).
"""
import asyncio
import os
import xml.etree.ElementTree as ET
import httpx
from core.crawl_politeness import CRAWL_UA
from core.utils import setup_logger
from workers.statute_adapters import ChangeEvent, VersionPayload
logger = setup_logger("statute_kr")
JURISDICTION = "KR"
SOURCE_API = "law.go.kr"
LAW_SEARCH_URL = "https://www.law.go.kr/DRF/lawSearch.do"
LAW_SERVICE_URL = "https://www.law.go.kr/DRF/lawService.do"
# 같은 도메인 연속 호출 간격 (일 1회 x 26콜 — 보수적)
_POLL_DELAY_S = 1.5
def _oc() -> str:
oc = os.getenv("LAW_OC", "")
if not oc:
raise RuntimeError("LAW_OC 미설정 — statute KR 어댑터 사용 불가")
return oc
def parse_search_hit(xml_text: str, official_title: str) -> dict | None:
"""lawSearch XML 에서 정식 법령명 exact match 1건 추출 (순수 함수 — fixture 테스트 대상).
정식명 기준 exact match 워치리스트 title 정식명(가운뎃점 포함)이므로 안전.
(law_monitor 하드코딩 '유해위험작업...'( 없음) 영구 미매칭이던 함정의 교훈:
조회 키는 반드시 레지스트리의 정식명을 쓴다.)
"""
root = ET.fromstring(xml_text)
for law in root.findall(".//law"):
if (law.findtext("법령명한글") or "").strip() != official_title:
continue
mst = (law.findtext("법령일련번호") or "").strip()
if not mst:
continue
return {
"mst": mst,
"law_id": (law.findtext("법령ID") or "").strip(),
"promulgation_date": (law.findtext("공포일자") or "").strip() or None,
"effective_date": (law.findtext("시행일자") or "").strip() or None,
"revision_type": (law.findtext("제개정구분명") or "").strip() or None,
"status_code": (law.findtext("현행연혁코드") or "").strip() or None,
}
return None
def detect_change(hit: dict | None, act_family_id: str, act_title: str,
watermark: str | None) -> ChangeEvent | None:
"""목록 hit + 워터마크 → ChangeEvent (순수 함수 — fixture 테스트 대상).
- hit 없음 = 감지 불가 (None 호출측이 fail-loud 로그. 폐지 단정 금지:
검색 누락/표기 변경 가능성과 구분 불가하므로 repeal 제개정구분명 기준만)
- MST == watermark = 변경 없음
- 제개정구분명에 '폐지' = repeal, = amend
"""
if hit is None:
return None
if watermark and hit["mst"] == watermark:
return None
kind = "repeal" if (hit.get("revision_type") or "").find("폐지") >= 0 else "amend"
return ChangeEvent(
family_id=act_family_id,
kind=kind,
new_version_key=hit["mst"],
title=act_title,
promulgation_date=hit.get("promulgation_date"),
effective_date=hit.get("effective_date"),
revision_type=hit.get("revision_type"),
)
def _article_markdown(art: ET.Element) -> str:
"""조문단위 1건 → 텍스트. 조문내용(이미 '제N조(제목) ...' 형태) + 항/호/목 전체.
메타 필드(조문번호/조문여부/조문시행일자 ) 제외 조문내용과 서브트리만.
"""
parts = []
body = (art.findtext("조문내용") or "").strip()
if body:
parts.append(body)
for hang in art.findall(""):
text = "\n".join(t.strip() for t in hang.itertext() if t.strip())
if text:
parts.append(text)
return "\n".join(parts)
def parse_service_payloads(xml_text: str, official_title: str, mst: str) -> list[VersionPayload]:
"""lawService 전문 XML → VersionPayload 리스트 (순수 함수 — fixture 테스트 대상).
스냅샷 의미론: 응답에 있는 별표가 버전의 별표 전체 (R7-M3 fixture 판정).
- primary 1: 조문 markdown (조문여부 != '조문' = / 헤더 '## ' 처리)
- annex N건: 별표단위별 version_key = 'MST|{별표번호}-{가지번호}' (zero-padded 그대로)
"""
root = ET.fromstring(xml_text)
base = root.find(".//기본정보")
prom = (base.findtext("공포일자") or "").strip() or None if base is not None else None
eff = (base.findtext("시행일자") or "").strip() or None if base is not None else None
lines: list[str] = [f"# {official_title}", ""]
for art in root.findall(".//조문단위"):
is_article = (art.findtext("조문여부") or "").strip() == "조문"
text = _article_markdown(art)
if not text:
continue
if is_article:
lines.append(f"### {text}" if not text.startswith("") else text)
else:
lines.append(f"## {text}")
lines.append("")
primary_content = "\n".join(lines).strip()
payloads = [VersionPayload(
law_doc_kind="primary",
version_key=mst,
title=official_title,
content=primary_content,
promulgation_date=prom,
effective_date=eff,
)]
for annex in root.findall(".//별표단위"):
no = (annex.findtext("별표번호") or "").strip()
sub = (annex.findtext("별표가지번호") or "").strip() or "00"
kind = (annex.findtext("별표구분") or "별표").strip() # 별표 / 서식 — 별도 차원!
a_title = (annex.findtext("별표제목") or "").strip()
a_body = (annex.findtext("별표내용") or "").strip()
if not no:
continue
# 삭제 tombstone — KR 은 별표/서식 삭제가 absence 가 아니라 '삭제 <날짜>' 명시 행
# (fixture 실측: 산안기준규칙 서식1·2). 내용 없는 tombstone 은 적재 skip.
# 시리즈의 구버전 current 잔존 처리 = PR③ 관찰 후보 (absence 추론은 불요 확정).
if a_title.startswith("삭제") and len(a_body) < 50:
continue
label = f"{kind}{int(no)}" + (f"{int(sub)}" if sub not in ("", "0", "00") else "")
payloads.append(VersionPayload(
law_doc_kind="annex",
# 구분 차원 포함 — (번호,가지)만으로는 별표1 vs 서식1 충돌 (fixture 실측)
version_key=f"{mst}|{kind}{no}-{sub}",
title=f"{official_title} {label} {a_title}".strip(),
content=f"# {official_title} {label}\n## {a_title}\n\n{a_body}".strip(),
promulgation_date=prom,
effective_date=eff,
annex_label=label,
))
return payloads
async def fetch_version(client: httpx.AsyncClient, act, change: ChangeEvent) -> list[VersionPayload]:
"""전문 1콜 → payload 리스트 (R2-m1 판정: lawjosub 조 단위 호출 안 함 — 853조 폭증 회피)."""
resp = await client.get(
LAW_SERVICE_URL,
params={"OC": _oc(), "target": "law", "MST": change.new_version_key, "type": "XML"},
headers={"User-Agent": CRAWL_UA},
)
resp.raise_for_status()
payloads = parse_service_payloads(resp.text, act.title, change.new_version_key)
if not payloads or len(payloads[0].content) < 200:
# 파싱 검증 floor — 미달 시 예외 = 워터마크 미영속 (재시도 가능 상태 유지)
raise ValueError(f"전문 파싱 결과 빈약 ({act.family_id}): payloads={len(payloads)}")
return payloads
async def poll_changes(client: httpx.AsyncClient, watch_rows: list) -> list[ChangeEvent]:
"""워치리스트 행별 lawSearch diff. 행 단위 실패 격리 (한 법령 실패가 나머지를 막지 않음)."""
oc = _oc()
events: list[ChangeEvent] = []
for act in watch_rows:
try:
resp = await client.get(
LAW_SEARCH_URL,
params={"OC": oc, "target": "law", "type": "XML", "query": act.title},
headers={"User-Agent": CRAWL_UA},
)
resp.raise_for_status()
hit = parse_search_hit(resp.text, act.title)
if hit is None:
# fail-loud: 정식명 미매칭 = 표기 변경/검색 누락 의심 — 침묵 skip 금지
logger.warning(f"[statute-kr] 목록 미매칭: {act.family_id} {act.title!r}")
else:
ev = detect_change(hit, act.family_id, act.title, act.watermark)
if ev:
events.append(ev)
except Exception as e:
logger.error(f"[statute-kr] poll 실패 ({act.family_id}): {type(e).__name__}: {e!r}")
await asyncio.sleep(_POLL_DELAY_S)
return events
+381
View File
@@ -0,0 +1,381 @@
"""statute_collector — 법령 수집 코어 (plan safety-library-1 B-1, PR②).
구성 ( 코드 통째 R8-B1: 승격과 스윕의 PR 분리 = 배포 이중 노출 윈도):
poll_changes(어댑터) fetch_version(전문 1, payload 리스트) ingest( 버전
pending 적재 + 4 주입) 생애주기 (버전 시리즈 단위 승격·supersede + 상태 기반
레거시 스윕 + repeal 단일 트랜잭션, KST 기준).
핵심 계약 (카드 = 스펙):
- 워터마크 영속 = ingest 파싱 검증 통과 후에만 (실패 다음 폴링이 재감지)
- 승격·supersede 단위 = 버전 시리즈 = (family_id, law_doc_kind, annex 식별자)
R7-B1: family 단위 구현 금지 (annex 승격이 primary 소거하는 본문 소실 경로)
- 레거시 스윕 = 상태 기반: 실행, primary 시리즈 current 보유 + repeal 미감지
family 법령명 매핑 레거시(law_monitor 스냅샷) 청크 in_corpus=false (멱등)
- 매핑 = 정확 일치 가정 금지: title '법령명 (YYYYMMDD)' 패턴에서 법령명 추출
정규화(공백·가운뎃점 변형 흡수) **동등** 비교 prefix 비교 금지 ('산업안전보건법'
'산업안전보건법 시행령' 레거시를 오폭하는 경로 차단)
- ingest 4 (R8-M1): material_type='law' / jurisdiction=어댑터 상수 /
published_date=COALESCE(시행일, 공포일) / license=public_domain(저작권법 제7조)
- 부트스트랩(--bootstrap) = kind='bootstrap' 합성 이벤트, amend 동일 경로 +
extract_meta.backfill=true (E-1 게이트 집계 제외 마커)
- 가시성: source_health 성공/실패 기록 (HC.io 2026-05-30 알림 레이어 폐기로 부재
silent-skip 가드 정신은 crawl-health 보드 + health 행으로 대체)
실행:
스케줄 = daily 07:00 KST (main.py law_monitor 슬롯 승계)
수동 = docker compose exec -T fastapi python -m workers.statute_collector [--bootstrap]
"""
import argparse
import asyncio
import hashlib
import re
import unicodedata
from datetime import date, datetime, timezone
from zoneinfo import ZoneInfo
import httpx
from sqlalchemy import select, update
from core.database import async_session
from core.utils import setup_logger
from models.chunk import DocumentChunk
from models.document import Document
from models.legal_act import LegalAct, LegalMeta
from models.news_source import NewsSource
from models.queue import enqueue_stage
from workers.news_collector import _get_or_create_health, _record_failure, _record_success
from workers.statute_adapters import ChangeEvent, VersionPayload
from workers.statute_adapters import kr
logger = setup_logger("statute_collector")
_KST = ZoneInfo("Asia/Seoul")
_SOURCE_NAME = "KR 법령 (law.go.kr)"
_LICENSE = {"scheme": "public_domain", "redistribute": True, "attribution": "국가법령정보센터"}
_FETCH_DELAY_S = 2.5 # lawService 전문(최대 ~1.3MB) 연속 호출 간격
# jurisdiction → 어댑터 모듈 (Phase 1 = KR 단독, 해외는 B-5 게이트 뒤)
_ADAPTERS = {"KR": kr}
# ─── 법령명 매핑 (R8-m1: 정확 일치 가정 금지 — 변형 흡수 정규화 + 동등 비교) ───
_LEGACY_TITLE_RE = re.compile(r"^(.*?)\s*\((\d{8})\)")
def normalize_law_name(name: str) -> str:
"""공백·가운뎃점 변형 흡수 — NFC 정규화 후 공백/ㆍ·・ 제거."""
s = unicodedata.normalize("NFC", name or "")
return re.sub(r"[\sㆍ·・]", "", s)
def legacy_law_name(title: str) -> str | None:
"""레거시 law_monitor title('법령명 (YYYYMMDD) 섹션')에서 법령명 추출."""
m = _LEGACY_TITLE_RE.match(title or "")
return m.group(1).strip() if m else None
def series_suffix(version_key: str) -> str | None:
"""버전 시리즈의 annex 식별자 — version_key 'MST|NNNN-SS''|' 뒤 (primary=None)."""
return version_key.split("|", 1)[1] if "|" in version_key else None
def _to_date(ymd: str | None) -> date | None:
digits = re.sub(r"\D", "", ymd or "")
if len(digits) != 8:
return None
try:
return date(int(digits[:4]), int(digits[4:6]), int(digits[6:8]))
except ValueError:
return None
# ─── ingest (전 버전 pending 적재 — R2-B2/R3 계약) ──────────────────────────────
async def _ingest_payload(session, act: LegalAct, ev: ChangeEvent,
payload: VersionPayload, backfill: bool) -> bool:
"""payload 1건 → Document + legal_meta(pending). 반환 = 신규 여부 (dedup 멱등)."""
fhash = hashlib.sha256(
f"statute|{act.jurisdiction}|{act.native_id}|{payload.version_key}".encode()
).hexdigest()[:32]
existing = await session.execute(
select(Document.id).where(Document.file_hash == fhash).limit(1)
)
if existing.scalars().first():
return False
prom = _to_date(payload.promulgation_date or ev.promulgation_date)
eff = _to_date(payload.effective_date or ev.effective_date)
now = datetime.now(timezone.utc)
extra = {"backfill": True} if backfill else {}
doc = Document(
file_path=f"crawl/statute/{act.family_id}/{payload.version_key.replace('|', '_')}",
file_hash=fhash,
file_format="article",
file_size=len(payload.content.encode()),
file_type="note",
title=f"{payload.title} ({payload.promulgation_date or ev.promulgation_date or ''})".strip(),
extracted_text=payload.content,
extracted_at=now,
extractor_version="statute_kr@law.go.kr",
md_status="skipped",
md_extraction_error="statute: 텍스트 네이티브, markdown 변환 비대상",
source_channel="crawl",
data_origin="external",
review_status="approved",
ai_domain="법령",
ai_sub_group=act.title,
ai_tags=[f"법령/KR/{act.title}"],
# 안전 자료실 ingest 4축 (R8-M1 — classify-skip 경로라 ingest 시점 필수)
material_type="law",
jurisdiction=kr.JURISDICTION,
published_date=eff or prom,
extract_meta={
"statute": {"family_id": act.family_id, "law_id": act.native_id,
"kind": payload.law_doc_kind, "version_key": payload.version_key,
"annex_label": payload.annex_label,
"event_kind": ev.kind, "revision_type": ev.revision_type},
"license": dict(_LICENSE),
**extra,
},
)
session.add(doc)
await session.flush()
session.add(LegalMeta(
document_id=doc.id,
family_id=act.family_id,
law_doc_kind=payload.law_doc_kind,
version_key=payload.version_key,
promulgation_date=prom,
effective_date=eff,
version_status="pending", # 전 버전 pending 적재 — 승격은 생애주기 잡만
))
# summarize 안 함 (조문 자체가 정본 — 맥미니 부하 0), embed+chunk 만
await enqueue_stage(session, doc.id, "embed")
await enqueue_stage(session, doc.id, "chunk")
return True
# ─── 생애주기 잡 (전이·supersede·스윕·repeal 의 유일한 코드 지점) ────────────────
async def _flip_chunks(session, doc_ids: list[int]) -> int:
if not doc_ids:
return 0
result = await session.execute(
update(DocumentChunk)
.where(DocumentChunk.doc_id.in_(doc_ids), DocumentChunk.in_corpus.is_(True))
.values(in_corpus=False)
)
return result.rowcount or 0
async def _legacy_doc_ids(session, act: LegalAct) -> list[int]:
"""법령명 매핑 레거시(law_monitor) 문서 id — 정규화 동등 비교 (prefix 금지)."""
result = await session.execute(
select(Document.id, Document.title).where(
Document.source_channel == "law_monitor",
Document.deleted_at.is_(None),
)
)
want = normalize_law_name(act.title)
ids = []
for doc_id, title in result.all():
name = legacy_law_name(title or "")
if name and normalize_law_name(name) == want:
ids.append(doc_id)
return ids
async def run_lifecycle(session) -> dict:
"""일 1회 생애주기 잡 — 호출측이 단일 트랜잭션 commit. KST 기준, 멱등."""
today = datetime.now(_KST).date()
stats = {"promoted": 0, "superseded": 0, "repealed": 0,
"legacy_flipped_docs": 0, "legacy_flipped_chunks": 0}
acts_result = await session.execute(select(LegalAct).where(LegalAct.watch.is_(True)))
acts = {a.family_id: a for a in acts_result.scalars().all()}
lm_result = await session.execute(
select(LegalMeta).where(LegalMeta.family_id.in_(list(acts.keys())))
)
metas = lm_result.scalars().all()
# 1) repeal — 마킹된 family: current+pending 전부 repealed + 청크 flip + 레거시 flip (R7-M2)
repeal_families = {fid for fid, a in acts.items() if a.repeal_detected_at is not None}
for fid in repeal_families:
rows = [m for m in metas if m.family_id == fid and m.version_status in ("pending", "current")]
for m in rows:
m.version_status = "repealed"
stats["repealed"] += 1
await _flip_chunks(session, [m.document_id for m in rows])
legacy_ids = await _legacy_doc_ids(session, acts[fid])
stats["legacy_flipped_chunks"] += await _flip_chunks(session, legacy_ids)
# 2) 승격 + supersede — 버전 시리즈 단위 (R7-B1 a: family 단위 금지)
series: dict[tuple, list[LegalMeta]] = {}
for m in metas:
if m.family_id in repeal_families:
continue
series.setdefault(
(m.family_id, m.law_doc_kind, series_suffix(m.version_key)), []
).append(m)
for key, rows in series.items():
due = sorted(
(m for m in rows if m.version_status == "pending"
and (m.effective_date or m.promulgation_date)
and (m.effective_date or m.promulgation_date) <= today),
key=lambda m: (m.effective_date or m.promulgation_date),
)
for m in due:
prev = [c for c in rows if c.version_status == "current" and c is not m]
for c in prev:
c.version_status = "superseded"
stats["superseded"] += 1
await _flip_chunks(session, [c.document_id for c in prev])
m.version_status = "current"
stats["promoted"] += 1
# 3) 레거시 스윕 — 상태 기반 (R6-B1 a / R7-B1 b: primary 시리즈 current 보유 한정)
for fid, act in acts.items():
if fid in repeal_families:
continue
has_primary_current = any(
m.family_id == fid and m.law_doc_kind == "primary" and m.version_status == "current"
for m in metas
)
if not has_primary_current:
continue # R3-B1 ② 내장 — fetch 실패 family 의 레거시 보존
legacy_ids = await _legacy_doc_ids(session, act)
flipped = await _flip_chunks(session, legacy_ids)
if flipped:
stats["legacy_flipped_docs"] += len(legacy_ids)
stats["legacy_flipped_chunks"] += flipped
return stats
# ─── 메인 런 ─────────────────────────────────────────────────────────────────────
async def run(bootstrap: bool = False) -> None:
"""poll → fetch → ingest(가족 단위 커밋) → 생애주기 잡. 가족 단위 실패 격리."""
async with async_session() as session:
result = await session.execute(
select(LegalAct).where(LegalAct.watch.is_(True)).order_by(LegalAct.family_id)
)
rows = result.scalars().all()
if not rows:
logger.warning("[statute] 워치리스트 비어 있음 — 시드(migration 356) 미적용?")
return
source = await _get_source(session)
await session.commit()
source_id = source.id
ingested = 0
failed = 0
by_jur: dict[str, list] = {}
for row in rows:
by_jur.setdefault(row.jurisdiction, []).append(row)
async with httpx.AsyncClient(timeout=60) as client:
for jur, acts in by_jur.items():
adapter = _ADAPTERS.get(jur)
if adapter is None:
logger.warning(f"[statute] 어댑터 없는 jurisdiction skip: {jur}")
continue
assert adapter.JURISDICTION == jur, \
f"어댑터/행 jurisdiction 불일치: {adapter.JURISDICTION} != {jur}"
events = await adapter.poll_changes(client, acts)
acts_by_id = {a.family_id: a for a in acts}
for ev in events:
if bootstrap:
ev.kind = "bootstrap" # 합성 이벤트 — amend 와 동일 경로 (R6-m2)
act_ref = acts_by_id[ev.family_id]
try:
payloads = await adapter.fetch_version(client, act_ref, ev)
async with async_session() as session:
act = await session.get(LegalAct, ev.family_id)
new_docs = 0
for p in payloads:
if await _ingest_payload(session, act, ev, p, backfill=bootstrap):
new_docs += 1
# 워터마크 영속 = 파싱 검증(payload floor) 통과 후에만
act.watermark = ev.new_version_key
if ev.kind == "repeal":
act.repeal_detected_at = datetime.now(timezone.utc)
await session.commit()
ingested += new_docs
logger.info(f"[statute] ingest {ev.family_id} ({ev.kind}): "
f"payload {len(payloads)}건 중 신규 {new_docs}")
except Exception as e:
failed += 1
logger.error(f"[statute] ingest 실패 ({ev.family_id}): "
f"{type(e).__name__}: {e!r} — 워터마크 미영속, 다음 폴링 재감지")
await asyncio.sleep(_FETCH_DELAY_S)
# 생애주기 잡 — 수집 사이클 직후, 단일 트랜잭션 (0-2 ②)
async with async_session() as session:
stats = await run_lifecycle(session)
await session.commit()
logger.info(f"[statute] lifecycle: {stats}")
# health — fail-loud 가시성 (HC.io 폐기로 보드/health 행이 1차 관측면)
async with async_session() as session:
h = await _get_or_create_health(session, source_id)
now = datetime.now(timezone.utc)
if failed:
_record_failure(h, f"ingest 실패 {failed}", now)
else:
_record_success(h, ingested, False, now)
await session.commit()
logger.info(f"[statute] run 완료 — 신규 문서 {ingested}건, 실패 {failed}"
+ (" (bootstrap)" if bootstrap else ""))
async def _get_source(session) -> NewsSource:
result = await session.execute(select(NewsSource).where(NewsSource.name == _SOURCE_NAME))
source = result.scalars().first()
if source is None:
source = NewsSource(
name=_SOURCE_NAME, feed_url=kr.LAW_SEARCH_URL, feed_type="rss",
fetch_method="api", fulltext_policy="none", source_channel="crawl",
category="Safety", language="ko", country="KR",
enabled=False, # 6h 뉴스 사이클 비대상 — 본 워커가 daily 폴링
)
session.add(source)
await session.flush()
return source
async def poll_once() -> int:
"""관찰 전용 폴링 (PR① 잔존 CLI — 상태 변경 0)."""
async with async_session() as session:
result = await session.execute(
select(LegalAct).where(LegalAct.watch.is_(True)).order_by(LegalAct.family_id)
)
rows = result.scalars().all()
total = 0
async with httpx.AsyncClient(timeout=30) as client:
events = await kr.poll_changes(client, [r for r in rows if r.jurisdiction == "KR"])
for ev in events:
logger.info(f"[statute] 변경 감지 ({ev.kind}): {ev.family_id} {ev.title} "
f"MST={ev.new_version_key}")
total = len(events)
logger.info(f"[statute] poll 완료 — 변경 {total}건 (관찰 전용)")
return total
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--bootstrap", action="store_true",
help="26 family 현행판 1회 부트스트랩 (backfill 마커, R4-M1)")
parser.add_argument("--poll-only", action="store_true", help="관찰 전용 폴링")
args = parser.parse_args()
if args.poll_only:
asyncio.run(poll_once())
else:
asyncio.run(run(bootstrap=args.bootstrap))
@@ -1,16 +1,18 @@
<script lang="ts">
// 처리 머신 보드 v2파이프라인 흐름 뷰 (plan ds-board-engines-1, R2 통합안).
// 메인 = 좌→우 흐름 노드(병목 amber·실패 뱃지), 노드 클릭 = 상세 패널(안1 변형),
// 실패 뱃지 클릭 = 실패 처리 드로어 (재시도/건너뛰기 — 영구 실패의 유일한 조치 경로).
// 데이터 = GET /api/queue/overview (60s 폴링 store) + GET /api/queue/failed (드로어 열 때).
// 처리 머신 보드 v3통합안 (plan ds-board-merged: C2 머신레인 + C3 번다운/정직ETA).
// · 머신 3레인(GPU/맥미니/맥북) = "누가 일하나" + 요약 오프로드(맥북 합류) 가시화
// · 지배 백로그 번다운 패널 = "언제 끝나나" + 유입 차감한 정직 ETA(summarize_eta)
// · 신선도 '갱신 N초 전' + stale 경고 / 실패 드로어·상세 패널은 v2 자산 재사용.
// 데이터 = GET /api/queue/overview (60s 폴링 store) + GET /api/queue/failed (드로어).
import { api } from '$lib/api';
import { refreshQueueOverview } from '$lib/stores/queueOverview';
import { refreshQueueOverview, queueUpdatedAt } from '$lib/stores/queueOverview';
import { addToast } from '$lib/stores/toast';
import {
AUX_NODES,
FLOW_NODES,
MACHINE_META,
type FlowNodeDef,
type FlowMachine,
etaShort,
flowStageLabel,
formatAgeSec,
@@ -20,6 +22,7 @@
FailedItem,
FailedListResponse,
MachineCurrentItem,
MachineOverview,
QueueOverview,
QueueStageRow,
RetryResponse,
@@ -82,14 +85,6 @@
);
const totalFailed = $derived(overview.totals.failed);
// 머신 스트립 — overview.machines 의 state/처리율 + 정적 모델 메타
const machineStrip = $derived(
overview.machines.map((m) => ({
...m,
meta: MACHINE_META[m.key],
})),
);
// ─── 선택 상태 (노드 상세 / 실패 드로어 — 동시에 하나만) ───
let selected = $state<string | null>(null);
let failOpen = $state(false);
@@ -194,22 +189,105 @@
await Promise.all([loadFailures(), refreshQueueOverview()]);
}
// ─── trend_24h 스파크라인 (summarize 유입 vs 소화 — API 가 주는데 미렌더이던 슬롯) ───
const spark = $derived.by(() => {
// ─── 머신 레인 (C2) — mainNodes 를 머신별로 그룹 + 머신 카드(state/처리율) 결합 ───
const machineByKey = $derived(
new Map<FlowMachine, MachineOverview>(overview.machines.map((m) => [m.key as FlowMachine, m])),
);
const LANE_ORDER: FlowMachine[] = ['gpu', 'macmini', 'macbook'];
const lanes = $derived(
LANE_ORDER.map((key) => ({
key,
meta: MACHINE_META[key],
card: machineByKey.get(key) ?? null,
nodes: mainNodes.filter((n) => n.def.machine === key),
})),
);
// 요약 오프로드 분담 — 맥미니 vs 맥북 (A-1 summarize_by_machine)
const split = $derived(overview.summarize_by_machine);
const splitTotal1h = $derived(Math.max(1, split.macmini.done_1h + split.macbook.done_1h));
const macbookSharePct = $derived(Math.round((split.macbook.done_1h / splitTotal1h) * 100));
// 맥북이 요약을 실제로 가져가는 중인가 (합류 표식 게이트)
const offloadActive = $derived(split.macbook.done_1h > 0);
// ─── 지배 백로그 = 요약. 정직 ETA(유입 차감) — summarize_eta ───
const eta = $derived(overview.summarize_eta);
// 정직 ETA 라벨: eta_minutes null = 유입이 소화를 앞섬(소진 불가)
const honestEtaLabel = $derived(
eta.pending === 0
? '비어 있음'
: eta.eta_minutes != null
? etaShort(eta.eta_minutes)
: '소진 불가',
);
const honestEtaWarn = $derived(eta.pending > 0 && eta.eta_minutes == null);
/** 단계별 정직 ETA(순소화율) — 노드용. 유입>소화면 null(소진 불가) */
function netEtaLabel(n: NodeStats): string | null {
if (n.pending === 0) return '한가';
const net = n.done1h - n.created1h;
if (net > 0) return etaShort(Math.round((n.pending / net) * 60));
if (n.created1h > n.done1h) return '유입 우세';
return null;
}
// ─── 신선도 (B-4) — '갱신 N초 전' + stale 경고 (폴링 60s) ───
let now = $state(Date.now());
$effect(() => {
const id = setInterval(() => (now = Date.now()), 1000);
return () => clearInterval(id);
});
const ageSec = $derived(
$queueUpdatedAt != null ? Math.max(0, Math.round((now - $queueUpdatedAt) / 1000)) : null,
);
const stale = $derived(ageSec != null && ageSec > 90);
const freshLabel = $derived(
ageSec == null
? '갱신 대기'
: ageSec < 60
? `갱신 ${ageSec}초 전`
: `갱신 ${Math.round(ageSec / 60)}분 전`,
);
// ─── 24h 번다운 (C3) — 요약 유입 vs 소화 + 맥북 합류 변곡점 마커 ───
const burn = $derived.by(() => {
const t = overview.trend_24h;
if (!t || t.length === 0) return null;
const max = Math.max(1, ...t.map((b) => Math.max(b.inflow, b.done)));
const w = 120;
const h = 24;
const w = 300;
const h = 64;
const step = w / Math.max(1, t.length - 1);
const pts = (sel: (b: (typeof t)[number]) => number) =>
t.map((b, i) => `${(i * step).toFixed(1)},${(h - (sel(b) / max) * (h - 3) + 1).toFixed(1)}`).join(' ');
return { inflow: pts((b) => b.inflow), done: pts((b) => b.done) };
const y = (v: number) => (h - (v / max) * (h - 8) + 4).toFixed(1);
const line = (sel: (b: (typeof t)[number]) => number) =>
t.map((b, i) => `${(i * step).toFixed(1)},${y(sel(b))}`).join(' ');
const doneLine = line((b) => b.done);
const area = `0,${h} ${doneLine} ${w.toFixed(1)},${h}`;
// 합류 변곡점 = done 최대 버킷 (맥북 야간 drain 합류 추정)
let mi = 0;
t.forEach((b, i) => {
if (b.done > t[mi].done) mi = i;
});
return {
w,
h,
area,
doneLine,
inflowLine: line((b) => b.inflow),
markX: (mi * step).toFixed(1),
markHour: t[mi].hour,
markDone: t[mi].done,
peak: max,
};
});
// 머신 상태 dot 색 클래스
function dotClass(state: string): string {
return state === 'active' ? 'bg-success' : state === 'deferred' ? 'bg-warning' : 'bg-faint';
}
</script>
<div class="mt-5">
<!-- 헤더: 타이틀 + 요약 24h 스파크라인 + 실패 합계 -->
<!-- 헤더: 타이틀 + 신선도 + 실패 합계 -->
<div class="flex items-center justify-between gap-3 mb-3">
<div class="text-[11px] font-bold text-dim uppercase tracking-wider">처리 머신</div>
<div class="flex items-center gap-3">
@@ -219,80 +297,107 @@
onclick={openFailures}
>실패 {totalFailed}건 처리</button>
{/if}
{#if spark}
<div class="flex items-center gap-2 text-[10px] text-faint tabular-nums" title="요약(summarize) 단계 24시간 — 유입(회색) vs 소화(녹색)">
<svg width="120" height="24" viewBox="0 0 120 24" class="block">
<polyline points={spark.inflow} fill="none" stroke="currentColor" stroke-width="1.5" class="text-faint" />
<polyline points={spark.done} fill="none" stroke="currentColor" stroke-width="1.5" class="text-success" />
</svg>
<span>요약 24h 유입/소화</span>
</div>
{/if}
<span class="flex items-center gap-1.5 text-[10px] tabular-nums {stale ? 'text-warning' : 'text-faint'}" title="60초 폴링">
<span class="w-1.5 h-1.5 rounded-full {stale ? 'bg-warning' : 'bg-success'}"></span>
{freshLabel}{#if stale} · 갱신 지연{/if}
</span>
</div>
</div>
<!-- 머신 스트립 -->
<div class="flex flex-wrap gap-2 mb-3">
{#each machineStrip as m (m.key)}
<div class="flex items-center gap-2 bg-surface border border-default rounded-full px-3.5 py-1.5 text-xs">
<span class="w-2 h-2 rounded-full shrink-0 {m.state === 'active' ? 'bg-success' : m.state === 'deferred' ? 'bg-warning' : 'bg-faint'}"></span>
<span class="font-bold text-text">{m.meta?.label ?? m.label}</span>
<span class="text-[10px] text-faint font-mono">{m.meta?.model}</span>
<span class="text-[11px] text-dim tabular-nums">{formatRate(m.done_1h)}/h</span>
{#if m.key === 'macbook' && m.deferred_pending > 0}
<span class="text-[10px] font-semibold text-warning tabular-nums">보류 {m.deferred_pending}</span>
{/if}
<!-- 지배 백로그 스트립 (요약) + 정직 ETA -->
<div class="flex items-center flex-wrap gap-x-3 gap-y-1 bg-surface border border-warning/50 rounded-card px-3.5 py-2 mb-3">
<span class="text-[9px] font-bold text-warning border border-warning/60 rounded-full px-2 py-px">지배 백로그</span>
<span class="text-xs font-bold text-text">요약</span>
<span class="text-[11px] text-dim tabular-nums">대기 <b class="text-text">{eta.pending.toLocaleString()}</b> · 순소화 <b class="text-text">{formatRate(eta.done_rate_1h)}</b>/h · 유입 {formatRate(eta.inflow_rate_1h)}/h</span>
<span class="ml-auto flex items-center gap-1.5 border rounded-full px-2.5 py-0.5 {honestEtaWarn ? 'border-warning text-warning' : 'border-accent text-accent'}">
<span class="text-[10px] font-semibold">정직 ETA</span>
<span class="text-xs font-bold tabular-nums">{honestEtaLabel}</span>
</span>
</div>
<!-- 머신 레인 (누가 일하나 + 요약 오프로드) -->
<div class="grid gap-2 mb-3">
{#each lanes as lane (lane.key)}
<div class="bg-surface border border-default rounded-card px-3.5 py-2.5">
<div class="flex items-center gap-2 flex-wrap mb-2">
<span class="w-2 h-2 rounded-full shrink-0 {dotClass(lane.card?.state ?? 'idle')}"></span>
<span class="text-[9px] font-bold rounded px-1.5 py-px mtag-{lane.key}">{lane.meta.label}</span>
<span class="text-[10px] text-faint font-mono">{lane.meta.model}</span>
<span class="text-[11px] text-dim tabular-nums ml-1">{formatRate(lane.card?.done_1h ?? 0)}/h</span>
{#if lane.key === 'macbook' && (lane.card?.deferred_pending ?? 0) > 0}
<span class="text-[10px] font-semibold text-warning tabular-nums">보류 {lane.card?.deferred_pending}</span>
{/if}
{#if lane.card?.state === 'deferred'}
<span class="text-[9px] text-warning">잠듦 — 요약은 맥미니로 복귀</span>
{/if}
</div>
<div class="flex items-stretch gap-1.5 flex-wrap">
{#each lane.nodes as n (n.def.key)}
{@const idle = n.pending + n.processing + n.doneToday + n.failed === 0}
<button
class="relative text-left rounded-lg border px-2.5 py-1.5 transition-colors cursor-pointer hover:bg-surface-hover min-w-[96px]
{idle ? 'border-dashed border-default opacity-55' : n.inflowDominant ? 'border-warning' : 'border-default'}
{selected === n.def.key ? 'node-sel' : ''}"
onclick={() => toggleNode(n.def.key)}
title="{n.def.label} — 클릭하면 상세"
>
{#if n.failed > 0}
<span class="absolute -top-1.5 -right-1 text-[9px] font-extrabold bg-error text-white rounded-full px-1.5">{n.failed}</span>
{/if}
<div class="flex items-center gap-1 text-[11px] font-semibold text-text whitespace-nowrap">
{n.def.label}
{#if n.processing > 0}<span class="inline-block w-1.5 h-1.5 rounded-full bg-accent animate-pulse"></span>{/if}
</div>
<div class="text-sm font-extrabold tabular-nums leading-tight text-text">{n.pending.toLocaleString()}<span class="text-[9px] text-faint font-normal ml-0.5">대기</span></div>
<div class="text-[9px] text-dim tabular-nums whitespace-nowrap">{formatRate(n.done1h)}/h · 오늘 {n.doneToday.toLocaleString()}</div>
{#if n.def.key === 'summarize'}
<div class="mt-1 h-1 w-full rounded-full overflow-hidden flex" title="맥미니 {split.macmini.done_1h}/h · 맥북 {split.macbook.done_1h}/h">
<span class="block h-full mtag-macmini-bar" style="width:{100 - macbookSharePct}%"></span>
<span class="block h-full mtag-macbook-bar" style="width:{macbookSharePct}%"></span>
</div>
<div class="text-[9px] text-faint tabular-nums whitespace-nowrap mt-0.5">맥미니 {split.macmini.done_1h} · 맥북 {split.macbook.done_1h}/h</div>
{/if}
</button>
{/each}
{#if lane.key === 'macbook' && offloadActive}
<button
class="text-left rounded-lg border border-dashed border-warning/50 px-2.5 py-1.5 cursor-pointer hover:bg-surface-hover min-w-[96px]"
onclick={() => toggleNode('summarize')}
title="맥북이 요약을 맥미니에서 가져와 처리 중"
>
<div class="flex items-center gap-1 text-[11px] font-semibold text-text whitespace-nowrap">요약 합류 <span class="text-[8px] font-bold text-warning">OFFLOAD</span></div>
<div class="text-sm font-extrabold tabular-nums leading-tight text-text">{split.macbook.done_1h}<span class="text-[9px] text-faint font-normal ml-0.5">/h</span></div>
<div class="text-[9px] text-dim tabular-nums whitespace-nowrap">요약의 {macbookSharePct}% 담당</div>
</button>
{/if}
</div>
</div>
{/each}
</div>
<!-- 흐름 노드 -->
<div class="flex items-stretch overflow-x-auto pb-1">
{#each mainNodes as n, i (n.def.key)}
{#if i > 0}
<div class="flex items-center text-faint text-sm px-1.5 shrink-0" aria-hidden="true"></div>
{/if}
<div
class="relative bg-surface border-[1.5px] rounded-card px-3 py-2.5 min-w-[124px] shrink-0 text-left transition-colors cursor-pointer hover:bg-surface-hover
{n.inflowDominant ? 'border-warning' : n.etaMinutes != null && n.def.stages.includes('chunk') ? 'border-success' : 'border-default'}
{selected === n.def.key ? 'node-sel' : ''}"
role="button"
tabindex="0"
onclick={() => toggleNode(n.def.key)}
onkeydown={(e) => { if (e.key === 'Enter' || e.key === ' ') { e.preventDefault(); toggleNode(n.def.key); } }}
title="{n.def.label} — 클릭하면 상세"
>
{#if n.failed > 0}
<button
class="absolute -top-2 -right-1.5 text-[9px] font-extrabold bg-error text-white rounded-full px-1.5 py-px shadow cursor-pointer"
onclick={(e) => { e.stopPropagation(); openFailures(); }}
title="실패 {n.failed}건 — 클릭하면 실패 처리"
>{n.failed}</button>
{/if}
<span class="inline-block text-[9px] font-bold rounded px-1.5 py-px mb-1.5 mtag-{n.def.machine}">
{MACHINE_META[n.def.machine].label} · {n.def.engine}
</span>
<div class="text-xs font-bold text-text flex items-center gap-1.5">
{n.def.label}
{#if n.processing > 0}
<span class="inline-block w-1.5 h-1.5 rounded-full bg-accent animate-pulse" title="처리 중 {n.processing}"></span>
{/if}
{#if n.inflowDominant}
<span class="text-[9px] font-bold text-warning">유입 우세</span>
{/if}
</div>
<div class="text-base font-extrabold tabular-nums tracking-tight leading-tight mt-0.5 text-text">
{n.pending.toLocaleString()}
</div>
<div class="text-[10px] text-dim tabular-nums">
{formatRate(n.done1h)}/h · 오늘 {n.doneToday.toLocaleString()}
{#if n.etaMinutes != null && !n.inflowDominant && n.pending > 0}
· <span class="text-accent font-semibold">{etaShort(n.etaMinutes)}</span>
{/if}
</div>
<!-- 번다운 / ETA 패널 -->
{#if burn}
<div class="bg-surface border border-default rounded-card px-3.5 py-3 mb-1">
<div class="flex items-center gap-2 mb-2">
<span class="text-[11px] font-bold text-text">요약 백로그 24시간</span>
<span class="text-[9px] text-faint">유입(회색) vs 소화(녹색)</span>
{#if offloadActive}<span class="text-[9px] text-warning ml-auto">맥북 합류 {burn.markHour} — 소화 급증</span>{/if}
</div>
{/each}
</div>
<svg viewBox="0 0 {burn.w} {burn.h}" class="block w-full" style="height:64px" preserveAspectRatio="none" role="img" aria-label="요약 백로그 24시간 번다운">
<polygon points={burn.area} fill="currentColor" class="text-success" opacity="0.12" />
<polyline points={burn.inflowLine} fill="none" stroke="currentColor" stroke-width="1.2" class="text-faint" />
<polyline points={burn.doneLine} fill="none" stroke="currentColor" stroke-width="1.6" class="text-success" />
{#if offloadActive}
<line x1={burn.markX} y1="0" x2={burn.markX} y2={burn.h} stroke="currentColor" stroke-width="1" stroke-dasharray="2 2" class="text-warning" opacity="0.7" />
{/if}
</svg>
<div class="flex flex-wrap gap-x-4 gap-y-1 mt-2 pt-2 border-t border-default text-[10px] text-dim tabular-nums">
{#each mainNodes.filter((n) => n.pending > 0 && n.def.key !== 'summarize') as n (n.def.key)}
<span class="whitespace-nowrap">{n.def.label} 대기 <b class="text-text">{n.pending.toLocaleString()}</b>{#if netEtaLabel(n)} · <span class="text-accent font-semibold">{netEtaLabel(n)}</span>{/if}</span>
{/each}
</div>
</div>
{/if}
<!-- 보조 라인 -->
<p class="text-[10px] text-faint mt-1.5 tabular-nums">
@@ -413,6 +518,9 @@
.mtag-gpu { background: #e7eef6; color: #3b6ea5; }
.mtag-macmini { background: #efe9f7; color: #8a5fbf; }
.mtag-macbook { background: #f7eedd; color: #b07a10; }
/* 요약 오프로드 분담 막대 채움 (맥미니 보라 / 맥북 황) */
.mtag-macmini-bar { background: #8a5fbf; }
.mtag-macbook-bar { background: #b07a10; }
.node-sel { outline: 2px solid #3b6ea5; outline-offset: 1px; }
.detail-frame { border-color: #3b6ea5; }
.detail-head { background: #e7eef6; }
@@ -43,14 +43,17 @@
{@const open = selectedId === s.chunk_id}
{@const active = activeKey != null && activeKey === s.chunk_id}
{@const typeLabel = sectionTypeLabel(s.section_type)}
{@const depth = Math.max(0, (s.level ?? 1) - 1)}
<li>
<button
type="button"
onclick={() => { toggle(item); onJump?.(s.chunk_id); }}
aria-expanded={open}
aria-current={active ? 'true' : undefined}
style="padding-left:{8 + depth * 13}px"
class={[
'w-full text-left px-2 py-1.5 rounded-md text-xs flex items-start gap-1.5 transition-colors border-l-2',
'w-full text-left pr-2 py-1.5 rounded-md text-xs flex items-start gap-1.5 transition-colors border-l-2',
depth > 0 ? 'text-[11px]' : '',
open ? 'bg-surface-active text-text border-accent' : active ? 'bg-surface text-accent-hover border-accent' : 'text-dim hover:bg-surface hover:text-text border-transparent',
].join(' ')}
>
+8 -1
View File
@@ -17,6 +17,11 @@ let pollHandle: ReturnType<typeof setInterval> | null = null;
let subscriberCount = 0;
let inFlight: Promise<void> | null = null;
// 마지막 성공 갱신 시각(epoch ms) — 보드 신선도 '갱신 N초 전' + stale 경고용
// (ds-board-merged B-4). 실패(null 수렴) 시엔 갱신 안 함 → age 가 늘어 stale 로 드러남.
const updatedAt = writable<number | null>(null);
export const queueUpdatedAt = { subscribe: updatedAt.subscribe };
const internal = writable<QueueOverview | null>(null, (_set) => {
subscriberCount += 1;
if (subscriberCount === 1 && browser) {
@@ -54,7 +59,9 @@ export async function refreshQueueOverview(): Promise<void> {
if (inFlight) return inFlight;
inFlight = (async () => {
try {
internal.set(await fetchOverview());
const ov = await fetchOverview();
internal.set(ov);
if (ov) updatedAt.set(Date.now()); // 성공 시에만 신선도 갱신 (실패=stale 유지)
} finally {
inFlight = null;
}
+8 -1
View File
@@ -43,13 +43,19 @@ export interface SummarizeEta {
eta_minutes: number | null;
}
/** 시간당 유입 vs 소화 (이번 트랙 미렌더 — 후속 추세 위젯 슬롯) */
/** 시간당 유입 vs 소화 (요약 24h 추이) */
export interface TrendPoint {
hour: string;
inflow: number;
done: number;
}
/** summarize 머신별 완료 실적 분담 (오프로드 가시화 — ds-board-merged A-1) */
export interface SummarizeByMachine {
macmini: { done_1h: number; done_today: number };
macbook: { done_1h: number; done_today: number };
}
export interface QueueTotals {
pending: number;
processing: number;
@@ -72,6 +78,7 @@ export interface QueueStageRow {
export interface QueueOverview {
machines: MachineOverview[];
summarize_eta: SummarizeEta;
summarize_by_machine: SummarizeByMachine;
trend_24h: TrendPoint[];
stages: QueueStageRow[];
totals: QueueTotals;
+5 -1
View File
@@ -37,10 +37,14 @@ export function etaPhrase(minutes: number): string {
return `${text}시간 후 소진 예상`;
}
/** ETA 분 → 칩용 짧은 표기 ("약 4.6시간" / "약 12분") */
/** ETA 분 → 칩용 짧은 표기 ("약 12분" / "약 4.6시간" / 48h+ = "약 5.5일") */
export function etaShort(minutes: number): string {
if (minutes < 60) return `${Math.max(1, Math.round(minutes))}`;
const hours = minutes / 60;
if (hours >= 48) {
const days = hours / 24;
return `${days >= 10 ? Math.round(days) : Math.round(days * 10) / 10}`;
}
const text = hours >= 10 ? String(Math.round(hours)) : String(Math.round(hours * 10) / 10);
return `${text}시간`;
}
+1 -115
View File
@@ -19,7 +19,7 @@
import EmptyState from '$lib/components/ui/EmptyState.svelte';
import Skeleton from '$lib/components/ui/Skeleton.svelte';
import {
Scale, FileText, Pin, ChevronRight, GraduationCap, Upload, Newspaper,
Scale, FileText, Pin, GraduationCap, Upload, Newspaper,
} from 'lucide-svelte';
import { addToast } from '$lib/stores/toast';
@@ -133,17 +133,6 @@
// 백엔드 미배포/실패 시 store=null → 보드 자체가 조용히 생략 (silent 비차단).
let queue = $derived<QueueOverview | null>($queueOverview);
// 머신 담당 단계 라벨 — STAGE_LABEL 재사용 + overview 전용 단계 보강
// (backend services/queue_overview.py _STAGE_ORDER 와 동기), 미지 키는 raw
const QUEUE_STAGE_LABEL: Record<string, string> = {
...STAGE_LABEL,
summarize: '요약', chunk: '청크', markdown: '마크다운',
fulltext: '전문', deep_summary: '심층분석',
};
function queueStageLabel(stage: string): string {
return QUEUE_STAGE_LABEL[stage] ?? stage;
}
onMount(() => {
void refreshQueueOverview();
const handle = setInterval(() => void refreshQueueOverview(), 30_000);
@@ -191,35 +180,10 @@
let pipelineRows = $derived(
summary ? buildPipelineRows(summary.pipeline_status, summary.queue_lag ?? []) : []
);
let pipelineMax = $derived(Math.max(1, ...pipelineRows.map((r) => r.total)));
let totalFailed = $derived(summary?.failed_count ?? 0);
let totalPending = $derived(pipelineRows.reduce((s, r) => s + r.pending, 0));
let totalProcessing = $derived(pipelineRows.reduce((s, r) => s + r.processing, 0));
let pipelineManualClosed = $state(false);
let pipelineOpen = $derived(
pipelineManualClosed ? false : (queue?.totals.failed ?? totalFailed) > 0
);
// 단계별 현황 (2026-06-11 피드백 재설계: 완료가 보여야 한다 — overview.stages 단일 소스)
// active = 오늘 움직임이 있는 단계만, idle = 전부 0 인 단계는 한 줄로 숨김.
let stageRows = $derived(queue?.stages ?? []);
let activeStageRows = $derived(
stageRows.filter((r) => r.pending + r.processing + r.failed + r.done_today > 0)
);
let idleStageRows = $derived(
stageRows.filter((r) => r.pending + r.processing + r.failed + r.done_today === 0)
);
let stageDoneToday = $derived(stageRows.reduce((s, r) => s + r.done_today, 0));
function formatAge(sec: number | null): string {
if (sec == null || sec <= 0) return '';
if (sec < 60) return `${sec} `;
if (sec < 3600) return `${Math.floor(sec / 60)} `;
if (sec < 86400) return `${Math.floor(sec / 3600)}시간 `;
return `${Math.floor(sec / 86400)}일 전`;
}
function formatTime(dateStr: string) {
const d = new Date(dateStr);
if (isNaN(d.getTime())) return ''; // 빈 문자열/유효하지 않은 created_at → 'Invalid Date' 회피
@@ -463,80 +427,6 @@
<ProcessingFlowBoard overview={queue} />
{/if}
<!-- ═══ 단계 상세 (기존 stage 테이블 — 접힘 강등, 실패 있을 때 자동 펼침) ═══ -->
<details
class="mt-5"
open={pipelineOpen}
ontoggle={(e) => { if (!e.currentTarget.open) pipelineManualClosed = true; }}
>
<summary class="flex items-center justify-between px-5 py-3.5 bg-surface border border-default rounded-card cursor-pointer hover:bg-surface-hover transition-colors select-none list-none">
<span class="text-sm font-semibold text-text flex items-center gap-2">
<ChevronRight size={14} class="transition-transform details-chevron" />
단계별 현황
</span>
<span class="text-xs text-dim flex items-center gap-2.5">
{#if queue}
{#if stageDoneToday > 0}<span class="text-success">오늘 {stageDoneToday.toLocaleString()} 완료</span>{/if}
{#if queue.totals.failed > 0}<span class="text-error font-medium">실패 {queue.totals.failed}</span>{/if}
{#if queue.totals.pending > 0}<span>대기 {queue.totals.pending.toLocaleString()}</span>{/if}
{#if stageDoneToday === 0 && queue.totals.failed === 0 && queue.totals.pending === 0}<span>모든 단계 한가함</span>{/if}
{:else}
{#if totalFailed > 0}<span class="text-error font-medium">실패 {totalFailed}</span>{/if}
{#if totalPending > 0}<span>대기 {totalPending}</span>{/if}
{/if}
</span>
</summary>
<div class="mt-2 px-5 py-4 bg-surface border border-default rounded-card">
{#if queue}
{#if activeStageRows.length > 0}
<div class="space-y-3.5">
{#each activeStageRows as row (row.stage)}
{@const total = row.done_today + row.pending + row.processing}
{@const donePct = total > 0 ? (row.done_today / total) * 100 : 0}
{@const procPct = total > 0 ? (row.processing / total) * 100 : 0}
<div>
<div class="flex items-baseline justify-between text-xs mb-1.5 gap-2">
<span class="font-medium text-text flex items-center gap-1.5 whitespace-nowrap">
{queueStageLabel(row.stage)}
{#if row.processing > 0}
<span class="inline-block w-1.5 h-1.5 rounded-full bg-accent animate-pulse"></span>
<span class="text-accent font-normal">처리 중 {row.processing}</span>
{/if}
</span>
<span class="text-dim tabular-nums flex items-center gap-2.5 whitespace-nowrap">
{#if row.done_today > 0}<span class="text-success">오늘 {row.done_today.toLocaleString()} 완료</span>{/if}
{#if row.pending > 0}<span>대기 {row.pending.toLocaleString()}</span>{/if}
{#if row.failed > 0}<span class="text-error font-medium">실패 {row.failed}</span>{/if}
</span>
</div>
<!-- 게이지 = 이 단계의 오늘 진척 (완료 / 완료+대기) — 가득 찬 초록 = 다 끝남 -->
<div class="flex h-1.5 w-full overflow-hidden rounded-sm bg-bg" title="오늘 완료 {row.done_today.toLocaleString()} / 잔여 {row.pending.toLocaleString()}">
{#if donePct > 0}<div class="bg-success/70 h-full" style="width: {donePct}%"></div>{/if}
{#if procPct > 0}<div class="bg-accent h-full" style="width: {Math.max(procPct, 1)}%"></div>{/if}
</div>
{#if row.pending > 0 && row.oldest_pending_age_sec && row.oldest_pending_age_sec > 600}
<p class="text-[10px] mt-1 tabular-nums {row.oldest_pending_age_sec > 21600 ? 'text-warning' : 'text-faint'}">
가장 오래 기다린 항목 {formatAge(row.oldest_pending_age_sec)}
</p>
{/if}
</div>
{/each}
</div>
{:else}
<p class="text-xs text-dim text-center py-3">대기·처리·실패 없음 — 모든 단계가 한가합니다</p>
{/if}
{#if idleStageRows.length > 0}
<p class="text-[11px] text-faint mt-4 pt-3 border-t border-default">
비어 있음: {idleStageRows.map((r) => queueStageLabel(r.stage)).join(' · ')}
</p>
{/if}
{:else}
<p class="text-xs text-dim text-center py-3">현황을 불러오지 못했습니다</p>
{/if}
</div>
</details>
{/if}
</div>
</div>
@@ -548,7 +438,3 @@
</div>
{/snippet}
<style>
details[open] :global(.details-chevron) { transform: rotate(90deg); }
details summary::-webkit-details-marker { display: none; }
</style>
+62 -48
View File
@@ -30,7 +30,6 @@
import AnalysisPanel from '$lib/components/AnalysisPanel.svelte';
import ReadCounter from '$lib/components/ReadCounter.svelte';
import SectionOutline from '$lib/components/SectionOutline.svelte';
import Tabs from '$lib/components/ui/Tabs.svelte';
marked.use({ mangle: false, headerIds: false });
function renderMd(text) {
@@ -460,53 +459,68 @@
{/if}
</div>
<!-- 오른쪽 — 메타 Tabs [정보 | AI | 관리] (카드 11개 수직 스프롤 해소) -->
<aside class="min-w-0">
<Card>
<Tabs
tabs={[
{ id: 'info', label: '정보' },
{ id: 'ai', label: 'AI' },
{ id: 'manage', label: '관리' },
]}
>
{#snippet children(tab)}
<div class="pt-3 space-y-4">
{#if tab === 'info'}
{#if doc.category === 'library'}
<ReadCounter
documentId={doc.id}
initialCount={doc.read_count ?? 0}
initialLastReadAt={doc.last_read_at ?? null}
/>
{/if}
<FileInfoView {doc} />
<ProcessingStatusView {doc} />
{:else if tab === 'ai'}
<AnalysisPanel docId={doc.id} doc={doc} />
<AIClassificationEditor {doc} />
<div>
<h4 class="text-xs font-semibold text-dim uppercase mb-1.5">관련 문서</h4>
<!-- TODO(backend): GET /documents/{id}/related?limit=10 (벡터 유사도) -->
<EmptyState
icon={FileText}
title="추후 지원"
description="관련 문서 추천은 backend 연동 후 제공됩니다."
/>
</div>
{:else}
<LibraryPathEditor {doc} />
<NoteEditor {doc} />
<EditUrlEditor {doc} />
<TagsEditor {doc} />
<div class="pt-2 border-t border-default">
<DocumentDangerZone {doc} ondelete={handleDocDelete} />
</div>
{/if}
</div>
{/snippet}
</Tabs>
</Card>
<!-- 오른쪽 — 슬림 전역 인사이트 레일 (D3: 탭 게이트 제거, 요약·심층·불일치 상시 노출).
정보/관리는 접이(<details>) — 데스크탑은 인사이트 상시, 모바일은 본문 메인 + 열어서 확인. -->
<aside class="min-w-0 space-y-3">
{#if doc.category === 'library'}
<Card>
<ReadCounter
documentId={doc.id}
initialCount={doc.read_count ?? 0}
initialLastReadAt={doc.last_read_at ?? null}
/>
</Card>
{/if}
<!-- 요약·분석 — 기본 펼침(데스크탑 상시감, 모바일 접기 가능) -->
<details open class="bg-surface border border-default rounded-card overflow-hidden group">
<summary class="cursor-pointer list-none flex items-center justify-between px-3.5 py-2.5 text-xs font-semibold text-dim uppercase tracking-wide select-none">
<span>요약 · 분석</span>
<ChevronRight size={14} class="transition-transform group-open:rotate-90 text-faint" />
</summary>
<div class="px-3.5 pb-3.5 space-y-4">
<AnalysisPanel docId={doc.id} doc={doc} />
<AIClassificationEditor {doc} />
<div>
<h4 class="text-xs font-semibold text-dim uppercase mb-1.5">관련 문서</h4>
<!-- TODO(backend): GET /documents/{id}/related?limit=10 (벡터 유사도) — v1 제외(자리만) -->
<EmptyState
icon={FileText}
title="추후 지원"
description="관련 문서 추천은 backend 연동 후 제공됩니다."
/>
</div>
</div>
</details>
<!-- 문서 정보 — 접이(기본 닫힘) -->
<details class="bg-surface border border-default rounded-card overflow-hidden group">
<summary class="cursor-pointer list-none flex items-center justify-between px-3.5 py-2.5 text-xs font-semibold text-dim uppercase tracking-wide select-none">
<span>문서 정보</span>
<ChevronRight size={14} class="transition-transform group-open:rotate-90 text-faint" />
</summary>
<div class="px-3.5 pb-3.5 space-y-3">
<FileInfoView {doc} />
<ProcessingStatusView {doc} />
</div>
</details>
<!-- 관리 — 접이(기본 닫힘) -->
<details class="bg-surface border border-default rounded-card overflow-hidden group">
<summary class="cursor-pointer list-none flex items-center justify-between px-3.5 py-2.5 text-xs font-semibold text-dim uppercase tracking-wide select-none">
<span>관리</span>
<ChevronRight size={14} class="transition-transform group-open:rotate-90 text-faint" />
</summary>
<div class="px-3.5 pb-3.5 space-y-3">
<LibraryPathEditor {doc} />
<NoteEditor {doc} />
<EditUrlEditor {doc} />
<TagsEditor {doc} />
<div class="pt-2 border-t border-default">
<DocumentDangerZone {doc} ondelete={handleDocDelete} />
</div>
</div>
</details>
</aside>
</div>
@@ -0,0 +1,6 @@
-- 340_documents_material_type.sql
-- 안전 자료실 분류 축 A-1 (1/12) — 자료유형 컬럼.
-- plan: safety-library-1 (PKM plans/2026-06-12-safety-library-plan.html)
-- TEXT+CHECK 방식 (PG enum 아님 — 152 의 enum ADD VALUE 동일-런 사용 불가 함정 회피).
-- 값 부여 = 수집기 ingest 시점 deterministic (classify_worker 아님 — classify-skip 경로 다수).
ALTER TABLE documents ADD COLUMN IF NOT EXISTS material_type TEXT;
@@ -0,0 +1,6 @@
-- 341_documents_material_type_check.sql
-- 안전 자료실 분류 축 A-1 (2/12) — material_type 값 공간 named CHECK.
-- plan: safety-library-1 0-1 확정 7값. 값 추가 시 = 본 제약 DROP + 재ADD 2파일 (named 라 가능).
-- NULL 은 CHECK 통과 (비안전/일반 문서는 NULL 유지 — 전수 분류 시도 금지).
ALTER TABLE documents ADD CONSTRAINT chk_documents_material_type
CHECK (material_type IN ('law', 'paper', 'book', 'incident', 'manual', 'standard', 'guide'));
@@ -0,0 +1,5 @@
-- 342_documents_jurisdiction.sql
-- 안전 자료실 분류 축 A-1 (3/12) — 관할(나라) 컬럼. 법령 1급 시민 축.
-- plan: safety-library-1 0-1. 'GB' 표기 (news_sources.country 실측 어휘와 통일, UI 라벨만 UK).
-- paper 는 NULL 허용 (국제 학술지 — 관할 개념 부적합). INT = ISO 류 국제기구 자료 유보.
ALTER TABLE documents ADD COLUMN IF NOT EXISTS jurisdiction TEXT;
@@ -0,0 +1,4 @@
-- 343_documents_jurisdiction_check.sql
-- 안전 자료실 분류 축 A-1 (4/12) — jurisdiction 값 공간 named CHECK.
ALTER TABLE documents ADD CONSTRAINT chk_documents_jurisdiction
CHECK (jurisdiction IN ('KR', 'US', 'EU', 'JP', 'GB', 'INT'));
@@ -0,0 +1,7 @@
-- 344_documents_law_jurisdiction_check.sql
-- 안전 자료실 분류 축 A-1 (5/12) — 나라 혼선 금지를 구조로 강제.
-- 법령(material_type='law')인데 jurisdiction NULL 인 행은 적재 자체가 거부된다.
-- 업로드 승인 경로는 proposed_jurisdiction 필수 입력 (KR 기본값 오염 금지 — plan A-2).
-- material_type 이 NULL 이면 식 전체가 NULL = CHECK 통과 (비법령 무영향).
ALTER TABLE documents ADD CONSTRAINT chk_documents_law_jurisdiction
CHECK (material_type <> 'law' OR jurisdiction IS NOT NULL);
@@ -0,0 +1,5 @@
-- 345_documents_published_date.sql
-- 안전 자료실 분류 축 A-1 (6/12) — 유형별 대표 날짜 (패싯 연도·freshness 단일 날짜 축).
-- 법령 = COALESCE(effective_date, promulgation_date) — plan 0-1 R2-M2 확정.
-- 논문 = 발행일 / 재해 = 발생일 / 뉴스·크롤 = extract_meta.published_at backfill (A-3).
ALTER TABLE documents ADD COLUMN IF NOT EXISTS published_date DATE;
+22
View File
@@ -0,0 +1,22 @@
-- 346_legal_acts_table.sql
-- 안전 자료실 A-1 (7/12) — 법령 레지스트리 = 워치리스트 (news_sources 패턴의 법령판).
-- plan: safety-library-1 0-2. statute_watchlist 별도 테이블 안 만듦 (R2 blocker — 이중 정의 해소, watermark 흡수).
-- KOSHA GUIDE / KGS Code 는 비대상 (guide=비법령, KGS=watch-폴더 단독 트랙 R3-M5).
-- 시드 = B-1 PR① (레거시 law_monitor 26개 superset, watch=true 전부 — R3-B1).
-- repeal_detected_at: 어댑터(코어)는 폐지 감지 마킹만, 전이는 일일 잡 단일 지점 (R3-M3).
CREATE TABLE IF NOT EXISTS legal_acts (
family_id TEXT PRIMARY KEY,
jurisdiction TEXT NOT NULL CHECK (jurisdiction IN ('KR', 'US', 'EU', 'JP', 'GB', 'INT')),
law_level TEXT NOT NULL CHECK (law_level IN ('statute', 'decree', 'rule', 'admin_rule', 'code')),
title TEXT NOT NULL,
title_ko TEXT,
parent_family_id TEXT REFERENCES legal_acts(family_id),
native_id TEXT NOT NULL,
source_api TEXT NOT NULL,
watch BOOLEAN NOT NULL DEFAULT TRUE,
poll_cycle TEXT NOT NULL DEFAULT 'daily' CHECK (poll_cycle IN ('daily', 'weekly', 'monthly', 'quarterly')),
watermark TEXT,
repeal_detected_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
+20
View File
@@ -0,0 +1,20 @@
-- 347_legal_meta_table.sql
-- 안전 자료실 A-1 (8/12) — 법령 문서 1건(=1버전 또는 1부속문서)당 1행. documents 1:0..1 위성, 최소형.
-- plan: safety-library-1 0-2. supersedes 체인 컬럼은 미포함 (개정 이벤트 10건 관찰 후 승격).
-- version_key: KR primary = MST / annex = 'MST|별표N' 합성 (같은 MST 별표 다건 UNIQUE 충돌 회피)
-- / interpretation = 소스 native id. dedup 키도 이 합성형 그대로 (R3-M4 silent skip 차단).
-- version_status 운영 계약 (B-1 PR② 일일 잡이 유일한 전이 지점, R2-B2·R3-M3):
-- 전 버전 pending 적재 → 잡이 KST 기준 시행일 도래분 current 승격 + 직전 current 를 superseded
-- + 구버전 청크 in_corpus=false 를 한 트랜잭션 처리. repeal 도 잡 경유.
-- 입법예고 등 신호류 문서는 legal_meta 없음 (legal_meta 존재 = 법령 본문).
CREATE TABLE IF NOT EXISTS legal_meta (
document_id BIGINT PRIMARY KEY REFERENCES documents(id) ON DELETE CASCADE,
family_id TEXT NOT NULL REFERENCES legal_acts(family_id),
law_doc_kind TEXT NOT NULL DEFAULT 'primary' CHECK (law_doc_kind IN ('primary', 'annex', 'interpretation')),
version_key TEXT NOT NULL,
promulgation_date DATE,
effective_date DATE,
version_status TEXT NOT NULL DEFAULT 'pending' CHECK (version_status IN ('pending', 'current', 'superseded', 'repealed')),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uq_legal_meta_version UNIQUE (family_id, law_doc_kind, version_key)
);
@@ -0,0 +1,4 @@
-- 348_documents_material_type_idx.sql
-- 안전 자료실 A-1 (9/12) — material_type partial index (128~131 facet 인덱스 선례).
CREATE INDEX IF NOT EXISTS idx_documents_material_type
ON documents (material_type) WHERE material_type IS NOT NULL;
@@ -0,0 +1,4 @@
-- 349_documents_jurisdiction_idx.sql
-- 안전 자료실 A-1 (10/12) — jurisdiction partial index.
CREATE INDEX IF NOT EXISTS idx_documents_jurisdiction
ON documents (jurisdiction) WHERE jurisdiction IS NOT NULL;
+6
View File
@@ -0,0 +1,6 @@
-- 350_legal_meta_family_idx.sql
-- 안전 자료실 A-1 (11/12) — point-in-time 조회 축.
-- 술어 = COALESCE(effective_date, promulgation_date) (KGS 류 시행일 미상 row 침묵 탈락 방지)
-- 이나 인덱스는 effective_date 단순형으로 시작 — COALESCE expression index 는 실측 후.
CREATE INDEX IF NOT EXISTS idx_legal_meta_family
ON legal_meta (family_id, effective_date DESC);
@@ -0,0 +1,9 @@
-- 351_documents_paper_doi_uq.sql
-- 안전 자료실 A-1 (12/12) — 논문 DOI dedup 구조 강제 (partial UNIQUE).
-- doi 보유 계약 (R3 — R2-B1): paper.doi 는 서지 Document 단일 보유.
-- OA 전문 PDF / 구매분 file Document 는 paper.doi 를 갖지 않고 paper.parent_doi 링크로 연결
-- → 인덱스 식이 NULL 이라 다중 행 허용, 2-Document 구조와 무충돌.
-- DOI 정규화(소문자·prefix 제거)는 단일 함수 경유 — 저장=조회 동일 함수 원칙 (B-3).
CREATE UNIQUE INDEX IF NOT EXISTS uq_documents_paper_doi
ON documents (lower(extract_meta #>> '{paper,doi}'))
WHERE material_type = 'paper' AND extract_meta #>> '{paper,doi}' IS NOT NULL;
@@ -0,0 +1,8 @@
-- 352_news_sources_material_type.sql
-- 안전 자료실 A-2 (1/4) — 소스 레지스트리에 자료유형 기본값.
-- plan: safety-library-1 A-2. 수집기 ingest 시점 deterministic 부여의 단일 진실 =
-- 레지스트리 행 (country 와 동일 패턴 — 코드 하드코딩/이름 매칭 회피).
-- NULL = 자료유형 비대상 (뉴스/철학 등). paper 소스는 country 가 있어도
-- documents.jurisdiction 은 NULL (국제 학술지 — 코드 레벨 규칙).
ALTER TABLE news_sources ADD COLUMN IF NOT EXISTS material_type TEXT
CHECK (material_type IN ('law', 'paper', 'book', 'incident', 'manual', 'standard', 'guide'));
@@ -0,0 +1,6 @@
-- 353_news_sources_license_scheme.sql
-- 안전 자료실 A-2 (2/4) — 소스별 라이선스 scheme (0-3 license 메타 deterministic 주입).
-- kogl(공공누리류) / ogl(UK) / public_domain(미 연방) / proprietary / unknown.
-- 미확정 소스는 보수적으로 unknown/proprietary + redistribute=false 에서 시작
-- (갱신은 근거 확보 시 완화 방향 — 보수적=빡빡 원칙).
ALTER TABLE news_sources ADD COLUMN IF NOT EXISTS license_scheme TEXT;
@@ -0,0 +1,4 @@
-- 354_news_sources_license_redistribute.sql
-- 안전 자료실 A-2 (3/4) — 재배포 가능 여부. P3 다이제스트/발행류의 구조 게이트 입력
-- (redistribute=false 소스 제외 — 사람 기억 의존 차단, 0-3).
ALTER TABLE news_sources ADD COLUMN IF NOT EXISTS license_redistribute BOOLEAN;
@@ -0,0 +1,45 @@
-- 355_news_sources_material_seed.sql
-- 안전 자료실 A-2 (4/4) — 기존 안전/공학 소스 12행 material_type + license 시드.
-- 매핑 근거 = plan safety-library-1 0-1 경계 확정 (2026-06-12 prod 레지스트리 실측 대조):
-- law=입법예고(신호) / incident=HSE·KOSHA사례·CSB·CCPS / guide=KOSHA GUIDE·TWI
-- / standard=NB·API 공지 / paper=JPVT·arXiv (jurisdiction 은 코드에서 NULL 강제).
-- 뉴스/철학 소스는 NULL 유지 (자료유형 비대상). 이름 키 = 시드 마이그레이션이 부여한 고정값.
UPDATE news_sources SET
material_type = CASE name
WHEN '고용노동부 입법행정예고' THEN 'law'
WHEN 'UK HSE Press' THEN 'incident'
WHEN 'KOSHA 재해사례' THEN 'incident'
WHEN 'US CSB 사고조사보고서' THEN 'incident'
WHEN 'CCPS Process Safety Beacon' THEN 'incident'
WHEN 'KOSHA GUIDE' THEN 'guide'
WHEN 'TWI Job Knowledge' THEN 'guide'
WHEN 'National Board 기술 아티클' THEN 'standard'
WHEN 'API 표준 공지' THEN 'standard'
WHEN 'ASME J. Pressure Vessel Technology' THEN 'paper'
WHEN 'arXiv physics.app-ph' THEN 'paper'
WHEN 'arXiv cond-mat.mtrl-sci' THEN 'paper'
END,
license_scheme = CASE name
WHEN '고용노동부 입법행정예고' THEN 'kogl'
WHEN 'KOSHA 재해사례' THEN 'kogl'
WHEN 'KOSHA GUIDE' THEN 'kogl'
WHEN 'UK HSE Press' THEN 'ogl'
WHEN 'US CSB 사고조사보고서' THEN 'public_domain'
WHEN 'TWI Job Knowledge' THEN 'proprietary'
WHEN 'National Board 기술 아티클' THEN 'proprietary'
WHEN 'API 표준 공지' THEN 'proprietary'
WHEN 'CCPS Process Safety Beacon' THEN 'proprietary'
WHEN 'ASME J. Pressure Vessel Technology' THEN 'proprietary'
WHEN 'arXiv physics.app-ph' THEN 'unknown'
WHEN 'arXiv cond-mat.mtrl-sci' THEN 'unknown'
END,
license_redistribute = CASE name
WHEN 'UK HSE Press' THEN TRUE
WHEN 'US CSB 사고조사보고서' THEN TRUE
ELSE FALSE
END
WHERE name IN ('고용노동부 입법행정예고', 'UK HSE Press', 'KOSHA 재해사례',
'US CSB 사고조사보고서', 'CCPS Process Safety Beacon', 'KOSHA GUIDE',
'TWI Job Knowledge', 'National Board 기술 아티클', 'API 표준 공지',
'ASME J. Pressure Vessel Technology', 'arXiv physics.app-ph',
'arXiv cond-mat.mtrl-sci');
+41
View File
@@ -0,0 +1,41 @@
-- 356_seed_legal_acts_kr.sql
-- 안전 자료실 B-1 PR① — legal_acts KR 시드 26행 (레거시 law_monitor 26개 superset).
-- plan: safety-library-1 B-1. watch=true 26개 전부 (R3-B1 ① — '우선순위'는 정렬일 뿐 제외 아님).
-- 법령ID/공포/시행 = 2026-06-13 lawSearch 라이브 실측 (tests/fixtures/statute_kr/seed_26laws.tsv).
-- ★ '유해ㆍ위험작업...' = 정식명에 가운뎃점(U+318D) — law_monitor 하드코딩(점 없음)은 exact match
-- 불일치로 이 법령을 영구 미매칭하던 잠복 누락이었음 (R8-m1 의 watchlist 판 실증).
-- parent 계열: 법률 → 시행령/시행규칙/위임 부령. VALUES 순서 = 부모 선행 (FK).
INSERT INTO legal_acts (family_id, jurisdiction, law_level, title, parent_family_id, native_id, source_api, watch, poll_cycle)
SELECT v.family_id, v.jurisdiction, v.law_level, v.title, v.parent_family_id, v.native_id, v.source_api, v.watch, v.poll_cycle
FROM (VALUES
-- 법률 (statute, 14)
('kr-law:001766', 'KR', 'statute', '산업안전보건법', NULL, '001766', 'law.go.kr', TRUE, 'daily'),
('kr-law:013993', 'KR', 'statute', '중대재해 처벌 등에 관한 법률', NULL, '013993', 'law.go.kr', TRUE, 'daily'),
('kr-law:001807', 'KR', 'statute', '건설기술 진흥법', NULL, '001807', 'law.go.kr', TRUE, 'daily'),
('kr-law:000237', 'KR', 'statute', '시설물의 안전 및 유지관리에 관한 특별법', NULL, '000237', 'law.go.kr', TRUE, 'daily'),
('kr-law:009502', 'KR', 'statute', '위험물안전관리법', NULL, '009502', 'law.go.kr', TRUE, 'daily'),
('kr-law:000162', 'KR', 'statute', '화학물질관리법', NULL, '000162', 'law.go.kr', TRUE, 'daily'),
('kr-law:011857', 'KR', 'statute', '화학물질의 등록 및 평가 등에 관한 법률', NULL, '011857', 'law.go.kr', TRUE, 'daily'),
('kr-law:009503', 'KR', 'statute', '소방시설 설치 및 관리에 관한 법률', NULL, '009503', 'law.go.kr', TRUE, 'daily'),
('kr-law:001854', 'KR', 'statute', '전기사업법', NULL, '001854', 'law.go.kr', TRUE, 'daily'),
('kr-law:013718', 'KR', 'statute', '전기안전관리법', NULL, '013718', 'law.go.kr', TRUE, 'daily'),
('kr-law:001850', 'KR', 'statute', '고압가스 안전관리법', NULL, '001850', 'law.go.kr', TRUE, 'daily'),
('kr-law:001849', 'KR', 'statute', '액화석유가스의 안전관리 및 사업법', NULL, '001849', 'law.go.kr', TRUE, 'daily'),
('kr-law:001872', 'KR', 'statute', '근로기준법', NULL, '001872', 'law.go.kr', TRUE, 'daily'),
('kr-law:002016', 'KR', 'statute', '환경영향평가법', NULL, '002016', 'law.go.kr', TRUE, 'daily'),
-- 대통령령 (decree, 7)
('kr-law:003786', 'KR', 'decree', '산업안전보건법 시행령', 'kr-law:001766', '003786', 'law.go.kr', TRUE, 'daily'),
('kr-law:014159', 'KR', 'decree', '중대재해 처벌 등에 관한 법률 시행령', 'kr-law:013993', '014159', 'law.go.kr', TRUE, 'daily'),
('kr-law:002111', 'KR', 'decree', '건설기술 진흥법 시행령', 'kr-law:001807', '002111', 'law.go.kr', TRUE, 'daily'),
('kr-law:009707', 'KR', 'decree', '위험물안전관리법 시행령', 'kr-law:009502', '009707', 'law.go.kr', TRUE, 'daily'),
('kr-law:004390', 'KR', 'decree', '화학물질관리법 시행령', 'kr-law:000162', '004390', 'law.go.kr', TRUE, 'daily'),
('kr-law:009694', 'KR', 'decree', '소방시설 설치 및 관리에 관한 법률 시행령', 'kr-law:009503', '009694', 'law.go.kr', TRUE, 'daily'),
('kr-law:002246', 'KR', 'decree', '고압가스 안전관리법 시행령', 'kr-law:001850', '002246', 'law.go.kr', TRUE, 'daily'),
-- 부령 (rule, 5)
('kr-law:007364', 'KR', 'rule', '산업안전보건법 시행규칙', 'kr-law:001766', '007364', 'law.go.kr', TRUE, 'daily'),
('kr-law:007363', 'KR', 'rule', '산업안전보건기준에 관한 규칙', 'kr-law:001766', '007363', 'law.go.kr', TRUE, 'daily'),
('kr-law:007844', 'KR', 'rule', '유해ㆍ위험작업의 취업 제한에 관한 규칙', 'kr-law:001766', '007844', 'law.go.kr', TRUE, 'daily'),
('kr-law:006175', 'KR', 'rule', '건설기술 진흥법 시행규칙', 'kr-law:001807', '006175', 'law.go.kr', TRUE, 'daily'),
('kr-law:009732', 'KR', 'rule', '위험물안전관리법 시행규칙', 'kr-law:009502', '009732', 'law.go.kr', TRUE, 'daily')
) AS v(family_id, jurisdiction, law_level, title, parent_family_id, native_id, source_api, watch, poll_cycle)
WHERE NOT EXISTS (SELECT 1 FROM legal_acts la WHERE la.family_id = v.family_id);
+221
View File
@@ -0,0 +1,221 @@
"""안전 자료실 A-3 백필 — 기존 코퍼스에 material_type/jurisdiction/published_date/license 소급.
plan: safety-library-1 A-3 (PKM plans/2026-06-12-safety-library-plan.html)
선례: backfill_category.py (one-off 멱등 스크립트 migration 아님, 152 단일 트랜잭션 제약 회피)
술어 (2026-06-13 prod 실측 교정 R2 blocker 반영):
1. extract_meta.source_id JOIN news_sources 레지스트리 material_type/country 전파
(KOSHA 사례 본문·CSB 페이지·HSE·MOEL·JPVT·arXiv·NB·TWI·API 공지 전부 커버.
paper jurisdiction NULL 강제 plan 0-1. KOSHA 본문의 kosha.kind='case' 가정은
실측 부정됨: kind 첨부/GUIDE 에만 존재 source_id JOIN 정본 술어)
2. kosha.kind='case_attachment' incident/KR
3. kosha.kind='guide' guide/KR (+ ofancYmd 'YYYY-MM-DD' 실측)
4. csb.kind='report_pdf' incident/US (source_id 없음 JOIN 비대상)
5. source_channel='law_monitor' law/KR (243. legal_meta 생략 MST 미보존,
버전 체인은 B-1 가동 시점부터. published_date = title '(YYYYMMDD)' 공포일 추출
extract_meta 빈값 실측, R3-m1 'NULL 허용' 보다 1 정규식이 저렴해 채움)
6. file_path LIKE '%KGS_Code%' law/KR (frontmatter = 'code' 실측 117/118,
'kgs_code' 0. 경로 술어가 단순·전수. license B-4 소관 미주입)
불변식:
- UPDATE material_type IS NULL 가드 (멱등 재실행 안전, A-2 신규 유입분 무접촉)
- material_type + jurisdiction 동일 statement (law CHECK chk_documents_law_jurisdiction 충족)
- published_date / license 각자 필드 부재 가드 (이미 있으면 무접촉)
- 업로드 Industrial_Safety 문서 = 대상 아님 (LLM 제안+승인 경로만 자동 전이 금지)
- 코퍼스(청크/임베딩) 무접촉 검색 지표 무변동이 정상
실행:
docker compose exec -T fastapi python /app/scripts/backfill_material_axis.py --dry-run
docker compose exec -T fastapi python /app/scripts/backfill_material_axis.py --apply
"""
import argparse
import asyncio
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "app"))
# text() 미사용 — exec_driver_sql (정규식 콜론 함정)
from sqlalchemy.ext.asyncio import create_async_engine
# ─── 술어별 (라벨, 카운트 SQL, 적용 SQL) ───────────────────────────────────────
_KOSHA_LICENSE = ("kogl", "false", "한국산업안전보건공단(KOSHA)")
_CSB_LICENSE = ("public_domain", "true", "U.S. Chemical Safety Board")
_LAW_LICENSE = ("public_domain", "true", "국가법령정보센터")
def _license_obj(scheme: str, redistribute: str, attribution: str) -> str:
return (
f"jsonb_build_object('license', jsonb_build_object("
f"'scheme', '{scheme}', 'redistribute', {redistribute}::boolean, "
f"'attribution', '{attribution}'))"
)
STEPS: list[tuple[str, str]] = [
# 1) 레지스트리 전파 (source_id JOIN)
("1. src_join material/jurisdiction", """
UPDATE documents d SET
material_type = ns.material_type,
jurisdiction = CASE WHEN ns.material_type = 'paper' THEN NULL ELSE ns.country END
FROM news_sources ns
WHERE d.material_type IS NULL AND d.deleted_at IS NULL
AND d.extract_meta->>'source_id' ~ '^[0-9]+$'
AND ns.id = (d.extract_meta->>'source_id')::int
AND ns.material_type IS NOT NULL
"""),
# 2) KOSHA 첨부
("2. kosha 첨부 incident/KR", """
UPDATE documents SET material_type = 'incident', jurisdiction = 'KR'
WHERE material_type IS NULL AND deleted_at IS NULL
AND extract_meta#>>'{kosha,kind}' = 'case_attachment'
"""),
# 3) KOSHA GUIDE
("3. kosha GUIDE guide/KR", """
UPDATE documents SET material_type = 'guide', jurisdiction = 'KR'
WHERE material_type IS NULL AND deleted_at IS NULL
AND extract_meta#>>'{kosha,kind}' = 'guide'
"""),
# 4) CSB 보고서 PDF
("4. csb PDF incident/US", """
UPDATE documents SET material_type = 'incident', jurisdiction = 'US'
WHERE material_type IS NULL AND deleted_at IS NULL
AND extract_meta#>>'{csb,kind}' = 'report_pdf'
"""),
# 5) 레거시 law_monitor
("5. law_monitor law/KR", """
UPDATE documents SET material_type = 'law', jurisdiction = 'KR'
WHERE material_type IS NULL AND deleted_at IS NULL
AND source_channel = 'law_monitor'
"""),
# 6) KGS Code watch 폴더
("6. KGS law/KR", """
UPDATE documents SET material_type = 'law', jurisdiction = 'KR'
WHERE material_type IS NULL AND deleted_at IS NULL
AND file_path LIKE '%KGS_Code%'
"""),
# 7) published_date — crawl/news 공통 (extract_meta.published_at ISO)
("7. published_date (published_at)", """
UPDATE documents SET published_date = (extract_meta->>'published_at')::date
WHERE published_date IS NULL AND deleted_at IS NULL
AND extract_meta->>'published_at' ~ '^\\d{4}-\\d{2}-\\d{2}'
"""),
# 8) published_date — KOSHA GUIDE 공표일자 ('YYYY-MM-DD' 실측)
("8. published_date (GUIDE ofancYmd)", """
UPDATE documents SET published_date = (extract_meta#>>'{kosha,ofancYmd}')::date
WHERE published_date IS NULL AND deleted_at IS NULL
AND extract_meta#>>'{kosha,ofancYmd}' ~ '^\\d{4}-\\d{2}-\\d{2}$'
"""),
# 9) published_date — 레거시 law title 공포일 '(YYYYMMDD)'
("9. published_date (law title 공포일)", """
UPDATE documents
SET published_date = to_date(substring(title from '\\((20\\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\\d|3[01]))\\)'), 'YYYYMMDD')
WHERE published_date IS NULL AND deleted_at IS NULL
AND source_channel = 'law_monitor'
AND title ~ '\\((20\\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\\d|3[01]))\\)'
"""),
# 10) license — 레지스트리 전파 (scheme 있는 소스만)
("10. license (src_join)", """
UPDATE documents d SET
extract_meta = COALESCE(d.extract_meta, '{}'::jsonb)
|| jsonb_build_object('license', jsonb_build_object(
'scheme', ns.license_scheme,
'redistribute', COALESCE(ns.license_redistribute, false),
'attribution', ns.name))
FROM news_sources ns
WHERE d.deleted_at IS NULL AND NOT (COALESCE(d.extract_meta, '{}'::jsonb) ? 'license')
AND d.extract_meta->>'source_id' ~ '^[0-9]+$'
AND ns.id = (d.extract_meta->>'source_id')::int
AND ns.license_scheme IS NOT NULL
"""),
# 11) license — KOSHA 첨부/GUIDE (source_id 없음)
("11. license (kosha kinds)", f"""
UPDATE documents SET
extract_meta = COALESCE(extract_meta, '{{}}'::jsonb) || {_license_obj(*_KOSHA_LICENSE)}
WHERE deleted_at IS NULL AND NOT (COALESCE(extract_meta, '{{}}'::jsonb) ? 'license')
AND extract_meta#>>'{{kosha,kind}}' IN ('case_attachment', 'guide')
"""),
# 12) license — CSB PDF
("12. license (csb PDF)", f"""
UPDATE documents SET
extract_meta = COALESCE(extract_meta, '{{}}'::jsonb) || {_license_obj(*_CSB_LICENSE)}
WHERE deleted_at IS NULL AND NOT (COALESCE(extract_meta, '{{}}'::jsonb) ? 'license')
AND extract_meta#>>'{{csb,kind}}' = 'report_pdf'
"""),
# 13) license — 레거시 법령 (저작권법 제7조 비보호)
("13. license (law_monitor)", f"""
UPDATE documents SET
extract_meta = COALESCE(extract_meta, '{{}}'::jsonb) || {_license_obj(*_LAW_LICENSE)}
WHERE deleted_at IS NULL AND NOT (COALESCE(extract_meta, '{{}}'::jsonb) ? 'license')
AND source_channel = 'law_monitor'
"""),
]
VERIFY_SQL = [
("축 전수표 (material_type x jurisdiction)", """
SELECT material_type, jurisdiction, count(*) AS docs,
count(published_date) AS with_date,
count(*) FILTER (WHERE extract_meta ? 'license') AS with_license
FROM documents WHERE material_type IS NOT NULL AND deleted_at IS NULL
GROUP BY 1, 2 ORDER BY 1, 2
"""),
("law & jurisdiction NULL (0 이어야 함 — hard)", """
SELECT count(*) FROM documents
WHERE material_type = 'law' AND jurisdiction IS NULL AND deleted_at IS NULL
"""),
("잔여 미분류 안전 후보 (kosha/csb 메타 보유인데 NULL — 0 이어야 함)", """
SELECT count(*) FROM documents
WHERE material_type IS NULL AND deleted_at IS NULL
AND (extract_meta ? 'kosha' OR extract_meta ? 'csb')
"""),
]
async def main() -> None:
parser = argparse.ArgumentParser()
mode = parser.add_mutually_exclusive_group(required=True)
mode.add_argument("--dry-run", action="store_true",
help="전 UPDATE 를 트랜잭션 안에서 실행해 정확한 rowcount + 검증표를 보여주고 ROLLBACK (변경 0)")
mode.add_argument("--apply", action="store_true", help="백필 실행 (단일 트랜잭션 커밋)")
args = parser.parse_args()
db_url = os.getenv(
"DATABASE_URL", "postgresql+asyncpg://pkm:pkm@localhost:5432/pkm"
)
engine = create_async_engine(db_url)
tag = "apply" if args.apply else "dry-run"
async with engine.connect() as conn:
trans = await conn.begin()
try:
for label, sql in STEPS:
# text() 는 정규식의 '(?:' 콜론을 bind param 으로 오인 (migration 러너와
# 동일 함정) → driver 직결 실행
result = await conn.exec_driver_sql(sql)
print(f"[{tag}] {label}: {result.rowcount}")
print("\n─── 검증 (트랜잭션 내 미리보기) ───")
for label, sql in VERIFY_SQL:
result = await conn.exec_driver_sql(sql)
rows = result.fetchall()
print(f"\n{label}:")
for row in rows:
print(" ", tuple(row))
if args.apply:
await trans.commit()
print("\n[apply] 커밋 완료")
else:
await trans.rollback()
print("\n[dry-run] 전체 롤백 — 변경 0")
except Exception:
await trans.rollback()
raise
await engine.dispose()
if __name__ == "__main__":
asyncio.run(main())
+138
View File
@@ -0,0 +1,138 @@
"""B-1 PR③ — 법령 버전 체인 검증 3술어 (plan safety-library-1).
read-only 진단 E-1 관찰의 법령 게이트 도구로도 재사용 (반복 실행 안전).
검증 3술어 (R7-M1, B-1 단일 정본):
존재성 watch family 각각 primary 시리즈 current 정확 1(0건도 위반)
+ annex 시리즈당 current 1
노출 유일성 primary current 보유 family당 primary 노출(체인+레거시 매핑 합산) 정확 1
(모집단 = primary current 보유 family 한정 R8-M2)
고아 그물 law_monitor in_corpus=true 레거시 :
(a) current 보유 family 매핑되는데 flip (flip 누락) = 0
(b) 어느 watch family 에도 매핑 되는 (제명 개정 매핑 구멍) = 0
repealed family·primary current 미보유 family 레거시 보존은 위반 아님
repealed family 기대값 0 으로 면제.
실행:
docker compose exec -T fastapi python /app/scripts/verify_statute_chain.py
종료코드: 0 = 전건 PASS, 1 = 위반 (CI/관찰 게이트 )
"""
import asyncio
import os
import sys
# 컨테이너: /app/scripts → /app (workers/core/models 패키지 루트). 로컬: repo/scripts → repo/app
_here = os.path.dirname(os.path.abspath(__file__))
for _cand in (os.path.join(_here, ".."), os.path.join(_here, "..", "app")):
if os.path.isdir(os.path.join(_cand, "workers")):
sys.path.insert(0, os.path.abspath(_cand))
break
from collections import defaultdict
from sqlalchemy import select, text
from sqlalchemy.ext.asyncio import create_async_engine
from workers.statute_collector import legacy_law_name, normalize_law_name, series_suffix
async def main() -> int:
db_url = os.getenv("DATABASE_URL", "postgresql+asyncpg://pkm:pkm@localhost:5432/pkm")
engine = create_async_engine(db_url)
violations: list[str] = []
async with engine.connect() as conn:
# ── 로드 ──
acts = (await conn.execute(text(
"SELECT family_id, title, repeal_detected_at IS NOT NULL AS repealed "
"FROM legal_acts WHERE watch"))).all()
metas = (await conn.execute(text(
"SELECT family_id, law_doc_kind, version_key, version_status FROM legal_meta"))).all()
act_title = {a.family_id: a.title for a in acts}
repealed = {a.family_id for a in acts if a.repealed}
active = [a for a in acts if not a.repealed]
# family → primary current 수 / annex 시리즈별 current 수
prim_current = defaultdict(int)
annex_series_current = defaultdict(int)
for m in metas:
if m.version_status != "current":
continue
if m.law_doc_kind == "primary":
prim_current[m.family_id] += 1
else:
annex_series_current[(m.family_id, series_suffix(m.version_key))] += 1
# ── ① 존재성 ──
for a in active:
n = prim_current[a.family_id]
if n != 1:
violations.append(f"{a.family_id} ({a.title}): primary current {n}건 (정확 1 기대)")
for (fid, suf), n in annex_series_current.items():
if fid not in repealed and n > 1:
violations.append(f"{fid} annex 시리즈 {suf}: current {n}건 (≤1 기대)")
# ── ③ 고아 그물 (정규화 동등 매핑) ──
# watch family 정규화명 → family_id (current 보유 여부 동반)
norm_to_fid = {}
for a in active:
norm_to_fid[normalize_law_name(a.title)] = a.family_id
legacy = (await conn.execute(text(
"SELECT d.id, d.title, "
" EXISTS(SELECT 1 FROM document_chunks c WHERE c.doc_id=d.id AND c.in_corpus) AS exposed "
"FROM documents d WHERE d.source_channel='law_monitor' AND d.deleted_at IS NULL"))).all()
orphan_flip_miss = 0
orphan_unmapped = 0
unmapped_names = set()
for row in legacy:
if not row.exposed:
continue # in_corpus=false = 정상 (스윕됨 or 청크 없음)
name = legacy_law_name(row.title or "")
norm = normalize_law_name(name) if name else None
fid = norm_to_fid.get(norm) if norm else None
if fid is None:
orphan_unmapped += 1
if name:
unmapped_names.add(name)
elif prim_current.get(fid, 0) >= 1:
# current 보유 family 인데 레거시가 노출 중 = flip 누락
orphan_flip_miss += 1
if orphan_flip_miss:
violations.append(f"③(a) flip 누락: current 보유 family 의 노출 레거시 {orphan_flip_miss}")
if orphan_unmapped:
violations.append(
f"③(b) 무매핑 노출 레거시 {orphan_unmapped}건 — 매핑 구멍(매핑 보강 신호): "
+ ", ".join(sorted(unmapped_names))[:200])
# ── ② 노출 유일성 (primary current 보유 family 한정) ──
# 노출 primary = 체인 primary current(=1) + 레거시 매핑 노출분.
# ③(a)=0 이면 레거시 노출분 0 → 체인 1건만 = 정확 1. 별도 위반 추출은 ③(a)에 포함됨.
# (annex 노출 비동기 일반화는 may — Phase 1 미적용)
# ── 상태 요약 출력 ──
print("=== 법령 체인 검증 (B-1 PR③ 3술어) ===")
print(f"watch family: {len(acts)} (active {len(active)}, repealed {len(repealed)})")
print(f"primary current 보유 family: {sum(1 for a in active if prim_current[a.family_id]==1)}/{len(active)}")
print(f"annex current 시리즈: {len(annex_series_current)}")
exposed_legacy = sum(1 for r in legacy if r.exposed)
print(f"레거시 law_monitor: {len(legacy)}건 (in_corpus 노출 {exposed_legacy}건)")
print()
await engine.dispose()
if violations:
print(f"[FAIL] 위반 {len(violations)}건:")
for v in violations:
print(" -", v)
return 1
print("[PASS] 3술어 전건 통과 (존재성·노출 유일성·고아 그물)")
return 0
if __name__ == "__main__":
sys.exit(asyncio.run(main()))
+1
View File
@@ -0,0 +1 @@
{"header": {"resultCode": "00", "resultMsg": "NORMAL_CODE"}, "body": {"pageNo": 1, "totalCount": 2845, "numOfRows": 3, "items": {"item": [{"contents": "<p><img src='https://portal.kosha.or.kr/api/compn24/auth/stdtboard/getImage.do?bbsId=B2025021314108&pstNo=20260611111536KIZXJ8&bbsAtcflNo=E0802000030001' style='width: 931px;' data-filename='6-9 부산 사상구.jpg' data-tboard-img-cvrt='Y'></p><p><br></p><p>2026. 6. 9. (화), 12:22경부산 사상구 소재 아파트에서</p><p><br></p><p>재해자가 2명이 실외기 설치 작업 중</p><p><br></p><p>베란다 난간이 파손되며 바닥으로 떨어짐</p><p><br></p><p>(사망 2명)</p><p>※ 위 내용은 신고 및 현재 파악된 내용으로 조사결과에 따라 변경될 수 있습니다.</p><div><br></div>", "keyword": "[6/9, 부산 사상구] 실외기 설치 작업 중 베란다 난간이 파손되어 떨어짐", "arno": "20260611111536KIZXJ8"}, {"contents": "<p><br><img src='https://portal.kosha.or.kr/api/compn24/auth/stdtboard/getImage.do?bbsId=B2025021314108&pstNo=20260611111355OZSS9T&bbsAtcflNo=E0802000030001' style='width: 931px;' data-filename='서 울관악구.jpg' data-tboard-img-cvrt='Y'></p><p><br></p><p>2026. 6. 9. (화), 17:26경서울 관악구 철도 공사 현장에서</p><p><br></p><p>재해자가 수직형 케이블 거치대 설치 준비 작업 중</p><p><br></p><p>개구부로 떨어짐(사망 1명)</p><p><br></p><p>※ 위 내용은 신고 및 현재 파악된 내용으로 조사결과에 따라 변경될 수 있습니다.</p><div><br></div><p></p>", "keyword": "[6/9, 서울 관악구] 수직형 케이블 거치대 설치 준비 중 개구부로 떨어짐", "arno": "20260611111355OZSS9T"}, {"contents": "<p><img src='https://portal.kosha.or.kr/api/compn24/auth/stdtboard/getImage.do?bbsId=B2025021314108&pstNo=202606111110595AR9QY&bbsAtcflNo=E0802000030001' style='width: 931px;' data-filename='5-14 전남 광양시.jpg' data-tboard-img-cvrt='Y'><br></p><p><br></p><p>2026. 5. 14. (목), 16:51경전남 광양시 소재 화학물질 제조사업장에서</p><p><br></p><p>재해자가 정제설비 내부에서 플랜지 해체 작업 중</p><p><br></p><p>고온 응축수가 쏟아져 화상을 입음(사망 1명)※ 위 내용은 신고 및 현재 파악된 내용으로 조사결과에 따라 변경될 수 있습니다.<br></p>", "keyword": "[5/14, 전남 광양시] 플랜지 해체 작업 중 고온 응축수가 쏟아져 화상", "arno": "202606111110595AR9QY"}]}}}
+46
View File
@@ -0,0 +1,46 @@
<?xml version="1.0" encoding="utf-8"?>
<법령 법령키="0017662026021921374">
<기본정보>
<법령ID>001766</법령ID>
<공포일자>20260219</공포일자>
<공포번호>21374</공포번호>
<언어>한글</언어>
<법종구분 법종구분코드="A0002">법률</법종구분>
<법령명_한글><![CDATA[산업안전보건법]]></법령명_한글>
<법령명_한자><![CDATA[산업안전보건법]]></법령명_한자>
<제명변경여부>N</제명변경여부>
<한글법령여부>Y</한글법령여부>
<편장절관>40040000</편장절관>
<소관부처 소관부처코드="1492000">고용노동부</소관부처>
<전화번호>044-202-8810, 8813, 8815, 8997</전화번호>
<시행일자>20260601</시행일자>
<제개정구분>일부개정</제개정구분>
<조문별시행일자>20260601</조문별시행일자>
<조문시행일자문자열>20260801:제10조의2, 제23조, 제175조제4항제1호의2</조문시행일자문자열>
<별표편집여부>N</별표편집여부>
<공포법령여부>Y</공포법령여부>
<시행일기준편집여부>Y</시행일기준편집여부>
</기본정보>
<조문>
<조문단위 조문키="0001000">
<조문번호>1</조문번호>
<조문여부>전문</조문여부>
<조문시행일자>20260601</조문시행일자>
<조문이동이전></조문이동이전>
<조문이동이후></조문이동이후>
<조문변경여부>N</조문변경여부>
</조문단위>
<조문단위 조문키="0001001">
<조문번호>1</조문번호>
<조문여부>조문</조문여부>
<조문제목><![CDATA[목적]]></조문제목>
<조문시행일자>20260601</조문시행일자>
<조문이동이전></조문이동이전>
<조문이동이후></조문이동이후>
<조문변경여부>N</조문변경여부>
<조문내용>
<![CDATA[제1조(목적) 이 법은 산업 안전 및 보건에 관한 기준을 확립하고 그 책임의 소재를 명확하게 하여 산업재해를 예방하고 쾌적한 작업환경을 조성함으로써 노무를 제공하는 사람의 안전 및 보건을 유지ㆍ증진함을 목적으로 한다. <개정 2020.5.26>]]>
</조문내용>
</조문단위>
</조문>
</법령>
+2
View File
@@ -0,0 +1,2 @@
<?xml version="1.0" encoding="UTF-8"?><LawSearch><target>law</target><키워드>산업안전보건기준에 관한 규칙</키워드><section>lawNm</section><totalCnt>1</totalCnt><page>1</page><numOfRows>1</numOfRows><resultCode>00</resultCode><resultMsg>success</resultMsg><law id="1"><법령일련번호>273603</법령일련번호><현행연혁코드>현행</현행연혁코드><법령명한글><![CDATA[산업안전보건기준에 관한 규칙]]></법령명한글><법령약칭명><![CDATA[안전보건규칙]]></법령약칭명><법령ID>007363</법령ID><공포일자>20250901</공포일자><공포번호>00450</공포번호><제개정구분명>일부개정</제개정구분명><소관부처코드>1492000</소관부처코드><소관부처명>고용노동부</소관부처명><법령구분명>고용노동부령</법령구분명><공동부령정보></공동부령정보><시행일자>20260302</시행일자><자법타법여부></자법타법여부><법령상세링크>/DRF/lawService.do?OC=__OC_REDACTED__&amp;target=law&amp;MST=273603&amp;type=HTML&amp;mobileYn=&amp;efYd=20260302</법령상세링크></law></LawSearch>
+2
View File
@@ -0,0 +1,2 @@
<?xml version="1.0" encoding="UTF-8"?><LawSearch><target>law</target><키워드>산업안전보건법</키워드><section>lawNm</section><totalCnt>3</totalCnt><page>1</page><numOfRows>3</numOfRows><resultCode>00</resultCode><resultMsg>success</resultMsg><law id="1"><법령일련번호>283449</법령일련번호><현행연혁코드>현행</현행연혁코드><법령명한글><![CDATA[산업안전보건법]]></법령명한글><법령약칭명><![CDATA[]]></법령약칭명><법령ID>001766</법령ID><공포일자>20260219</공포일자><공포번호>21374</공포번호><제개정구분명>일부개정</제개정구분명><소관부처코드>1492000</소관부처코드><소관부처명>고용노동부</소관부처명><법령구분명>법률</법령구분명><공동부령정보></공동부령정보><시행일자>20260601</시행일자><자법타법여부></자법타법여부><법령상세링크>/DRF/lawService.do?OC=__OC_REDACTED__&amp;target=law&amp;MST=283449&amp;type=HTML&amp;mobileYn=&amp;efYd=20260601</법령상세링크></law><law id="2"><법령일련번호>284771</법령일련번호><현행연혁코드>현행</현행연혁코드><법령명한글><![CDATA[산업안전보건법 시행령]]></법령명한글><법령약칭명><![CDATA[]]></법령약칭명><법령ID>003786</법령ID><공포일자>20260324</공포일자><공포번호>36220</공포번호><제개정구분명>타법개정</제개정구분명><소관부처코드>1492000</소관부처코드><소관부처명>고용노동부</소관부처명><법령구분명>대통령령</법령구분명><공동부령정보></공동부령정보><시행일자>20260324</시행일자><자법타법여부></자법타법여부><법령상세링크>/DRF/lawService.do?OC=__OC_REDACTED__&amp;target=law&amp;MST=284771&amp;type=HTML&amp;mobileYn=&amp;efYd=20260324</법령상세링크></law><law id="3"><법령일련번호>286657</법령일련번호><현행연혁코드>현행</현행연혁코드><법령명한글><![CDATA[산업안전보건법 시행규칙]]></법령명한글><법령약칭명><![CDATA[]]></법령약칭명><법령ID>007364</법령ID><공포일자>20260529</공포일자><공포번호>00470</공포번호><제개정구분명>일부개정</제개정구분명><소관부처코드>1492000</소관부처코드><소관부처명>고용노동부</소관부처명><법령구분명>고용노동부령</법령구분명><공동부령정보></공동부령정보><시행일자>20260601</시행일자><자법타법여부></자법타법여부><법령상세링크>/DRF/lawService.do?OC=__OC_REDACTED__&amp;target=law&amp;MST=286657&amp;type=HTML&amp;mobileYn=&amp;efYd=20260601</법령상세링크></law></LawSearch>
Binary file not shown.
Binary file not shown.
+26
View File
@@ -0,0 +1,26 @@
산업안전보건법 001766 283449 20260219 20260601 법률
산업안전보건법 시행령 003786 284771 20260324 20260324 대통령령
산업안전보건법 시행규칙 007364 286657 20260529 20260601 고용노동부령
산업안전보건기준에 관한 규칙 007363 273603 20250901 20260302 고용노동부령
유해위험작업의 취업 제한에 관한 규칙 MISS
중대재해 처벌 등에 관한 법률 013993 228817 20210126 20220127 법률
중대재해 처벌 등에 관한 법률 시행령 014159 277417 20251001 20251001 대통령령
건설기술 진흥법 001807 276921 20251001 20251001 법률
건설기술 진흥법 시행령 002111 286847 20260609 20260609 대통령령
건설기술 진흥법 시행규칙 006175 286885 20260611 20260611 국토교통부령
시설물의 안전 및 유지관리에 관한 특별법 000237 266683 20241203 20251204 법률
위험물안전관리법 009502 259933 20240206 20250807 법률
위험물안전관리법 시행령 009707 273077 20250805 20250807 대통령령
위험물안전관리법 시행규칙 009732 262765 20240520 20250521 행정안전부령
화학물질관리법 000162 276815 20251001 20251001 법률
화학물질관리법 시행령 004390 280507 20251223 20251223 대통령령
화학물질의 등록 및 평가 등에 관한 법률 011857 279805 20251111 20260512 법률
소방시설 설치 및 관리에 관한 법률 009503 236977 20211130 20241201 법률
소방시설 설치 및 관리에 관한 법률 시행령 009694 284781 20260324 20260324 대통령령
전기사업법 001854 283981 20260310 20260310 법률
전기안전관리법 013718 268805 20250131 20260201 법률
고압가스 안전관리법 001850 283919 20260310 20260310 법률
고압가스 안전관리법 시행령 002246 286839 20260609 20260609 대통령령
액화석유가스의 안전관리 및 사업법 001849 276549 20251001 20251128 법률
근로기준법 001872 265959 20241022 20251023 법률
환경영향평가법 002016 276833 20251001 20251023 법률
1 산업안전보건법 001766 283449 20260219 20260601 법률
2 산업안전보건법 시행령 003786 284771 20260324 20260324 대통령령
3 산업안전보건법 시행규칙 007364 286657 20260529 20260601 고용노동부령
4 산업안전보건기준에 관한 규칙 007363 273603 20250901 20260302 고용노동부령
5 유해위험작업의 취업 제한에 관한 규칙 MISS
6 중대재해 처벌 등에 관한 법률 013993 228817 20210126 20220127 법률
7 중대재해 처벌 등에 관한 법률 시행령 014159 277417 20251001 20251001 대통령령
8 건설기술 진흥법 001807 276921 20251001 20251001 법률
9 건설기술 진흥법 시행령 002111 286847 20260609 20260609 대통령령
10 건설기술 진흥법 시행규칙 006175 286885 20260611 20260611 국토교통부령
11 시설물의 안전 및 유지관리에 관한 특별법 000237 266683 20241203 20251204 법률
12 위험물안전관리법 009502 259933 20240206 20250807 법률
13 위험물안전관리법 시행령 009707 273077 20250805 20250807 대통령령
14 위험물안전관리법 시행규칙 009732 262765 20240520 20250521 행정안전부령
15 화학물질관리법 000162 276815 20251001 20251001 법률
16 화학물질관리법 시행령 004390 280507 20251223 20251223 대통령령
17 화학물질의 등록 및 평가 등에 관한 법률 011857 279805 20251111 20260512 법률
18 소방시설 설치 및 관리에 관한 법률 009503 236977 20211130 20241201 법률
19 소방시설 설치 및 관리에 관한 법률 시행령 009694 284781 20260324 20260324 대통령령
20 전기사업법 001854 283981 20260310 20260310 법률
21 전기안전관리법 013718 268805 20250131 20260201 법률
22 고압가스 안전관리법 001850 283919 20260310 20260310 법률
23 고압가스 안전관리법 시행령 002246 286839 20260609 20260609 대통령령
24 액화석유가스의 안전관리 및 사업법 001849 276549 20251001 20251128 법률
25 근로기준법 001872 265959 20241022 20251023 법률
26 환경영향평가법 002016 276833 20251001 20251023 법률
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,dedup_count,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,30411;30423;30406;30418;3879;3868;3890;3863;3856;3908,925.1,1.000,0.200,0.490,0,0.417,1.000,1.000,0,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,3921;3917;3919;3923;30418;10573;10571;3916;3874;3918,549.1,1.000,1.000,1.000,1,0.834,1.000,1.000,0,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3980;3984;3993;30412;3857;3978;3986;3983,350.4,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,4041;3852;30407;3851;30406;30421;3877;30410;30422;3905,657.8,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,10570;3888;3912;3913;3911;3905;3909;3889;3910;3902,665.5,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,5244;30422;30410;3878;5249;3897;3863;30421;3868;5253,697.8,0.250,0.111,0.118,0,0.083,0.333,0.000,0,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,30410;3855;5227;30422;30409;3917;3854;5244;30406;3867,566.2,0.667,0.500,0.432,1,0.508,0.667,1.000,0,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3980;3903;3904;3981;3760;5253;3985;3896;3917;3909,554.8,0.667,1.000,0.671,1,0.671,0.667,0.000,0,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,10572;10573;3917;3918;3916;3923;5244;3919;10571;30409,418.8,0.500,0.333,0.346,1,0.399,0.500,1.000,0,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,3853;3876;5249;5234;4025;30420;30408;30414;6675;11677,918.5,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,37599;27583;35225;27350;37601;32410;16081;37592;28407;37608,674.1,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,41652;41650;41634;31894;35899;22049;16392;9022;17123;11945,411.6,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4065;4064;4071;4066;4068;4058;4067;5063;5105,653.8,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4060;4062;4070;4059;4061;4064;4065;4063;4058;5095,552.8,1.000,1.000,0.947,1,0.947,1.000,0.000,0,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,21155;4775;4202;4776;4679;4199;4519;4668;37733;25592,462.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,35778;5239;5244;3770;3817;3774;4540;3762;3789;5249,819.4,0.500,0.250,0.264,0,0.339,0.500,1.000,0,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5249;3760;5229;3755;3774;5230;10573;3775;3787;3818,644.5,0.500,0.250,0.291,1,0.291,0.500,0.000,0,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,3772;35779;3790;5260;3897;37326;3755;37216;37011;5235,1193.3,1.000,1.000,0.920,1,0.920,1.000,0.000,0,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,33101;33368;7740;27078;5925;5840;16010;16457;6945;4507,469.1,0.125,0.100,0.073,1,0.073,0.125,0.000,0,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,21275;16927;4457;19111;16761;35246;41629;35894;6789;35908,844.5,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,5262;23732;36544;35645;37582;3774;30259;28766;31709;35635,533.4,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,31984;29686;16289;16384;5066;33238;20507;30566;5092;32721,290.9,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,31643;20470;35249;4634;26593;16059;9102;23082;23336;31187,265.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,3898;3886;3908;3887;3881;3911;3912;3892;3900;3909,506.2,1.000,0.500,0.651,1,0.642,1.000,1.000,0,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;3895;3903;13930;13931;13929;3894;3886;13944;3892,477.6,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11644;11579;35775;4026;4025;11693;11645;4034;13299;13749,538.7,1.000,0.250,0.501,0,0.450,1.000,1.000,0,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13311;13306;13312;13302;13304;13309;13299;13313;13918,446.0,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;11689;13657;13655;13656;13649;13651;13752;13659;13650,312.5,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3895;3887;13935;3902;3896;3877;3900;3899;3854,587.9,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5144;5140;5136;5180;36899;5137;5149;5178;5207;5148,4162.7,1.000,1.000,0.920,1,0.956,1.000,1.000,0,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5204;5212;5180;5208;5210;5143;5141;5206;5137;11634,3038.6,1.000,1.000,0.807,1,0.894,1.000,1.000,0,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,36721;5205;5204;5178;5214;5224;5148;5140;5209;5145,5221.9,1.000,0.500,0.591,1,0.622,1.000,1.000,0,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5186;5141;5137;36901;36869;5140;5139;5136;5178,5732.5,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5178;5180;5210;5212;5182;5133;5207;5148;5204;5209,4941.9,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,37097;5224;36877;36721;5208;5209;5205;5178;5204;5180,2432.9,0.500,0.167,0.218,0,0.327,1.000,1.000,0,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5208;38747;38791;36935;5189;36874;5180;36908;5187;36901,3828.4,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3763;3759;3774;3755;3818;3812;3778;3756;3761;3769,1866.0,1.000,1.000,0.877,1,0.974,1.000,1.000,0,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5139;5207;5204;5225;5206;5208;5210;5137;5182;5145,805.5,0.750,1.000,0.767,1,0.686,0.750,1.000,0,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5222;5225;5209;5204;5210;5205;5178;5143;5141,737.6,1.000,1.000,0.877,1,0.932,1.000,1.000,0,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,38941;4026;36874;5145;5186;38791;39084;5143;3895;5210,1231.3,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13760;13674;13669;13774;13773;13675;13755;13924;11688,528.5,0.250,1.000,0.390,1,0.647,0.333,1.000,0,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,10575;5177;11671;11649;11648;13915;5241;5173;11653;5170,872.1,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11504;11533;5090;4544;11509;11505;11476;11486;5132;5140,628.8,1.000,1.000,0.832,1,0.907,1.000,1.000,0,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;11501;5139;5090;5140;5178;11515;11517;5210,338.1,0.667,1.000,0.765,1,0.856,0.667,1.000,0,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11514;5083;11475;5090;5084;11531;11476;11473;11479;5124,675.8,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,11591;11664;13660;5177;11691;11665;13652;13942;13917;11660,357.3,0.667,1.000,0.651,1,0.785,0.667,1.000,0,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11627;11658;11600;11625;13918;11692;13751;5177;11655;13653,400.9,0.667,1.000,0.671,1,0.883,1.000,1.000,0,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11595;11616;13669;11617;11649;11605;11655;11690;11658;13915,349.0,0.333,0.250,0.202,0,0.321,0.500,1.000,0,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11711;11712;11503;11500;11714;11713;13930;11717;11701;11715,416.5,1.000,1.000,1.000,1,0.858,1.000,1.000,0,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11691;11647;11693;11692;13665;13661;13664;13666;13670;13773,404.6,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,4026;35775;5236;3966;3977;5260;3971;4018;3972;3973,569.1,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 dedup_count error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 30411;30423;30406;30418;3879;3868;3890;3863;3856;3908 925.1 1.000 0.200 0.490 0 0.417 1.000 1.000 0
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 3921;3917;3919;3923;30418;10573;10571;3916;3874;3918 549.1 1.000 1.000 1.000 1 0.834 1.000 1.000 0
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3980;3984;3993;30412;3857;3978;3986;3983 350.4 1.000 1.000 1.000 1 1.000 1.000 1.000 0
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 4041;3852;30407;3851;30406;30421;3877;30410;30422;3905 657.8 1.000 1.000 1.000 1 1.000 1.000 1.000 0
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 10570;3888;3912;3913;3911;3905;3909;3889;3910;3902 665.5 1.000 0.500 0.631 1 0.631 1.000 1.000 0
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 5244;30422;30410;3878;5249;3897;3863;30421;3868;5253 697.8 0.250 0.111 0.118 0 0.083 0.333 0.000 0
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 30410;3855;5227;30422;30409;3917;3854;5244;30406;3867 566.2 0.667 0.500 0.432 1 0.508 0.667 1.000 0
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3980;3903;3904;3981;3760;5253;3985;3896;3917;3909 554.8 0.667 1.000 0.671 1 0.671 0.667 0.000 0
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 10572;10573;3917;3918;3916;3923;5244;3919;10571;30409 418.8 0.500 0.333 0.346 1 0.399 0.500 1.000 0
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 3853;3876;5249;5234;4025;30420;30408;30414;6675;11677 918.5 0.500 1.000 0.613 1 0.787 0.500 1.000 0
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 37599;27583;35225;27350;37601;32410;16081;37592;28407;37608 674.1 0.000 0.000 0.000 1 0.000 0.000 0.000 0
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 41652;41650;41634;31894;35899;22049;16392;9022;17123;11945 411.6 0.000 0.000 0.000 0 0.000 0.000 0.000 0
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4065;4064;4071;4066;4068;4058;4067;5063;5105 653.8 1.000 1.000 1.000 1 1.000 1.000 1.000 0
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4060;4062;4070;4059;4061;4064;4065;4063;4058;5095 552.8 1.000 1.000 0.947 1 0.947 1.000 0.000 0
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 21155;4775;4202;4776;4679;4199;4519;4668;37733;25592 462.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 35778;5239;5244;3770;3817;3774;4540;3762;3789;5249 819.4 0.500 0.250 0.264 0 0.339 0.500 1.000 0
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5249;3760;5229;3755;3774;5230;10573;3775;3787;3818 644.5 0.500 0.250 0.291 1 0.291 0.500 0.000 0
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 3772;35779;3790;5260;3897;37326;3755;37216;37011;5235 1193.3 1.000 1.000 0.920 1 0.920 1.000 0.000 0
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 33101;33368;7740;27078;5925;5840;16010;16457;6945;4507 469.1 0.125 0.100 0.073 1 0.073 0.125 0.000 0
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 21275;16927;4457;19111;16761;35246;41629;35894;6789;35908 844.5 0.000 0.000 0.000 1 0.000 0.000 0.000 0
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 5262;23732;36544;35645;37582;3774;30259;28766;31709;35635 533.4 0.000 0.000 0.000 1 0.000 0.000 0.000 0
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 31984;29686;16289;16384;5066;33238;20507;30566;5092;32721 290.9 0.000 0.000 0.000 1 0.000 0.000 0.000 0
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 31643;20470;35249;4634;26593;16059;9102;23082;23336;31187 265.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 3898;3886;3908;3887;3881;3911;3912;3892;3900;3909 506.2 1.000 0.500 0.651 1 0.642 1.000 1.000 0
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;3895;3903;13930;13931;13929;3894;3886;13944;3892 477.6 0.500 1.000 0.613 1 0.917 1.000 1.000 0
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11644;11579;35775;4026;4025;11693;11645;4034;13299;13749 538.7 1.000 0.250 0.501 0 0.450 1.000 1.000 0
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13311;13306;13312;13302;13304;13309;13299;13313;13918 446.0 1.000 1.000 1.000 1 1.000 1.000 1.000 0
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;11689;13657;13655;13656;13649;13651;13752;13659;13650 312.5 1.000 1.000 1.000 1 1.000 1.000 1.000 0
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3895;3887;13935;3902;3896;3877;3900;3899;3854 587.9 0.500 1.000 0.613 1 0.917 1.000 1.000 0
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5144;5140;5136;5180;36899;5137;5149;5178;5207;5148 4162.7 1.000 1.000 0.920 1 0.956 1.000 1.000 0
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5204;5212;5180;5208;5210;5143;5141;5206;5137;11634 3038.6 1.000 1.000 0.807 1 0.894 1.000 1.000 0
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 36721;5205;5204;5178;5214;5224;5148;5140;5209;5145 5221.9 1.000 0.500 0.591 1 0.622 1.000 1.000 0
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5186;5141;5137;36901;36869;5140;5139;5136;5178 5732.5 1.000 1.000 1.000 1 1.000 1.000 1.000 0
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5178;5180;5210;5212;5182;5133;5207;5148;5204;5209 4941.9 1.000 1.000 1.000 1 1.000 1.000 1.000 0
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 37097;5224;36877;36721;5208;5209;5205;5178;5204;5180 2432.9 0.500 0.167 0.218 0 0.327 1.000 1.000 0
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5208;38747;38791;36935;5189;36874;5180;36908;5187;36901 3828.4 0.500 1.000 0.613 1 0.787 0.500 1.000 0
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3763;3759;3774;3755;3818;3812;3778;3756;3761;3769 1866.0 1.000 1.000 0.877 1 0.974 1.000 1.000 0
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5139;5207;5204;5225;5206;5208;5210;5137;5182;5145 805.5 0.750 1.000 0.767 1 0.686 0.750 1.000 0
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5222;5225;5209;5204;5210;5205;5178;5143;5141 737.6 1.000 1.000 0.877 1 0.932 1.000 1.000 0
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 38941;4026;36874;5145;5186;38791;39084;5143;3895;5210 1231.3 0.000 0.000 0.000 1 0.000 0.000 0.000 0
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13760;13674;13669;13774;13773;13675;13755;13924;11688 528.5 0.250 1.000 0.390 1 0.647 0.333 1.000 0
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 10575;5177;11671;11649;11648;13915;5241;5173;11653;5170 872.1 0.000 0.000 0.000 0 0.000 0.000 0.000 0
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11504;11533;5090;4544;11509;11505;11476;11486;5132;5140 628.8 1.000 1.000 0.832 1 0.907 1.000 1.000 0
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;11501;5139;5090;5140;5178;11515;11517;5210 338.1 0.667 1.000 0.765 1 0.856 0.667 1.000 0
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11514;5083;11475;5090;5084;11531;11476;11473;11479;5124 675.8 0.500 1.000 0.613 1 0.787 0.500 1.000 0
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 11591;11664;13660;5177;11691;11665;13652;13942;13917;11660 357.3 0.667 1.000 0.651 1 0.785 0.667 1.000 0
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11627;11658;11600;11625;13918;11692;13751;5177;11655;13653 400.9 0.667 1.000 0.671 1 0.883 1.000 1.000 0
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11595;11616;13669;11617;11649;11605;11655;11690;11658;13915 349.0 0.333 0.250 0.202 0 0.321 0.500 1.000 0
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11711;11712;11503;11500;11714;11713;13930;11717;11701;11715 416.5 1.000 1.000 1.000 1 0.858 1.000 1.000 0
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11691;11647;11693;11692;13665;13661;13664;13666;13670;13773 404.6 0.000 0.000 0.000 1 0.000 0.000 0.000 0
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 4026;35775;5236;3966;3977;5260;3971;4018;3972;3973 569.1 0.000 0.000 0.000 1 0.000 0.000 0.000 0
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,dedup_count,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,30411;30423;3879;3868;3856;3851;4041;10573;5163;30406,659.8,1.000,0.333,0.618,0,0.529,1.000,1.000,0,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,3921;3917;10573;5244;3919;3923;30418;5227;3916;3874,517.4,1.000,1.000,1.000,1,0.834,1.000,1.000,0,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3980;3984;3993;3869;30412;3857;3978;3986,336.1,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,4041;3852;30407;5227;30421;30409;30406;10573;3851;3858,599.6,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,10570;3888;5249;3893;3887;3904;3899;3890;3895;3902,629.5,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,5244;30422;30410;3878;5249;3855;3897;30421;3868;3863,664.7,0.250,0.111,0.118,0,0.083,0.333,0.000,0,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,30410;3855;5227;30409;3917;3854;5244;3867;30421;3878,545.5,1.000,0.500,0.580,1,0.599,1.000,1.000,0,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3980;3904;3903;3981;3905;35780;3985;10569;11648;3857,477.8,0.667,1.000,0.671,1,0.671,0.667,0.000,0,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,10573;3917;3916;3918;5227;5244;3919;10571;30409;3854,391.5,0.500,0.500,0.441,1,0.506,0.500,1.000,0,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,3853;3876;5249;5234;30420;30408;30414;10573;11677;3867,708.8,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,37599;27583;35225;27350;34850;32177;15924;15976;28316;16378,601.4,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,41652;41650;41634;31894;35899;22055;9022;17123;11945;9105,402.3,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4065;4066;4071;4067;4058;5105;4068;4064;5083,578.4,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4070;4060;4062;4059;4064;4066;4058;4068;5083;4065,509.9,0.667,0.500,0.531,1,0.531,0.667,0.000,0,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,21155;21441;4775;4202;4776;4679;4519;37733;25592;28001,339.8,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,35778;5239;5244;3758;3770;3763;4540;3787;3810;3791,699.5,0.500,0.200,0.237,0,0.305,0.500,1.000,0,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5230;5249;3755;3774;5229;10573;3851;3863;30406;3802,543.0,0.250,0.333,0.195,1,0.195,0.250,0.000,0,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,35779;5260;3772;3897;3790;37326;3755;37216;37011;37330,1114.4,1.000,0.333,0.544,1,0.544,1.000,0.000,0,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,33101;33368;7740;33374;27078;5925;26631;5840;16010;16457,426.3,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,23242;4457;19111;41629;26741;6789;17242;9469;4765;4345,839.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,23732;36544;25771;35645;37582;17810;28766;40739;40432;40679,465.6,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,32096;31984;16289;16384;32043;32035;28233;33551;32721;32028,270.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,18567;20022;20470;35249;26594;16232;23336;35498;18286;16694,258.9,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,3898;3886;3887;3900;3881;3889;3912;3894;3897;3888,480.6,1.000,0.500,0.693,1,0.665,1.000,1.000,0,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;3895;13930;3903;3897;13769;11679;11562;11567;10571,513.9,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11644;11579;4025;4026;35775;11693;4034;13750;13299;13941,493.9,1.000,0.333,0.571,1,0.539,1.000,1.000,0,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13306;13311;13312;13302;13304;11688;13309;13313;13918,434.6,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;11689;13657;13655;13649;13752;13648;13659;13650;13656,307.8,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3886;3887;13935;3895;3902;3896;3899;3877;3900,583.1,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5140;5136;5186;5144;5180;5212;5137;5193;36899;5182,4040.7,1.000,0.500,0.651,0,0.552,1.000,1.000,0,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5204;5212;5208;5210;5206;11634;5207;5183;5205;5182,2996.1,1.000,1.000,0.850,1,0.918,1.000,1.000,0,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,36721;5205;5204;5178;5214;5224;5148;5186;5192;5185,5084.5,1.000,0.500,0.591,1,0.622,1.000,1.000,0,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5186;5141;5136;5137;36901;36869;5139;5178;5145,5563.0,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5178;5180;5212;5182;5207;5210;5148;5204;5209;5224,4870.2,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,5180;5178;5224;36721;36877;37097;5205;5204;36878;37051,2374.8,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5186;5208;38747;38791;36935;36874;5187;5191;5182;5175,3752.4,0.500,0.500,0.387,1,0.497,0.500,1.000,0,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3759;3763;3774;3755;3818;5230;3812;3770;3756;3760,1817.5,1.000,0.500,0.651,1,0.635,1.000,1.000,0,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5139;5207;5204;5206;5225;5210;5133;5143;5141;5140,795.3,0.750,1.000,0.776,1,0.697,0.750,1.000,0,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5222;5209;5225;5210;5208;5180;5204;5195;5205,724.0,1.000,1.000,0.920,1,0.956,1.000,1.000,0,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,38941;3762;4026;36889;36874;38791;5182;39084;5143;5186,1217.1,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13674;13675;11688;11645;13310;13773;13759;13668;11652,512.6,0.500,1.000,0.541,1,0.754,0.667,1.000,0,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,10575;5177;13924;13925;5241;11562;5178;5173;11568;11563,838.4,0.500,0.143,0.204,0,0.262,0.500,1.000,0,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11504;11533;5081;11482;5088;5090;5082;5077;11509;11505,590.3,1.000,1.000,0.790,1,0.885,1.000,1.000,0,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;11501;5085;5071;11515;5090;5139;5086;5088,387.2,0.667,1.000,0.765,1,0.856,0.667,1.000,0,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11479;5083;5084;11476;11514;11516;11515;11518;11517;11472,645.5,1.000,0.200,0.442,0,0.417,1.000,1.000,0,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,11591;11664;5177;13652;13660;13942;11660;13653;13752;11579,338.3,0.333,1.000,0.469,1,0.674,0.333,1.000,0,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11627;11658;11625;11692;13753;13751;11655;13653;13752;11624,399.8,0.667,1.000,0.704,1,0.905,1.000,1.000,0,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11595;11616;13669;11617;11649;11605;11655;11690;11658;11653,353.2,0.333,0.250,0.202,0,0.321,0.500,1.000,0,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11712;11711;11500;11503;11713;13930;11701;11502;11715;11717,408.6,1.000,1.000,0.967,1,0.980,1.000,1.000,0,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11647;11668;5177;11583;11692;13665;13661;11680;11644;13666,417.6,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,35775;3966;3875;5227;4026;4025;30419;3971;4018;4021,897.2,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 dedup_count error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 30411;30423;3879;3868;3856;3851;4041;10573;5163;30406 659.8 1.000 0.333 0.618 0 0.529 1.000 1.000 0
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 3921;3917;10573;5244;3919;3923;30418;5227;3916;3874 517.4 1.000 1.000 1.000 1 0.834 1.000 1.000 0
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3980;3984;3993;3869;30412;3857;3978;3986 336.1 1.000 1.000 1.000 1 1.000 1.000 1.000 0
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 4041;3852;30407;5227;30421;30409;30406;10573;3851;3858 599.6 1.000 1.000 1.000 1 1.000 1.000 1.000 0
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 10570;3888;5249;3893;3887;3904;3899;3890;3895;3902 629.5 1.000 0.500 0.631 1 0.631 1.000 1.000 0
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 5244;30422;30410;3878;5249;3855;3897;30421;3868;3863 664.7 0.250 0.111 0.118 0 0.083 0.333 0.000 0
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 30410;3855;5227;30409;3917;3854;5244;3867;30421;3878 545.5 1.000 0.500 0.580 1 0.599 1.000 1.000 0
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3980;3904;3903;3981;3905;35780;3985;10569;11648;3857 477.8 0.667 1.000 0.671 1 0.671 0.667 0.000 0
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 10573;3917;3916;3918;5227;5244;3919;10571;30409;3854 391.5 0.500 0.500 0.441 1 0.506 0.500 1.000 0
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 3853;3876;5249;5234;30420;30408;30414;10573;11677;3867 708.8 0.500 1.000 0.613 1 0.787 0.500 1.000 0
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 37599;27583;35225;27350;34850;32177;15924;15976;28316;16378 601.4 0.000 0.000 0.000 1 0.000 0.000 0.000 0
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 41652;41650;41634;31894;35899;22055;9022;17123;11945;9105 402.3 0.000 0.000 0.000 0 0.000 0.000 0.000 0
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4065;4066;4071;4067;4058;5105;4068;4064;5083 578.4 1.000 1.000 1.000 1 1.000 1.000 1.000 0
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4070;4060;4062;4059;4064;4066;4058;4068;5083;4065 509.9 0.667 0.500 0.531 1 0.531 0.667 0.000 0
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 21155;21441;4775;4202;4776;4679;4519;37733;25592;28001 339.8 0.000 0.000 0.000 1 0.000 0.000 0.000 0
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 35778;5239;5244;3758;3770;3763;4540;3787;3810;3791 699.5 0.500 0.200 0.237 0 0.305 0.500 1.000 0
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5230;5249;3755;3774;5229;10573;3851;3863;30406;3802 543.0 0.250 0.333 0.195 1 0.195 0.250 0.000 0
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 35779;5260;3772;3897;3790;37326;3755;37216;37011;37330 1114.4 1.000 0.333 0.544 1 0.544 1.000 0.000 0
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 33101;33368;7740;33374;27078;5925;26631;5840;16010;16457 426.3 0.000 0.000 0.000 1 0.000 0.000 0.000 0
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 23242;4457;19111;41629;26741;6789;17242;9469;4765;4345 839.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 23732;36544;25771;35645;37582;17810;28766;40739;40432;40679 465.6 0.000 0.000 0.000 1 0.000 0.000 0.000 0
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 32096;31984;16289;16384;32043;32035;28233;33551;32721;32028 270.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 18567;20022;20470;35249;26594;16232;23336;35498;18286;16694 258.9 0.000 0.000 0.000 1 0.000 0.000 0.000 0
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 3898;3886;3887;3900;3881;3889;3912;3894;3897;3888 480.6 1.000 0.500 0.693 1 0.665 1.000 1.000 0
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;3895;13930;3903;3897;13769;11679;11562;11567;10571 513.9 0.500 1.000 0.613 1 0.917 1.000 1.000 0
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11644;11579;4025;4026;35775;11693;4034;13750;13299;13941 493.9 1.000 0.333 0.571 1 0.539 1.000 1.000 0
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13306;13311;13312;13302;13304;11688;13309;13313;13918 434.6 1.000 1.000 1.000 1 1.000 1.000 1.000 0
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;11689;13657;13655;13649;13752;13648;13659;13650;13656 307.8 1.000 1.000 1.000 1 1.000 1.000 1.000 0
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3886;3887;13935;3895;3902;3896;3899;3877;3900 583.1 0.500 1.000 0.613 1 0.917 1.000 1.000 0
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5140;5136;5186;5144;5180;5212;5137;5193;36899;5182 4040.7 1.000 0.500 0.651 0 0.552 1.000 1.000 0
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5204;5212;5208;5210;5206;11634;5207;5183;5205;5182 2996.1 1.000 1.000 0.850 1 0.918 1.000 1.000 0
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 36721;5205;5204;5178;5214;5224;5148;5186;5192;5185 5084.5 1.000 0.500 0.591 1 0.622 1.000 1.000 0
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5186;5141;5136;5137;36901;36869;5139;5178;5145 5563.0 1.000 1.000 1.000 1 1.000 1.000 1.000 0
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5178;5180;5212;5182;5207;5210;5148;5204;5209;5224 4870.2 1.000 1.000 1.000 1 1.000 1.000 1.000 0
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 5180;5178;5224;36721;36877;37097;5205;5204;36878;37051 2374.8 0.000 0.000 0.000 0 0.000 0.000 0.000 0
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5186;5208;38747;38791;36935;36874;5187;5191;5182;5175 3752.4 0.500 0.500 0.387 1 0.497 0.500 1.000 0
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3759;3763;3774;3755;3818;5230;3812;3770;3756;3760 1817.5 1.000 0.500 0.651 1 0.635 1.000 1.000 0
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5139;5207;5204;5206;5225;5210;5133;5143;5141;5140 795.3 0.750 1.000 0.776 1 0.697 0.750 1.000 0
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5222;5209;5225;5210;5208;5180;5204;5195;5205 724.0 1.000 1.000 0.920 1 0.956 1.000 1.000 0
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 38941;3762;4026;36889;36874;38791;5182;39084;5143;5186 1217.1 0.000 0.000 0.000 1 0.000 0.000 0.000 0
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13674;13675;11688;11645;13310;13773;13759;13668;11652 512.6 0.500 1.000 0.541 1 0.754 0.667 1.000 0
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 10575;5177;13924;13925;5241;11562;5178;5173;11568;11563 838.4 0.500 0.143 0.204 0 0.262 0.500 1.000 0
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11504;11533;5081;11482;5088;5090;5082;5077;11509;11505 590.3 1.000 1.000 0.790 1 0.885 1.000 1.000 0
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;11501;5085;5071;11515;5090;5139;5086;5088 387.2 0.667 1.000 0.765 1 0.856 0.667 1.000 0
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11479;5083;5084;11476;11514;11516;11515;11518;11517;11472 645.5 1.000 0.200 0.442 0 0.417 1.000 1.000 0
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 11591;11664;5177;13652;13660;13942;11660;13653;13752;11579 338.3 0.333 1.000 0.469 1 0.674 0.333 1.000 0
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11627;11658;11625;11692;13753;13751;11655;13653;13752;11624 399.8 0.667 1.000 0.704 1 0.905 1.000 1.000 0
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11595;11616;13669;11617;11649;11605;11655;11690;11658;11653 353.2 0.333 0.250 0.202 0 0.321 0.500 1.000 0
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11712;11711;11500;11503;11713;13930;11701;11502;11715;11717 408.6 1.000 1.000 0.967 1 0.980 1.000 1.000 0
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11647;11668;5177;11583;11692;13665;13661;11680;11644;13666 417.6 0.000 0.000 0.000 1 0.000 0.000 0.000 0
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 35775;3966;3875;5227;4026;4025;30419;3971;4018;4021 897.2 0.000 0.000 0.000 1 0.000 0.000 0.000 0
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,dedup_count,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,30417;30418;30406;10573;3863;3871;3862;3874;3851;3859,437.7,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,10573;3923;30418;3916;3921;3874;3917;3918;3920;3854,449.2,1.000,0.200,0.442,0,0.393,1.000,1.000,0,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3980;3978;3984;3869;3983;30424;3982;3903,320.2,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,4041;30406;3851;10573;30419;30407;4036;3915;3852;3875,530.2,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,3905;3888;3885;3910;3895;3911;3890;3906;3913;3889,605.2,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,3856;30411;30423;3879;5249;5244;3868;30421;30406;3851,644.8,0.750,1.000,0.689,1,0.859,1.000,1.000,0,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,3855;3867;30410;5227;3917;3878;30418;30422;3874;30421,538.4,1.000,1.000,0.933,1,0.958,1.000,1.000,0,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3981;3985;3903;3980;30424;3867;3904;3905;3880;3760,474.4,0.667,1.000,0.671,1,0.671,0.667,0.000,0,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,3917;10573;3918;3916;10572;3872;10571;3923;30418;30417,403.9,0.500,1.000,0.559,1,0.710,0.500,1.000,0,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,5249;10573;30420;30408;3876;3853;3867;5234;30424;4842,711.7,0.500,0.167,0.218,0,0.280,0.500,1.000,0,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,27583;35225;34298;30806;30679;27350;27974;32155;26534;26734,566.4,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,35904;35247;22055;16532;35893;26204;35899;16392;17123;4327,333.3,0.250,0.100,0.113,0,0.074,0.250,0.000,0,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4071;4065;4064;4066;4058;5083;4067;5105;4068,468.5,1.000,1.000,0.920,1,0.956,1.000,1.000,0,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4059;4060;4070;4062;4058;4061;5113;5083;4064;4066,382.4,1.000,0.500,0.665,1,0.665,1.000,0.000,0,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,4775;37733;4776;4658;4679;4519;4202;4668;4333;25974,321.8,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,35778;5239;5244;3758;3770;3763;3774;4540;5253;3787,776.2,0.500,0.200,0.237,0,0.305,0.500,1.000,0,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5230;5249;3756;3882;3774;37259;3863;3755;10573;3802,599.2,0.250,0.125,0.123,1,0.123,0.250,0.000,0,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,35779;3790;3772;5260;3897;37326;3755;37216;37011;3774,1205.5,1.000,0.500,0.693,1,0.693,1.000,0.000,0,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,33101;33368;7740;33374;27078;5925;26631;5840;16010;16457,494.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,23242;15924;4457;36648;36490;19111;35556;41629;26631;26741,875.2,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,23732;36544;35645;37582;17810;28766;40739;40432;40679;40700,502.6,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,32096;31984;33521;16289;16384;32043;32035;22202;28233;33551,342.0,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,18567;31643;20022;20470;16059;27146;26559;26637;18286;31187,289.1,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,5236;3898;3886;3902;3895;3887;5244;3889;3912;3894,537.9,1.000,0.333,0.525,1,0.514,1.000,1.000,0,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;3895;3903;13930;3897;13769;3766;13935;10571;3867,583.1,1.000,1.000,0.818,1,0.961,1.000,1.000,0,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11644;11579;4026;4025;11693;35775;11645;4034;13299;13941,539.9,1.000,0.333,0.571,0,0.508,1.000,1.000,0,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13311;13306;13312;13302;13304;11688;13309;13313;13918,488.1,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;11689;13657;13655;13649;13656;13648;13659;13752;13650,362.0,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3758;3886;3755;5260;3772;3887;13935;3895;3774,716.8,1.000,1.000,0.832,1,0.964,1.000,1.000,0,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5136;5178;5140;5180;5144;5193;36899;5182;5207;5206,4109.2,1.000,1.000,0.850,0,0.642,1.000,1.000,0,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5204;5212;5208;5206;5207;5183;5210;5205;5182;5148,3037.9,1.000,1.000,0.877,1,0.932,1.000,1.000,0,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,5205;5204;36721;5225;5178;5224;37029;5214;5210;5190,5183.0,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5186;5141;5136;5140;5137;36901;36869;5139;5178,5657.1,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5143;5178;5180;5212;5182;5133;5207;5210;5148;5204,4902.9,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,5204;5178;5224;36721;37097;5205;37051;36963;36951;5208,2455.9,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5208;5186;38747;38791;36935;5180;5182;36874;5187;36889,3808.4,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3759;3763;3818;3774;3812;3760;3756;3778;3757;3775,1870.9,0.500,0.500,0.387,1,0.579,1.000,1.000,0,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5207;5204;5139;5225;5210;5182;5140;5148;5212;5144,832.1,0.750,1.000,0.737,1,0.789,0.750,1.000,0,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5222;5204;5225;5210;5205;36877;5209;37051;36721,737.6,1.000,1.000,0.807,1,0.894,1.000,1.000,0,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,38941;3762;11644;4026;5186;38791;3895;39084;13299;5185,1267.9,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13674;13669;13774;13675;11688;13667;13670;11645;13310,558.8,0.500,1.000,0.508,1,0.730,0.667,1.000,0,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,5177;10572;11671;5241;11688;11653;3774;5178;5173;38701,879.9,0.500,0.125,0.193,0,0.248,0.500,1.000,0,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11504;11533;5081;5087;11482;5088;5090;4544;11509;11505,629.4,1.000,1.000,0.790,1,0.885,1.000,1.000,0,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;11501;5085;5090;3788;5086;5071;11515;11503,405.7,0.667,1.000,0.765,1,0.856,0.667,1.000,0,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11479;5083;11514;5084;11476;11527;11516;11515;5124;11472,691.4,1.000,0.333,0.500,1,0.500,1.000,1.000,0,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,11591;13660;11668;5177;13652;13943;13942;13917;11592;13752,400.9,0.333,1.000,0.469,1,0.674,0.333,1.000,0,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11627;11658;11600;11625;11692;13918;13751;5177;11655;13653,435.4,0.667,1.000,0.671,1,0.883,1.000,1.000,0,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11595;11616;11617;11649;13304;11655;11690;11658;13669;13915,417.2,0.333,0.333,0.235,1,0.373,0.500,1.000,0,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11712;11711;11503;11500;11713;13930;11502;11716;11612;11715,436.7,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11647;11668;5177;11688;11583;11692;13665;13661;11680;13666,447.7,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,35775;11644;3966;4026;3977;4022;3972;13299;3974;3971,586.4,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 dedup_count error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 30417;30418;30406;10573;3863;3871;3862;3874;3851;3859 437.7 0.000 0.000 0.000 0 0.000 0.000 0.000 0
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 10573;3923;30418;3916;3921;3874;3917;3918;3920;3854 449.2 1.000 0.200 0.442 0 0.393 1.000 1.000 0
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3980;3978;3984;3869;3983;30424;3982;3903 320.2 1.000 1.000 1.000 1 1.000 1.000 1.000 0
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 4041;30406;3851;10573;30419;30407;4036;3915;3852;3875 530.2 1.000 1.000 1.000 1 1.000 1.000 1.000 0
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 3905;3888;3885;3910;3895;3911;3890;3906;3913;3889 605.2 1.000 0.500 0.631 1 0.631 1.000 1.000 0
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 3856;30411;30423;3879;5249;5244;3868;30421;30406;3851 644.8 0.750 1.000 0.689 1 0.859 1.000 1.000 0
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 3855;3867;30410;5227;3917;3878;30418;30422;3874;30421 538.4 1.000 1.000 0.933 1 0.958 1.000 1.000 0
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3981;3985;3903;3980;30424;3867;3904;3905;3880;3760 474.4 0.667 1.000 0.671 1 0.671 0.667 0.000 0
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 3917;10573;3918;3916;10572;3872;10571;3923;30418;30417 403.9 0.500 1.000 0.559 1 0.710 0.500 1.000 0
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 5249;10573;30420;30408;3876;3853;3867;5234;30424;4842 711.7 0.500 0.167 0.218 0 0.280 0.500 1.000 0
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 27583;35225;34298;30806;30679;27350;27974;32155;26534;26734 566.4 0.000 0.000 0.000 1 0.000 0.000 0.000 0
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 35904;35247;22055;16532;35893;26204;35899;16392;17123;4327 333.3 0.250 0.100 0.113 0 0.074 0.250 0.000 0
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4071;4065;4064;4066;4058;5083;4067;5105;4068 468.5 1.000 1.000 0.920 1 0.956 1.000 1.000 0
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4059;4060;4070;4062;4058;4061;5113;5083;4064;4066 382.4 1.000 0.500 0.665 1 0.665 1.000 0.000 0
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 4775;37733;4776;4658;4679;4519;4202;4668;4333;25974 321.8 0.000 0.000 0.000 1 0.000 0.000 0.000 0
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 35778;5239;5244;3758;3770;3763;3774;4540;5253;3787 776.2 0.500 0.200 0.237 0 0.305 0.500 1.000 0
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5230;5249;3756;3882;3774;37259;3863;3755;10573;3802 599.2 0.250 0.125 0.123 1 0.123 0.250 0.000 0
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 35779;3790;3772;5260;3897;37326;3755;37216;37011;3774 1205.5 1.000 0.500 0.693 1 0.693 1.000 0.000 0
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 33101;33368;7740;33374;27078;5925;26631;5840;16010;16457 494.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 23242;15924;4457;36648;36490;19111;35556;41629;26631;26741 875.2 0.000 0.000 0.000 1 0.000 0.000 0.000 0
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 23732;36544;35645;37582;17810;28766;40739;40432;40679;40700 502.6 0.000 0.000 0.000 1 0.000 0.000 0.000 0
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 32096;31984;33521;16289;16384;32043;32035;22202;28233;33551 342.0 0.000 0.000 0.000 1 0.000 0.000 0.000 0
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 18567;31643;20022;20470;16059;27146;26559;26637;18286;31187 289.1 0.000 0.000 0.000 1 0.000 0.000 0.000 0
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 5236;3898;3886;3902;3895;3887;5244;3889;3912;3894 537.9 1.000 0.333 0.525 1 0.514 1.000 1.000 0
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;3895;3903;13930;3897;13769;3766;13935;10571;3867 583.1 1.000 1.000 0.818 1 0.961 1.000 1.000 0
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11644;11579;4026;4025;11693;35775;11645;4034;13299;13941 539.9 1.000 0.333 0.571 0 0.508 1.000 1.000 0
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13311;13306;13312;13302;13304;11688;13309;13313;13918 488.1 1.000 1.000 1.000 1 1.000 1.000 1.000 0
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;11689;13657;13655;13649;13656;13648;13659;13752;13650 362.0 1.000 1.000 1.000 1 1.000 1.000 1.000 0
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3758;3886;3755;5260;3772;3887;13935;3895;3774 716.8 1.000 1.000 0.832 1 0.964 1.000 1.000 0
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5136;5178;5140;5180;5144;5193;36899;5182;5207;5206 4109.2 1.000 1.000 0.850 0 0.642 1.000 1.000 0
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5204;5212;5208;5206;5207;5183;5210;5205;5182;5148 3037.9 1.000 1.000 0.877 1 0.932 1.000 1.000 0
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 5205;5204;36721;5225;5178;5224;37029;5214;5210;5190 5183.0 0.500 1.000 0.613 1 0.917 1.000 1.000 0
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5186;5141;5136;5140;5137;36901;36869;5139;5178 5657.1 1.000 1.000 1.000 1 1.000 1.000 1.000 0
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5143;5178;5180;5212;5182;5133;5207;5210;5148;5204 4902.9 1.000 0.500 0.631 1 0.631 1.000 1.000 0
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 5204;5178;5224;36721;37097;5205;37051;36963;36951;5208 2455.9 0.000 0.000 0.000 0 0.000 0.000 0.000 0
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5208;5186;38747;38791;36935;5180;5182;36874;5187;36889 3808.4 0.500 1.000 0.613 1 0.787 0.500 1.000 0
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3759;3763;3818;3774;3812;3760;3756;3778;3757;3775 1870.9 0.500 0.500 0.387 1 0.579 1.000 1.000 0
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5207;5204;5139;5225;5210;5182;5140;5148;5212;5144 832.1 0.750 1.000 0.737 1 0.789 0.750 1.000 0
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5222;5204;5225;5210;5205;36877;5209;37051;36721 737.6 1.000 1.000 0.807 1 0.894 1.000 1.000 0
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 38941;3762;11644;4026;5186;38791;3895;39084;13299;5185 1267.9 0.000 0.000 0.000 1 0.000 0.000 0.000 0
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13674;13669;13774;13675;11688;13667;13670;11645;13310 558.8 0.500 1.000 0.508 1 0.730 0.667 1.000 0
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 5177;10572;11671;5241;11688;11653;3774;5178;5173;38701 879.9 0.500 0.125 0.193 0 0.248 0.500 1.000 0
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11504;11533;5081;5087;11482;5088;5090;4544;11509;11505 629.4 1.000 1.000 0.790 1 0.885 1.000 1.000 0
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;11501;5085;5090;3788;5086;5071;11515;11503 405.7 0.667 1.000 0.765 1 0.856 0.667 1.000 0
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11479;5083;11514;5084;11476;11527;11516;11515;5124;11472 691.4 1.000 0.333 0.500 1 0.500 1.000 1.000 0
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 11591;13660;11668;5177;13652;13943;13942;13917;11592;13752 400.9 0.333 1.000 0.469 1 0.674 0.333 1.000 0
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11627;11658;11600;11625;11692;13918;13751;5177;11655;13653 435.4 0.667 1.000 0.671 1 0.883 1.000 1.000 0
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11595;11616;11617;11649;13304;11655;11690;11658;13669;13915 417.2 0.333 0.333 0.235 1 0.373 0.500 1.000 0
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11712;11711;11503;11500;11713;13930;11502;11716;11612;11715 436.7 1.000 1.000 1.000 1 1.000 1.000 1.000 0
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11647;11668;5177;11688;11583;11692;13665;13661;11680;13666 447.7 0.000 0.000 0.000 1 0.000 0.000 0.000 0
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 35775;11644;3966;4026;3977;4022;3972;13299;3974;3971 586.4 0.000 0.000 0.000 1 0.000 0.000 0.000 0
@@ -0,0 +1,52 @@
label,id,category,legacy_category,intent,domain_hint,language,ocr_derived,failure_expected,query,relevant_ids,graded_relevance,returned_ids_top10,latency_ms,recall_at_10,mrr_at_10,ndcg_at_10,top3_hit,graded_ndcg_at_10,graded_recall_at_10_t2,graded_recall_at_10_t3,dedup_count,error
single,kw_001,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건법 제6장,3856;3868;3879,3856:3;3868:2;3879:2,30411;3879;3868;3856;3851;10573;5249;3867;3873;3853,697.4,1.000,0.500,0.733,0,0.617,1.000,1.000,0,
single,kw_002,standards,exact_keyword,fact_lookup,document,ko,0,0,중대재해 처벌 등에 관한 법률 제2장 중대산업재해,3917;3921,3917:3;3921:2,3921;3917;10573;3919;3923;30418;5227;10571;3916;3874,513.6,1.000,1.000,1.000,1,0.834,1.000,1.000,0,
single,kw_003,standards,exact_keyword,fact_lookup,document,ko,0,0,화학물질관리법 유해화학물질 영업자,3981,3981:3,3981;3985;3984;3983;3904;3980;3903;3905;3978;3869,319.1,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_004,standards,exact_keyword,fact_lookup,document,ko,0,0,근로기준법 안전과 보건,4041,4041:3,4041;3852;30407;3851;30406;3915;10573;3858;5249;3863,603.5,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_005,standards,exact_keyword,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 보호구,3888,3888:3,10570;3888;3912;3913;3908;3911;3905;5249;3910;3893,642.7,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,nl_001,korean_only,natural_language_ko,semantic_search,document,ko,0,0,기계로 인한 산업재해 관련 법령,3856;3868;3879;3854,3854:1;3856:3;3868:2;3879:2,5244;5227;5249;3867;3878;3863;30411;3868;3856;3879,664.3,0.750,0.125,0.354,0,0.362,1.000,1.000,0,
single,nl_002,korean_only,natural_language_ko,semantic_search,document,ko,0,0,사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일,3855;3867;3878,3855:3;3867:2;3878:2,30410;3855;30409;3917;5244;30406;3851;3867;5249;30421,547.9,0.667,0.500,0.444,1,0.516,0.667,1.000,0,
single,nl_003,korean_only,natural_language_ko,semantic_search,document,ko,0,0,유해화학물질을 다루는 회사가 지켜야 할 안전 의무,3980;3981;3982,3980:2;3981:2;3982:2,3903;3982;3760;3905;3981;3904;3980;5249;3985;10569,466.6,1.000,0.500,0.634,1,0.634,1.000,0.000,0,
single,nl_004,korean_only,natural_language_ko,semantic_search,document,ko,0,0,중대재해가 발생했을 때 경영책임자가 처벌받는 기준,3916;3917;3920;3921,3916:2;3917:3;3920:2;3921:2,10572;10573;3917;3918;3916;5227;3919;10571;30409;3854,392.0,0.500,0.333,0.346,1,0.399,0.500,1.000,0,
single,nl_005,korean_only,natural_language_ko,semantic_search,document,ko,0,0,안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가,3853;3865,3853:3;3865:2,3853;5249;5234;4025;30420;30408;10573;11677;3876;3867,726.2,0.500,1.000,0.613,1,0.787,0.500,1.000,0,
single,news_001,korean_only,news_ko,semantic_search,news,ko,0,0,이란과 미국의 군사 충돌,4303;4304;4307;4316;4322;4323;4327;4335,4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2,37599;27583;35225;27350;34850;30518;15924;32410;16081;28407,593.2,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_002,korean_only,news_ko,semantic_search,news,ko,0,0,호르무즈 해협 봉쇄,4316;4320;4322;4327,4316:3;4320:2;4322:2;4327:2,41652;41650;41634;31894;35899;22055;16392;9022;17123;11945,399.4,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,misc_001,korean_only,other_domain,fact_lookup,document,ko,0,0,강체의 평면 운동학,4063;4065,4063:3;4065:2,4063;4065;4066;4071;4064;4068;4067;4058;5083;4059,592.2,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,misc_002,korean_only,other_domain,semantic_search,document,ko,0,0,질점의 운동역학,4060;4061;4062,4060:2;4061:2;4062:2,4070;4061;4060;4062;4059;4065;4066;4058;4068;4071,512.9,1.000,0.500,0.733,1,0.733,1.000,0.000,0,
single,news_003,english_only,news_en,semantic_search,news,en,0,0,Trump Iran ultimatum,4258;4260;4262,4258:2;4260:2;4262:2,21155;4775;4202;4776;4679;4199;4519;4668;37733;4658,342.4,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_001,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,기계 안전 가드 설계 원리,3770;3856,3770:3;3856:2,35778;5239;5244;3758;3791;3770;3763;3774;4540;5253,683.2,0.500,0.167,0.218,0,0.280,0.500,1.000,0,
single,cl_002,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,산업 안전 입문서,3755;3775;3776;3777,3755:2;3775:2;3776:2;3777:2,5230;5249;3756;3761;3757;3774;3863;3755;10573;3802,565.0,0.250,0.125,0.123,1,0.123,0.250,0.000,0,
single,cl_003,mixed,crosslingual_ko_en,semantic_search,document,mixed,0,0,전기 안전 위험,3772;3790,3772:2;3790:2,35779;3790;5260;3772;3897;37326;3755;37216;37011;3774,1136.0,1.000,0.500,0.651,1,0.651,1.000,0.000,0,
single,news_004,mixed,news_fr,semantic_search,news,mixed,0,0,guerre en Iran,4199;4202;4210;4361;4363;4507;4519;4521,4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2,33101;33368;7740;33374;27078;5925;5840;16010;16457;6945,423.2,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,news_005,mixed,news_crosslingual,semantic_search,news,mixed,0,0,이란 미국 전쟁 글로벌 반응,4202;4258;4262;4536;4303;4304;4316,4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2,23242;4688;15924;4457;36648;36490;19111;35556;41629;26631,823.5,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_001,failure_expected,failure_expected,semantic_search,document,mixed,0,1,Rust async runtime tokio scheduler 내부 구조,,,36544;35645;37582;17810;28766;40739;40432;40679;40700;40734,461.6,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_002,failure_expected,failure_expected,semantic_search,document,ko,0,1,양자컴퓨터 큐비트 디코히어런스,,,32096;31984;33521;16289;16384;32043;32035;22202;28233;33551,253.5,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_003,failure_expected,failure_expected,semantic_search,news,ko,0,1,재즈 보컬리스트 빌리 홀리데이,,,18567;20022;20470;27146;26559;26637;18286;31187;26759;38352,245.5,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,kw_006,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준에 관한 규칙 작업장 통로,3886;3887,3886:3;3887:2,5236;3898;3886;3902;3895;3887;5244;3889;3912;3894,498.7,1.000,0.333,0.525,1,0.514,1.000,1.000,0,
single,kw_007,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 폭발 화재 위험물 누출 방지,3896;3766,3766:1;3896:3,3896;3895;3903;13930;3897;13769;3766;13935;10571;3867,532.8,1.000,1.000,0.818,1,0.961,1.000,1.000,0,
single,kw_008,standards,standards,fact_lookup,document,ko,0,0,고압가스 안전관리법 전문,4025;4026,4025:3;4026:2,11644;11579;4026;4025;11693;35775;11645;4034;13299;13941,505.4,1.000,0.333,0.571,0,0.508,1.000,1.000,0,
single,kw_009,standards,standards,fact_lookup,document,ko,0,0,KGS FP111 가스설비 배관설비 기준,13305,13305:3,13305;13306;13311;13312;13302;13304;11688;13309;13313;13918,439.9,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_010,standards,standards,fact_lookup,document,ko,0,0,KGS FU551 가스설비 압력조정기 가스계량기,13652,13652:3,13652;11689;13657;13655;13649;13656;13651;13648;13752;13659,324.4,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,kw_011,standards,standards,fact_lookup,document,ko,0,0,산업안전보건기준 전기로 인한 위험 방지,3897;3772,3772:1;3897:3,3897;3758;3886;3755;5260;3772;3887;13935;3895;3774,873.6,1.000,1.000,0.832,1,0.964,1.000,1.000,0,
single,en_001,english_only,english_only,semantic_search,document,en,0,0,pressure vessel flange design,5144;5136,5136:2;5144:3,5136;5178;5140;5180;5144;36899;5182;5207;5206;5149,4311.0,1.000,1.000,0.850,0,0.642,1.000,1.000,0,
single,en_002,english_only,english_only,semantic_search,document,en,0,0,ASME Section VIII introduction,5204;5206,5204:3;5206:2,5204;5212;5208;5210;5206;5141;5207;5205;5182;5148,11067.7,1.000,1.000,0.850,1,0.918,1.000,1.000,0,
single,en_003,english_only,english_only,semantic_search,document,en,0,0,impact test requirements ASME,5205;5148,5148:1;5205:3,5205;5204;36721;5225;5178;5224;37029;5214;5210;5190,5144.1,0.500,1.000,0.613,1,0.917,1.000,1.000,0,
single,en_004,english_only,english_only,semantic_search,document,en,0,0,design of vessel supports,5149,5149:3,5149;5186;5141;5140;5136;5137;36901;36869;5139;5210,5560.7,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,en_005,english_only,english_only,semantic_search,document,en,0,0,hydrogen piping ASME code,5178,5178:3,5143;5178;5180;5212;5182;5207;5210;5148;5204;5209,4909.7,1.000,0.500,0.631,1,0.631,1.000,1.000,0,
single,en_006,english_only,english_only,semantic_search,document,en,0,0,ASME welding qualification requirements,5209;3771,3771:1;5209:3,5180;5204;5224;36721;37097;5205;37051;36963;5178;36951,2380.2,0.000,0.000,0.000,0,0.000,0.000,0.000,0,
single,en_007,english_only,english_only,semantic_search,document,en,0,0,pressure vessel fabrication and inspection,5208;5145,5145:2;5208:3,5186;5208;38747;38791;36935;5180;5182;36874;5187;36897,3795.7,0.500,0.500,0.387,1,0.497,0.500,1.000,0,
single,en_008,english_only,english_only,semantic_search,document,en,0,0,Industrial Safety and Health Management ergonomics,3763;3755,3755:1;3763:3,3759;3763;3818;3774;3812;3757;3756;3775;3762;37280,1851.3,0.500,0.500,0.387,1,0.579,1.000,1.000,0,
single,cl_004,mixed,mixed,semantic_search,document,mixed,0,0,ASME 압력용기 설계 실무,5207;5210;5139;5135,5135:2;5139:2;5207:3;5210:3,5207;5204;5139;5225;5210;5182;5140;5212;5144;5224,783.3,0.750,1.000,0.737,1,0.789,0.750,1.000,0,
single,cl_005,mixed,mixed,semantic_search,document,mixed,0,0,ASME 용접 코드 해설,5224;5209,5209:2;5224:3,5224;5222;5204;5225;5210;5182;5195;5205;36877;5209,704.0,1.000,1.000,0.790,1,0.885,1.000,1.000,0,
single,cl_006,mixed,mixed,semantic_search,document,mixed,0,0,pressure vessel Korean industrial safety regulation,4025;3856;5136,3856:2;4025:2;5136:1,38941;3762;11644;4026;5186;38791;3895;39084;5204;13299,1227.3,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,cl_007,mixed,mixed,semantic_search,document,mixed,0,0,KGS 코드 LPG safety standard,11647;11689;11645;4025,4025:1;11645:2;11647:3;11689:2,11647;13674;13669;13774;13675;11688;13769;13667;13670;11645,525.2,0.500,1.000,0.503,1,0.727,0.667,1.000,0,
single,cl_008,mixed,mixed,semantic_search,document,mixed,0,0,수소 가스 안전 기준 hydrogen safety,5178;5169,5169:2;5178:3,10575;5177;10572;5241;11688;11653;5178;11693;5173;11563,838.0,0.500,0.143,0.204,0,0.262,0.500,1.000,0,
single,exam_001,exam,exam,fact_lookup,document,ko,0,0,레이놀즈수 정의 공식,11504;11505,11504:3;11505:2,11504;11533;5081;5087;11482;5088;5090;5082;11509;11505,576.2,1.000,1.000,0.790,1,0.885,1.000,1.000,0,
single,exam_002,exam,exam,fact_lookup,document,ko,0,0,탱크 바닥 구멍 유체 유속 토리첼리,11500;11495;11496,11495:2;11496:2;11500:3,11500;11495;11501;5090;5085;3788;5086;5071;5075;11515,377.2,0.667,1.000,0.765,1,0.856,0.667,1.000,0,
single,exam_003,exam,exam,fact_lookup,document,ko,0,0,이상기체 음속 마하수,11514;11515,11514:3;11515:2,11479;5083;11514;5084;11527;11516;11515;5124;11518;11471,650.0,1.000,0.333,0.511,1,0.506,1.000,1.000,0,
single,exam_004,exam,exam,fact_lookup,document,ko,0,0,고압가스 용기 내압시험 영구증가량,11591;11644;11691,11591:3;11644:2;11691:2,11591;11664;11668;5177;13943;13942;13917;11592;13752;4026,354.7,0.333,1.000,0.469,1,0.674,0.333,1.000,0,
single,exam_005,exam,exam,fact_lookup,document,ko,0,0,도시가스 배관 매설 이격거리,11627;11625;11646,11625:2;11627:3;11646:1,11627;11658;11600;11625;11692;13918;13751;5177;11655;13653,393.6,0.667,1.000,0.671,1,0.883,1.000,1.000,0,
single,exam_006,exam,exam,fact_lookup,document,ko,0,0,LPG 저장탱크 안전거리 분말소화기,11617;11669;11620,11617:3;11620:1;11669:2,11595;11616;11617;11649;13304;11655;11690;11658;13669;13915,393.7,0.333,0.333,0.235,1,0.373,0.500,1.000,0,
single,exam_007,exam,exam,fact_lookup,document,ko,0,0,오리피스 차압식 유량계,11712;11711;11503,11503:2;11711:2;11712:3,11712;11711;11503;11500;11713;11502;11716;11612;11715;11717,397.8,1.000,1.000,1.000,1,1.000,1.000,1.000,0,
single,fail_004,failure_expected,failure_expected,fact_lookup,document,ko,0,1,KGS AC999 임의 가스 코드,,,11647;11668;5177;11688;11583;11692;13665;13661;13664;11680,405.8,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
single,fail_005,failure_expected,failure_expected,fact_lookup,document,ko,0,1,초전도 안전 관리법 시행규칙,,,35775;11644;3966;4026;3977;3972;13299;3974;3971;4025,548.8,0.000,0.000,0.000,1,0.000,0.000,0.000,0,
1 label id category legacy_category intent domain_hint language ocr_derived failure_expected query relevant_ids graded_relevance returned_ids_top10 latency_ms recall_at_10 mrr_at_10 ndcg_at_10 top3_hit graded_ndcg_at_10 graded_recall_at_10_t2 graded_recall_at_10_t3 dedup_count error
2 single kw_001 standards exact_keyword fact_lookup document ko 0 0 산업안전보건법 제6장 3856;3868;3879 3856:3;3868:2;3879:2 30411;3879;3868;3856;3851;10573;5249;3867;3873;3853 697.4 1.000 0.500 0.733 0 0.617 1.000 1.000 0
3 single kw_002 standards exact_keyword fact_lookup document ko 0 0 중대재해 처벌 등에 관한 법률 제2장 중대산업재해 3917;3921 3917:3;3921:2 3921;3917;10573;3919;3923;30418;5227;10571;3916;3874 513.6 1.000 1.000 1.000 1 0.834 1.000 1.000 0
4 single kw_003 standards exact_keyword fact_lookup document ko 0 0 화학물질관리법 유해화학물질 영업자 3981 3981:3 3981;3985;3984;3983;3904;3980;3903;3905;3978;3869 319.1 1.000 1.000 1.000 1 1.000 1.000 1.000 0
5 single kw_004 standards exact_keyword fact_lookup document ko 0 0 근로기준법 안전과 보건 4041 4041:3 4041;3852;30407;3851;30406;3915;10573;3858;5249;3863 603.5 1.000 1.000 1.000 1 1.000 1.000 1.000 0
6 single kw_005 standards exact_keyword fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 보호구 3888 3888:3 10570;3888;3912;3913;3908;3911;3905;5249;3910;3893 642.7 1.000 0.500 0.631 1 0.631 1.000 1.000 0
7 single nl_001 korean_only natural_language_ko semantic_search document ko 0 0 기계로 인한 산업재해 관련 법령 3856;3868;3879;3854 3854:1;3856:3;3868:2;3879:2 5244;5227;5249;3867;3878;3863;30411;3868;3856;3879 664.3 0.750 0.125 0.354 0 0.362 1.000 1.000 0
8 single nl_002 korean_only natural_language_ko semantic_search document ko 0 0 사업주가 도급을 줄 때 산업재해를 예방하기 위해 해야 할 일 3855;3867;3878 3855:3;3867:2;3878:2 30410;3855;30409;3917;5244;30406;3851;3867;5249;30421 547.9 0.667 0.500 0.444 1 0.516 0.667 1.000 0
9 single nl_003 korean_only natural_language_ko semantic_search document ko 0 0 유해화학물질을 다루는 회사가 지켜야 할 안전 의무 3980;3981;3982 3980:2;3981:2;3982:2 3903;3982;3760;3905;3981;3904;3980;5249;3985;10569 466.6 1.000 0.500 0.634 1 0.634 1.000 0.000 0
10 single nl_004 korean_only natural_language_ko semantic_search document ko 0 0 중대재해가 발생했을 때 경영책임자가 처벌받는 기준 3916;3917;3920;3921 3916:2;3917:3;3920:2;3921:2 10572;10573;3917;3918;3916;5227;3919;10571;30409;3854 392.0 0.500 0.333 0.346 1 0.399 0.500 1.000 0
11 single nl_005 korean_only natural_language_ko semantic_search document ko 0 0 안전보건교육은 누가 받아야 하고 어떤 내용을 다루는가 3853;3865 3853:3;3865:2 3853;5249;5234;4025;30420;30408;10573;11677;3876;3867 726.2 0.500 1.000 0.613 1 0.787 0.500 1.000 0
12 single news_001 korean_only news_ko semantic_search news ko 0 0 이란과 미국의 군사 충돌 4303;4304;4307;4316;4322;4323;4327;4335 4303:2;4304:2;4307:2;4316:2;4322:2;4323:2;4327:2;4335:2 37599;27583;35225;27350;34850;30518;15924;32410;16081;28407 593.2 0.000 0.000 0.000 1 0.000 0.000 0.000 0
13 single news_002 korean_only news_ko semantic_search news ko 0 0 호르무즈 해협 봉쇄 4316;4320;4322;4327 4316:3;4320:2;4322:2;4327:2 41652;41650;41634;31894;35899;22055;16392;9022;17123;11945 399.4 0.000 0.000 0.000 0 0.000 0.000 0.000 0
14 single misc_001 korean_only other_domain fact_lookup document ko 0 0 강체의 평면 운동학 4063;4065 4063:3;4065:2 4063;4065;4066;4071;4064;4068;4067;4058;5083;4059 592.2 1.000 1.000 1.000 1 1.000 1.000 1.000 0
15 single misc_002 korean_only other_domain semantic_search document ko 0 0 질점의 운동역학 4060;4061;4062 4060:2;4061:2;4062:2 4070;4061;4060;4062;4059;4065;4066;4058;4068;4071 512.9 1.000 0.500 0.733 1 0.733 1.000 0.000 0
16 single news_003 english_only news_en semantic_search news en 0 0 Trump Iran ultimatum 4258;4260;4262 4258:2;4260:2;4262:2 21155;4775;4202;4776;4679;4199;4519;4668;37733;4658 342.4 0.000 0.000 0.000 1 0.000 0.000 0.000 0
17 single cl_001 mixed crosslingual_ko_en semantic_search document mixed 0 0 기계 안전 가드 설계 원리 3770;3856 3770:3;3856:2 35778;5239;5244;3758;3791;3770;3763;3774;4540;5253 683.2 0.500 0.167 0.218 0 0.280 0.500 1.000 0
18 single cl_002 mixed crosslingual_ko_en semantic_search document mixed 0 0 산업 안전 입문서 3755;3775;3776;3777 3755:2;3775:2;3776:2;3777:2 5230;5249;3756;3761;3757;3774;3863;3755;10573;3802 565.0 0.250 0.125 0.123 1 0.123 0.250 0.000 0
19 single cl_003 mixed crosslingual_ko_en semantic_search document mixed 0 0 전기 안전 위험 3772;3790 3772:2;3790:2 35779;3790;5260;3772;3897;37326;3755;37216;37011;3774 1136.0 1.000 0.500 0.651 1 0.651 1.000 0.000 0
20 single news_004 mixed news_fr semantic_search news mixed 0 0 guerre en Iran 4199;4202;4210;4361;4363;4507;4519;4521 4199:2;4202:2;4210:2;4361:2;4363:2;4507:2;4519:2;4521:2 33101;33368;7740;33374;27078;5925;5840;16010;16457;6945 423.2 0.000 0.000 0.000 1 0.000 0.000 0.000 0
21 single news_005 mixed news_crosslingual semantic_search news mixed 0 0 이란 미국 전쟁 글로벌 반응 4202;4258;4262;4536;4303;4304;4316 4202:2;4258:2;4262:2;4303:2;4304:2;4316:2;4536:2 23242;4688;15924;4457;36648;36490;19111;35556;41629;26631 823.5 0.000 0.000 0.000 1 0.000 0.000 0.000 0
22 single fail_001 failure_expected failure_expected semantic_search document mixed 0 1 Rust async runtime tokio scheduler 내부 구조 36544;35645;37582;17810;28766;40739;40432;40679;40700;40734 461.6 0.000 0.000 0.000 1 0.000 0.000 0.000 0
23 single fail_002 failure_expected failure_expected semantic_search document ko 0 1 양자컴퓨터 큐비트 디코히어런스 32096;31984;33521;16289;16384;32043;32035;22202;28233;33551 253.5 0.000 0.000 0.000 1 0.000 0.000 0.000 0
24 single fail_003 failure_expected failure_expected semantic_search news ko 0 1 재즈 보컬리스트 빌리 홀리데이 18567;20022;20470;27146;26559;26637;18286;31187;26759;38352 245.5 0.000 0.000 0.000 1 0.000 0.000 0.000 0
25 single kw_006 standards standards fact_lookup document ko 0 0 산업안전보건기준에 관한 규칙 작업장 통로 3886;3887 3886:3;3887:2 5236;3898;3886;3902;3895;3887;5244;3889;3912;3894 498.7 1.000 0.333 0.525 1 0.514 1.000 1.000 0
26 single kw_007 standards standards fact_lookup document ko 0 0 산업안전보건기준 폭발 화재 위험물 누출 방지 3896;3766 3766:1;3896:3 3896;3895;3903;13930;3897;13769;3766;13935;10571;3867 532.8 1.000 1.000 0.818 1 0.961 1.000 1.000 0
27 single kw_008 standards standards fact_lookup document ko 0 0 고압가스 안전관리법 전문 4025;4026 4025:3;4026:2 11644;11579;4026;4025;11693;35775;11645;4034;13299;13941 505.4 1.000 0.333 0.571 0 0.508 1.000 1.000 0
28 single kw_009 standards standards fact_lookup document ko 0 0 KGS FP111 가스설비 배관설비 기준 13305 13305:3 13305;13306;13311;13312;13302;13304;11688;13309;13313;13918 439.9 1.000 1.000 1.000 1 1.000 1.000 1.000 0
29 single kw_010 standards standards fact_lookup document ko 0 0 KGS FU551 가스설비 압력조정기 가스계량기 13652 13652:3 13652;11689;13657;13655;13649;13656;13651;13648;13752;13659 324.4 1.000 1.000 1.000 1 1.000 1.000 1.000 0
30 single kw_011 standards standards fact_lookup document ko 0 0 산업안전보건기준 전기로 인한 위험 방지 3897;3772 3772:1;3897:3 3897;3758;3886;3755;5260;3772;3887;13935;3895;3774 873.6 1.000 1.000 0.832 1 0.964 1.000 1.000 0
31 single en_001 english_only english_only semantic_search document en 0 0 pressure vessel flange design 5144;5136 5136:2;5144:3 5136;5178;5140;5180;5144;36899;5182;5207;5206;5149 4311.0 1.000 1.000 0.850 0 0.642 1.000 1.000 0
32 single en_002 english_only english_only semantic_search document en 0 0 ASME Section VIII introduction 5204;5206 5204:3;5206:2 5204;5212;5208;5210;5206;5141;5207;5205;5182;5148 11067.7 1.000 1.000 0.850 1 0.918 1.000 1.000 0
33 single en_003 english_only english_only semantic_search document en 0 0 impact test requirements ASME 5205;5148 5148:1;5205:3 5205;5204;36721;5225;5178;5224;37029;5214;5210;5190 5144.1 0.500 1.000 0.613 1 0.917 1.000 1.000 0
34 single en_004 english_only english_only semantic_search document en 0 0 design of vessel supports 5149 5149:3 5149;5186;5141;5140;5136;5137;36901;36869;5139;5210 5560.7 1.000 1.000 1.000 1 1.000 1.000 1.000 0
35 single en_005 english_only english_only semantic_search document en 0 0 hydrogen piping ASME code 5178 5178:3 5143;5178;5180;5212;5182;5207;5210;5148;5204;5209 4909.7 1.000 0.500 0.631 1 0.631 1.000 1.000 0
36 single en_006 english_only english_only semantic_search document en 0 0 ASME welding qualification requirements 5209;3771 3771:1;5209:3 5180;5204;5224;36721;37097;5205;37051;36963;5178;36951 2380.2 0.000 0.000 0.000 0 0.000 0.000 0.000 0
37 single en_007 english_only english_only semantic_search document en 0 0 pressure vessel fabrication and inspection 5208;5145 5145:2;5208:3 5186;5208;38747;38791;36935;5180;5182;36874;5187;36897 3795.7 0.500 0.500 0.387 1 0.497 0.500 1.000 0
38 single en_008 english_only english_only semantic_search document en 0 0 Industrial Safety and Health Management ergonomics 3763;3755 3755:1;3763:3 3759;3763;3818;3774;3812;3757;3756;3775;3762;37280 1851.3 0.500 0.500 0.387 1 0.579 1.000 1.000 0
39 single cl_004 mixed mixed semantic_search document mixed 0 0 ASME 압력용기 설계 실무 5207;5210;5139;5135 5135:2;5139:2;5207:3;5210:3 5207;5204;5139;5225;5210;5182;5140;5212;5144;5224 783.3 0.750 1.000 0.737 1 0.789 0.750 1.000 0
40 single cl_005 mixed mixed semantic_search document mixed 0 0 ASME 용접 코드 해설 5224;5209 5209:2;5224:3 5224;5222;5204;5225;5210;5182;5195;5205;36877;5209 704.0 1.000 1.000 0.790 1 0.885 1.000 1.000 0
41 single cl_006 mixed mixed semantic_search document mixed 0 0 pressure vessel Korean industrial safety regulation 4025;3856;5136 3856:2;4025:2;5136:1 38941;3762;11644;4026;5186;38791;3895;39084;5204;13299 1227.3 0.000 0.000 0.000 1 0.000 0.000 0.000 0
42 single cl_007 mixed mixed semantic_search document mixed 0 0 KGS 코드 LPG safety standard 11647;11689;11645;4025 4025:1;11645:2;11647:3;11689:2 11647;13674;13669;13774;13675;11688;13769;13667;13670;11645 525.2 0.500 1.000 0.503 1 0.727 0.667 1.000 0
43 single cl_008 mixed mixed semantic_search document mixed 0 0 수소 가스 안전 기준 hydrogen safety 5178;5169 5169:2;5178:3 10575;5177;10572;5241;11688;11653;5178;11693;5173;11563 838.0 0.500 0.143 0.204 0 0.262 0.500 1.000 0
44 single exam_001 exam exam fact_lookup document ko 0 0 레이놀즈수 정의 공식 11504;11505 11504:3;11505:2 11504;11533;5081;5087;11482;5088;5090;5082;11509;11505 576.2 1.000 1.000 0.790 1 0.885 1.000 1.000 0
45 single exam_002 exam exam fact_lookup document ko 0 0 탱크 바닥 구멍 유체 유속 토리첼리 11500;11495;11496 11495:2;11496:2;11500:3 11500;11495;11501;5090;5085;3788;5086;5071;5075;11515 377.2 0.667 1.000 0.765 1 0.856 0.667 1.000 0
46 single exam_003 exam exam fact_lookup document ko 0 0 이상기체 음속 마하수 11514;11515 11514:3;11515:2 11479;5083;11514;5084;11527;11516;11515;5124;11518;11471 650.0 1.000 0.333 0.511 1 0.506 1.000 1.000 0
47 single exam_004 exam exam fact_lookup document ko 0 0 고압가스 용기 내압시험 영구증가량 11591;11644;11691 11591:3;11644:2;11691:2 11591;11664;11668;5177;13943;13942;13917;11592;13752;4026 354.7 0.333 1.000 0.469 1 0.674 0.333 1.000 0
48 single exam_005 exam exam fact_lookup document ko 0 0 도시가스 배관 매설 이격거리 11627;11625;11646 11625:2;11627:3;11646:1 11627;11658;11600;11625;11692;13918;13751;5177;11655;13653 393.6 0.667 1.000 0.671 1 0.883 1.000 1.000 0
49 single exam_006 exam exam fact_lookup document ko 0 0 LPG 저장탱크 안전거리 분말소화기 11617;11669;11620 11617:3;11620:1;11669:2 11595;11616;11617;11649;13304;11655;11690;11658;13669;13915 393.7 0.333 0.333 0.235 1 0.373 0.500 1.000 0
50 single exam_007 exam exam fact_lookup document ko 0 0 오리피스 차압식 유량계 11712;11711;11503 11503:2;11711:2;11712:3 11712;11711;11503;11500;11713;11502;11716;11612;11715;11717 397.8 1.000 1.000 1.000 1 1.000 1.000 1.000 0
51 single fail_004 failure_expected failure_expected fact_lookup document ko 0 1 KGS AC999 임의 가스 코드 11647;11668;5177;11688;11583;11692;13665;13661;13664;11680 405.8 0.000 0.000 0.000 1 0.000 0.000 0.000 0
52 single fail_005 failure_expected failure_expected fact_lookup document ko 0 1 초전도 안전 관리법 시행규칙 35775;11644;3966;4026;3977;3972;13299;3974;3971;4025 548.8 0.000 0.000 0.000 1 0.000 0.000 0.000 0
+80
View File
@@ -0,0 +1,80 @@
"""B-4 — licensed_restricted 차단 술어 + watch 타깃 (material/jurisdiction/license) 매핑 순수 테스트.
차단 술어(_license_sql) retrieval 3-leg + digest 공유하는 단일 술어. 실제 제외 동작은
GPU 라이브(합성 restricted doc 검색 제외) 검증 여기선 술어 형태 + 매핑 계약만.
[[feedback_external_api_fixture_first]] / [[feedback_structural_integrity_over_path_discipline]]
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "app"))
from services.search.license_filter import ( # noqa: E402
restricted_exclude_orm,
restricted_exclude_sql,
)
from services.search.retrieval_service import _license_sql # noqa: E402
from workers.file_watcher import _TARGET_AXIS # noqa: E402
def test_shared_predicate_single_source():
# retrieval/digest/briefing 가 같은 술어 정의를 공유 — drift 방지(단일 source 계약)
assert _license_sql("d") == " AND " + restricted_exclude_sql("d")
assert _license_sql("") == " AND " + restricted_exclude_sql("")
assert restricted_exclude_sql("d").startswith("COALESCE(d.extract_meta")
def test_restricted_exclude_orm_constructs():
# study 풀이(explanation_rag)용 ORM 표현 — 컴파일 SQL 이 raw 술어와 동일 구조인지
from sqlalchemy.dialects import postgresql
clause = restricted_exclude_orm()
sql = str(clause.compile(dialect=postgresql.dialect(),
compile_kwargs={"literal_binds": True}))
assert "extract_meta" in sql
assert "'license'" in sql and "'restricted'" in sql # JSONB 경로 키
assert "'false'" in sql and "'true'" in sql # COALESCE 기본 + 비교값
def test_license_sql_shape_with_alias():
sql = _license_sql("d")
assert sql.startswith(" AND ") # 항상 ' AND ...' (WHERE 합성용)
assert "COALESCE(d.extract_meta -> 'license' ->> 'restricted', 'false')" in sql
assert "<> 'true'" in sql # restricted=true 만 제외
def test_license_sql_shape_no_alias():
# alias='' = 단일 FROM documents (컬럼 직접 참조)
sql = _license_sql("")
assert "COALESCE(extract_meta -> 'license' ->> 'restricted', 'false')" in sql
assert ".extract_meta" not in sql # 점 없는 컬럼 직접
def test_axis_books_papers_are_restricted():
for folder, mt in (("Books", "book"), ("Papers_Purchased", "paper")):
material, jur, lic = _TARGET_AXIS[folder]
assert material == mt
assert jur is None # 책/논문 = 관할 없음(A-2 paper NULL 강제와 정합)
assert lic["scheme"] == "proprietary"
assert lic["restricted"] is True # RAG/digest 차단 대상
assert lic["redistribute"] is False
def test_axis_manuals_proprietary_but_not_restricted():
material, jur, lic = _TARGET_AXIS["Manuals"]
assert material == "manual"
assert lic["scheme"] == "proprietary"
assert lic["restricted"] is False # 사용자 결정 2026-06-13 (검색·RAG 활용)
def test_axis_kgs_law_kr_public_not_restricted():
material, jur, lic = _TARGET_AXIS["KGS_Code"]
assert (material, jur) == ("law", "KR")
assert lic["scheme"] == "kogl"
assert lic["restricted"] is False # 법정 위임 공공 → 차단 아님
def test_axis_non_target_folder_yields_none():
# Inbox/Recordings 등 비대상 = (None, None, None) → material/license 미주입
assert _TARGET_AXIS.get("Inbox", (None, None, None)) == (None, None, None)
+57
View File
@@ -0,0 +1,57 @@
"""C-1 후속 — facets 집계 + version_status decorate 순수 로직 테스트.
version_status 실제 legal_meta 조회는 GPU 라이브(법령 검색) 검증 여기선 facets 분포
계약 + decorate law 무결과 skip 경로(DB 미접촉).
"""
import asyncio
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "app"))
from services.search.result_decorate import ( # noqa: E402
compute_facets,
decorate_version_status,
)
class _R:
"""SearchResult 흉내 — 분류 축 속성만."""
def __init__(self, material_type=None, jurisdiction=None, version_status=None, id=1):
self.material_type = material_type
self.jurisdiction = jurisdiction
self.version_status = version_status
self.id = id
def test_compute_facets_distribution():
results = [
_R("law", "KR", "current"),
_R("law", "KR", "superseded"),
_R("incident", "KR", None),
_R("paper", None, None),
]
f = compute_facets(results)
assert f["material_type"] == {"law": 2, "incident": 1, "paper": 1}
assert f["jurisdiction"] == {"KR": 3} # paper jurisdiction None 제외
assert f["version_status"] == {"current": 1, "superseded": 1} # None 제외
def test_compute_facets_empty_and_all_none():
assert compute_facets([]) == {}
assert compute_facets([_R(), _R()]) == {} # 모든 축 None → 빈 축 미포함
def test_compute_facets_excludes_empty_axes():
f = compute_facets([_R(jurisdiction="US"), _R(jurisdiction="EU")])
assert f == {"jurisdiction": {"US": 1, "EU": 1}}
assert "material_type" not in f
def test_decorate_version_status_skips_without_law():
# law 결과 없으면 legal_meta 조회 skip → session 미사용(None 으로도 무오류)
results = [_R("incident", "KR"), _R("paper")]
asyncio.run(decorate_version_status(None, results))
assert all(r.version_status is None for r in results)
+67
View File
@@ -0,0 +1,67 @@
"""B-2 KOSHA 사망사고 속보(callApiId=1040) — 순수 파서 fixture 테스트 (plan safety-library-1).
fixture = 2026-06-13 data.go.kr 라이브 박제 (serviceKey 응답 본문 미포함 확인,
tests/fixtures/kosha_fatal_response.json). _fatal_fields/_items 순수 함수라 DB/httpx
호출 없이 검증 [[feedback_external_api_fixture_first]].
"""
import json
from datetime import date
from pathlib import Path
from workers.kosha_collector import _fatal_fields, _items
FIXTURE = Path(__file__).parent / "fixtures" / "kosha_fatal_response.json"
def _payload() -> dict:
return json.loads(FIXTURE.read_text(encoding="utf-8"))
def test_items_envelope_parse():
"""body.items.item 봉투 파싱 — 재해사례와 동일 envelope."""
items = _items(_payload())
assert len(items) == 3
assert all({"arno", "keyword", "contents"} <= set(it) for it in items)
def test_fatal_fields_basic_mapping():
item = _items(_payload())[0]
f = _fatal_fields(item)
assert f is not None
assert f["arno"] == "20260611111536KIZXJ8"
assert f["title"].startswith("[6/9, 부산 사상구]")
# HTML 태그 + 이미지 서버 URL 노이즈 완전 제거 (검색/임베딩 본문 정화)
assert "<" not in f["text"]
assert "portal.kosha.or.kr" not in f["text"]
assert "data-filename" not in f["text"]
# 본문 텍스트는 보존
assert "(사망 2명)" in f["text"]
assert "베란다 난간" in f["text"]
# published_date = arno 접두 8자리(KST 등록일), reg_dt = 14자리 등록시각 원문
assert f["published_date"] == date(2026, 6, 11)
assert f["reg_dt"] == "20260611111536"
def test_fatal_fields_all_three_items_well_formed():
for item in _items(_payload()):
f = _fatal_fields(item)
assert f is not None
assert f["published_date"] == date(2026, 6, 11) # 3건 모두 06-11 등록
assert f["reg_dt"] is not None
assert f["text"] and "<" not in f["text"]
def test_fatal_fields_skips_missing_required():
assert _fatal_fields({"arno": "20260611111536XX", "contents": "x"}) is None # keyword 부재
assert _fatal_fields({"keyword": "제목만", "contents": "x"}) is None # arno 부재
assert _fatal_fields({"arno": " ", "keyword": " ", "contents": "x"}) is None # 공백뿐
def test_fatal_fields_malformed_arno_date_is_fail_quiet():
# arno 접두가 8자리 날짜로 안 풀리면 published_date/reg_dt = None (보조 축이라 fail-quiet)
f = _fatal_fields({"arno": "ABC123", "keyword": "제목", "contents": "<p>본문</p>"})
assert f is not None
assert f["published_date"] is None
assert f["reg_dt"] is None
assert f["text"] == "본문"
+26
View File
@@ -103,6 +103,32 @@ def test_summarize_pool_split_attribution():
assert macbook["pending"] == 0 # 풀 pending 은 macmini 만
def test_summarize_by_machine_projection():
"""build_summarize_by_machine = split 의 done_1h/done_today 를 머신별로 투영
(done_15m 제외 내부 state 판정 전용)."""
from services.queue_overview import build_summarize_by_machine
split = _split(
macbook={"done_1h": 226, "done_today": 312, "done_15m": 60},
macmini={"done_1h": 37, "done_today": 94, "done_15m": 9},
)
sbm = build_summarize_by_machine(split)
assert sbm == {
"macmini": {"done_1h": 37, "done_today": 94},
"macbook": {"done_1h": 226, "done_today": 312},
}
assert "done_15m" not in sbm["macbook"]
def test_compose_overview_includes_summarize_by_machine():
"""compose_overview 응답 계약에 summarize_by_machine 포함 (FE 레인 분담 재료)."""
now_kst = datetime(2026, 6, 13, 13, 0, tzinfo=KST)
stats = {"summarize": _stage(pending=1317, done_1h=264)}
split = _split(macbook={"done_1h": 226, "done_today": 312}, macmini={"done_1h": 37, "done_today": 94})
ov = compose_overview(stats, split, {}, {}, [], deep_enabled=True, now_kst=now_kst)
assert ov["summarize_by_machine"]["macbook"]["done_1h"] == 226
assert ov["summarize_by_machine"]["macmini"]["done_today"] == 94
def test_deep_disabled_deep_summary_counts_to_macmini():
stats = {"deep_summary": _stage(pending=2, processing=1, done_1h=3, done_today=4)}
machines = build_machines(stats, _split(), [], deep_enabled=False)
+89
View File
@@ -0,0 +1,89 @@
"""B-1 PR① — KR 어댑터 순수 파서 fixture 테스트 (plan safety-library-1).
fixture = 2026-06-13 law.go.kr 라이브 박제 (OC 새니타이즈, tests/fixtures/statute_kr/).
파서는 순수 함수라 httpx/DB 불요 컨테이너 로컬 실행.
"""
import gzip
import sys
import xml.etree.ElementTree as ET
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "app"))
from workers.statute_adapters import ChangeEvent # noqa: E402
from workers.statute_adapters.kr import detect_change, parse_search_hit # noqa: E402
FIX = Path(__file__).parent / "fixtures" / "statute_kr"
def _read(name: str) -> str:
p = FIX / name
if name.endswith(".gz"):
return gzip.decompress(p.read_bytes()).decode("utf-8")
return p.read_text(encoding="utf-8")
def test_parse_search_hit_exact_match():
hit = parse_search_hit(_read("lawsearch_sanab.xml"), "산업안전보건법")
assert hit is not None
assert hit["law_id"] == "001766"
assert hit["mst"] == "283449"
assert hit["promulgation_date"] == "20260219"
assert hit["effective_date"] == "20260601"
assert hit["status_code"] == "현행"
def test_parse_search_hit_rejects_partial_name():
# totalCnt 3 인 응답에서 '산업안전보건법 시행령' 등 부분 일치는 비매칭이어야 함
hit = parse_search_hit(_read("lawsearch_sanab.xml"), "산업안전보건")
assert hit is None
def test_detect_change_same_watermark_is_silent():
hit = parse_search_hit(_read("lawsearch_sanab.xml"), "산업안전보건법")
assert detect_change(hit, "kr-law:001766", "산업안전보건법", watermark="283449") is None
def test_detect_change_new_mst_is_amend():
hit = parse_search_hit(_read("lawsearch_sanab.xml"), "산업안전보건법")
ev = detect_change(hit, "kr-law:001766", "산업안전보건법", watermark="283448")
assert isinstance(ev, ChangeEvent)
assert ev.kind == "amend"
assert ev.new_version_key == "283449"
assert ev.effective_date == "20260601"
def test_detect_change_empty_watermark_is_amend():
# 첫 폴링(워터마크 부재) = 변경으로 감지 — PR② 부트스트랩 전 관찰 모드의 기대 동작
hit = parse_search_hit(_read("lawsearch_sanab.xml"), "산업안전보건법")
ev = detect_change(hit, "kr-law:001766", "산업안전보건법", watermark=None)
assert ev is not None and ev.kind == "amend"
def test_detect_change_repeal_keyword():
hit = {"mst": "9", "revision_type": "폐지", "promulgation_date": None,
"effective_date": None, "law_id": "x", "status_code": None}
ev = detect_change(hit, "kr-law:x", "x", watermark="1")
assert ev is not None and ev.kind == "repeal"
def test_lawservice_snapshot_semantics_rule():
"""R7-M3 판정 박제: 전문 1콜 XML = 조문+별표 전체 스냅샷 (PR② payload 계약의 전제)."""
root = ET.fromstring(_read("lawservice_rule.xml.gz"))
articles = root.findall(".//조문단위")
annexes = root.findall(".//별표단위")
assert len(articles) >= 800, "산안기준규칙 조문 853 기대 — 전문 1콜 판정 근거"
assert len(annexes) == 23, "별표 23 전부 본문 XML 포함 = 스냅샷 의미론"
# R7-M3 ②: 별표 식별 = 구조화 필드 (suffix 문자열 파싱 불요)
first = annexes[0]
assert first.findtext("별표번호") is not None
assert first.findtext("별표가지번호") is not None
def test_lawservice_sanab_basic_info():
root = ET.fromstring(_read("lawservice_sanab.xml.gz"))
assert root.findtext(".//법령ID") == "001766"
assert len(root.findall(".//조문단위")) >= 200
# 별표 없는 법령 = 별표단위 0 (스냅샷 의미론의 반대쪽 케이스)
assert len(root.findall(".//별표단위")) == 0
+87
View File
@@ -0,0 +1,87 @@
"""B-1 PR② — 매핑·시리즈·payload 순수 단위 테스트 (plan safety-library-1).
법령명 매핑 단위 테스트 = R8-B1 동반 계약 (검증(PR③) 전에 스윕이 도는 만큼
매핑은 코드 레벨 선고정). title 표본 = 2026-06-13 prod documents 실측 형태.
"""
import gzip
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / "app"))
from workers.statute_adapters.kr import parse_service_payloads # noqa: E402
from workers.statute_collector import ( # noqa: E402
legacy_law_name,
normalize_law_name,
series_suffix,
)
FIX = Path(__file__).parent / "fixtures" / "statute_kr"
# ─── 법령명 매핑 (실 title 표본) ───
def test_legacy_law_name_extraction():
# prod 실측 형태: '법령명 (YYYYMMDD) 섹션'
assert legacy_law_name("건설기술 진흥법 시행규칙 (20260611) 제6장_보칙") == "건설기술 진흥법 시행규칙"
assert legacy_law_name("산업안전보건법 (20260219) 전문") == "산업안전보건법"
assert legacy_law_name("패턴 불일치 제목") is None
def test_mapping_equality_not_prefix():
"""prefix 비교 금지 — '산업안전보건법' family 가 시행령 레거시를 오폭하면 안 됨."""
name = legacy_law_name("산업안전보건법 시행령 (20260324) 제1장")
assert name == "산업안전보건법 시행령"
assert normalize_law_name(name) != normalize_law_name("산업안전보건법")
assert normalize_law_name(name) == normalize_law_name("산업안전보건법 시행령")
def test_mapping_absorbs_middle_dot_and_space():
"""가운뎃점·공백 변형 흡수 — 유해ㆍ위험작업(정식) vs 유해위험작업(law_monitor 표기)."""
assert (normalize_law_name("유해ㆍ위험작업의 취업 제한에 관한 규칙")
== normalize_law_name("유해위험작업의 취업 제한에 관한 규칙"))
assert (normalize_law_name("산업안전보건기준에 관한 규칙")
== normalize_law_name("산업안전보건기준에관한규칙"))
# ─── 버전 시리즈 식별자 (R7-B1 a) ───
def test_series_suffix():
assert series_suffix("283449") is None # primary
assert series_suffix("273603|별표0001-00") == "별표0001-00" # annex (구분 차원 포함)
assert series_suffix("273603|서식0003-00") == "서식0003-00"
# ─── fetch_version payload (fixture — R4-M4 리스트 계약) ───
def _read_gz(name: str) -> str:
return gzip.decompress((FIX / name).read_bytes()).decode("utf-8")
def test_parse_service_payloads_rule_with_annexes():
payloads = parse_service_payloads(
_read_gz("lawservice_rule.xml.gz"), "산업안전보건기준에 관한 규칙", "273603")
assert payloads[0].law_doc_kind == "primary"
assert payloads[0].version_key == "273603"
assert len(payloads[0].content) > 100_000 # 853조 본문
annexes = [p for p in payloads if p.law_doc_kind == "annex"]
# 별표단위 23 중 삭제 tombstone 3 skip(별표10 '삭제 <2023.11.14>'·서식1·2 '삭제 <2012.3.5>')
# — KR 별표/서식 삭제 = absence 아닌 명시 tombstone (R7-M3 absence 추론 불요의 fixture 증거)
assert len(annexes) == 20
keys = [p.version_key for p in annexes]
assert len(keys) == len(set(keys)), "annex version_key 유일성 (uq_legal_meta_version 전제)"
assert all(k.startswith("273603|") for k in keys)
# 구분 차원 — 별표1 vs 서식N 공존 (fixture 실측: (번호,가지)만으로는 4건 충돌)
assert any("별표" in k for k in keys) and any("서식" in k for k in keys)
def test_parse_service_payloads_sanab_no_annex():
payloads = parse_service_payloads(
_read_gz("lawservice_sanab.xml.gz"), "산업안전보건법", "283449")
assert len(payloads) == 1 # 별표 없는 법령 = primary 단독
p = payloads[0]
assert p.promulgation_date == "20260219"
assert p.effective_date == "20260601"
assert "제2조(정의)" in p.content # 조문내용 보존
assert p.content.startswith("# 산업안전보건법")