hyungi_document_server

Author	SHA1	Message	Date
Hyungi Ahn	dfc5913c5e	fix(tests): explanation cap test setup — 한글 chunk 길이 부족 보정 case 3/4 의 setup 이 EXPLANATION_MAX_CHARS (1200) 보다 작은 text 를 만들어 assert 실패. 한글 chunk 반복 횟수 늘려 1200 자 이상 보장. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:35:34 +09:00
Hyungi Ahn	6b52d57bac	feat(study): Phase 4-A explanation_md 길이 cap + prompt 강화 운영 데이터에서 ready 박힌 풀이가 793/838/866자 — 권장 200~400 대비 큰 편. 1차 운영 후 결과 화면 가독성 + 토큰 사용량 통제 위해 prompt 강화 + 저장 전 cap. Prompt (study_explanation_envelope.txt): - explanation_md 권장 300~600자, 최대 900자 명시 - 핵심 개념 + 정답 근거 + 헷갈리는 1~2개 오답만 — 모든 오답 풀이 X - explanation_md 안 줄바꿈 최소화 (parse_json fix 와 결합 — invalid escape 줄임) - LaTeX 수식 자제 — \\circ/\\text/\\, 매크로 가능하면 평문 ('0°C', 'C') - 출력은 raw JSON 한 객체만 — 코드 펜스/thinking/메타 X 강조 Worker (study_explanation_worker.py): - _cap_explanation_md(text, max_chars=1200) 헬퍼 신규 · 1200자 이하 passthrough · 초과 시 마지막 200자 안에서 \\n\\n / \\n / '. ' / '다.' / '요.' 경계 탐색 · 경계에서 자르기 + '…' (단어 중간 자르기 회피) · 경계 못 찾으면 단순 자르기 + '…' - save 전 cap 적용. ai_explanation_status='ready' 유지 (cap 됐다고 failed X) - payload 에 운영 분석 metadata: explanation_len_original / _saved / capped 플래그 검증: - tests/test_explanation_cap.py (6 케이스) · short passthrough / exact at limit / paragraph boundary / sentence boundary · no boundary fallback / empty input - scripts/phase4_health.sql 섹션 8/9 추가 · ai_explanation 길이 p50/p95/max (study_questions.ready) · cap 작동 빈도 (job.payload 의 explanation_capped/_original/_saved) cap 1200 = 800 (4-B summary_md) 보다 여유 — 기사시험 풀이는 공식+오답+개념 묶이면 800 빡빡함. 운영 후 800~1000 으로 조정 검토. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 08:33:18 +09:00
Hyungi Ahn	8074be6b6d	feat(study): Phase 4-D 운영 관찰 + confidence calibration Phase 4-B v1 첫 검증 결과 자료 부족 토픽인데도 모델이 confidence='high' 박는 케이스 발견. 정의 (high = 자료 + 다른 ai_explanation 으로 패턴 명확) 보다 과신 — UX 신뢰도 위험. 자동 cap 보정 + 운영 관찰 SQL 추가. confidence calibration (services/study/session_summary_guard): - calibrate_confidence(c, ctx_docs_count, ready_explanation_count) 신규 · ctx_docs_count == 0 AND ready_explanation_count == 0 → 'low' cap · ctx_docs_count == 0 (ready 만 있음) → 'medium' cap · ctx_docs_count >= 1 → 모델 값 그대로 - 모델이 정의보다 더 보수적인 값 박은 경우 (모델 'low' + cap 'medium') 는 보존 — 더 보수적인 값을 절대 올리지 않음 worker 적용 (study_session_analysis_worker): - ctx_docs_count = len(ctx_docs) - ready_explanation_count = sum(1 for a in prompt_attempts if a.get('ai_explanation')) - calibrate_confidence 호출 → study_quiz_session_analysis.confidence 박힘 - job.payload 에 운영 분석 metadata 보존: · ctx_docs_count / ready_explanation_count · model_confidence_raw (모델 응답) vs calibrated_confidence (cap 후) · prompt_attempts / valid_attempts_total / summary_len → SQL 4 번 쿼리가 cap 작동 빈도 측정 scripts/phase4_health.sql (신규 운영 점검 SQL 7 섹션): 1. 4-A study_question_jobs status × error_code 분포 2. 4-B study_quiz_session_jobs status × error_code 분포 3. 4-B confidence 분포 (calibrated) 4. 4-B model_confidence_raw vs calibrated 차이 (cap 작동 빈도) 5. 4-A/4-B 최근 7일 처리 지연 p50/p95/max/avg 6. 4-A/4-B skipped 사유 분포 7. 4-B guard_fail / parse_fail / llm_timeout 비율 ship gate (단위 테스트): - test_calibrate_confidence_no_evidence_caps_to_low (3 케이스) - test_calibrate_confidence_only_explanations_caps_to_medium (3 케이스) - test_calibrate_confidence_with_documents_passthrough (3 케이스) - test_calibrate_confidence_normalizes_invalid_first (2 케이스) Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1 후속) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:33:57 +09:00
Hyungi Ahn	6785d53d3d	feat(study): Phase 4-B v1 세션 단위 종합 분석 (자유 마크다운) Phase 4-A 가 wrong/unsure 한 문제씩 풀이 캐시. 4-B 는 세션 전체 wrong/unsure 5~30건을 묶어 200~400자 자연어 요약 1건 생성. 결과 화면 헤더 카드. 큐 인프라는 4-A study_question_jobs 와 분리 — FK 단일 의미 + 운영 SQL 명확성 + 4-A/4-B 가드/payload/재시도 정책 차이. 신규 study_quiz_session_jobs (큐) + study_quiz_session_analysis (결과 캐시 PK=session_id, UPSERT) + 전용 consumer. Backend: - migrations/233 — study_quiz_session_jobs (FK study_quiz_sessions NOT NULL, status pending/processing/completed/failed/skipped, max_attempts=2) - migrations/234 — partial unique idx (session_id) WHERE pending/processing - migrations/235 — study_quiz_session_analysis (session_id PK, summary_md, confidence, model_name, generated_at, is_stale) - models/study_quiz_session_job — ORM + enqueue_session_analysis_job() (멱등) - models/study_quiz_session_analysis — ORM (PK = session_id) - services/study/session_summary_guard — GUARD_PATTERN (정규식) + normalize_confidence() 단일 source, worker + tests 가 import 공유 - services/study/session_summary_rag — gather_session_summary_context() documents 만 (PR-3 _gather_document_evidence 재사용). evidence 없어도 호출 허용 (4-A 와 다른 정책 — 세션 기록 자체가 evidence) - services/study/session_analysis_enqueue — auto (finalize/fallback) + request_session_analysis_regenerate (manual). manual 은 wrong/unsure < 5 즉시 차단, active job 차단, 기존 analysis 있으면 is_stale=true 박기 - prompts/study_session_summary_envelope.txt — envelope JSON {summary_md, confidence}. 정량 정수만 인용 가능, 비율/추세/범위/날짜 금지 - workers/study_session_analysis_worker — terminal status 분기: · wrong/unsure < 5 → status=skipped, error_code=insufficient_attempts · question_text/outcome 부족 → skipped, evidence_missing · GUARD_PATTERN match → failed, guard_fail · 800자 hard cap + confidence normalize · timeout/parse/unknown → 재시도 후보 · UPSERT study_quiz_session_analysis ON CONFLICT DO UPDATE (PK session_id) - workers/study_session_queue_consumer — 4-A consumer 패턴 복제. BATCH_SIZE=1 + STALE_MINUTES=10. MLX gate 4-A 와 공유 (Semaphore(1)) - main.py — APScheduler add_job(consume_study_session_queue, ..., 1분 주기) - session_finalize — 끝에서 enqueue_session_analysis_auto (best-effort) - api/study_topics: · QuizSessionAnalysisOut + ai_session_analysis 응답 필드 (analysis row + 최신 job status/error_code) · GET fallback enqueue (기존 analysis 또는 active job 없으면만, non-blocking) · POST /quiz-sessions/{sid}/regenerate-summary — manual 트리거 Frontend (quiz-sessions/[sid]/+page.svelte): - 결과 헤더에 세션 요약 카드 (AI 풀이 indicator 직후, 바로 할 일 직전) - summary_md 박혔으면 markdown 렌더, 없으면 job_status / error_code 분기: · pending/processing → "AI 가 세션 분석 중" · insufficient_attempts → "오답·모르겠음 5건 미만" · evidence_missing → "자료 부족" · guard_fail → "환각 검증 차단" + 재생성 링크 - confidence='low' 배지 + is_stale "재생성 중" 배지 - 재생성 버튼 + regenerateSummary() — reason 별 toast 분기 ship gate: - tests/test_session_summary_guard_pattern.py — 허용 5 + 차단 7 케이스 + normalize_confidence 표준/비표준 검증. python3 직접 실행 패스. Plan: ~/.claude/plans/nifty-sparking-spindle.md (Phase 4-B v1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:20:29 +09:00
Hyungi Ahn	99672292d3	fix(policy): use container-compatible imports (drop app. prefix) 프로덕션 컨테이너는 /app 을 cwd 로 실행하고 import 는 `from api...`, `from core...`, `from workers...` 처럼 무접두 스타일을 사용한다. PR-A 내부 import 가 `from app.policy...`, `from app.ai.envelope` 로 되어 있어서 컨테이너에서 ModuleNotFoundError 발생. 변경: - app/policy/.py: `from app.policy.X` → `from policy.X` - app/services/prompt_versions.py: lazy import 도 `from policy.prompt_render` - app/ai/envelope.py: 영향 없음 (내부 import 없음) - tests/policy/.py: 모두 `from policy.X` / `from ai.envelope` 로 통일 - tests/policy/conftest.py: 로컬 pytest 용 sys.path.insert(app/) 추가 (MacBook 에서 repo-root 기준 실행 시 app/ 를 package root 로 취급) CI: pytest tests/policy/ -q → 98 passed (로컬, 동일 결과) 프로덕션: docker exec fastapi python -c "from policy.loader import load_policy" → OK Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:42:24 +09:00
Hyungi Ahn	ba97766d45	feat(policy): INV-1~6 테스트 + loader/audit/envelope/shadow 검증 tests/policy/ 7개 테스트 파일 + conftest + __init__. 98 tests passed. 커버: - test_policy_loader_schema.py (9) — yaml 로드, cross-reference, unknown flag reject, invalid UI category reject, synthesis_directive 500 chars 초과 reject - test_self_declare_add_only.py (4) — INV-1 invariant 엄격 검증 - test_routing_decisions.py (27) — INV-2~6 + low_confidence + 도메인 × 시나리오 parametrize (9 도메인 x 기본 시나리오) - test_audit_patterns.py (11) — detection_patterns 양성/음성, 도메인 미스매치, 빈 텍스트 엣지 - test_envelope_contract.py (6) — JSON round-trip, invalid from_stage reject, tuple 강제 - test_prompt_render.py (16) — 모든 템플릿 렌더, placeholder 치환, policy_version deterministic/yaml-sensitive hash - test_shadow_logger_inmem.py (5) — record/clear/multiple/extra/ Protocol 호환 conftest.py: autouse _clear_policy_cache fixture — lru_cache 로 인한 테스트 간 오염 방지. policy fixture 는 repo root domain_policy.yaml 로드. plan: ~/.claude/plans/wise-gliding-hippo.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:34:49 +09:00
Hyungi Ahn	51a6f7c9af	feat(eval): 발주건 단위 baseline 평가 경로 추가 - run_eval.py: --queries-order / --order-groups / --output-order / --debug 옵션 추가. 기존 legacy CSV 스키마/값 불변 (출력 소비자 보호). - Tier 1A/1B/2 지표 구현: cross_format_link_success (top-10 공식 + top-5 보조, eligible/success 분수), top_5_document_match (guardrail + 절대 건수), manual_refind_flag (v0 heuristic), chunk_idx_stddev, range/page_citation_available capability flags. - order_groups.yaml: 발주건 3건 매핑 (TKP-26-0114/0132/0112, 10 docs). - queries_order_baseline.yaml: 12개 질문 (A:4 B:4 C:3 D:1). plan: ~/.claude/plans/merry-yawning-owl.md	2026-04-20 15:04:39 +09:00
Hyungi Ahn	3971cf08d2	fix(search): re-gate Tier 0 — synthesis self-refuse / timeout / empty answer 일관 처리 이전 버그: synthesis LLM self-refuse(status=completed + refused=True) 또는 timeout/parse_failed/llm_error/empty answer 시 grounding/verifier flag 가 0건이라 re-gate 체인이 `else clean` 분기로 빠지며 `completeness="full"` 초기값이 보존됨. 결과: `completeness=full + refused=True + re_gate=clean` 모순 row 생성. 실측: baseline v1-400char (2026-04-17) 223 row 중 24 (10.8%) 해당. - LLM self-refuse: 20 (completed + refused=True) - synthesis timeout: 4 (timeout + refused=False + empty answer) 수정: re-gate 최상위에 Tier 0 삽입 + 판정 로직을 `_detect_synthesis_failure()` helper 로 분리. self-refuse 는 `synthesis_self_refuse`, 메커니즘 실패는 `synthesis_failed({status})` 라벨로 구분. no_reason fallback 도 refuse_reason 우선 활용하도록 보강. 테스트: tests/test_synthesis_failure_regate.py — self-refuse / timeout / parse_failed / llm_error / empty answer / whitespace / valid answer 총 10 case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:29:49 +09:00
Hyungi Ahn	5bfbb79641	feat(verifier): Phase 3.5 B2 — numeric_conflict promote (env flag) + Tier 4 VERIFIER_NUMERIC_PROMOTE 환경변수로 numeric_conflict severity 승격 실험. verifier_service.py: - _NUMERIC_PROMOTE = os.getenv('VERIFIER_NUMERIC_PROMOTE', '0') == '1' (import time 평가 — env 변경 시 process restart 필수) - _SEVERITY_MAP['numeric_conflict']: env=1 → critical=strong / minor=medium, env=0 (기본) → 둘 다 medium (기존 동작 유지) - direct_negation 은 env 무관 항상 strong (안전장치) verifier.txt: - numeric_conflict 정의에 critical/minor 분리 명시 (core quantity vs peripheral) - "Range values satisfy any answer within range" rule 추가 - severity mapping 갱신: numeric_conflict 분기 명시 search.py re-gate (Tier 1~7 재번호, B2 신규 Tier 4): - v_strong_numeric = sum(1 for f in v_strong if f.startswith('verifier_numeric_conflict')) - Tier 4 (신규): g_strong + v_strong_numeric >= 1 + low_conf → refuse re_gate value: 'refuse(grounding+verifier_numeric)' - 원칙 유지: verifier strong 단독 refuse 금지 — g_strong 교차 필수 - 호환성: 기존 re_gate string literals 그대로 유지, 신규 1개만 추가 credentials.env.example: VERIFIER_NUMERIC_PROMOTE=0 (off, B3 통과 후 production 전환) tests/test_verifier_numeric_promote.py: 4 케이스 (env off / on / explicit 0 / direct_negation invariant). monkeypatch.setenv + importlib.reload 패턴. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	2665d4eb60	feat(grounding): Phase 3.5 B1 — unit-aware fabricated_number + bound semantics Codex adversarial review (no-ship) 반영: fix1: unit-aware numeric clearing - _extract_numeric_corpus(): 단위별 bucket dict (exact_by_unit) + ranges_by_unit (양방향 + 단방향 bound 통합) - _within_unit_range / _close_to_unit_pool: 같은 unit 안에서만 매칭 bare answer 는 보수적으로 range/tolerance 패스 X - 2-pass cleared_pairs (unit, digits): cross-unit cleared 절대 skip 안 함. bare(None) 답변은 unit-anchored cleared 시 duplicate 로 skip (콤마 normalize 부산물 보호 — Codex 케이스는 그대로 flag) fix3: 최대/최소 bound semantics - _APPROX_PREFIX_RE 에서 최대/최소 제거 (약/대략/거의/얼추 만 strip) - _BOUND_PATTERN_RE: 최대 N → range (0, N-1), 최소 N → range (N+1, 1e18) - 경계값 자체는 cleared 대상 아님 ("최대 100명" + answer "100명" → flag) - bound span 내 숫자는 exact pool 에서 제외 기존 prefix strip / 콤마 / 부터 separator / 단위 동의어 / tolerance 4자리+ / 식별자성 단위 1자리 flag 동작 모두 유지. tests/test_grounding_fabricated_number.py: 25 케이스 — 기존 17 + Codex unit-mismatch 3 (won_vs_myeong_range/tol, pct_vs_myeong_range) + bound 5 (최대/최소 boundary/inner/outer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	99abd287dc	feat(scripts): Phase 3.5 — calibrate_ask.py CLI (Q0~Q8 + render + FP CSV) scripts/calibrate_ask.py — ask_events 집계 + markdown report 영구 도구. 기능: - argparse: --source / --prompt-version / --since / --until / --eval-split (tuning\|confirm\|all, id 해시 기반 deterministic split) / --run-label / --output / --format md\|json / --compare-against / --sample-limit / --fp-artifacts / --inspect-shape / --dry-run - 9개 fetcher (모두 read-only SELECT): - Q0 defense_layers shape inspect - Q1 re-gate tier 분포 - Q2 max_rerank_score 히스토그램 (bucket × bin) - Q3 classifier 혼동행렬 - Q4 verifier severity 분포 (cast + COALESCE NULL safe) - Q5 hallucination_flags top-K (UNION ALL outer wrap, strong/weak 컬럼 유지) - Q6 eval golden mismatch (eval_case_id 기반 join + query string fallback) - Q7 FP candidate (case A/B/C 분리 + candidate_reason 컬럼 + LIMIT/3 분배) - Q8 answer_length p25/p50/p75 분포 (E.3 v1↔v2 비교 축) - markdown render + json baseline + delta compare (compare-against) - FP CSV dump (artifacts/fp_candidates_{run_label}.csv) + is_true_fp 공란 - dry-run: tests/calibrate_fixtures/sample_ask_events.json 로 출력 검증 - --threshold-overrides: Step 0 feasibility 통과 후 v2 (현재 stub raise) read-only 강제: INSERT/UPDATE/DELETE/ALTER/DROP/TRUNCATE 0건. tests/calibrate_fixtures/sample_ask_events.json: dry-run snapshot fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	09883d0358	feat(ask): Phase 3.5 A0 — ask_events source/eval_case_id + eval auth boundary - migrations 138~142: source TEXT DEFAULT 'document_server' + eval_case_id TEXT 추가, 인덱스 2개, backfill, 1주 관찰 후 NOT NULL (140 적용 분리) - app/models/ask_event.py: source / eval_case_id ORM 필드 (138~141 단계 nullable) - app/services/search_telemetry.py: record_ask_event 시그니처에 source / eval_case_id - app/core/config.py: settings.eval_runner_token + EVAL_RUNNER_TOKEN env 로드 - app/api/search.py: - X-Source / X-Eval-Case-Id / X-Eval-Token 헤더 수신 - _resolve_eval_identity(): hmac.compare_digest 로 token 검증, 실패 시 source 'document_server' 강등 + warning log + eval_case_id=None - 두 record_ask_event 호출에 검증된 source/eval_case_id 전달 - credentials.env.example: EVAL_RUNNER_TOKEN= (empty default = 모든 eval claim 거부) - tests/test_ask_eval_auth.py: 9 케이스 — token 없음/틀림/일치, env 미설정, case_id only, non-eval source forces case_id None trust boundary: 일반 client 의 X-Source=eval / X-Eval-Case-Id 시도는 무시되어 calibration telemetry 오염 불가. eval runner 만 EVAL_RUNNER_TOKEN 으로 인증. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:11:06 +09:00
Hyungi Ahn	06443947bf	feat(ask): Phase 3.5a guardrails (classifier + refusal gate + grounding + partial) 신규 파일: - classifier_service.py: exaone binary classifier (sufficient/insufficient) parallel with evidence, circuit breaker, timeout 5s - refusal_gate.py: multi-signal fusion (score + classifier) AND 조건, conservative fallback 3-tier (classifier 부재 시) - grounding_check.py: strong/weak flag 분리 strong: fabricated_number + intent_misalignment(important keywords) weak: uncited_claim + low_overlap + intent_misalignment(generic) re-gate: 2+ strong → refuse, 1 strong → partial - sentence_splitter.py: regex 기반 (Phase 3.5b KSS 업그레이드) - classifier.txt: exaone Y+ prompt (calibration examples 포함) - search_synthesis_partial.txt: partial answer 전용 프롬프트 - 102_ask_events.sql: /ask 관측 테이블 (completeness 3-분리 지표) - queries.yaml: Phase 3.5 smoke test 평가셋 10개 수정 파일: - search.py /ask: classifier parallel + refusal gate + grounding re-gate + defense_layers 로깅 + AskResponse completeness/aspects/confirmed_items - config.yaml: classifier model 섹션 (exaone3.5:7.8b GPU Ollama) - config.py: classifier optional 파싱 - AskAnswer.svelte: 4분기 렌더 (full/partial/insufficient/loading) - ask.ts: Completeness + ConfirmedItem 타입 P1 실측: exaone ternary 불안정 → binary gate 축소. partial은 grounding이 담당. 토론 9라운드 확정. plan: quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 08:49:11 +09:00
Hyungi Ahn	21a78fbbf0	fix(search): semaphore로 LLM concurrency=1 강제 + run_eval analyze 파라미터 추가 ## 배경 1차 Phase 2.2 eval에서 발견: 23개 쿼리가 순차 호출되지만 각 request의 background analyzer task는 모두 동시에 MLX에 요청 날림 → MLX single-inference 서버 queue 폭발 → 22개가 15초 timeout. cache 채워지지 않음. ## 수정 ### query_analyzer.py - LLM_CONCURRENCY = 1 상수 추가 - _LLM_SEMAPHORE: lazy init asyncio.Semaphore (event loop 바인딩) - analyze() 내부: semaphore → timeout(실제 LLM 호출만) 이중 래핑 semaphore 대기 시간이 timeout에 포함되지 않도록 주의 ### run_eval.py - --analyze true\|false 파라미터 추가 (Phase 2.1+ 측정용) - call_search / evaluate 시그니처에 analyze 전달 ## 기대 효과 - prewarm/background/동기 호출 모두 1개씩 순차 MLX 호출 - 23개 대기 시 최악 230초 소요, 단 모두 성공해서 cache 채움 - MLX 서버 부하 안정 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:12:13 +09:00
Hyungi Ahn	76e723cdb1	feat(search): Phase 1.3 TEI reranker 통합 (코드 골격) 데이터 흐름 원칙: fusion=doc 기준 / reranker=chunk 기준 — 절대 섞지 말 것. 신규/수정: - ai/client.py: rerank() 메서드 추가 (TEI POST /rerank API) - services/search/rerank_service.py: - rerank_chunks() — asyncio.Semaphore(2) + 5s soft timeout + RRF fallback - _make_snippet/_extract_window — title + query 중심 200~400 토큰 (keyword 매치 없으면 첫 800자 fallback) - apply_diversity() — max_per_doc=2, top score>=0.90 unlimited - warmup_reranker() — 10회 retry + 3초 간격 (TEI 모델 로딩 대기) - MAX_RERANK_INPUT=200, MAX_CHUNKS_PER_DOC=2 hard cap - services/search_telemetry.py: compute_confidence_reranked() — sigmoid score 임계값 - api/search.py: - ?rerank=true\|false 파라미터 (기본 true, hybrid 모드만) - 흐름: fused_docs(limit*5) → chunks_by_doc 회수 → rerank_chunks → apply_diversity - text-only 매치 doc은 doc 자체를 chunk처럼 wrap (fallback) - rerank 활성 시 confidence는 reranker score 기반 - tests/search_eval/run_eval.py: --rerank true\|false 플래그 GPU 적용 보류: - TEI 컨테이너 추가 (docker-compose.yml) — 별도 작업 - config.yaml rerank.endpoint 갱신 — GPU 직접 (commit 없음) - 재인덱싱 완료 후 build + warmup + 평가셋 측정	2026-04-08 12:41:47 +09:00
Hyungi Ahn	42dfe82c9b	feat(chunk): Phase 1.2-E reindex 스크립트 추가 tests/scripts/reindex_all_chunks.py — 전체 documents chunk 재인덱싱 도구. 핵심 요건 (사용자 정의): - asyncio.Semaphore(N) — 동시 처리 수 제한 (기본 3, Ollama bge-m3 부하 조절) - checkpoint resume — JSON 파일 atomic swap, 중간 실패/중단 후 재시작 가능 - rate limiting — 작업 간 sleep 0.1초 (Ollama API 보호) - 진행 로그 — [REINDEX] N/total (P%) ETA: ... fails: N (~2% 단위) CLI: - --concurrency, --checkpoint, --rate-limit, --limit (dry-run), --skip-existing 야간 배치 (00:00~06:00): PYTHONPATH=app .venv/bin/python tests/scripts/reindex_all_chunks.py \ --concurrency 3 --checkpoint checkpoints/reindex.json \ > logs/reindex.log 2>&1 &	2026-04-08 12:31:29 +09:00
Hyungi Ahn	161ff18a31	feat(search): Phase 0.5 RRF fusion + 강한 신호 boost 기존 weighted-sum merge를 Reciprocal Rank Fusion으로 교체. 정확 키워드 매치에서 RRF가 평탄화되는 문제는 boost로 보완. 신규 모듈 app/services/search_fusion.py: - FusionStrategy ABC - LegacyWeightedSum : 기존 _merge_results 동작 (A/B 비교용) - RRFOnly : 순수 RRF, k=60 - RRFWithBoost : RRF + title/tags/법령조문/high-text-score boost (default) - normalize_display_scores: SearchResult.score를 [0..1] 랭크 기반 정규화 (프론트엔드가 score*100을 % 표시하므로 RRF 원본 점수 노출 시 표시 깨짐) search.py: - ?fusion=legacy\|rrf\|rrf_boost 파라미터 (default rrf_boost) - _merge_results 제거 (LegacyWeightedSum에 흡수) - pre-fusion confidence: hybrid는 raw text/vector 신호로 계산 (fused score는 fusion 전략마다 스케일이 달라 일관 비교 불가) - timing에 fusion_ms 추가 - debug notes에 fusion 전략 표시 telemetry: - compute_confidence_hybrid(text_results, vector_results) 헬퍼 - record_search_event에 confidence override 파라미터 run_eval.py: - --fusion CLI 옵션, call_search 쿼리 파라미터에 전달 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 08:58:33 +09:00
Hyungi Ahn	8490cfed10	test(search): Phase 0.2 평가셋 + 평가 스크립트 22개 쿼리(6개 카테고리)와 Recall/MRR/NDCG@10 + latency p50/p95 측정 스크립트 추가. wiggly-weaving-puppy 플랜 Phase 0.2 산출물. - queries.yaml: 정확키워드/한국어자연어/crosslingual/뉴스/실패 케이스 실제 코퍼스(2026-04-07, 753 docs) 기반 정답 doc_id 매핑 - run_eval.py: 단일 평가 + A/B 비교 모드, CSV 저장 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 08:19:38 +09:00
Hyungi Ahn	131dbd7b7c	feat: scaffold v2 project structure with Docker, FastAPI, and config 동작하는 최소 코드 수준의 v2 스캐폴딩: - docker-compose.yml: postgres, fastapi, kordoc, frontend, caddy - app/: FastAPI 백엔드 (main, core, models, ai, prompts) - services/kordoc/: Node.js 문서 파싱 마이크로서비스 - gpu-server/: AI Gateway + GPU docker-compose - frontend/: SvelteKit 기본 구조 - migrations/: PostgreSQL 초기 스키마 (documents, tasks, processing_queue) - tests/: pytest conftest 기본 설정 - config.yaml, Caddyfile, credentials.env.example 갱신 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 10:20:15 +09:00
Hyungi Ahn	e48b6a2bb4	chore: remove v1 files from main branch v1 코드는 v1-archive 브랜치 + v1-final 태그로 보존. 필요시 git show v1-final:<파일경로>로 참조 가능. 삭제: applescript/, launchd/, v1 scripts, v1 docs, requirements.txt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 09:35:09 +09:00
Hyungi Ahn	948be1624f	fix: Qwen3.5 /nothink 모드 + json_mode 파라미터 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 13:33:23 +09:00
Hyungi Ahn	084d3a8c63	feat: 전체 PKM 스크립트 일괄 작성 — 분류/법령/메일/다이제스트/임베딩 - scripts/pkm_utils.py: 공통 유틸 (로거, dotenv, osascript 래퍼) - scripts/prompts/classify_document.txt: Ollama 분류 프롬프트 - applescript/auto_classify.scpt: Inbox → AI 분류 → DB 이동 - applescript/omnifocus_sync.scpt: Projects → OmniFocus 작업 생성 - scripts/law_monitor.py: 법령 변경 모니터링 + DEVONthink 임포트 - scripts/mailplus_archive.py: MailPlus IMAP → Archive DB - scripts/pkm_daily_digest.py: 일일 다이제스트 + OmniFocus 액션 - scripts/embed_to_chroma.py: GPU 서버 벡터 임베딩 → ChromaDB - launchd/*.plist: 3개 스케줄 (07:00, 07:00+18:00, 20:00) - docs/deploy.md: Mac mini 배포 가이드 - docs/devonagent-setup.md: 검색 세트 9종 설정 가이드 - tests/test_classify.py: 5종 문서 분류 테스트 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 12:32:36 +09:00

22 Commits