hyungi_document_server

Author	SHA1	Message	Date
hyungi	51c3f6df10	feat(search): /ask/react endpoint with Qwen native tool calling ReAct loop PR-DocSrv-Ask-ToolCalling-ReAct-1 — Qwen3.6-27B-8bit 의 native tool calling 으로 ReAct loop 도입. 기존 /api/search/ask 무수정. 트랙 B (frontend /ask SSE) 와 파일 단위 충돌 0 (search.py 의 ask() 함수 line diff = 0, 순수 추가). 핵심 invariant: - 별 endpoint /api/search/ask/react (qwen-macbook only, implicit opt-in) - MacBook unavailable 시 HTTP 503 + error_reason=macbook_unavailable. Gemma 자동 fallback X (정정 4 의 연장) G0 (구현 전 hard gate, plan b-velvety-hare.md): - G0-1 fixture (tests/fixtures/qwen_tool_call_response.json): 실제 mlx-vlm 응답 박제. shape = OpenAI 표준 호환 (choices[0].message.tool_calls + function.arguments JSON string). generate_with_tools() 가 본 shape 기준 구현. - G0-2 counter semantics: max_tool_rounds=2 + max_llm_calls=3 + search_exec_max=2. 마지막 LLM 호출은 tool_choice="none" + system instruction 으로 final 강제. - G0-3 trace exposure: default response 의 debug_trace=null. debug=true 시만 채움. server log 에는 항상 round 기록. backends.py (193 → 261줄): - QwenMacBookBackend.generate_with_tools(messages, tools, tool_choice) 신규 method. 기존 generate() 무수정. BackendUnavailable 처리 동일. react_loop.py 신규 (275줄): - agentic_ask_loop(session, query, *, backend, max_tool_rounds, debug) - tool round 안에서 run_search 호출, results dedup by id, final round 강제, partial=True 조건 (final content 빈 경우) search.py (+82줄): - POST /api/search/ask/react + AskReactRequest/Response schema - BackendUnavailable → JSONResponse(503, error_reason=macbook_unavailable) config.yaml + config.py: - search.ask.react: { enabled, max_tool_rounds=2, search_tool_limit=5, search_tool_mode=hybrid } tests (566줄, 18 신규 + 23 회귀 모두 PASS): - test_react_loop.py 13건: G0-1 fixture shape / G0-2 counter cap / G0-3 trace exposure / BackendUnavailable propagation / sources dedup - test_search_ask_react_endpoint.py 5건: 503 + run_search 호출 0 / 정상 200 / debug=true trace 노출 / max rounds partial - 회귀 (test_ask_eval_auth 9 + test_search_ask_macbook_503 5 + test_backend_dispatcher 9) 모두 PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 13:43:47 +00:00
hyungi	a7b8f15870	feat(search): /ask backend dispatcher (qwen-macbook opt-in, no silent fallback) PR-MacBook-RAG-Backend-1 — /api/search/ask 의 명시 backend 선택 진입점. 핵심 invariant (정정 4): - backend 미지정 = Gemma Mac mini default, 응답 contract 변동 0 - backend="qwen-macbook" 명시 opt-in 만 MacBook M5 Max mlx-vlm.server 호출 - MacBook unavailable 시 HTTP 503 + error_reason=macbook_unavailable - 자동 fallback 절대 금지 — 실패 path 에서 Gemma backend.generate() 호출 0 backend dispatcher (services/llm/): - BackendBase / GemmaMacMiniBackend / QwenMacBookBackend / BackendUnavailable - Qwen backend 는 Mac mini llm_gate 점유 X, 별 Semaphore(1) — llm_gate docstring 의 single-inference 영구 룰은 같은 endpoint 한정으로 scope 명시 - httpx Connect/Read/Pool/Timeout/5xx → BackendUnavailable, 4xx 전파 synthesis_service.py: - backend 인자 추가, status="backend_unavailable" 신규 - cache key 에 backend_name 포함 (qwen ↔ gemma 캐시 충돌 차단) config: - search.ask.backend.{macmini_url, macbook_url, macbook_model, timeout_connect_s=1, timeout_read_s=30} - MacBook endpoint = http://100.118.112.84:8810 (M5 Max Tailscale bind) tests (14 신규): - tests/services/test_backend_dispatcher.py (9): dispatcher 정합성 + Qwen generate path (mock 200 / dead port / 5xx / 4xx) + cache identity - tests/api/test_search_ask_macbook_503.py (5): 정정 4 핵심 invariant. backend=qwen-macbook 비가용 시 gemma.generate.assert_not_called() 기존 ask 회귀 0 (test_ask_eval_auth 9건 등 85건 모두 PASS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 13:10:44 +00:00
Hyungi Ahn	a332a8aabe	fix(search): classifier timeout 15s → 30s (concurrent load 2x margin) A1+config(15s) 후속 진단: voice memo PoC plan 호출 elapsed_ms=14432 — 15s 한계 거의 밀착. Mac mini 26B 동시 부하 (classifier + evidence + synthesis 3-way) 시 빈번 ReadTimeout 잔존. 30s 로 2x 마진 확보 — config.yaml + classifier_service.py 양쪽 align. Phase 3.5 guardrail 동작 자체에는 영향 없음 (timeout 시 fallback 경로 동일). 향후 별 트랙 (DS-Mac-mini-26B-Concurrent-Load-1): asyncio.Semaphore 도입으로 Mac mini 26B 동시 호출 제한 vs triage 만 작은 모델 재도입. 본 PR 은 timeout 완화만.	2026-05-16 19:42:49 +09:00
Hyungi Ahn	a8b84e641a	fix(search): classifier.timeout config 10s → 15s (httpx inner timeout align) A1 timeout 5s → 15s 후 진단 로그가 httpx.ReadTimeout('') 확정. classifier_service 의 asyncio.timeout 외부 wrap (15s) 보다 AIClient._request 내부 httpx timeout (10s, config.yaml classifier.timeout) 가 먼저 fire → ReadTimeout 빈 메시지 raise. 두 timeout 을 15s 로 align — Mac mini 26B 동시 부하 (PR #20 후속) 시 classifier 지연 ≤15s 까지 허용. 후속: evidence_service.py / synthesis_service.py 의 timeout 도 동일 패턴 검토 필요 (별 PR, DS-Mac-mini-26B-Concurrent-Load-1 트랙).	2026-05-16 19:12:51 +09:00
Hyungi Ahn	4eed0bc4f8	refactor(ai): GPU Ollama LLM 제거 — Mac mini 26B 단일 generation 호스트로 통일 GPU 서버 정체성 = embedding/rerank/STT/OCR/marker 특화 백엔드. Generative LLM 0. Mac mini gemma-4-26B-A4B 가 triage + primary + classifier 모두 흡수. fallback 은 Claude Sonnet 4 API (자동 trigger, premium 과 budget 공유). - triage: GPU Ollama gemma4:e4b → Mac mini :8801 26B (primary 동일 endpoint) - fallback: GPU Ollama gemma4:e4b → Claude Sonnet 4 API (require_explicit_trigger=false) - classifier: GPU Ollama gemma4:e4b → Mac mini :8801 26B (max_tokens 512) - primary / premium / embedding / rerank: 변경 0 후속 (별 커밋): `ssh gpu "ollama rm gemma4:e4b-it-q8_0"` — VRAM ~11GB 회수. Mac mini 단일화 위험 mitigation = (1) Mac mini uptime 31d 무중단 검증, (2) Claude Sonnet 4 API daily_budget $5 안 (Mac mini up 가정 호출 빈도 낮음), (3) Beszel siteMonitor :8801 health check + Synology Chat alert. plan: ~/.claude/plans/rosy-launching-otter.md §C/§D/§E (7-device LLM 배치 + 운영 전략) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 08:16:40 +09:00
Hyungi Ahn	d3303cec1c	fix(search): point reranker endpoint to TEI service	2026-05-13 12:02:26 +09:00
Hyungi Ahn	18d684b501	ops(infra): STT Mac mini 이전 + classifier 섹션 복원 (gemma4:e4b) - docker-compose.yml stt-service 를 profiles:[legacy] 로 이동. GPU 의 stt-service 는 더 이상 기동하지 않고, fastapi STT_ENDPOINT 가 Mac mini (기본 100.76.254.116:8804 Tailscale, MAC_MINI_HOST env 로 LAN IP 주입) 를 바라보도록 변경. 복원 필요 시 `docker compose --profile legacy up -d stt-service`. - config.yaml: classifier 섹션을 gemma4:e4b-it-q8_0 으로 복원. 이전 B-0 커밋이 classifier 를 주석 처리했는데, 실제로는 classifier_service 가 쓰고 있어 gate 유효. exaone 은 이미 제거됐으니 모델만 gemma4 로 통일. classifier_service 의 hasattr 체크는 유지되어 fallback 안전. D13 (STT 이전) drift 를 main 으로 승격. inventory 갱신은 B-3 마감 단계에서 3-tier + STT 경로 묶어서 일괄. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:08:00 +09:00
Hyungi Ahn	490bef1136	feat(ai): B-0 3-tier routing — triage/primary/fallback 슬롯 + AIClient - config.yaml: ai.models 에 triage (gemma4:e4b-it-q8_0, GPU Ollama, context_char_limit=120k, timeout 30s) 신규. primary (MLX gemma-4-26b) 는 에스컬레이션 전용 역할 명시. fallback 을 gemma4:e4b 로 통일 (exaone 제거 이미 반영). classifier/verifier 는 optional 유지, vision 은 optional 로 완화 (미사용 정리 준비). - core/config.py: AIConfig 에 triage 필드 추가, vision 은 Optional 로 전환. AIModelConfig.context_char_limit + DeepSummaryBacklogConfig (R2 backlog guard 임계치 ratio 0.3 / pending 5 / window 30min) 스키마 신설. load_settings 가 models.get("vision") graceful. - ai/client.py: call_triage / call_primary / call_fallback 3-tier 진입점 신규. primary 는 caller 가 get_mlx_gate() 블록 안에서 호출 해야 한다는 계약 docstring. classify/summarize 는 DEPRECATED 주석 만 추가, 기존 호출부 (eval runner 등) 를 위해 유지. PR-B B-0 Day 1. 기존 primary 경로 변경 없음 — 회귀 0 기대. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:05:24 +09:00
Hyungi Ahn	8fdea88676	feat(documents): §1 category enum + ai_suggestion 승인 파이프 plan: ~/.claude/plans/luminous-sprouting-hamster.md §1 - migrations/143_category.sql: doc_category enum (6 활성 + 3 유보) + documents.category + documents.ai_suggestion JSONB + 2 idx. - app/models/document.py: category (Enum, create_type=False), ai_suggestion (JSONB). - app/prompts/classify.txt: document_type enum 에 7 실무 doctype 추가 (발주서/세금계산서/명세표/도면/증명서/계획서/시방서) + facet_doctype 필드 directive. - config.yaml: document_types 에 7 항목 추가 (worker 검증 통과). - app/workers/classify_worker.py: FACET_DOCTYPES / LIBRARY_SUGGESTION_DOCTYPES 상수, facet_doctype 파싱(기존값 미덮어씀), 발주서/세금계산서/명세표 감지 시 ai_suggestion={proposed_category=library, proposed_path=@library/ 거래/{YYYY}/{doctype}, source_updated_at=doc.updated_at.isoformat(), ...}. category / user_tags 자동 전이 금지 (suggestion-only). - app/api/documents.py: · DocumentResponse 에 category / ai_suggestion 노출 · GET /documents ?category=<cat> / ?has_suggestion / ?proposed_category (category 지정 시 기본 news/memo 제외 해제 — §2 승인 UI 계약) · GET /documents/library 를 Document.category=='library' 기반으로 재구현 (path subquery 는 user_tags 유지 — 분류 내부 서가 경로) · POST /documents/{id}/accept-suggestion — FOR UPDATE + idempotent no-op + dual 409 stale (payload source_updated_at / documents.updated_at) + user_tags idempotent append · DELETE /documents/{id}/suggestion — idempotent, stale 검사 없음 - scripts/backfill_category.py: dry-run / apply. 매핑(news/memo/@library/else) + 3-way 상대 검증 (all_rows==categorized, uncategorized==0, cat_library==has_library_tag — 자동 전이 금지 정책 검증). 남은 DoD (원격 배포 후): docker compose up → migration 143 적용 → backfill apply → smoke (drive_sync 발주서 업로드 suggestion 생성 / category 유지, accept-suggestion idempotency + 409 stale 두 벡터, /documents?category=library == /documents/library 건수 일치). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:32:01 +09:00
Hyungi Ahn	8622a97e7d	feat(upload): backend-owned upload size contract + public config 엔드포인트 업로드 크기 한도를 프론트 하드코딩이 아닌 서버 config 의 단일 진실 공급원 으로 이동. 프론트는 Phase B 후속 커밋에서 이 값을 읽어 pre-check UX 에 사용. - config.yaml 에 `upload` 블록 추가: * max_bytes (authoritative policy) * content_length_slack_ratio (multipart 오버헤드 여유) * stream_chunk_bytes (스트리밍 IO 단위) - app/core/config.py 에 UploadConfig pydantic 모델 + Settings.upload 필드 - app/api/config.py 신규 — GET /api/config/public 엔드포인트 * 민감정보 없는 프론트 필수 설정만 노출 * 범용 서버 설정 공개 창구로 확대 금지 (docstring 명시) - /api/config 를 setup redirect bypass 에 추가 (초기 setup 전에도 조회 가능) 이 커밋 자체는 기존 upload 동작에 영향 없음. 후속 커밋에서 enforcement + 프론트 구독을 연결. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 08:02:19 +09:00
Hyungi Ahn	06443947bf	feat(ask): Phase 3.5a guardrails (classifier + refusal gate + grounding + partial) 신규 파일: - classifier_service.py: exaone binary classifier (sufficient/insufficient) parallel with evidence, circuit breaker, timeout 5s - refusal_gate.py: multi-signal fusion (score + classifier) AND 조건, conservative fallback 3-tier (classifier 부재 시) - grounding_check.py: strong/weak flag 분리 strong: fabricated_number + intent_misalignment(important keywords) weak: uncited_claim + low_overlap + intent_misalignment(generic) re-gate: 2+ strong → refuse, 1 strong → partial - sentence_splitter.py: regex 기반 (Phase 3.5b KSS 업그레이드) - classifier.txt: exaone Y+ prompt (calibration examples 포함) - search_synthesis_partial.txt: partial answer 전용 프롬프트 - 102_ask_events.sql: /ask 관측 테이블 (completeness 3-분리 지표) - queries.yaml: Phase 3.5 smoke test 평가셋 10개 수정 파일: - search.py /ask: classifier parallel + refusal gate + grounding re-gate + defense_layers 로깅 + AskResponse completeness/aspects/confirmed_items - config.yaml: classifier model 섹션 (exaone3.5:7.8b GPU Ollama) - config.py: classifier optional 파싱 - AskAnswer.svelte: 4분기 렌더 (full/partial/insufficient/loading) - ask.ts: Completeness + ConfirmedItem 타입 P1 실측: exaone ternary 불안정 → binary gate 축소. partial은 grounding이 담당. 토론 9라운드 확정. plan: quiet-meandering-nova.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 08:49:11 +09:00
Hyungi Ahn	de08735420	fix(ai): primary -> mlx-proxy 8801 + align model to gemma - endpoint: 100.76.254.116:8800 -> :8801 (route through mlx-proxy for /status observability - active_jobs / total_requests) - model: Qwen3.5-35B-A3B-4bit -> gemma-4-26b-a4b-it-8bit (match the model actually loaded on mlx-proxy) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 04:40:06 +00:00
Hyungi Ahn	bf8efd1cd3	feat: 임베딩 모델 변경 — nomic-embed-text → bge-m3 (1024차원, 다국어) - config.yaml: embedding model → bge-m3 - document.py: Vector(768) → Vector(1024) - embed_worker.py: 모델 버전 업데이트 - migration 011: 벡터 컬럼 재생성 (기존 임베딩 초기화) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 12:49:45 +09:00
Hyungi Ahn	9b0705b79f	config: fallback 모델 qwen3.5:35b → qwen3.5:9b-q8_0 (GPU VRAM 제한) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:40:25 +09:00
Hyungi Ahn	6d73e7ee12	feat: 분류 체계 전면 개편 — taxonomy + document_type + confidence - config.yaml: 6개 domain × 3단계 taxonomy + 13개 document_types 정의 - classify.txt: 영문 프롬프트, taxonomy 경로 기반 분류 + 분류 규칙 주입 - classify_worker: taxonomy 검증, confidence 기반 분류, document_type 저장 - migration 008: document_type, importance, ai_confidence 컬럼 - API: DocumentResponse에 document_type, importance, ai_confidence 추가 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:32:20 +09:00
Hyungi Ahn	0ca78640ee	infra: migrate application from Mac mini to GPU server - Integrate ollama + ai-gateway into root docker-compose.yml (NVIDIA GPU runtime, single compose for all services) - Change NAS mount from SMB (NAS_SMB_PATH) to NFS (NAS_NFS_PATH) Default: /mnt/nas/Document_Server (fstab registered on GPU server) - Update config.yaml AI endpoints: primary → Mac mini MLX via Tailscale (100.76.254.116:8800) fallback/embedding/vision/rerank → ollama (same Docker network) gateway → ai-gateway (same Docker network) - Update credentials.env.example (remove GPU_SERVER_IP, add NFS path) - Mark gpu-server/docker-compose.yml as deprecated - Update CLAUDE.md network diagram and AI model config - Update architecture.md, deploy.md, devlog.md for GPU server as main - Caddyfile: auto_https off, HTTP only (TLS at upstream proxy) - Caddy port: 127.0.0.1:8080:80 (localhost only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 07:47:09 +09:00
Hyungi Ahn	131dbd7b7c	feat: scaffold v2 project structure with Docker, FastAPI, and config 동작하는 최소 코드 수준의 v2 스캐폴딩: - docker-compose.yml: postgres, fastapi, kordoc, frontend, caddy - app/: FastAPI 백엔드 (main, core, models, ai, prompts) - services/kordoc/: Node.js 문서 파싱 마이크로서비스 - gpu-server/: AI Gateway + GPU docker-compose - frontend/: SvelteKit 기본 구조 - migrations/: PostgreSQL 초기 스키마 (documents, tasks, processing_queue) - tests/: pytest conftest 기본 설정 - config.yaml, Caddyfile, credentials.env.example 갱신 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 10:20:15 +09:00

17 Commits