hyungi_document_server

Files

T

Hyungi Ahn b80116243f feat(search): Phase 1.2-C chunks 기반 vector retrieval + raw chunks 보존

retrieval_service.search_vector를 documents.embedding → document_chunks.embedding로 전환.
fetch_limit = limit*5로 raw chunks를 넓게 가져온 후 doc 기준 압축.

신규: compress_chunks_to_docs(chunks, limit) → (doc_results, chunks_by_doc)
- doc_id 별 best score chunk만 doc_results (fusion 입력)
- 모든 raw chunks는 chunks_by_doc dict에 보존 (Phase 1.3 reranker용)
- '같은 doc 중복으로 RRF가 false boost' 방지

SearchResult: chunk_id / chunk_index / section_title optional 필드 추가.
- text 검색 결과는 None (doc-level)
- vector 검색 결과는 채워짐 (chunk-level)

search.py 흐름:
1. raw_chunks = await search_vector(...)
2. vector_results, chunks_by_doc = compress_chunks_to_docs(raw_chunks, limit)
3. fusion(text_results, vector_results) — doc 기준
4. (Phase 1.3) chunks_by_doc → reranker — chunk 기준

debug notes: raw=N compressed=M unique_docs=K로 흐름 검증.

데이터 의존: 재인덱싱(reindex_all_chunks.py 진행 중) 완료 후 평가셋으로 검증.

2026-04-08 12:36:47 +09:00

__init__.py

feat: scaffold v2 project structure with Docker, FastAPI, and config

2026-04-02 10:20:15 +09:00

auth.py

fix: set refresh cookie secure=False, samesite=lax for reverse proxy chain

2026-04-03 06:53:59 +09:00

dashboard.py

fix: Codex 리뷰 5건 수정 (critical 1 + high 4)

2026-04-06 07:15:13 +09:00

documents.py

fix: 사이드바 트리에서 News 제외 + 뉴스 페이지 ☰ 숨김