fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss
Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.
Solution (programmatic node control, not prompt engineering):
- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
LLM) + individual bands. Split bands >4000 chars at element boundaries.
Reassemble with element count validation (>10% change = rollback).
- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
each). LLM cannot "reimagine" the entire report.
- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
replacement. Zero LLM calls, zero content loss.
- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
valid JRXML identifiers.
- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
Full suite 385 tests, zero regressions.
This commit is contained in:
@@ -150,6 +150,12 @@ def _get_searcher() -> RAGSearcher:
|
||||
return _searcher
|
||||
|
||||
|
||||
def search_chunks(query: str, k: int = 5) -> str:
|
||||
"""搜索 JRXML 知识库并返回拼接后的上下文文本(便捷函数)。"""
|
||||
def search_chunks(query: str, k: int = 5, kb_id: str = "") -> str:
|
||||
"""搜索知识库并返回拼接后的上下文文本。
|
||||
|
||||
若指定 kb_id,使用该 KB 专属 ChromaDB;否则使用全局默认库。
|
||||
"""
|
||||
if kb_id:
|
||||
from backend.kb_searcher import search_kb
|
||||
return search_kb(kb_id, query, k=k)
|
||||
return _get_searcher().search_as_context(query, k=k)
|
||||
|
||||
Reference in New Issue
Block a user