Commit Graph

4 Commits

Author SHA1 Message Date
panda bd5bfbac2d fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss
Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.

Solution (programmatic node control, not prompt engineering):

- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
  LLM) + individual bands. Split bands >4000 chars at element boundaries.
  Reassemble with element count validation (>10% change = rollback).

- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
  each). LLM cannot "reimagine" the entire report.

- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
  replacement. Zero LLM calls, zero content loss.

- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
  valid JRXML identifiers.

- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
  Full suite 385 tests, zero regressions.
2026-05-24 08:55:38 +08:00
panda b280c2b453 feat: integrate RAG rag_jrxml submodule and fix Anthropic API key
Add rag submodule for semantic JRXML chunk retrieval, refactor
retrieve node to use RAGSearcher, and fix missing api_key in
Anthropic SDK client initialization.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 09:42:57 +08:00
panda 4416c20b77 feat: update init_kb script 2026-05-15 08:29:01 +08:00
panda 4b43c5d3e4 feat: LangGraph工作流核心 — Agent状态/节点/图 + 验证服务 + 知识库
agent/
  state.py: AgentState TypedDict(20字段含意图/压缩/会话/撤销)
  nodes.py: 17个节点函数(生成/修改/验证/纠错/意图分类/压缩/撤销/重置)
  graph.py: 17节点状态图,8意图路由分发

验证服务 validation_service/
  main.py: FastAPI服务,lxml XSD验证 + 结构化检查(字段引用/SQL/尺寸)

数据 data/
  sample_templates/: 4个JRXML示例模板
  corrections/: 3个错误修正案例

脚本 scripts/
  init_kb.py: Chroma知识库初始化
2026-05-14 23:21:10 +08:00