agent_jrxml

panda/agent_jrxml

Fork 0

Commit Graph

Author	SHA1	Message	Date
panda	bd5bfbac2d	fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch instead of modifying coordinates in-place, shrinking output to ~3k chars. Solution (programmatic node control, not prompt engineering): - New agent/jrxml_windower.py: decompose JRXML into header (never sent to LLM) + individual bands. Split bands >4000 chars at element boundaries. Reassemble with element count validation (>10% change = rollback). - Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars each). LLM cannot "reimagine" the entire report. - Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name replacement. Zero LLM calls, zero content loss. - _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for valid JRXML identifiers. - Tests: 48 new unit tests (windower 28 + map_fields 20). All passing. Full suite 385 tests, zero regressions.	2026-05-24 08:55:38 +08:00
panda	4dfc418fc5	fix: escape {field_N} braces in prompt templates to prevent .format() KeyError $F{field_1} literal text in skeleton_generation/refine_layout/field_mapping prompts was being parsed as Python .format() placeholder, causing KeyError on every image-based initial_generation request. Escaped with double braces so .format() outputs literal {field_1} for the LLM.	2026-05-22 08:12:56 +08:00
panda	43a0542a11	feat: layered precise generation for A4 report images 3-phase pipeline to solve LLM prompt overflow from too many OCR elements: Phase 1 (generate_skeleton): compressed layout schema → skeleton JRXML Phase 2 (refine_layout): sampled coordinates → pixel-level position tuning Phase 3 (map_fields): OCR field names → replace $F{field_N} placeholders Only triggered when layout_schema.total_rows > 0 on initial_generation intent. Text requests and all other intents are unaffected (zero behavior change).	2026-05-21 08:34:32 +08:00

Author

SHA1

Message

Date

panda

bd5bfbac2d

fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss

Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.

Solution (programmatic node control, not prompt engineering):

- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
  LLM) + individual bands. Split bands >4000 chars at element boundaries.
  Reassemble with element count validation (>10% change = rollback).

- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
  each). LLM cannot "reimagine" the entire report.

- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
  replacement. Zero LLM calls, zero content loss.

- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
  valid JRXML identifiers.

- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
  Full suite 385 tests, zero regressions.

2026-05-24 08:55:38 +08:00

panda

4dfc418fc5

fix: escape {field_N} braces in prompt templates to prevent .format() KeyError

$F{field_1} literal text in skeleton_generation/refine_layout/field_mapping
prompts was being parsed as Python .format() placeholder, causing KeyError
on every image-based initial_generation request. Escaped with double braces
so .format() outputs literal {field_1} for the LLM.

2026-05-22 08:12:56 +08:00

panda

43a0542a11

feat: layered precise generation for A4 report images

3-phase pipeline to solve LLM prompt overflow from too many OCR elements:
Phase 1 (generate_skeleton): compressed layout schema → skeleton JRXML
Phase 2 (refine_layout): sampled coordinates → pixel-level position tuning
Phase 3 (map_fields): OCR field names → replace $F{field_N} placeholders

Only triggered when layout_schema.total_rows > 0 on initial_generation intent.
Text requests and all other intents are unaffected (zero behavior change).

2026-05-21 08:34:32 +08:00

3 Commits