Root cause (from review): field_coverage compared English JRXML field names
against Chinese OCR field names with set intersection — always zero. Combined
with 0.5 weight in score formula, caused valid JRXML (XSD pass, 82% element
coverage) to score 0.41 < 0.5 → fail → correction loop → progressive destruction.
Changes:
- Scoring weight: element_coverage 0.8 + field_coverage 0.2 (was 0.5+0.5)
- Validate node: only fail on fidelity when BOTH score<0.5 AND element_coverage<0.4
- Field name regex: \w+ → [^"]+ to support non-ASCII field names
- Field matching: also try _sanitize_field_name conversion (Chinese→_uXXXX_)
- correction.md: namespace check always active, not conditional on error keywords
Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.
Solution (programmatic node control, not prompt engineering):
- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
LLM) + individual bands. Split bands >4000 chars at element boundaries.
Reassemble with element count validation (>10% change = rollback).
- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
each). LLM cannot "reimagine" the entire report.
- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
replacement. Zero LLM calls, zero content loss.
- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
valid JRXML identifiers.
- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
Full suite 385 tests, zero regressions.
$F{field_N} was being parsed by str.format() as a replacement field,
causing KeyError and crashing correct_jrxml node.
Changed to $F{{field_N}} (double braces -> literal brace in output).