13 Commits

Author SHA1 Message Date
panda 00f718fbda fix: prompt 添加字段声明和 font 标签格式的强制规则 2026-05-26 08:23:39 +08:00
panda 520c8b19d0 fix: 五轮修正失败根因修复 - 评分公式去掉field_coverage权重, namespace无条件检查, OCR自动发现文档类型 2026-05-24 22:44:37 +08:00
panda f25a93b539 WIP: baseline on fix/retry-failure-root-causes 2026-05-24 22:38:30 +08:00
panda 2d5183d2bd fix: OCR fidelity scoring reform — prevent false fail from language-mismatched field names
Root cause (from review): field_coverage compared English JRXML field names
against Chinese OCR field names with set intersection — always zero. Combined
with 0.5 weight in score formula, caused valid JRXML (XSD pass, 82% element
coverage) to score 0.41 < 0.5 → fail → correction loop → progressive destruction.

Changes:
- Scoring weight: element_coverage 0.8 + field_coverage 0.2 (was 0.5+0.5)
- Validate node: only fail on fidelity when BOTH score<0.5 AND element_coverage<0.4
- Field name regex: \w+ → [^"]+ to support non-ASCII field names
- Field matching: also try _sanitize_field_name conversion (Chinese→_uXXXX_)
- correction.md: namespace check always active, not conditional on error keywords
2026-05-24 15:36:40 +08:00
panda bd5bfbac2d fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss
Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.

Solution (programmatic node control, not prompt engineering):

- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
  LLM) + individual bands. Split bands >4000 chars at element boundaries.
  Reassemble with element count validation (>10% change = rollback).

- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
  each). LLM cannot "reimagine" the entire report.

- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
  replacement. Zero LLM calls, zero content loss.

- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
  valid JRXML identifiers.

- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
  Full suite 385 tests, zero regressions.
2026-05-24 08:55:38 +08:00
panda bb6cc6e241 feat: add Java JRXML-to-PNG rendering pipeline with pixel-level SSIM comparison
- lib/java/: Java renderer (JrxmlRenderer) using JasperReports 6.21.0
  - JrxmlDebug for diagnostics, JrxmlGen for format reference
  - download_jars.sh for one-time dependency setup
- agent/nodes.py: _render_jrxml_to_png() and _compute_pixel_similarity()
  - Pixel comparison integrates into validate node (SSIM < 0.4 fails)
  - Pixel fidelity context injected into correct_jrxml for targeted fixes
- tests/test_pixel_comparison.py: 15 unit tests (render, SSIM, integration)
- .gitignore: exclude lib/java/*.jar, lib/java/*.class, tmp/
- CLAUDE.md: v11 changelog documenting the rendering pipeline
- All non-LLM tests pass (97/97)
2026-05-23 15:09:55 +08:00
panda 9de75d2f25 fix: escape $F{field_N} in correction.md to prevent Python format KeyError
$F{field_N} was being parsed by str.format() as a replacement field,
causing KeyError and crashing correct_jrxml node.
Changed to $F{{field_N}} (double braces -> literal brace in output).
2026-05-23 11:27:31 +08:00
panda 23cdfa8c2b fix: map_fields empty-retry + correction prompt field_N guidance
- map_fields: retry with simplified prompt on empty LLM response
- correction.md: add explicit guidance for undeclared field_N errors
  (add <field> declarations + try OCR name replacement)
- MAX_RETRY=5 now effective (was overridden by .env:3)
2026-05-23 11:15:09 +08:00
panda 1210b926c3 fix: MAX_RETRY 5 + rolling continuation + namespace-aware JRXML extraction
- MAX_RETRY: 3→5 (graph.py:35, nodes.py:25) with env override
- Rolling continuation: _generate_with_continuation() auto-detects
  truncated JRXML and sends anchor-based continuation, max 3 rounds
- JRXML extraction: regex/end-tag now namespace-prefix aware
  (ns0:jasperReport, ns:jasperReport, etc.)
- All 5 generation nodes refactored to use continuation helper
- Tests updated: scenario1 accepts ns-prefixed root, max_retry
  verifies graph termination
- stop_reason capture + WARNING log on max_tokens truncation
- Correction prompt now injects OCR context + layout schema
2026-05-23 10:58:46 +08:00
panda 4dfc418fc5 fix: escape {field_N} braces in prompt templates to prevent .format() KeyError
$F{field_1} literal text in skeleton_generation/refine_layout/field_mapping
prompts was being parsed as Python .format() placeholder, causing KeyError
on every image-based initial_generation request. Escaped with double braces
so .format() outputs literal {field_1} for the LLM.
2026-05-22 08:12:56 +08:00
panda 43a0542a11 feat: layered precise generation for A4 report images
3-phase pipeline to solve LLM prompt overflow from too many OCR elements:
Phase 1 (generate_skeleton): compressed layout schema → skeleton JRXML
Phase 2 (refine_layout): sampled coordinates → pixel-level position tuning
Phase 3 (map_fields): OCR field names → replace $F{field_N} placeholders

Only triggered when layout_schema.total_rows > 0 on initial_generation intent.
Text requests and all other intents are unaffected (zero behavior change).
2026-05-21 08:34:32 +08:00
panda 9bb011e429 feat: v4 multimodal chat input, multi-format support, and annotation detection
- Replace st.chat_input with st-multimodal-chatinput (Ctrl+V paste, drag-drop, file button)
- Extract _process_uploaded_file() shared handler (eliminates ~70 duplicated lines)
- Add XLSX (openpyxl), XLS (xlrd), DOC (olefile) parsers to file_parser.py
- Add backend/annotation_detector.py: circle detection (HoughCircles) + arrow detection (HoughLinesP clustering) + OCR correlation + LLM context formatting
- Add annotation_result field to AgentState with session persistence
- Wire annotation detection into process_input and _format_ocr_context
- Add 11 new tests: 7 annotation detector + 4 multi-format parser
- Update all docs: CLAUDE.md, README.md, CODE_GUIDE.md, ROADMAP.md
2026-05-20 23:43:16 +08:00
panda 70614dff5e feat: comprehensive v2 upgrade — streaming, error KB, file upload, layout analysis
Major changes:
- Streaming: LLM统一 _BaseLLM 接口 (invoke + stream), generate/modify/correct
  节点使用 get_stream_writer() 实现逐字输出, UI 节点平铺展开自动折叠
- Prompt外部化: 7个prompt拆分到 prompts/*.md, loader.py 支持热重载
- 错误自增长: backend/error_kb.py — 指纹去重 + ChromaDB持久化,
  correct_jrxml→validate 通过时自动入库, retrieve同时搜索错误KB
- 文件上传: backend/file_parser.py — PDF/DOCX/图片/文本解析,
  侧边栏多文件上传, 文本自动注入下一条消息
- A4模板识别: backend/layout_analyzer.py — 三种模式(完整A4/行片段修改/行片段新建),
  PaddleOCR元素提取 + 行分组 + JRXML section匹配
- 会话历史下载: jrxml_versions版本追踪 + 侧边栏历史版本下载按钮
- 预览修复: route_after_save跳过预览/导出意图的验证循环
- Ctrl+C修复: JS注入拦截Streamlit裸c键清缓存

Docs: CLAUDE.md (完整项目文档), ROADMAP.md (改进路线图)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 15:02:53 +08:00