agent_jrxml

Author	SHA1	Message	Date
panda	bd5bfbac2d	fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch instead of modifying coordinates in-place, shrinking output to ~3k chars. Solution (programmatic node control, not prompt engineering): - New agent/jrxml_windower.py: decompose JRXML into header (never sent to LLM) + individual bands. Split bands >4000 chars at element boundaries. Reassemble with element count validation (>10% change = rollback). - Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars each). LLM cannot "reimagine" the entire report. - Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name replacement. Zero LLM calls, zero content loss. - _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for valid JRXML identifiers. - Tests: 48 new unit tests (windower 28 + map_fields 20). All passing. Full suite 385 tests, zero regressions.	2026-05-24 08:55:38 +08:00
panda	bb6cc6e241	feat: add Java JRXML-to-PNG rendering pipeline with pixel-level SSIM comparison - lib/java/: Java renderer (JrxmlRenderer) using JasperReports 6.21.0 - JrxmlDebug for diagnostics, JrxmlGen for format reference - download_jars.sh for one-time dependency setup - agent/nodes.py: _render_jrxml_to_png() and _compute_pixel_similarity() - Pixel comparison integrates into validate node (SSIM < 0.4 fails) - Pixel fidelity context injected into correct_jrxml for targeted fixes - tests/test_pixel_comparison.py: 15 unit tests (render, SSIM, integration) - .gitignore: exclude lib/java/.jar, lib/java/.class, tmp/ - CLAUDE.md: v11 changelog documenting the rendering pipeline - All non-LLM tests pass (97/97)	2026-05-23 15:09:55 +08:00
panda	93ad5e8876	fix: address audit findings — session_id validation, streaming reset, state isolation - Replace truncated 12-char UUID with full 32-char UUID (128-bit entropy) - Add validate_session_id() regex check to prevent path traversal - Add _check_session_id() guard on all 6 API endpoints - Change _step_counter from module global to contextvars.ContextVar - Filter None values from node_state before merging into agent_state - Log save_session failures instead of silently swallowing them - Add finishStreaming() in catch/finally blocks to prevent UI lockup - Fix broken multiline docstring in chat() endpoint	2026-05-23 09:08:53 +08:00
panda	1e5ce9725b	feat: FastAPI+SSE API server, JRXML auto-reorder, session integrity fixes	2026-05-22 17:53:59 +08:00
panda	1144a86d02	fix: session persistence, multi-turn memory, OCR pipeline, download UX (v7) - graph.stream() state fix: agent_state now properly accumulates node updates - atomic session save (tempfile + os.replace) - uploaded_file_path injection for OcrExtractor + annotation_detector - download section always visible; refreshFromApi auto-reloads after generation - node_start/complete unfiltered for full progress visibility - modification_request without status=='pass' check	2026-05-22 11:13:25 +08:00
panda	a364e1de81	feat: 5-issue fix — OCR image parse bug + Vue frontend feature parity + streaming UX Fix 1 (CRITICAL): file_parser.py suffix normalization ".jpg", api_server.py Path.suffix Fix 2: Sidebar version history download, ProcessSection replaces old components Fix 3: OCR content/position layer structured logging in agent/nodes.py Fix 4: collapsible process sections with per-section stream routing + auto-fold Fix 5: agent_complete total_duration_ms, SummaryCard duration display - backend/file_parser.py: normalize suffix to always include leading dot - api_server.py: step_index in node_start, total_duration_ms in agent_complete - agent/nodes.py: _log_ocr_layers() for [内容层]/[位置层]/[合并] logging - frontend: ProcessSection.vue (NEW), chat.ts sections model, Sidebar versions - CLAUDE.md: updated component list and v6 changelog	2026-05-21 23:43:21 +08:00
panda	60e2f520ba	fix: image files silently falling to text parser due to suffix dot mismatch api_server.py passed "jpg" (no dot) from rsplit, but file_parser.py parser dict keys all have dots (".jpg"), causing image files to fall through to _parse_text() which fails on binary data, skipping ALL OCR and layout analysis. Every image upload was affected. - file_parser.py: normalize file_type to always have leading dot - api_server.py: use Path.suffix instead of manual rsplit	2026-05-21 23:05:27 +08:00
panda	aa1d8a6c52	fix: logging KeyError with reserved 'filename' key, pytest return-not-none warnings - api_server.py: rename 'filename' to 'file_name' in upload_file log extra dict to avoid collision with Python logging's reserved LogRecord attribute - test_e2e_ocr.py: replace return statements with assert in test functions to fix PytestReturnNotNoneWarning	2026-05-21 22:28:07 +08:00
panda	74f3f03d2c	feat: 前后端分离架构 — FastAPI SSE后端 + Vue 3前端将单体 Streamlit 应用拆分为三层架构： - api_server.py: FastAPI SSE 流式后端 (端口 8000) - frontend/: Vue 3 + Vite + Pinia 聊天前端 (端口 5173) - agent/graph.py: 新增 node_start 回调支持 - 更新启动脚本为三服务模式 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 20:04:27 +08:00

9 Commits