Commit Graph

10 Commits

Author SHA1 Message Date
panda bd5bfbac2d fix: band-level windowed refine_layout + programmatic map_fields to prevent 91.5% content loss
Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.

Solution (programmatic node control, not prompt engineering):

- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
  LLM) + individual bands. Split bands >4000 chars at element boundaries.
  Reassemble with element count validation (>10% change = rollback).

- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
  each). LLM cannot "reimagine" the entire report.

- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
  replacement. Zero LLM calls, zero content loss.

- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
  valid JRXML identifiers.

- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
  Full suite 385 tests, zero regressions.
2026-05-24 08:55:38 +08:00
panda bb6cc6e241 feat: add Java JRXML-to-PNG rendering pipeline with pixel-level SSIM comparison
- lib/java/: Java renderer (JrxmlRenderer) using JasperReports 6.21.0
  - JrxmlDebug for diagnostics, JrxmlGen for format reference
  - download_jars.sh for one-time dependency setup
- agent/nodes.py: _render_jrxml_to_png() and _compute_pixel_similarity()
  - Pixel comparison integrates into validate node (SSIM < 0.4 fails)
  - Pixel fidelity context injected into correct_jrxml for targeted fixes
- tests/test_pixel_comparison.py: 15 unit tests (render, SSIM, integration)
- .gitignore: exclude lib/java/*.jar, lib/java/*.class, tmp/
- CLAUDE.md: v11 changelog documenting the rendering pipeline
- All non-LLM tests pass (97/97)
2026-05-23 15:09:55 +08:00
panda 93ad5e8876 fix: address audit findings — session_id validation, streaming reset, state isolation
- Replace truncated 12-char UUID with full 32-char UUID (128-bit entropy)
- Add validate_session_id() regex check to prevent path traversal
- Add _check_session_id() guard on all 6 API endpoints
- Change _step_counter from module global to contextvars.ContextVar
- Filter None values from node_state before merging into agent_state
- Log save_session failures instead of silently swallowing them
- Add finishStreaming() in catch/finally blocks to prevent UI lockup
- Fix broken multiline docstring in chat() endpoint
2026-05-23 09:08:53 +08:00
panda 1952d75f13 test: add unit/integration/E2E test suites, fix create_session bug, update docs
- Unit tests: test_session.py (27), test_error_kb.py (24), test_agent.py hardened
- Integration tests: test_api_integration.py (25) with FastAPI TestClient
- E2E tests: main-flows.spec.ts (8) with Playwright + API mocking
- Bug fix: backend/session.py create_session() missing session_id parameter
- Config: frontend/playwright.config.ts, npm run test:e2e
- Docs: update CLAUDE.md v9, .gitignore for test artifacts/eval reports
2026-05-23 08:38:29 +08:00
panda 1144a86d02 fix: session persistence, multi-turn memory, OCR pipeline, download UX (v7)
- graph.stream() state fix: agent_state now properly accumulates node updates

- atomic session save (tempfile + os.replace)

- uploaded_file_path injection for OcrExtractor + annotation_detector

- download section always visible; refreshFromApi auto-reloads after generation

- node_start/complete unfiltered for full progress visibility

- modification_request without status=='pass' check
2026-05-22 11:13:25 +08:00
panda 339d415322 fix: crash 'list' object has no attribute 'keys' on image upload, output disappearing on error
Root cause: layout_schema.regions is a list of region dicts, not a dict.
_log_ocr_layers() was calling .keys() on it, causing agent_error.

Also fixed: ProcessSection now stays visible after streaming ends (error or
completion), so generated content is not lost. Header shows ✓/✕/pulse indicators.
Error handler now refreshes session state for partial JRXML download.
2026-05-22 00:01:54 +08:00
panda d600cbf285 feat: add quick action buttons (preview/undo/reset) to sidebar
Sidebar now has 快捷操作 section matching Streamlit app functionality:
- 预览 — sends "预览报表" to preview current JRXML
- 撤销 — sends "撤销上一步修改" to revert last change
- 重置 — sends "重新来,清空当前报表" to reset session

Session store now tracks history_states for undo availability check.
2026-05-21 23:54:57 +08:00
panda a364e1de81 feat: 5-issue fix — OCR image parse bug + Vue frontend feature parity + streaming UX
Fix 1 (CRITICAL): file_parser.py suffix normalization ".jpg", api_server.py Path.suffix
Fix 2: Sidebar version history download, ProcessSection replaces old components
Fix 3: OCR content/position layer structured logging in agent/nodes.py
Fix 4: collapsible process sections with per-section stream routing + auto-fold
Fix 5: agent_complete total_duration_ms, SummaryCard duration display

- backend/file_parser.py: normalize suffix to always include leading dot
- api_server.py: step_index in node_start, total_duration_ms in agent_complete
- agent/nodes.py: _log_ocr_layers() for [内容层]/[位置层]/[合并] logging
- frontend: ProcessSection.vue (NEW), chat.ts sections model, Sidebar versions
- CLAUDE.md: updated component list and v6 changelog
2026-05-21 23:43:21 +08:00
panda 7c1aa7d934 docs: update architecture docs for Vue 3 + FastAPI separation, add one-click start.bat
- CLAUDE.md: remove duplicate architecture section, fix MAX_RETRY 5→3
- README.md: update architecture diagram to 3-tier, add start.bat instructions
- ROADMAP.md: add 阶段六 layered generation v5 (items 16-20)
- start.bat: one-click startup with auto port-kill and path-with-spaces fix
- package-lock.json: updated from npm install
2026-05-21 22:10:22 +08:00
panda 74f3f03d2c feat: 前后端分离架构 — FastAPI SSE后端 + Vue 3前端
将单体 Streamlit 应用拆分为三层架构:
- api_server.py: FastAPI SSE 流式后端 (端口 8000)
- frontend/: Vue 3 + Vite + Pinia 聊天前端 (端口 5173)
- agent/graph.py: 新增 node_start 回调支持
- 更新启动脚本为三服务模式

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 20:04:27 +08:00