Root cause: LLM receiving full 34k-char JRXML would regenerate from scratch
instead of modifying coordinates in-place, shrinking output to ~3k chars.
Solution (programmatic node control, not prompt engineering):
- New agent/jrxml_windower.py: decompose JRXML into header (never sent to
LLM) + individual bands. Split bands >4000 chars at element boundaries.
Reassemble with element count validation (>10% change = rollback).
- Rewrite refine_layout: per-band windowed LLM processing (~2-4k chars
each). LLM cannot "reimagine" the entire report.
- Rewrite map_fields: 100% programmatic regex $F{field_N} -> real name
replacement. Zero LLM calls, zero content loss.
- _sanitize_field_name: non-ASCII chars escaped to _uXXXX_ format for
valid JRXML identifiers.
- Tests: 48 new unit tests (windower 28 + map_fields 20). All passing.
Full suite 385 tests, zero regressions.
- Replace truncated 12-char UUID with full 32-char UUID (128-bit entropy)
- Add validate_session_id() regex check to prevent path traversal
- Add _check_session_id() guard on all 6 API endpoints
- Change _step_counter from module global to contextvars.ContextVar
- Filter None values from node_state before merging into agent_state
- Log save_session failures instead of silently swallowing them
- Add finishStreaming() in catch/finally blocks to prevent UI lockup
- Fix broken multiline docstring in chat() endpoint
api_server.py passed "jpg" (no dot) from rsplit, but file_parser.py
parser dict keys all have dots (".jpg"), causing image files to fall
through to _parse_text() which fails on binary data, skipping ALL OCR
and layout analysis. Every image upload was affected.
- file_parser.py: normalize file_type to always have leading dot
- api_server.py: use Path.suffix instead of manual rsplit
3-phase pipeline to solve LLM prompt overflow from too many OCR elements:
Phase 1 (generate_skeleton): compressed layout schema → skeleton JRXML
Phase 2 (refine_layout): sampled coordinates → pixel-level position tuning
Phase 3 (map_fields): OCR field names → replace $F{field_N} placeholders
Only triggered when layout_schema.total_rows > 0 on initial_generation intent.
Text requests and all other intents are unaffected (zero behavior change).
- OCR: EasyOCR (primary, ch_sim+en) with PaddleOCR fallback for Windows compatibility
- Validation: _check_minimum_content() rejects empty-shell JRXML (no band/textField)
- Retry: MAX_RETRY 3→5, exhaustion records pending_failure_context for next-turn auto-injection
- Finalize: only saves jrxml_versions on pass, preserves last good final_jrxml on fail
- Extract JRXML: improved empty markdown block handling and XML fragment fallback
- UI: real-time node progress via placeholder updates, initial "analyzing" feedback
- UI: use agent_state (full) instead of node_state (partial) for summary card routing
- UI: unknown template_type now gives LLM meaningful image context instead of metadata
- Docs: updated CLAUDE.md and CODE_GUIDE.md to reflect all v3 changes
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add rag submodule for semantic JRXML chunk retrieval, refactor
retrieve node to use RAGSearcher, and fix missing api_key in
Anthropic SDK client initialization.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The langchain-anthropic wrapper fails auth with MiniMax because
it sends an api_key that conflicts with ANTHROPIC_AUTH_TOKEN at
the SDK level, causing the request to be sent with incorrect
auth headers. Use raw Anthropic SDK directly with a simple
MiniMaxLLM wrapper class instead.
Root cause: MiniMax requires the API key ONLY via ANTHROPIC_AUTH_TOKEN
(system env), not via api_key parameter or OPENAI_API_KEY. Setting
os.environ["NO_PROXY"]="*" is also needed to prevent httpx from
using a proxy that interferes with the auth header.
Note: E2E testing with streamlit run app.py still pending.
- Add LLM_PROVIDER env var (openai/anthropic) to switch cloud backend
- Use ChatAnthropic for anthropic provider with custom base_url
- Add CONTEXT_MAX_TOKENS, CONTEXT_KEEP_RECENT, SESSIONS_DIR,
HISTORY_MAX_SNAPSHOTS to .env and .env.example
- Add langchain-anthropic dependency to requirements.txt
Note: E2E testing blocked — the configured MiniMax API key
(sk-cp-...) returns 401 across all endpoints (Anthropic and OpenAI).
The API key may be expired or lack text-generation model access.