feat: v4 multimodal chat input, multi-format support, and annotation detection
- Replace st.chat_input with st-multimodal-chatinput (Ctrl+V paste, drag-drop, file button) - Extract _process_uploaded_file() shared handler (eliminates ~70 duplicated lines) - Add XLSX (openpyxl), XLS (xlrd), DOC (olefile) parsers to file_parser.py - Add backend/annotation_detector.py: circle detection (HoughCircles) + arrow detection (HoughLinesP clustering) + OCR correlation + LLM context formatting - Add annotation_result field to AgentState with session persistence - Wire annotation detection into process_input and _format_ocr_context - Add 11 new tests: 7 annotation detector + 4 multi-format parser - Update all docs: CLAUDE.md, README.md, CODE_GUIDE.md, ROADMAP.md
This commit is contained in:
+39
-1
@@ -122,4 +122,42 @@
|
||||
10. 结构化日志系统
|
||||
```
|
||||
|
||||
阶段一立即可做,无外部依赖。阶段二是主要工作量。阶段三是收尾。阶段四是可观测性基础。
|
||||
---
|
||||
|
||||
## 阶段五:OCR 与智能上传 (v3/v4) ✓
|
||||
|
||||
### 11. OCR 单据字段精确提取 ✓
|
||||
- [x] `backend/ocr_extractor.py` — 4 策略优先级提取 (exact_match → kv_pair → regex → table_match)
|
||||
- [x] PaddleOCR 首次识别后将原始结果(含所有文本元素 + bbox坐标)持久化
|
||||
- [x] `_format_ocr_context()` — OCR 结果格式化为 LLM prompt 注入
|
||||
- [x] `process_input` 节点在上传图片时自动触发 OCR 字段提取
|
||||
- [x] OCR 结果持久化到会话文件
|
||||
|
||||
### 12. 多模态聊天输入 ✓
|
||||
- [x] `app.py` — `st.chat_input` 替换为 `st_multimodal_chatinput`
|
||||
- [x] 支持 Ctrl+V 粘贴文件 + 拖拽 + 文件按钮
|
||||
- [x] `_process_uploaded_file()` — 提取共享文件处理逻辑(消除 ~70 行重复代码)
|
||||
- [x] 剪贴板文件 base64 解码 + MIME type → 扩展名推断
|
||||
|
||||
### 13. 多格式文件支持 ✓
|
||||
- [x] `backend/file_parser.py` — 新增 XLSX (openpyxl)、XLS (xlrd)、DOC (olefile)
|
||||
- [x] 侧边栏上传器类型列表中新增 xlsx/xls/doc
|
||||
- [x] 单元测试: `tests/test_file_parser_formats.py` (4 tests)
|
||||
|
||||
### 14. 批注检测 ✓
|
||||
- [x] `backend/annotation_detector.py` — 圈选 + 箭头 + OCR 关联
|
||||
- [x] 圆圈检测: 红色通道增强 → HoughCircles
|
||||
- [x] 箭头检测: Canny → HoughLinesP → 线段聚类 → 端点方向判定
|
||||
- [x] `format_annotation_context()` — 批注结果格式化为中文提示
|
||||
- [x] `process_input` 节点在 OCR 提取后自动运行批注检测
|
||||
- [x] `annotation_result` 字段持久化到 AgentState + 会话文件
|
||||
- [x] 单元测试: `tests/test_annotation_detector.py` (7 tests)
|
||||
|
||||
### 15. OCR 上下文 LLM 注入 ✓
|
||||
- [x] `prompts/modification.md` — 新增 `{ocr_context}` 占位符
|
||||
- [x] `modify_jrxml` + `generate` 节点注入 OCR 上下文
|
||||
- [x] OCR 上下文包含: 结构化字段、全部文本元素(含坐标)、批注检测结果
|
||||
|
||||
---
|
||||
|
||||
阶段一立即可做,无外部依赖。阶段二是主要工作量。阶段三是收尾。阶段四是可观测性基础。阶段五是 OCR 智能增强和用户体验改进。
|
||||
|
||||
Reference in New Issue
Block a user