feat: v4 multimodal chat input, multi-format support, and annotation detection

- Replace st.chat_input with st-multimodal-chatinput (Ctrl+V paste, drag-drop, file button) - Extract _process_uploaded_file() shared handler (eliminates ~70 duplicated lines) - Add XLSX (openpyxl), XLS (xlrd), DOC (olefile) parsers to file_parser.py - Add backend/annotation_detector.py: circle detection (HoughCircles) + arrow detection (HoughLinesP clustering) + OCR correlation + LLM context formatting - Add annotation_result field to AgentState with session persistence - Wire annotation detection into process_input and _format_ocr_context - Add 11 new tests: 7 annotation detector + 4 multi-format parser - Update all docs: CLAUDE.md, README.md, CODE_GUIDE.md, ROADMAP.md
2026-05-20 23:43:16 +08:00
parent c9f003e1b7
commit 9bb011e429
16 changed files with 1257 additions and 164 deletions
@@ -26,6 +26,24 @@ python-dotenv>=1.0.0
 httpx>=0.27.0
 tiktoken>=0.7.0

+# OCR 依赖（PaddleOCR 精确识别优先，EasyOCR 回退）
+# Pinned: paddleocr 2.9.x + paddlepaddle 2.6.x known-stable on Windows CPU
+# 3.x has ONEDNN compatibility issues on Windows
+paddleocr>=2.9.0,<3.0.0
+paddlepaddle>=2.6.0,<3.0.0
+easyocr>=1.7.0
+# 聊天输入增强（粘贴/拖拽上传）
+st-multimodal-chatinput>=0.2.1
+
+# 多格式文件解析
+openpyxl>=3.1.0
+xlrd>=2.0.0
+olefile>=0.47
+
+# 批注检测（圈选/箭头识别）
+opencv-python-headless>=4.8.0
+
 # 测试
 pytest>=8.0.0
 pytest-asyncio>=0.24.0
+xlwt>=1.3.0