feat: v4 multimodal chat input, multi-format support, and annotation detection

- Replace st.chat_input with st-multimodal-chatinput (Ctrl+V paste, drag-drop, file button) - Extract _process_uploaded_file() shared handler (eliminates ~70 duplicated lines) - Add XLSX (openpyxl), XLS (xlrd), DOC (olefile) parsers to file_parser.py - Add backend/annotation_detector.py: circle detection (HoughCircles) + arrow detection (HoughLinesP clustering) + OCR correlation + LLM context formatting - Add annotation_result field to AgentState with session persistence - Wire annotation detection into process_input and _format_ocr_context - Add 11 new tests: 7 annotation detector + 4 multi-format parser - Update all docs: CLAUDE.md, README.md, CODE_GUIDE.md, ROADMAP.md
2026-05-20 23:43:16 +08:00
parent c9f003e1b7
commit 9bb011e429
16 changed files with 1257 additions and 164 deletions
@@ -284,13 +284,13 @@ class OcrExtractor:
        try:
            import numpy as np

-            easyocr_result = self._try_easyocr(np.array(img))
-            if easyocr_result:
-                return easyocr_result
-
            paddleocr_result = self._try_paddleocr(img, file_path)
            if paddleocr_result:
                return paddleocr_result
+
+            easyocr_result = self._try_easyocr(np.array(img))
+            if easyocr_result:
+                return easyocr_result
        except Exception:
            pass