From bba0419695a80b6752e390d4ea2104102805cdd8 Mon Sep 17 00:00:00 2001 From: Doiiars Date: Fri, 7 Nov 2025 20:39:20 +0800 Subject: [PATCH 1/2] =?UTF-8?q?=E6=9B=B4=E6=96=B0readme?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- MindSpider/README.md | 29 ++++++++++++----------------- README-EN.md | 3 ++- README.md | 3 ++- 3 files changed, 16 insertions(+), 19 deletions(-) diff --git a/MindSpider/README.md b/MindSpider/README.md index 1f576b1..2dba233 100644 --- a/MindSpider/README.md +++ b/MindSpider/README.md @@ -1,10 +1,3 @@ -> [!warning] -> 好像最近项目中用来请求每日热点新闻的api接口被ban了,可以自己部署一下[newsnow](https://github.com/ourongxing/newsnow),很快的可以一键部署,然后替换掉这个URL即可,最近一个月我也会commit一版更通用的解决方案。 -> ```python -> #新闻API基础URL -> BASE URL = "https://newsnow.busiyi.world" -> ``` - # MindSpider - 专为舆情分析设计的AI爬虫 > 免责声明: @@ -193,7 +186,7 @@ flowchart TB - 记录任务状态、进度、结果等 5. **平台内容表**(继承自MediaCrawler) - - xhs_note - 小红书笔记 + - xhs_note - 小红书笔记(暂时废弃,详情查看:https://github.com/NanmiCoder/MediaCrawler/issues/754) - douyin_aweme - 抖音视频 - kuaishou_video - 快手视频 - bilibili_video - B站视频 @@ -206,10 +199,11 @@ flowchart TB ### 环境要求 - Python 3.9 或更高版本 -- MySQL 5.7 或更高版本 +- MySQL 5.7 或更高版本,或 PostgreSQL - Conda环境:pytorch_python11(推荐) - 操作系统:Windows/Linux/macOS + ### 1. 克隆项目 ```bash @@ -275,7 +269,7 @@ DB_PASSWORD = "your_password" DB_NAME = "mindspider" DB_CHARSET = "utf8mb4" -# DeepSeek API密钥 +# MINDSPIDER API密钥 MINDSPIDER_BASE_URL=your_api_base_url MINDSPIDER_API_KEY=sk-your-key MINDSPIDER_MODEL_NAME=deepseek-chat @@ -286,9 +280,6 @@ MINDSPIDER_MODEL_NAME=deepseek-chat ```bash # 检查系统状态 python main.py --status - -# 初始化数据库表 -python main.py --setup ``` ## 使用指南 @@ -325,7 +316,7 @@ python main.py --broad-topic --date 2024-01-15 **首次使用每个平台都需要登录,这是最关键的步骤:** -1. **小红书登录** +1. **小红书登录**(暂时废弃,详情查看:https://github.com/NanmiCoder/MediaCrawler/issues/754) ```bash # 测试小红书爬取(会弹出二维码) python main.py --deep-sentiment --platforms xhs --test @@ -369,6 +360,10 @@ python main.py --deep-sentiment --platforms zhihu --test 3. **手动处理验证**:有些平台可能需要手动滑动验证码 4. **重新登录**:删除 `DeepSentimentCrawling/MediaCrawler/browser_data/` 目录重新登录 +### 其他问题 + +https://github.com/666ghj/BettaFish/issues/185 + ### 爬取参数调整 在实际使用前建议调整爬取参数: @@ -394,8 +389,8 @@ python main.py --deep-sentiment --date 2024-01-15 #### 2. 指定平台爬取 ```bash -# 只爬取小红书和抖音 -python main.py --deep-sentiment --platforms xhs dy --test +# 只爬取B站和抖音 +python main.py --deep-sentiment --platforms bili dy --test # 爬取所有平台的特定数量内容 python main.py --deep-sentiment --max-keywords 30 --max-notes 20 @@ -405,7 +400,7 @@ python main.py --deep-sentiment --max-keywords 30 --max-notes 20 ```bash --status # 检查项目状态 ---setup # 初始化项目 +--setup # 初始化项目(废弃,已自动初始化) --broad-topic # 话题提取 --deep-sentiment # 爬虫模块 --complete # 完整流程 diff --git a/README-EN.md b/README-EN.md index b23db57..82f36cd 100644 --- a/README-EN.md +++ b/README-EN.md @@ -337,7 +337,7 @@ Recommended LLM API Provider: [Reasoning Era](https://aihubmix.com/?aff=8Ds9) ```bash # Local MySQL database initialization cd MindSpider -# Project initialization +# Project initialization, deprecated, initialization is now automatic. python main.py --setup ``` @@ -667,6 +667,7 @@ This project is licensed under the [GPL-2.0 License](LICENSE). Please see the LI ### Get Help - **Project Homepage**: [GitHub Repository](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem) +- **FAQ**: [Frequently Asked Questions](https://github.com/666ghj/BettaFish/issues/185) - **Issue Reporting**: [Issues Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues) - **Feature Requests**: [Discussions Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions) diff --git a/README.md b/README.md index cca162a..0150546 100644 --- a/README.md +++ b/README.md @@ -337,7 +337,7 @@ INSIGHT_ENGINE_MODEL_NAME= ```bash # 本地MySQL数据库初始化 cd MindSpider -# 项目初始化 +# 项目初始化(废弃,已自动初始化) python main.py --setup ``` @@ -665,6 +665,7 @@ class DeepSearchAgent: ### 获取帮助 - **项目主页**:[GitHub仓库](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem) +- **常见问题解答**:[FAQ](https://github.com/666ghj/BettaFish/issues/185) - **问题反馈**:[Issues页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues) - **功能建议**:[Discussions页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions) From 64e24532b550a78249a2607e34f8fc3aa4b8bb21 Mon Sep 17 00:00:00 2001 From: Zhang Yuxiang Date: Fri, 7 Nov 2025 22:50:42 +0800 Subject: [PATCH 2/2] fix(streamlit): disable file watcher to avoid torch.classes conflict --- docker-compose.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/docker-compose.yml b/docker-compose.yml index f4465b4..7210aff 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -9,6 +9,7 @@ services: restart: unless-stopped environment: - PYTHONUNBUFFERED=1 + - STREAMLIT_SERVER_ENABLE_FILE_WATCHER=false ports: - "5000:5000" - "8501:8501"