Update readme.
This commit is contained in:
+655
@@ -0,0 +1,655 @@
|
||||
<div align="center">
|
||||
|
||||
# 📊 Weibo Public Opinion Multi-Agent Analysis System
|
||||
|
||||
<img src="static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="600">
|
||||
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)
|
||||
|
||||
[English](./README-EN.md) | [中文文档](./README.md)
|
||||
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="static/image/banner_compressed.png" alt="banner" width="800">
|
||||
</div>
|
||||
|
||||
## 📝 Project Overview
|
||||
|
||||
**Weibo Public Opinion Multi-Agent Analysis System** is an innovative public opinion analysis platform built from scratch, utilizing multi-agent collaborative architecture to provide accurate, real-time, and comprehensive Weibo public opinion monitoring and analysis services. The system achieves full-process automation from data collection and sentiment analysis to report generation through the collaboration of five specialized AI agents.
|
||||
|
||||
### 🚀 Key Features
|
||||
|
||||
- **Multi-Agent Collaborative Architecture**: 5 specialized agents working together to complete the full process of public opinion analysis
|
||||
- **Comprehensive Data Collection**: Integrating Weibo crawlers, news search, multimedia content, and other multi-dimensional data sources
|
||||
- **Deep Sentiment Analysis**: Precise multilingual sentiment recognition based on fine-tuned BERT/GPT-2/Qwen models
|
||||
- **Intelligent Report Generation**: Automatically generate structured HTML analysis reports with custom template support
|
||||
- **Agent Forum Communication**: ForumEngine provides information sharing and collaborative decision-making platform for agents
|
||||
- **High-Performance Asynchronous Processing**: Support concurrent processing of multiple public opinion tasks with real-time status monitoring
|
||||
- **Cloud Data Support**: Convenient cloud database service with 100,000+ daily real data
|
||||
|
||||
## 🏗️ System Architecture
|
||||
|
||||
### Overall Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Frontend Display Layer"
|
||||
UI[Web Interface<br/>Flask + Streamlit]
|
||||
end
|
||||
|
||||
subgraph "Multi-Agent Collaboration Layer"
|
||||
QE[QueryEngine<br/>News Search Agent]
|
||||
ME[MediaEngine<br/>Multimedia Search Agent]
|
||||
IE[InsightEngine<br/>Deep Insight Agent]
|
||||
RE[ReportEngine<br/>Report Generation Agent]
|
||||
Forum[ForumEngine<br/>Agent Forum Communication Center]
|
||||
end
|
||||
|
||||
subgraph "Data Processing Layer"
|
||||
MS[MindSpider<br/>Weibo Crawler System]
|
||||
SA[SentimentAnalysis<br/>Sentiment Analysis Model Collection]
|
||||
DB[(MySQL<br/>Database)]
|
||||
end
|
||||
|
||||
subgraph "External Service Layer"
|
||||
LLM[LLM API<br/>DeepSeek/Kimi/Gemini]
|
||||
Search[Search API<br/>Tavily/Bocha]
|
||||
end
|
||||
|
||||
UI --> QE
|
||||
UI --> ME
|
||||
UI --> IE
|
||||
UI --> RE
|
||||
|
||||
QE --> Search
|
||||
ME --> Search
|
||||
IE --> MS
|
||||
IE --> SA
|
||||
|
||||
QE --> LLM
|
||||
ME --> LLM
|
||||
IE --> LLM
|
||||
RE --> LLM
|
||||
|
||||
MS --> DB
|
||||
SA --> DB
|
||||
|
||||
%% Agent Forum Communication Mechanism
|
||||
QE <--> Forum
|
||||
ME <--> Forum
|
||||
IE <--> Forum
|
||||
RE <--> Forum
|
||||
```
|
||||
|
||||
### Agent Collaboration Workflow
|
||||
|
||||
The system's core workflow is based on multi-agent collaboration:
|
||||
|
||||
1. **QueryEngine (News Query Agent)**: Uses Tavily API to search authoritative news reports, providing official information sources
|
||||
2. **MediaEngine (Multimedia Search Agent)**: Conducts multimodal content search through Bocha API to gather social media perspectives
|
||||
3. **InsightEngine (Deep Insight Agent)**: Queries local Weibo database, combines multiple sentiment analysis models for deep analysis
|
||||
4. **ForumEngine (Forum Monitoring Agent)**: Real-time monitoring of agent log outputs, extracts key information and promotes collaboration
|
||||
5. **ReportEngine (Report Generation Agent)**: Based on analysis results from all agents, uses Gemini LLM to generate comprehensive HTML reports
|
||||
|
||||
### Project Code Structure
|
||||
|
||||
```
|
||||
Weibo_PublicOpinion_AnalysisSystem/
|
||||
├── QueryEngine/ # News Query Engine Agent
|
||||
│ ├── agent.py # Agent main logic
|
||||
│ ├── llms/ # LLM interface wrapper
|
||||
│ ├── nodes/ # Processing nodes
|
||||
│ ├── tools/ # Search tools
|
||||
│ └── utils/ # Utility functions
|
||||
├── MediaEngine/ # Multimedia Search Engine Agent
|
||||
│ ├── agent.py # Agent main logic
|
||||
│ ├── llms/ # LLM interfaces
|
||||
│ ├── tools/ # Search tools
|
||||
│ └── ... # Other modules
|
||||
├── InsightEngine/ # Data Insight Engine Agent
|
||||
│ ├── agent.py # Agent main logic
|
||||
│ ├── llms/ # LLM interface wrapper
|
||||
│ │ ├── deepseek.py # DeepSeek API
|
||||
│ │ ├── kimi.py # Kimi API
|
||||
│ │ ├── openai_llm.py # OpenAI format API
|
||||
│ │ └── base.py # LLM base class
|
||||
│ ├── nodes/ # Processing nodes
|
||||
│ │ ├── first_search_node.py # First search node
|
||||
│ │ ├── reflection_node.py # Reflection node
|
||||
│ │ ├── summary_nodes.py # Summary nodes
|
||||
│ │ ├── search_node.py # Search node
|
||||
│ │ ├── sentiment_node.py # Sentiment analysis node
|
||||
│ │ └── insight_node.py # Insight generation node
|
||||
│ ├── tools/ # Database query and analysis tools
|
||||
│ │ ├── media_crawler_db.py # Database query tool
|
||||
│ │ └── sentiment_analyzer.py # Sentiment analysis integration tool
|
||||
│ ├── state/ # State management
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── state.py # Agent state definition
|
||||
│ ├── prompts/ # Prompt templates
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── prompts.py # Various prompts
|
||||
│ └── utils/ # Utility functions
|
||||
│ ├── __init__.py
|
||||
│ ├── config.py # Configuration management
|
||||
│ └── helpers.py # Helper functions
|
||||
├── ReportEngine/ # Report Generation Engine Agent
|
||||
│ ├── agent.py # Agent main logic
|
||||
│ ├── llms/ # LLM interfaces
|
||||
│ │ └── gemini.py # Gemini API dedicated
|
||||
│ ├── nodes/ # Report generation nodes
|
||||
│ │ ├── template_selection.py # Template selection node
|
||||
│ │ └── html_generation.py # HTML generation node
|
||||
│ ├── report_template/ # Report template library
|
||||
│ │ ├── 社会公共热点事件分析.md
|
||||
│ │ ├── 商业品牌舆情监测.md
|
||||
│ │ └── ... # More templates
|
||||
│ └── flask_interface.py # Flask API interface
|
||||
├── ForumEngine/ # Forum Communication Engine Agent
|
||||
│ └── monitor.py # Log monitoring and forum management
|
||||
├── MindSpider/ # Weibo Crawler System
|
||||
│ ├── main.py # Crawler main program
|
||||
│ ├── BroadTopicExtraction/ # Topic extraction module
|
||||
│ │ ├── get_today_news.py # Today's news fetching
|
||||
│ │ └── topic_extractor.py # Topic extractor
|
||||
│ ├── DeepSentimentCrawling/ # Deep sentiment crawling
|
||||
│ │ ├── MediaCrawler/ # Media crawler core
|
||||
│ │ └── platform_crawler.py # Platform crawler management
|
||||
│ └── schema/ # Database schema
|
||||
│ └── init_database.py # Database initialization
|
||||
├── SentimentAnalysisModel/ # Sentiment Analysis Model Collection
|
||||
│ ├── WeiboSentiment_Finetuned/ # Fine-tuned BERT/GPT-2 models
|
||||
│ ├── WeiboMultilingualSentiment/ # Multilingual sentiment analysis
|
||||
│ ├── WeiboSentiment_SmallQwen/ # Small Qwen model
|
||||
│ └── WeiboSentiment_MachineLearning/ # Traditional machine learning methods
|
||||
├── SingleEngineApp/ # Individual Agent Streamlit apps
|
||||
│ ├── query_engine_streamlit_app.py
|
||||
│ ├── media_engine_streamlit_app.py
|
||||
│ └── insight_engine_streamlit_app.py
|
||||
├── templates/ # Flask templates
|
||||
│ └── index.html # Main interface template
|
||||
├── static/ # Static resources
|
||||
├── logs/ # Runtime log directory
|
||||
├── app.py # Flask main application entry
|
||||
├── config.py # Global configuration file
|
||||
└── requirements.txt # Python dependency list
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### System Requirements
|
||||
|
||||
- **Operating System**: Windows 10/11 (Linux/macOS also supported)
|
||||
- **Python Version**: 3.11+
|
||||
- **Conda**: Anaconda or Miniconda
|
||||
- **Database**: MySQL 8.0+ (or choose our cloud database service)
|
||||
- **Memory**: 8GB+ recommended
|
||||
|
||||
### 1. Create Conda Environment
|
||||
|
||||
```bash
|
||||
# Create conda environment named pytorch_python11
|
||||
conda create -n pytorch_python11 python=3.11
|
||||
conda activate pytorch_python11
|
||||
```
|
||||
|
||||
### 2. Install Dependencies
|
||||
|
||||
```bash
|
||||
# Install basic dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# If you need local sentiment analysis functionality, install PyTorch
|
||||
# CPU version
|
||||
pip install torch torchvision torchaudio
|
||||
|
||||
# CUDA 11.8 version (if you have GPU)
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||
|
||||
# Install transformers and other AI-related dependencies
|
||||
pip install transformers scikit-learn xgboost
|
||||
```
|
||||
|
||||
### 3. Install Playwright Browser Drivers
|
||||
|
||||
```bash
|
||||
# Install browser drivers (for crawler functionality)
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
### 4. System Configuration
|
||||
|
||||
#### 4.1 Configure API Keys
|
||||
|
||||
Edit the `config.py` file and fill in your API keys:
|
||||
|
||||
```python
|
||||
# MySQL Database Configuration
|
||||
DB_HOST = "localhost"
|
||||
DB_PORT = 3306
|
||||
DB_USER = "your_username"
|
||||
DB_PASSWORD = "your_password"
|
||||
DB_NAME = "weibo_analysis"
|
||||
DB_CHARSET = "utf8mb4"
|
||||
|
||||
# DeepSeek API (Apply at: https://www.deepseek.com/)
|
||||
DEEPSEEK_API_KEY = "your_deepseek_api_key"
|
||||
|
||||
# Tavily Search API (Apply at: https://www.tavily.com/)
|
||||
TAVILY_API_KEY = "your_tavily_api_key"
|
||||
|
||||
# Kimi API (Apply at: https://www.kimi.com/)
|
||||
KIMI_API_KEY = "your_kimi_api_key"
|
||||
|
||||
# Gemini API (Apply at: https://api.chataiapi.com/)
|
||||
GEMINI_API_KEY = "your_gemini_api_key"
|
||||
|
||||
# Bocha Search API (Apply at: https://open.bochaai.com/)
|
||||
BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
|
||||
|
||||
# Silicon Flow API (Apply at: https://siliconflow.cn/)
|
||||
GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
|
||||
```
|
||||
|
||||
#### 4.2 Database Initialization
|
||||
|
||||
**Option 1: Use Local Database**
|
||||
```bash
|
||||
# Local MySQL database initialization
|
||||
cd MindSpider
|
||||
python schema/init_database.py
|
||||
```
|
||||
|
||||
**Option 2: Use Cloud Database Service (Recommended)**
|
||||
|
||||
We provide convenient cloud database service with 100,000+ daily real Weibo data, currently **free application** during the promotion period!
|
||||
|
||||
- Real Weibo data, updated in real-time
|
||||
- Pre-processed sentiment annotation data
|
||||
- Multi-dimensional tag classification
|
||||
- High-availability cloud service
|
||||
- Professional technical support
|
||||
|
||||
**Contact us to apply for free cloud database access: 📧 670939375@qq.com**
|
||||
|
||||
### 5. Launch System
|
||||
|
||||
#### 5.1 Complete System Launch (Recommended)
|
||||
|
||||
```bash
|
||||
# In project root directory, activate conda environment
|
||||
conda activate pytorch_python11
|
||||
|
||||
# Start main application (automatically starts all agents)
|
||||
python app.py
|
||||
```
|
||||
|
||||
Visit http://localhost:5000 to use the complete system
|
||||
|
||||
#### 5.2 Launch Individual Agents
|
||||
|
||||
```bash
|
||||
# Start QueryEngine
|
||||
streamlit run SingleEngineApp/query_engine_streamlit_app.py --server.port 8503
|
||||
|
||||
# Start MediaEngine
|
||||
streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
|
||||
|
||||
# Start InsightEngine
|
||||
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
|
||||
```
|
||||
|
||||
#### 5.3 Standalone Crawler System
|
||||
|
||||
```bash
|
||||
# Enter crawler directory
|
||||
cd MindSpider
|
||||
|
||||
# Project initialization
|
||||
python main.py --setup
|
||||
|
||||
# Run complete crawler workflow
|
||||
python main.py --complete --date 2024-01-20
|
||||
|
||||
# Run topic extraction only
|
||||
python main.py --broad-topic --date 2024-01-20
|
||||
|
||||
# Run deep crawling only
|
||||
python main.py --deep-sentiment --platforms xhs dy wb
|
||||
```
|
||||
|
||||
## 💾 Database Configuration
|
||||
|
||||
### Local Database Configuration
|
||||
|
||||
1. **Install MySQL 8.0+**
|
||||
2. **Create Database**:
|
||||
```sql
|
||||
CREATE DATABASE weibo_analysis CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
|
||||
```
|
||||
3. **Run Initialization Script**:
|
||||
```bash
|
||||
cd MindSpider
|
||||
python schema/init_database.py
|
||||
```
|
||||
|
||||
### Auto-Crawling Configuration
|
||||
|
||||
Configure automatic crawling tasks for continuous data updates:
|
||||
|
||||
```python
|
||||
# Configure crawler parameters in MindSpider/config.py
|
||||
CRAWLER_CONFIG = {
|
||||
'max_pages': 200, # Maximum pages to crawl
|
||||
'delay': 1, # Request delay (seconds)
|
||||
'timeout': 30, # Timeout (seconds)
|
||||
'platforms': ['xhs', 'dy', 'wb', 'bili'], # Crawling platforms
|
||||
'daily_keywords': 100, # Daily keywords count
|
||||
'max_notes_per_keyword': 50, # Max content per keyword
|
||||
'use_proxy': False, # Whether to use proxy
|
||||
}
|
||||
```
|
||||
|
||||
### Cloud Database Service (Recommended)
|
||||
|
||||
**Why Choose Our Cloud Database Service?**
|
||||
|
||||
- **Rich Data Sources**: 100,000+ daily real Weibo data covering hot topics across all industries
|
||||
- **High-Quality Annotations**: Professional team manually annotated sentiment data with 95%+ accuracy
|
||||
- **Multi-Dimensional Analysis**: Including topic classification, sentiment tendency, influence scoring and other multi-dimensional tags
|
||||
- **Real-Time Updates**: 24/7 continuous data collection ensuring timeliness
|
||||
- **Technical Support**: Professional team providing technical support and customization services
|
||||
|
||||
**Application Method**:
|
||||
📧 Email Contact: 670939375@qq.com
|
||||
📝 Email Subject: Apply for Weibo Public Opinion Cloud Database Access
|
||||
📝 Email Content: Please describe your use case and expected data volume requirements
|
||||
|
||||
**Promotion Period Benefits**:
|
||||
- Free basic cloud database access
|
||||
- Free technical support and deployment guidance
|
||||
- Priority access to new features
|
||||
|
||||
## ⚙️ Advanced Configuration
|
||||
|
||||
### Modify Key Parameters
|
||||
|
||||
#### Agent Configuration Parameters
|
||||
|
||||
Each agent has dedicated configuration files that can be adjusted according to needs:
|
||||
|
||||
```python
|
||||
# QueryEngine/utils/config.py
|
||||
class Config:
|
||||
max_reflections = 2 # Reflection rounds
|
||||
max_search_results = 15 # Maximum search results
|
||||
max_content_length = 8000 # Maximum content length
|
||||
|
||||
# MediaEngine/utils/config.py
|
||||
class Config:
|
||||
comprehensive_search_limit = 10 # Comprehensive search limit
|
||||
web_search_limit = 15 # Web search limit
|
||||
|
||||
# InsightEngine/utils/config.py
|
||||
class Config:
|
||||
default_search_topic_globally_limit = 200 # Global search limit
|
||||
default_get_comments_limit = 500 # Comment retrieval limit
|
||||
max_search_results_for_llm = 50 # Max results for LLM
|
||||
```
|
||||
|
||||
#### Sentiment Analysis Model Configuration
|
||||
|
||||
```python
|
||||
# InsightEngine/tools/sentiment_analyzer.py
|
||||
SENTIMENT_CONFIG = {
|
||||
'model_type': 'multilingual', # Options: 'bert', 'multilingual', 'qwen'
|
||||
'confidence_threshold': 0.8, # Confidence threshold
|
||||
'batch_size': 32, # Batch size
|
||||
'max_sequence_length': 512, # Max sequence length
|
||||
}
|
||||
```
|
||||
|
||||
### Integrate Different LLM Models
|
||||
|
||||
The system supports multiple LLM providers, switchable in each agent's configuration:
|
||||
|
||||
```python
|
||||
# Configure in each Engine's utils/config.py
|
||||
class Config:
|
||||
default_llm_provider = "deepseek" # Options: "deepseek", "openai", "kimi", "gemini"
|
||||
|
||||
# DeepSeek configuration
|
||||
deepseek_api_key = "your_api_key"
|
||||
deepseek_model = "deepseek-chat"
|
||||
|
||||
# OpenAI compatible configuration
|
||||
openai_api_key = "your_api_key"
|
||||
openai_model = "gpt-3.5-turbo"
|
||||
openai_base_url = "https://api.openai.com/v1"
|
||||
|
||||
# Kimi configuration
|
||||
kimi_api_key = "your_api_key"
|
||||
kimi_model = "moonshot-v1-8k"
|
||||
|
||||
# Gemini configuration
|
||||
gemini_api_key = "your_api_key"
|
||||
gemini_model = "gemini-pro"
|
||||
```
|
||||
|
||||
### Change Sentiment Analysis Models
|
||||
|
||||
The system integrates multiple sentiment analysis methods, selectable based on needs:
|
||||
|
||||
#### 1. BERT-based Fine-tuned Model (Highest Accuracy)
|
||||
|
||||
```bash
|
||||
# Use BERT Chinese model
|
||||
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
|
||||
python predict.py --text "This product is really great"
|
||||
```
|
||||
|
||||
#### 2. GPT-2 LoRA Fine-tuned Model (Faster Speed)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
|
||||
python predict.py --text "I'm not feeling great today"
|
||||
```
|
||||
|
||||
#### 3. Small Qwen Model (Balanced)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
|
||||
python predict_universal.py --text "This event was very successful"
|
||||
```
|
||||
|
||||
#### 4. Traditional Machine Learning Methods (Lightweight)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
|
||||
python predict.py --model_type "svm" --text "Service attitude needs improvement"
|
||||
```
|
||||
|
||||
#### 5. Multilingual Sentiment Analysis (Supports 22 Languages)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboMultilingualSentiment
|
||||
python predict.py --text "This product is amazing!" --lang "en"
|
||||
```
|
||||
|
||||
### Integrate Custom Business Database
|
||||
|
||||
#### 1. Modify Database Connection Configuration
|
||||
|
||||
```python
|
||||
# Add your business database configuration in config.py
|
||||
BUSINESS_DB_HOST = "your_business_db_host"
|
||||
BUSINESS_DB_PORT = 3306
|
||||
BUSINESS_DB_USER = "your_business_user"
|
||||
BUSINESS_DB_PASSWORD = "your_business_password"
|
||||
BUSINESS_DB_NAME = "your_business_database"
|
||||
```
|
||||
|
||||
#### 2. Create Custom Data Access Tools
|
||||
|
||||
```python
|
||||
# InsightEngine/tools/custom_db_tool.py
|
||||
class CustomBusinessDBTool:
|
||||
"""Custom business database query tool"""
|
||||
|
||||
def __init__(self):
|
||||
self.connection_config = {
|
||||
'host': config.BUSINESS_DB_HOST,
|
||||
'port': config.BUSINESS_DB_PORT,
|
||||
'user': config.BUSINESS_DB_USER,
|
||||
'password': config.BUSINESS_DB_PASSWORD,
|
||||
'database': config.BUSINESS_DB_NAME,
|
||||
}
|
||||
|
||||
def search_business_data(self, query: str, table: str):
|
||||
"""Query business data"""
|
||||
# Implement your business logic
|
||||
pass
|
||||
|
||||
def get_customer_feedback(self, product_id: str):
|
||||
"""Get customer feedback data"""
|
||||
# Implement customer feedback query logic
|
||||
pass
|
||||
```
|
||||
|
||||
#### 3. Integrate into InsightEngine
|
||||
|
||||
```python
|
||||
# Integrate custom tools in InsightEngine/agent.py
|
||||
from .tools.custom_db_tool import CustomBusinessDBTool
|
||||
|
||||
class DeepSearchAgent:
|
||||
def __init__(self, config=None):
|
||||
# ... other initialization code
|
||||
self.custom_db_tool = CustomBusinessDBTool()
|
||||
|
||||
def execute_custom_search(self, query: str):
|
||||
"""Execute custom business data search"""
|
||||
return self.custom_db_tool.search_business_data(query, "your_table")
|
||||
```
|
||||
|
||||
### Custom Report Templates
|
||||
|
||||
#### 1. Create Template Files
|
||||
|
||||
Create new Markdown templates in the `ReportEngine/report_template/` directory:
|
||||
|
||||
```markdown
|
||||
<!-- Enterprise Brand Monitoring Report.md -->
|
||||
# Enterprise Brand Public Opinion Monitoring Report
|
||||
|
||||
## 📊 Executive Summary
|
||||
{executive_summary}
|
||||
|
||||
## 🔍 Brand Mention Analysis
|
||||
### Mention Volume Trends
|
||||
{mention_trend}
|
||||
|
||||
### Sentiment Distribution
|
||||
{sentiment_distribution}
|
||||
|
||||
## 📈 Competitor Analysis
|
||||
{competitor_analysis}
|
||||
|
||||
## 🎯 Key Insights Summary
|
||||
{key_insights}
|
||||
|
||||
## ⚠️ Risk Alerts
|
||||
{risk_alerts}
|
||||
|
||||
## 📋 Improvement Recommendations
|
||||
{recommendations}
|
||||
|
||||
---
|
||||
*Report Type: Enterprise Brand Public Opinion Monitoring*
|
||||
*Generation Time: {generation_time}*
|
||||
*Data Sources: {data_sources}*
|
||||
```
|
||||
|
||||
#### 2. Use in Web Interface
|
||||
|
||||
The system supports uploading custom template files (.md or .txt format), selectable when generating reports.
|
||||
|
||||
## 🤝 Contributing Guide
|
||||
|
||||
We welcome all forms of contributions!
|
||||
|
||||
### How to Contribute
|
||||
|
||||
1. **Fork the project** to your GitHub account
|
||||
2. **Create Feature branch**: `git checkout -b feature/AmazingFeature`
|
||||
3. **Commit changes**: `git commit -m 'Add some AmazingFeature'`
|
||||
4. **Push to branch**: `git push origin feature/AmazingFeature`
|
||||
5. **Open Pull Request**
|
||||
|
||||
### Contribution Types
|
||||
|
||||
- 🐛 Bug fixes
|
||||
- ✨ New feature development
|
||||
- 📚 Documentation improvements
|
||||
- 🎨 UI/UX improvements
|
||||
- ⚡ Performance optimization
|
||||
- 🧪 Test case additions
|
||||
|
||||
### Development Standards
|
||||
|
||||
- Code follows PEP8 standards
|
||||
- Commit messages use clear Chinese/English descriptions
|
||||
- New features need corresponding test cases
|
||||
- Update related documentation
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is licensed under the [MIT License](LICENSE). Please see the LICENSE file for details.
|
||||
|
||||
## 🎉 Support & Contact
|
||||
|
||||
### Get Help
|
||||
|
||||
- **Project Homepage**: [GitHub Repository](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
|
||||
- **Issue Reporting**: [Issues Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
- **Feature Requests**: [Discussions Page](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions)
|
||||
|
||||
### Contact Information
|
||||
|
||||
- 📧 **Email**: 670939375@qq.com
|
||||
- 💬 **QQ Group**: [Join Technical Discussion Group]
|
||||
- 🐦 **WeChat**: [Scan QR Code for Technical Support]
|
||||
|
||||
### Business Cooperation
|
||||
|
||||
- 🏢 **Enterprise Custom Development**
|
||||
- 📊 **Big Data Services**
|
||||
- 🎓 **Academic Collaboration**
|
||||
- 💼 **Technical Training**
|
||||
|
||||
### Cloud Service Application
|
||||
|
||||
**Free Cloud Database Service Application**:
|
||||
📧 Send email to: 670939375@qq.com
|
||||
📝 Subject: Weibo Public Opinion Cloud Database Application
|
||||
📝 Description: Your use case and requirements
|
||||
|
||||
## 👥 Contributors
|
||||
|
||||
Thanks to these excellent contributors:
|
||||
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**⭐ If this project helps you, please give us a star!**
|
||||
|
||||
Made with ❤️ by [Weibo Public Opinion Analysis Team](https://github.com/666ghj)
|
||||
|
||||
</div>
|
||||
@@ -1,35 +1,37 @@
|
||||
<div align="center">
|
||||
|
||||
<!-- # 📊 Weibo Public Opinion Analysis System -->
|
||||
<img src="static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="600">
|
||||
|
||||
<img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/logo_compressed.png" alt="Weibo Public Opinion Analysis System Logo" width="800">
|
||||
# 微舆 - 致力于打造简洁通用的舆情分析平台
|
||||
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/stargazers)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/network)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/LICENSE)
|
||||
|
||||
[English](./README-EN.md) | [中文文档](./README.md)
|
||||
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<img src="https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/blob/main/static/image/banner_compressed.png" alt="banner" width="800">
|
||||
<img src="static/image/system_schematic.png" alt="banner" width="800">
|
||||
</div>
|
||||
|
||||
## 项目概述
|
||||
## 📝 项目概述
|
||||
|
||||
**Weibo舆情分析多智能体系统** 是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过多个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。
|
||||
**微博舆情分析多智能体系统**是一个从零构建的创新型舆情分析平台,采用多Agent协作架构,致力于提供准确、实时、全面的微博舆情监测与分析服务。系统通过五个专门化的AI Agent协同工作,实现了从数据采集、情感分析到报告生成的全流程自动化。
|
||||
|
||||
### 核心特色
|
||||
### 🚀 核心亮点
|
||||
|
||||
- **多智能体协作架构**:5个专门化Agent协同工作,各司其职
|
||||
- **全方位数据采集**:整合微博爬虫、新闻搜索、网络信息多维度数据源
|
||||
- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准情感识别
|
||||
- **智能报告生成**:自动生成结构化HTML分析报告
|
||||
- **Agent论坛交流**:Forum Engine提供Agent间信息共享和协作决策平台
|
||||
- **高性能异步处理**:支持并发处理多个舆情任务
|
||||
- **多智能体协作架构**:5个专门化Agent各司其职,协同工作完成舆情分析全流程
|
||||
- **全方位数据采集**:整合微博爬虫、新闻搜索、多媒体内容等多维度数据源
|
||||
- **深度情感分析**:基于微调BERT/GPT-2/Qwen模型的精准多语言情感识别
|
||||
- **智能报告生成**:自动生成结构化HTML分析报告,支持自定义模板
|
||||
- **Agent论坛交流**:ForumEngine提供Agent间信息共享和协作决策平台
|
||||
- **高性能异步处理**:支持并发处理多个舆情任务,实时状态监控
|
||||
- **云端数据支持**:提供便捷云数据库服务,日均10万+真实数据
|
||||
|
||||
## 系统架构
|
||||
## 🏗️ 系统架构
|
||||
|
||||
### 整体架构图
|
||||
|
||||
@@ -49,7 +51,7 @@ graph TB
|
||||
|
||||
subgraph "数据处理层"
|
||||
MS[MindSpider<br/>微博爬虫系统]
|
||||
SA[SentimentAnalysis<br/>情感分析模型]
|
||||
SA[SentimentAnalysis<br/>情感分析模型集合]
|
||||
DB[(MySQL<br/>数据库)]
|
||||
end
|
||||
|
||||
@@ -81,129 +83,110 @@ graph TB
|
||||
ME <--> Forum
|
||||
IE <--> Forum
|
||||
RE <--> Forum
|
||||
|
||||
style UI fill:#e1f5fe
|
||||
style QE fill:#fff3e0
|
||||
style ME fill:#fff3e0
|
||||
style IE fill:#fff3e0
|
||||
style RE fill:#f3e5f5
|
||||
style Forum fill:#e8f5e9
|
||||
style MS fill:#fce4ec
|
||||
style SA fill:#fce4ec
|
||||
style DB fill:#fff9c4
|
||||
style LLM fill:#e3f2fd
|
||||
style Search fill:#e3f2fd
|
||||
```
|
||||
|
||||
### 数据流程图
|
||||
### Agent协作流程
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User as 用户
|
||||
participant UI as Web界面
|
||||
participant QE as QueryEngine
|
||||
participant ME as MediaEngine
|
||||
participant IE as InsightEngine
|
||||
participant Forum as ForumEngine
|
||||
participant RE as ReportEngine
|
||||
participant DB as 数据库
|
||||
|
||||
User->>UI: 输入查询关键词
|
||||
UI->>QE: 发起搜索请求
|
||||
UI->>ME: 发起搜索请求
|
||||
UI->>IE: 发起搜索请求
|
||||
|
||||
Note over QE,IE: Agent执行前先读取论坛信息
|
||||
QE->>Forum: 读取论坛交流信息
|
||||
ME->>Forum: 读取论坛交流信息
|
||||
IE->>Forum: 读取论坛交流信息
|
||||
|
||||
par 并行处理与持续思维链交流
|
||||
Note over QE: 结构思考→反思搜索→持续交流
|
||||
QE->>QE: 确定新闻搜索结构
|
||||
QE->>Forum: 思维链交流(结构思考)
|
||||
QE->>QE: 多步反思与搜索分析
|
||||
QE->>Forum: 思维链交流(搜索进展)
|
||||
QE->>QE: 生成汇总报告
|
||||
QE->>Forum: 思维链交流(关键发现)
|
||||
and
|
||||
Note over ME: 结构思考→反思搜索→持续交流
|
||||
ME->>ME: 确定多媒体搜索结构
|
||||
ME->>Forum: 思维链交流(结构思考)
|
||||
ME->>ME: 多步反思与搜索分析
|
||||
ME->>Forum: 思维链交流(搜索进展)
|
||||
ME->>ME: 生成汇总报告
|
||||
ME->>Forum: 思维链交流(关键发现)
|
||||
and
|
||||
Note over IE: 结构思考→反思搜索→持续交流
|
||||
IE->>IE: 确定洞察分析结构
|
||||
IE->>Forum: 思维链交流(结构思考)
|
||||
IE->>DB: 查询微博数据
|
||||
IE->>IE: 多步反思与情感洞察
|
||||
IE->>Forum: 思维链交流(洞察进展)
|
||||
IE->>IE: 生成汇总报告
|
||||
IE->>Forum: 思维链交流(关键发现)
|
||||
end
|
||||
|
||||
Note over Forum: 论坛汇总Agent交流信息
|
||||
Forum->>RE: 触发报告生成
|
||||
RE->>Forum: 读取所有Agent的交流信息
|
||||
RE->>QE: 获取QueryEngine汇总报告
|
||||
RE->>ME: 获取MediaEngine汇总报告
|
||||
RE->>IE: 获取InsightEngine汇总报告
|
||||
|
||||
Note over RE: ReportEngine智能报告生成
|
||||
RE->>RE: 读取模板库与样式库并选择
|
||||
RE->>RE: 分步思考生成报告各部分
|
||||
RE->>RE: 整合生成最终报告
|
||||
RE->>UI: 生成综合HTML报告
|
||||
UI->>User: 展示分析结果
|
||||
```
|
||||
系统核心工作流程基于多Agent协作模式:
|
||||
|
||||
## 项目结构
|
||||
1. **QueryEngine(新闻查询Agent)**:使用Tavily API搜索权威新闻报道,提供官方信息源
|
||||
2. **MediaEngine(多媒体搜索Agent)**:通过Bocha API进行多模态内容搜索,获取社交媒体观点
|
||||
3. **InsightEngine(深度洞察Agent)**:查询本地微博数据库,结合多种情感分析模型进行深度分析
|
||||
4. **ForumEngine(论坛监控Agent)**:实时监控各Agent日志输出,提取关键信息并促进协作
|
||||
5. **ReportEngine(报告生成Agent)**:基于所有Agent的分析结果,使用Gemini LLM生成综合HTML报告
|
||||
|
||||
### 项目代码结构
|
||||
|
||||
```
|
||||
Weibo_PublicOpinion_AnalysisSystem/
|
||||
├── QueryEngine/ # web查询引擎Agent
|
||||
│ ├── agent.py # Agent主逻辑
|
||||
│ ├── llms/ # LLM接口封装
|
||||
│ ├── nodes/ # 处理节点
|
||||
│ ├── tools/ # 搜索工具
|
||||
│ └── utils/ # 工具函数
|
||||
├── MediaEngine/ # 媒体引擎Agent
|
||||
│ └── (类似结构)
|
||||
├── InsightEngine/ # 数据库引擎Agent
|
||||
│ └── (类似结构)
|
||||
├── ReportEngine/ # 报告生成Agent
|
||||
│ ├── report_template/ # 报告模板
|
||||
│ └── flask_interface.py # API接口
|
||||
├── ForumEgine/ # 论坛交流Agent
|
||||
│ └── monitor.py # 论坛交流管理器
|
||||
├── MindSpider/ # 微博爬虫系统
|
||||
│ ├── BroadTopicExtraction/ # 话题提取
|
||||
│ ├── DeepSentimentCrawling/ # 深度爬取
|
||||
│ └── schema/ # 数据库结构
|
||||
├── SentimentAnalysisModel/ # 情感分析模型
|
||||
│ ├── BertTopicDetection_Finetuned/
|
||||
│ ├── WeiboSentiment_Finetuned/
|
||||
│ └── WeiboSentiment_MachineLearning/
|
||||
├── SingleEngineApp/ # Streamlit应用
|
||||
├── templates/ # Flask模板
|
||||
├── static/ # 静态资源
|
||||
├── logs/ # 运行日志
|
||||
├── app.py # 主应用入口
|
||||
├── config.py # 配置文件
|
||||
└── requirements.txt # 依赖包
|
||||
├── QueryEngine/ # 新闻查询引擎Agent
|
||||
│ ├── agent.py # Agent主逻辑
|
||||
│ ├── llms/ # LLM接口封装
|
||||
│ ├── nodes/ # 处理节点
|
||||
│ ├── tools/ # 搜索工具
|
||||
│ └── utils/ # 工具函数
|
||||
├── MediaEngine/ # 多媒体搜索引擎Agent
|
||||
│ ├── agent.py # Agent主逻辑
|
||||
│ ├── llms/ # LLM接口
|
||||
│ ├── tools/ # 搜索工具
|
||||
│ └── ... # 其他模块
|
||||
├── InsightEngine/ # 数据洞察引擎Agent
|
||||
│ ├── agent.py # Agent主逻辑
|
||||
│ ├── llms/ # LLM接口封装
|
||||
│ │ ├── deepseek.py # DeepSeek API
|
||||
│ │ ├── kimi.py # Kimi API
|
||||
│ │ ├── openai_llm.py # OpenAI格式API
|
||||
│ │ └── base.py # LLM基类
|
||||
│ ├── nodes/ # 处理节点
|
||||
│ │ ├── first_search_node.py # 首次搜索节点
|
||||
│ │ ├── reflection_node.py # 反思节点
|
||||
│ │ ├── summary_nodes.py # 总结节点
|
||||
│ │ ├── search_node.py # 搜索节点
|
||||
│ │ ├── sentiment_node.py # 情感分析节点
|
||||
│ │ └── insight_node.py # 洞察生成节点
|
||||
│ ├── tools/ # 数据库查询和分析工具
|
||||
│ │ ├── media_crawler_db.py # 数据库查询工具
|
||||
│ │ └── sentiment_analyzer.py # 情感分析集成工具
|
||||
│ ├── state/ # 状态管理
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── state.py # Agent状态定义
|
||||
│ ├── prompts/ # 提示词模板
|
||||
│ │ ├── __init__.py
|
||||
│ │ └── prompts.py # 各类提示词
|
||||
│ └── utils/ # 工具函数
|
||||
│ ├── __init__.py
|
||||
│ ├── config.py # 配置管理
|
||||
│ └── helpers.py # 辅助函数
|
||||
├── ReportEngine/ # 报告生成引擎Agent
|
||||
│ ├── agent.py # Agent主逻辑
|
||||
│ ├── llms/ # LLM接口
|
||||
│ │ └── gemini.py # Gemini API专用
|
||||
│ ├── nodes/ # 报告生成节点
|
||||
│ │ ├── template_selection.py # 模板选择节点
|
||||
│ │ └── html_generation.py # HTML生成节点
|
||||
│ ├── report_template/ # 报告模板库
|
||||
│ │ ├── 社会公共热点事件分析.md
|
||||
│ │ ├── 商业品牌舆情监测.md
|
||||
│ │ └── ... # 更多模板
|
||||
│ └── flask_interface.py # Flask API接口
|
||||
├── ForumEngine/ # 论坛交流引擎Agent
|
||||
│ └── monitor.py # 日志监控和论坛管理
|
||||
├── MindSpider/ # 微博爬虫系统
|
||||
│ ├── main.py # 爬虫主程序
|
||||
│ ├── BroadTopicExtraction/ # 话题提取模块
|
||||
│ │ ├── get_today_news.py # 今日新闻获取
|
||||
│ │ └── topic_extractor.py # 话题提取器
|
||||
│ ├── DeepSentimentCrawling/ # 深度情感爬取
|
||||
│ │ ├── MediaCrawler/ # 媒体爬虫核心
|
||||
│ │ └── platform_crawler.py # 平台爬虫管理
|
||||
│ └── schema/ # 数据库结构
|
||||
│ └── init_database.py # 数据库初始化
|
||||
├── SentimentAnalysisModel/ # 情感分析模型集合
|
||||
│ ├── WeiboSentiment_Finetuned/ # 微调BERT/GPT-2模型
|
||||
│ ├── WeiboMultilingualSentiment/ # 多语言情感分析
|
||||
│ ├── WeiboSentiment_SmallQwen/ # 小型Qwen模型
|
||||
│ └── WeiboSentiment_MachineLearning/ # 传统机器学习方法
|
||||
├── SingleEngineApp/ # 单独Agent的Streamlit应用
|
||||
│ ├── query_engine_streamlit_app.py
|
||||
│ ├── media_engine_streamlit_app.py
|
||||
│ └── insight_engine_streamlit_app.py
|
||||
├── templates/ # Flask模板
|
||||
│ └── index.html # 主界面模板
|
||||
├── static/ # 静态资源
|
||||
├── logs/ # 运行日志目录
|
||||
├── app.py # Flask主应用入口
|
||||
├── config.py # 全局配置文件
|
||||
└── requirements.txt # Python依赖包清单
|
||||
```
|
||||
|
||||
## 快速开始
|
||||
## 🚀 快速开始
|
||||
|
||||
### 环境要求
|
||||
|
||||
- **操作系统**: Windows 10/11
|
||||
- **操作系统**: Windows 10/11(Linux/macOS也支持)
|
||||
- **Python版本**: 3.11+
|
||||
- **Conda**: Anaconda或Miniconda
|
||||
- **数据库**: MySQL 8.0+
|
||||
- **数据库**: MySQL 8.0+(可选择我们的云数据库服务)
|
||||
- **内存**: 建议8GB以上
|
||||
|
||||
### 1. 创建Conda环境
|
||||
@@ -220,14 +203,14 @@ conda activate pytorch_python11
|
||||
# 基础依赖安装
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 如果需要情感分析功能,安装PyTorch(根据CUDA版本选择)
|
||||
# 如果需要本地情感分析功能,安装PyTorch
|
||||
# CPU版本
|
||||
pip install torch torchvision torchaudio
|
||||
|
||||
# CUDA 11.8版本
|
||||
# CUDA 11.8版本(如有GPU)
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||
|
||||
# 安装transformers(用于BERT/GPT模型)
|
||||
# 安装transformers等AI相关依赖
|
||||
pip install transformers scikit-learn xgboost
|
||||
```
|
||||
|
||||
@@ -272,16 +255,30 @@ BOCHA_Web_Search_API_KEY = "your_bocha_api_key"
|
||||
GUIJI_QWEN3_API_KEY = "your_guiji_api_key"
|
||||
```
|
||||
|
||||
#### 4.2 初始化数据库
|
||||
#### 4.2 数据库初始化
|
||||
|
||||
**选择1:使用本地数据库**
|
||||
```bash
|
||||
# 本地MySQL数据库初始化
|
||||
cd MindSpider
|
||||
python schema/init_database.py
|
||||
```
|
||||
|
||||
**选择2:使用云数据库服务(推荐)**
|
||||
|
||||
我们提供便捷的云数据库服务,包含日均10万+真实微博数据,目前推广期间**免费申请**!
|
||||
|
||||
- 真实微博数据,实时更新
|
||||
- 预处理的情感标注数据
|
||||
- 多维度标签分类
|
||||
- 高可用云端服务
|
||||
- 专业技术支持
|
||||
|
||||
**联系我们申请免费云数据库访问:📧 670939375@qq.com**
|
||||
|
||||
### 5. 启动系统
|
||||
|
||||
#### 方式一:完整系统启动(推荐)
|
||||
#### 5.1 完整系统启动(推荐)
|
||||
|
||||
```bash
|
||||
# 在项目根目录下,激活conda环境
|
||||
@@ -291,9 +288,9 @@ conda activate pytorch_python11
|
||||
python app.py
|
||||
```
|
||||
|
||||
访问 http://localhost:5000 即可使用系统
|
||||
访问 http://localhost:5000 即可使用完整系统
|
||||
|
||||
#### 方式二:单独启动某个Agent
|
||||
#### 5.2 单独启动某个Agent
|
||||
|
||||
```bash
|
||||
# 启动QueryEngine
|
||||
@@ -306,147 +303,353 @@ streamlit run SingleEngineApp/media_engine_streamlit_app.py --server.port 8502
|
||||
streamlit run SingleEngineApp/insight_engine_streamlit_app.py --server.port 8501
|
||||
```
|
||||
|
||||
## 使用指南
|
||||
#### 5.3 爬虫系统单独使用
|
||||
|
||||
### 基础使用流程
|
||||
```bash
|
||||
# 进入爬虫目录
|
||||
cd MindSpider
|
||||
|
||||
1. **启动系统**:运行 `python app.py`,系统会自动启动所有Agent
|
||||
# 项目初始化
|
||||
python main.py --setup
|
||||
|
||||
2. **输入查询**:在Web界面搜索框输入要分析的舆情关键词
|
||||
# 运行完整爬虫流程
|
||||
python main.py --complete --date 2024-01-20
|
||||
|
||||
3. **Agent协作**:
|
||||
- QueryEngine:搜索新闻和官方报道,将关键发现发布到论坛
|
||||
- MediaEngine:搜索多媒体内容,与其他Agent分享重要信息
|
||||
- InsightEngine:分析微博数据和情感,在论坛中交流洞察
|
||||
- ForumEngine:提供Agent间交流平台,汇总协作信息
|
||||
# 仅运行话题提取
|
||||
python main.py --broad-topic --date 2024-01-20
|
||||
|
||||
4. **查看结果**:
|
||||
- Agent论坛交流:查看Agent间的实时信息交换
|
||||
- 分析报告:查看基于Agent协作的综合HTML报告
|
||||
# 仅运行深度爬取
|
||||
python main.py --deep-sentiment --platforms xhs dy wb
|
||||
```
|
||||
|
||||
### 高级配置
|
||||
## 💾 数据库配置
|
||||
|
||||
#### 配置爬虫系统
|
||||
### 本地数据库配置
|
||||
|
||||
1. **安装MySQL 8.0+**
|
||||
2. **创建数据库**:
|
||||
```sql
|
||||
CREATE DATABASE weibo_analysis CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
|
||||
```
|
||||
3. **运行初始化脚本**:
|
||||
```bash
|
||||
cd MindSpider
|
||||
python schema/init_database.py
|
||||
```
|
||||
|
||||
### 自动爬取配置
|
||||
|
||||
配置自动爬取任务,实现数据的持续更新:
|
||||
|
||||
1. **配置爬虫参数**:
|
||||
```python
|
||||
# MindSpider/config.py
|
||||
# MindSpider/config.py 中配置爬虫参数
|
||||
CRAWLER_CONFIG = {
|
||||
'max_pages': 100, # 最大爬取页数
|
||||
'delay': 1, # 请求延迟(秒)
|
||||
'timeout': 30, # 超时时间(秒)
|
||||
'use_proxy': False, # 是否使用代理
|
||||
'max_pages': 200, # 最大爬取页数
|
||||
'delay': 1, # 请求延迟(秒)
|
||||
'timeout': 30, # 超时时间(秒)
|
||||
'platforms': ['xhs', 'dy', 'wb', 'bili'], # 爬取平台
|
||||
'daily_keywords': 100, # 每日关键词数量
|
||||
'max_notes_per_keyword': 50, # 每关键词最大内容数
|
||||
'use_proxy': False, # 是否使用代理
|
||||
}
|
||||
```
|
||||
|
||||
2. **运行爬虫**:
|
||||
```bash
|
||||
cd MindSpider
|
||||
python main.py --topic "话题关键词" --days 7
|
||||
### 云数据库服务(推荐)
|
||||
|
||||
**为什么选择我们的云数据库服务?**
|
||||
|
||||
- **丰富数据源**:日均10万+真实微博数据,涵盖各行业热点话题
|
||||
- **高质量标注**:专业团队人工标注的情感数据,准确率95%+
|
||||
- **多维度分析**:包含话题分类、情感倾向、影响力评分等多维标签
|
||||
- **实时更新**:24小时不间断数据采集,确保时效性
|
||||
- **技术支持**:专业团队提供技术支持和定制化服务
|
||||
|
||||
**申请方式**:
|
||||
📧 邮件联系:670939375@qq.com
|
||||
📝 邮件标题:申请微博舆情云数据库访问
|
||||
📝 邮件内容:请说明您的使用场景和预期数据量需求
|
||||
|
||||
**推广期福利**:
|
||||
- 免费提供基础版云数据库访问
|
||||
- 免费技术支持和部署指导
|
||||
- 优先体验新功能特性
|
||||
|
||||
## ⚙️ 高级配置
|
||||
|
||||
### 修改关键参数
|
||||
|
||||
#### Agent配置参数
|
||||
|
||||
每个Agent都有专门的配置文件,可根据需求调整:
|
||||
|
||||
```python
|
||||
# QueryEngine/utils/config.py
|
||||
class Config:
|
||||
max_reflections = 2 # 反思轮次
|
||||
max_search_results = 15 # 最大搜索结果数
|
||||
max_content_length = 8000 # 最大内容长度
|
||||
|
||||
# MediaEngine/utils/config.py
|
||||
class Config:
|
||||
comprehensive_search_limit = 10 # 综合搜索限制
|
||||
web_search_limit = 15 # 网页搜索限制
|
||||
|
||||
# InsightEngine/utils/config.py
|
||||
class Config:
|
||||
default_search_topic_globally_limit = 200 # 全局搜索限制
|
||||
default_get_comments_limit = 500 # 评论获取限制
|
||||
max_search_results_for_llm = 50 # 传给LLM的最大结果数
|
||||
```
|
||||
|
||||
#### 配置情感分析模型
|
||||
#### 情感分析模型配置
|
||||
|
||||
1. **选择模型**:
|
||||
- BERT微调模型(精度高)
|
||||
- GPT-2 LoRA(速度快)
|
||||
- Qwen小模型(平衡型)
|
||||
- 机器学习基线(轻量级)
|
||||
|
||||
2. **模型切换**:
|
||||
```python
|
||||
# InsightEngine/tools/sentiment_analyzer.py
|
||||
MODEL_TYPE = "bert" # 可选: "bert", "gpt2", "qwen", "ml"
|
||||
SENTIMENT_CONFIG = {
|
||||
'model_type': 'multilingual', # 可选: 'bert', 'multilingual', 'qwen'
|
||||
'confidence_threshold': 0.8, # 置信度阈值
|
||||
'batch_size': 32, # 批处理大小
|
||||
'max_sequence_length': 512, # 最大序列长度
|
||||
}
|
||||
```
|
||||
|
||||
#### 自定义报告模板
|
||||
### 接入不同的LLM模型
|
||||
|
||||
在 `ReportEngine/report_template/` 目录下创建新模板:
|
||||
系统支持多种LLM提供商,可在各Agent的配置中切换:
|
||||
|
||||
```python
|
||||
# 在各Engine的utils/config.py中配置
|
||||
class Config:
|
||||
default_llm_provider = "deepseek" # 可选: "deepseek", "openai", "kimi", "gemini"
|
||||
|
||||
# DeepSeek配置
|
||||
deepseek_api_key = "your_api_key"
|
||||
deepseek_model = "deepseek-chat"
|
||||
|
||||
# OpenAI兼容配置
|
||||
openai_api_key = "your_api_key"
|
||||
openai_model = "gpt-3.5-turbo"
|
||||
openai_base_url = "https://api.openai.com/v1"
|
||||
|
||||
# Kimi配置
|
||||
kimi_api_key = "your_api_key"
|
||||
kimi_model = "moonshot-v1-8k"
|
||||
|
||||
# Gemini配置
|
||||
gemini_api_key = "your_api_key"
|
||||
gemini_model = "gemini-pro"
|
||||
```
|
||||
|
||||
### 更改情感分析模型
|
||||
|
||||
系统集成了多种情感分析方法,可根据需求选择:
|
||||
|
||||
#### 1. 基于BERT的微调模型(精度最高)
|
||||
|
||||
```bash
|
||||
# 使用BERT中文模型
|
||||
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/BertChinese-Lora
|
||||
python predict.py --text "这个产品真的很不错"
|
||||
```
|
||||
|
||||
#### 2. GPT-2 LoRA微调模型(速度较快)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_Finetuned/GPT2-Lora
|
||||
python predict.py --text "今天心情不太好"
|
||||
```
|
||||
|
||||
#### 3. 小型Qwen模型(平衡型)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_SmallQwen
|
||||
python predict_universal.py --text "这次活动办得很成功"
|
||||
```
|
||||
|
||||
#### 4. 传统机器学习方法(轻量级)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboSentiment_MachineLearning
|
||||
python predict.py --model_type "svm" --text "服务态度需要改进"
|
||||
```
|
||||
|
||||
#### 5. 多语言情感分析(支持22种语言)
|
||||
|
||||
```bash
|
||||
cd SentimentAnalysisModel/WeiboMultilingualSentiment
|
||||
python predict.py --text "This product is amazing!" --lang "en"
|
||||
```
|
||||
|
||||
### 接入自定义业务数据库
|
||||
|
||||
#### 1. 修改数据库连接配置
|
||||
|
||||
```python
|
||||
# config.py 中添加您的业务数据库配置
|
||||
BUSINESS_DB_HOST = "your_business_db_host"
|
||||
BUSINESS_DB_PORT = 3306
|
||||
BUSINESS_DB_USER = "your_business_user"
|
||||
BUSINESS_DB_PASSWORD = "your_business_password"
|
||||
BUSINESS_DB_NAME = "your_business_database"
|
||||
```
|
||||
|
||||
#### 2. 创建自定义数据访问工具
|
||||
|
||||
```python
|
||||
# InsightEngine/tools/custom_db_tool.py
|
||||
class CustomBusinessDBTool:
|
||||
"""自定义业务数据库查询工具"""
|
||||
|
||||
def __init__(self):
|
||||
self.connection_config = {
|
||||
'host': config.BUSINESS_DB_HOST,
|
||||
'port': config.BUSINESS_DB_PORT,
|
||||
'user': config.BUSINESS_DB_USER,
|
||||
'password': config.BUSINESS_DB_PASSWORD,
|
||||
'database': config.BUSINESS_DB_NAME,
|
||||
}
|
||||
|
||||
def search_business_data(self, query: str, table: str):
|
||||
"""查询业务数据"""
|
||||
# 实现您的业务逻辑
|
||||
pass
|
||||
|
||||
def get_customer_feedback(self, product_id: str):
|
||||
"""获取客户反馈数据"""
|
||||
# 实现客户反馈查询逻辑
|
||||
pass
|
||||
```
|
||||
|
||||
#### 3. 集成到InsightEngine
|
||||
|
||||
```python
|
||||
# InsightEngine/agent.py 中集成自定义工具
|
||||
from .tools.custom_db_tool import CustomBusinessDBTool
|
||||
|
||||
class DeepSearchAgent:
|
||||
def __init__(self, config=None):
|
||||
# ... 其他初始化代码
|
||||
self.custom_db_tool = CustomBusinessDBTool()
|
||||
|
||||
def execute_custom_search(self, query: str):
|
||||
"""执行自定义业务数据搜索"""
|
||||
return self.custom_db_tool.search_business_data(query, "your_table")
|
||||
```
|
||||
|
||||
### 自定义报告模板
|
||||
|
||||
#### 1. 创建模板文件
|
||||
|
||||
在 `ReportEngine/report_template/` 目录下创建新的Markdown模板:
|
||||
|
||||
```markdown
|
||||
# 自定义报告模板
|
||||
## 舆情概览
|
||||
${overview}
|
||||
<!-- 企业品牌监测报告.md -->
|
||||
# 企业品牌舆情监测报告
|
||||
|
||||
## 情感分析
|
||||
${sentiment_analysis}
|
||||
## 📊 执行摘要
|
||||
{executive_summary}
|
||||
|
||||
## 关键观点
|
||||
${key_insights}
|
||||
## 🔍 品牌提及分析
|
||||
### 提及量趋势
|
||||
{mention_trend}
|
||||
|
||||
## 趋势预测
|
||||
${trend_prediction}
|
||||
### 情感分布
|
||||
{sentiment_distribution}
|
||||
|
||||
## 📈 竞品对比分析
|
||||
{competitor_analysis}
|
||||
|
||||
## 🎯 关键观点摘要
|
||||
{key_insights}
|
||||
|
||||
## ⚠️ 风险预警
|
||||
{risk_alerts}
|
||||
|
||||
## 📋 改进建议
|
||||
{recommendations}
|
||||
|
||||
---
|
||||
*报告类型:企业品牌舆情监测*
|
||||
*生成时间:{generation_time}*
|
||||
*数据来源:{data_sources}*
|
||||
```
|
||||
|
||||
### 监控与日志
|
||||
#### 2. 在Web界面中使用
|
||||
|
||||
#### 查看系统日志
|
||||
系统支持上传自定义模板文件(.md或.txt格式),可在生成报告时选择使用。
|
||||
|
||||
所有日志文件位于 `logs/` 目录:
|
||||
- `query.log`: QueryEngine运行日志
|
||||
- `media.log`: MediaEngine运行日志
|
||||
- `insight.log`: InsightEngine运行日志
|
||||
- `forum.log`: ForumEngine论坛交流日志
|
||||
- `report.log`: ReportEngine生成日志
|
||||
|
||||
#### Agent论坛交流
|
||||
|
||||
ForumEngine提供多Agent协作交流功能:
|
||||
1. Agent行动前读取论坛交流信息
|
||||
2. Agent思考后决定是否分享关键发现
|
||||
3. 汇总所有Agent的交流信息
|
||||
4. 为ReportEngine提供协作数据基础
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 常见问题
|
||||
|
||||
#### 1. 端口占用
|
||||
```bash
|
||||
# 查看端口占用(Windows)
|
||||
netstat -ano | findstr :5000
|
||||
netstat -ano | findstr :8501
|
||||
|
||||
# 结束占用进程
|
||||
taskkill /F /PID <进程ID>
|
||||
```
|
||||
|
||||
#### 2. 编码问题
|
||||
```python
|
||||
# 在代码开头添加
|
||||
import sys
|
||||
import os
|
||||
os.environ['PYTHONIOENCODING'] = 'utf-8'
|
||||
os.environ['PYTHONUTF8'] = '1'
|
||||
```
|
||||
|
||||
#### 3. Playwright安装失败
|
||||
```bash
|
||||
# 手动安装
|
||||
python -m playwright install chromium --with-deps
|
||||
```
|
||||
|
||||
#### 4. MySQL连接失败
|
||||
- 检查MySQL服务是否启动
|
||||
- 确认用户权限配置
|
||||
- 检查防火墙设置
|
||||
|
||||
## 贡献指南
|
||||
## 🤝 贡献指南
|
||||
|
||||
我们欢迎所有形式的贡献!
|
||||
|
||||
1. Fork项目
|
||||
2. 创建Feature分支 (`git checkout -b feature/AmazingFeature`)
|
||||
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. 推送到分支 (`git push origin feature/AmazingFeature`)
|
||||
5. 开启Pull Request
|
||||
### 如何贡献
|
||||
|
||||
## 许可证
|
||||
1. **Fork项目**到您的GitHub账号
|
||||
2. **创建Feature分支**:`git checkout -b feature/AmazingFeature`
|
||||
3. **提交更改**:`git commit -m 'Add some AmazingFeature'`
|
||||
4. **推送到分支**:`git push origin feature/AmazingFeature`
|
||||
5. **开启Pull Request**
|
||||
|
||||
本项目采用 MIT 许可证。详见 [LICENSE](LICENSE) 文件。
|
||||
### 贡献类型
|
||||
|
||||
## 联系我们
|
||||
- 🐛 Bug修复
|
||||
- ✨ 新功能开发
|
||||
- 📚 文档完善
|
||||
- 🎨 UI/UX改进
|
||||
- ⚡ 性能优化
|
||||
- 🧪 测试用例添加
|
||||
|
||||
- 项目地址:[https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
|
||||
- 邮箱:670939375@qq.com
|
||||
- Issues:[项目Issues](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
### 开发规范
|
||||
|
||||
- 代码遵循PEP8规范
|
||||
- 提交信息使用清晰的中英文描述
|
||||
- 新功能需要包含相应的测试用例
|
||||
- 更新相关文档
|
||||
|
||||
## 📄 许可证
|
||||
|
||||
本项目采用 [MIT许可证](LICENSE)。详细信息请参阅LICENSE文件。
|
||||
|
||||
## 🎉 支持与联系
|
||||
|
||||
### 获取帮助
|
||||
|
||||
- **项目主页**:[GitHub仓库](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem)
|
||||
- **问题反馈**:[Issues页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/issues)
|
||||
- **功能建议**:[Discussions页面](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/discussions)
|
||||
|
||||
### 联系方式
|
||||
|
||||
- 📧 **邮箱**:670939375@qq.com
|
||||
- 💬 **QQ群**:[加入技术交流群]
|
||||
- 🐦 **微信**:[扫码添加技术支持]
|
||||
|
||||
### 商务合作
|
||||
|
||||
- 🏢 **企业定制开发**
|
||||
- 📊 **大数据服务**
|
||||
- 🎓 **学术合作**
|
||||
- 💼 **技术培训**
|
||||
|
||||
### 云服务申请
|
||||
|
||||
**免费云数据库服务申请**:
|
||||
📧 发送邮件至:670939375@qq.com
|
||||
📝 标题:微博舆情云数据库申请
|
||||
📝 说明:您的使用场景和需求
|
||||
|
||||
## 👥 贡献者
|
||||
|
||||
感谢以下优秀的贡献者们:
|
||||
|
||||
[](https://github.com/666ghj/Weibo_PublicOpinion_AnalysisSystem/graphs/contributors)
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**⭐ 如果这个项目对您有帮助,请给我们一个星标!**
|
||||
|
||||
Made with ❤️ by [微博舆情分析团队](https://github.com/666ghj)
|
||||
|
||||
</div>
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 90 KiB After Width: | Height: | Size: 60 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 91 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 98 KiB |
Reference in New Issue
Block a user