334 lines
15 KiB
Plaintext
334 lines
15 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# anhuidianxinzhidao 说明\n",
|
|
"0. **下载地址:** [百度网盘](https://pan.baidu.com/s/1nrg5SRU3Xy1VN85dd85-vg)\n",
|
|
"1. **数据概览:** 15.6 万条电信问答数据\n",
|
|
"2. **推荐实验:** FAQ 问答系统\n",
|
|
"3. **数据来源:** 百度知道\n",
|
|
"4. **加工处理:**\n",
|
|
" 1. 过滤了id、url、qid、reply_t、user字段\n",
|
|
" 2. 对question、reply做了脱敏处理"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path = 'anhuidianxinzhidao_文件夹_所在_路径'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 1.anhuidianxinzhidao_filter.csv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 加载数据"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pd_all = pd.read_csv(path + 'anhuidianxinzhidao_filter.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 字段说明\n",
|
|
"\n",
|
|
"| 字段 | 说明 |\n",
|
|
"| ---- | ---- |\n",
|
|
"| title | 标题 |\n",
|
|
"| question | 问题(可为空) |\n",
|
|
"| reply| 每个问题的内容 |\n",
|
|
"| is_best| 是否是最佳答案 |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>title</th>\n",
|
|
" <th>question</th>\n",
|
|
" <th>reply</th>\n",
|
|
" <th>is_best</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>129754</th>\n",
|
|
" <td>红米no##4x</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>可以,</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>15843</th>\n",
|
|
" <td>为什么不能同时用两个电信卡</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好不可以的,目前推出的手机都是不能同时支持两张电信手机卡的,即使是全网通手机也只能在其中的...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>23985</th>\n",
|
|
" <td>电信181、177、133哪个号段好?</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>133的</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>72065</th>\n",
|
|
" <td>华*荣耀7x和魅蓝note6哪个好</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>荣耀畅玩7X很不错,性价比很高,以下是手机的配置:1、外观方面:荣耀畅玩7X采用5.93英寸...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>11843</th>\n",
|
|
" <td>p8青春版电信版多少钱</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好,这款手机价格参考如下</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3280</th>\n",
|
|
" <td>华为di####00叫什么</td>\n",
|
|
" <td>华为di####00叫什么</td>\n",
|
|
" <td>DI####00是华为畅享6S全网通版。华为畅享6S性价比高,是一款很不错的手机。电信新出流...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>143200</th>\n",
|
|
" <td>电信版酷派9190L双卡双通可以用移动网络吗</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好电信版双卡双待手机只能使用电信手机卡上网,卡槽2的移动或联通手机卡只能支持2G网络,一般...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>120692</th>\n",
|
|
" <td>苹果微信载图怎么载图</td>\n",
|
|
" <td>苹果微信载图怎么载图</td>\n",
|
|
" <td>您说的应该是截图吧。您可以直接通过苹果手机截图组合按键进行截图操作。直接同时安装电源键和ho...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>109786</th>\n",
|
|
" <td>天翼网关的wifi被我关了又没有邦定客户端怎么办想再连wifi该怎么办</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好电信光纤猫的无线网络一般需要破解才能使用的,但破解可能会到帐宽带不稳定或不能正常上网,建...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>29030</th>\n",
|
|
" <td>v*v*x21是不是全网通</td>\n",
|
|
" <td>v*v*x21是不是全网通</td>\n",
|
|
" <td>vi###21系列是有vi###21A全网通版本与vi###21移动全网通版本的;此两款机型...</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>72603</th>\n",
|
|
" <td>电信网上营业厅手机卡办理步骤</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>中*电信目前是支持网上办理手机号的,下面分享下网上营业厅办理号卡的步骤:1、首先打开浏览器,...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>103229</th>\n",
|
|
" <td>花呗可以充话费吗</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好,是可以的,目前花呗进行充值话费,每个月只能使用花呗一次,最高不超过500元,如果您已经...</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>91507</th>\n",
|
|
" <td>荣耀8好还是三星noT4好</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>如果我选择三星,华为去论坛发个意见都很尴尬。</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>143504</th>\n",
|
|
" <td>ios10.2.1能降级吗ios10.2.1怎么降级</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>IOS设备一旦升级IOS系统就无法降级了,因为:1、IOS采用推荐升级、强制保持最新的升级策...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>21999</th>\n",
|
|
" <td>电信校园网宽带超一分钟多少钱</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>由于各地业务情况不同,建议用户通过当地的电信网是营业厅或者手机营业厅了解,也可以直接到附近的...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7644</th>\n",
|
|
" <td>有没有人办过开发区的电信卡</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>您好目前使用电信手机卡的用户非常多,电信手机卡资费更优惠、网络更稳定、网速更快,请放心办理使...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>76835</th>\n",
|
|
" <td>请问67###18这个电话号码是哪里的</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>查吧</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>76752</th>\n",
|
|
" <td>电信,铁通,移动,广电。那个网速好呢?</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>办理宽带推荐您办理电信宽带使用。由于中*电信的服务器、网络架设等较完善,且每年都在不断完善和...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>94290</th>\n",
|
|
" <td>三星s8+好用不</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>S8+的主要特征:1.全视曲面屏:超窄边框、沉浸感视效、双曲面侧屏的显示屏,为您带来更纯粹的...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>79345</th>\n",
|
|
" <td>一加手机5玩王者会卡吗?</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>不会卡,我也推荐你买一加5,它运行内存有8G,玩游戏的时候就能感受到性能有多好,手机不卡,丢...</td>\n",
|
|
" <td>1</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" title question \\\n",
|
|
"129754 红米no##4x NaN \n",
|
|
"15843 为什么不能同时用两个电信卡 NaN \n",
|
|
"23985 电信181、177、133哪个号段好? NaN \n",
|
|
"72065 华*荣耀7x和魅蓝note6哪个好 NaN \n",
|
|
"11843 p8青春版电信版多少钱 NaN \n",
|
|
"3280 华为di####00叫什么 华为di####00叫什么 \n",
|
|
"143200 电信版酷派9190L双卡双通可以用移动网络吗 NaN \n",
|
|
"120692 苹果微信载图怎么载图 苹果微信载图怎么载图 \n",
|
|
"109786 天翼网关的wifi被我关了又没有邦定客户端怎么办想再连wifi该怎么办 NaN \n",
|
|
"29030 v*v*x21是不是全网通 v*v*x21是不是全网通 \n",
|
|
"72603 电信网上营业厅手机卡办理步骤 NaN \n",
|
|
"103229 花呗可以充话费吗 NaN \n",
|
|
"91507 荣耀8好还是三星noT4好 NaN \n",
|
|
"143504 ios10.2.1能降级吗ios10.2.1怎么降级 NaN \n",
|
|
"21999 电信校园网宽带超一分钟多少钱 NaN \n",
|
|
"7644 有没有人办过开发区的电信卡 NaN \n",
|
|
"76835 请问67###18这个电话号码是哪里的 NaN \n",
|
|
"76752 电信,铁通,移动,广电。那个网速好呢? NaN \n",
|
|
"94290 三星s8+好用不 NaN \n",
|
|
"79345 一加手机5玩王者会卡吗? NaN \n",
|
|
"\n",
|
|
" reply is_best \n",
|
|
"129754 可以, 0 \n",
|
|
"15843 您好不可以的,目前推出的手机都是不能同时支持两张电信手机卡的,即使是全网通手机也只能在其中的... 1 \n",
|
|
"23985 133的 0 \n",
|
|
"72065 荣耀畅玩7X很不错,性价比很高,以下是手机的配置:1、外观方面:荣耀畅玩7X采用5.93英寸... 1 \n",
|
|
"11843 您好,这款手机价格参考如下 1 \n",
|
|
"3280 DI####00是华为畅享6S全网通版。华为畅享6S性价比高,是一款很不错的手机。电信新出流... 1 \n",
|
|
"143200 您好电信版双卡双待手机只能使用电信手机卡上网,卡槽2的移动或联通手机卡只能支持2G网络,一般... 1 \n",
|
|
"120692 您说的应该是截图吧。您可以直接通过苹果手机截图组合按键进行截图操作。直接同时安装电源键和ho... 1 \n",
|
|
"109786 您好电信光纤猫的无线网络一般需要破解才能使用的,但破解可能会到帐宽带不稳定或不能正常上网,建... 1 \n",
|
|
"29030 vi###21系列是有vi###21A全网通版本与vi###21移动全网通版本的;此两款机型... 0 \n",
|
|
"72603 中*电信目前是支持网上办理手机号的,下面分享下网上营业厅办理号卡的步骤:1、首先打开浏览器,... 1 \n",
|
|
"103229 您好,是可以的,目前花呗进行充值话费,每个月只能使用花呗一次,最高不超过500元,如果您已经... 0 \n",
|
|
"91507 如果我选择三星,华为去论坛发个意见都很尴尬。 0 \n",
|
|
"143504 IOS设备一旦升级IOS系统就无法降级了,因为:1、IOS采用推荐升级、强制保持最新的升级策... 1 \n",
|
|
"21999 由于各地业务情况不同,建议用户通过当地的电信网是营业厅或者手机营业厅了解,也可以直接到附近的... 1 \n",
|
|
"7644 您好目前使用电信手机卡的用户非常多,电信手机卡资费更优惠、网络更稳定、网速更快,请放心办理使... 1 \n",
|
|
"76835 查吧 0 \n",
|
|
"76752 办理宽带推荐您办理电信宽带使用。由于中*电信的服务器、网络架设等较完善,且每年都在不断完善和... 1 \n",
|
|
"94290 S8+的主要特征:1.全视曲面屏:超窄边框、沉浸感视效、双曲面侧屏的显示屏,为您带来更纯粹的... 1 \n",
|
|
"79345 不会卡,我也推荐你买一加5,它运行内存有8G,玩游戏的时候就能感受到性能有多好,手机不卡,丢... 1 "
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"pd_all.sample(n=20)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|