781 lines
22 KiB
Plaintext
781 lines
22 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# ez_douban 说明\n",
|
|
"0. **下载地址:** [百度网盘](https://pan.baidu.com/s/1DkN1LmdSMzm_jCBKhbPbig)\n",
|
|
"1. **数据概览:** 5 万多部电影(3 万多有电影名称,2 万多没有电影名称),2.8 万 用户,280 万条评分数据\n",
|
|
"2. **推荐实验:** 推荐系统\n",
|
|
"2. **数据来源:**[豆瓣电影](https://movie.douban.com/)\n",
|
|
"3. **原数据集:** [Douban-1 和 Douban-2](https://sites.google.com/site/erhengzhong/datasets),这是 Erheng Zhong 博士 为在 KDD'12, TKDD'14, SDM'12 上发表论文而收集的数据\n",
|
|
"4. **加工处理:**\n",
|
|
" 1. 去除 Douban-1 中无用的 status 字段,以及无效的评分,并整理成与 [MovieLens](https://grouplens.org/datasets/movielens/) 兼容的格式\n",
|
|
" 2. 从 Douban-2 中提取电影信息和链接信息,并与 Douban-1 中的评分数据进行联表操作\n",
|
|
" 3. 进行脱敏操作,以保护用户隐私"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"path = 'ez_douban_文件夹_所在_路径'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 1. movies.csv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 加载数据"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"电影数目(有名称):33258\n",
|
|
"电影数目(没有名称):24166\n",
|
|
"电影数目(总计):57424\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"movies = pd.read_csv(path + 'movies.csv')\n",
|
|
"\n",
|
|
"print('电影数目(有名称):%d' % movies[~pd.isnull(movies.title)].shape[0])\n",
|
|
"print('电影数目(没有名称):%d' % movies[pd.isnull(movies.title)].shape[0])\n",
|
|
"print('电影数目(总计):%d' % movies.shape[0])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 字段说明\n",
|
|
"\n",
|
|
"| 字段 | 说明 |\n",
|
|
"| ---- | ---- |\n",
|
|
"| movieId | 电影 id (从 0 开始,连续编号) |\n",
|
|
"| title | 电影名称 |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style>\n",
|
|
" .dataframe thead tr:only-child th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: left;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>movieId</th>\n",
|
|
" <th>title</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>41807</th>\n",
|
|
" <td>41807</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>16521</th>\n",
|
|
" <td>16521</td>\n",
|
|
" <td>五女拜寿</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10689</th>\n",
|
|
" <td>10689</td>\n",
|
|
" <td>La pelote de laine</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>21653</th>\n",
|
|
" <td>21653</td>\n",
|
|
" <td>Ma mha 4 khaa khrap</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>36630</th>\n",
|
|
" <td>36630</td>\n",
|
|
" <td>the sky the earth and the rain</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31734</th>\n",
|
|
" <td>31734</td>\n",
|
|
" <td>Viva María!</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31530</th>\n",
|
|
" <td>31530</td>\n",
|
|
" <td>远路</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>22553</th>\n",
|
|
" <td>22553</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>32346</th>\n",
|
|
" <td>32346</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>29429</th>\n",
|
|
" <td>29429</td>\n",
|
|
" <td>The Crazies</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>34912</th>\n",
|
|
" <td>34912</td>\n",
|
|
" <td>Stestí</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10350</th>\n",
|
|
" <td>10350</td>\n",
|
|
" <td>羊のうた</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>31487</th>\n",
|
|
" <td>31487</td>\n",
|
|
" <td>一触即发</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>50688</th>\n",
|
|
" <td>50688</td>\n",
|
|
" <td>还君明珠</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>40769</th>\n",
|
|
" <td>40769</td>\n",
|
|
" <td>Red Riding Hood</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>32748</th>\n",
|
|
" <td>32748</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>17204</th>\n",
|
|
" <td>17204</td>\n",
|
|
" <td>작은아씨들</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>55870</th>\n",
|
|
" <td>55870</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>42879</th>\n",
|
|
" <td>42879</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>26432</th>\n",
|
|
" <td>26432</td>\n",
|
|
" <td>后门</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" movieId title\n",
|
|
"41807 41807 NaN\n",
|
|
"16521 16521 五女拜寿\n",
|
|
"10689 10689 La pelote de laine\n",
|
|
"21653 21653 Ma mha 4 khaa khrap\n",
|
|
"36630 36630 the sky the earth and the rain\n",
|
|
"31734 31734 Viva María!\n",
|
|
"31530 31530 远路\n",
|
|
"22553 22553 NaN\n",
|
|
"32346 32346 NaN\n",
|
|
"29429 29429 The Crazies\n",
|
|
"34912 34912 Stestí\n",
|
|
"10350 10350 羊のうた\n",
|
|
"31487 31487 一触即发\n",
|
|
"50688 50688 还君明珠\n",
|
|
"40769 40769 Red Riding Hood\n",
|
|
"32748 32748 NaN\n",
|
|
"17204 17204 작은아씨들\n",
|
|
"55870 55870 NaN\n",
|
|
"42879 42879 NaN\n",
|
|
"26432 26432 后门"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"movies.sample(20)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 2. ratings.csv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 加载数据"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"用户数据:28718\n",
|
|
"评分数目:2828585\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"ratings = pd.read_csv(path + 'ratings.csv')\n",
|
|
"\n",
|
|
"print('用户数据:%d' % ratings.userId.unique().shape[0])\n",
|
|
"print('评分数目:%d' % ratings.shape[0])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 字段说明"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"| 字段 | 说明 |\n",
|
|
"| ---- | ---- |\n",
|
|
"| userId | 用户 id (从 0 开始,连续编号) |\n",
|
|
"| movieId | 即 movies.csv 中的 movieId|\n",
|
|
"|rating | 评分,[1,5] 之间的整数 | \n",
|
|
"|timestamp | 评分时间戳 |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"scrolled": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style>\n",
|
|
" .dataframe thead tr:only-child th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: left;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>userId</th>\n",
|
|
" <th>movieId</th>\n",
|
|
" <th>rating</th>\n",
|
|
" <th>timestamp</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>1234569</th>\n",
|
|
" <td>4825</td>\n",
|
|
" <td>14852</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>1263084471</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1817521</th>\n",
|
|
" <td>7121</td>\n",
|
|
" <td>140</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1259054160</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2417373</th>\n",
|
|
" <td>9449</td>\n",
|
|
" <td>116</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1255344370</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1234106</th>\n",
|
|
" <td>4822</td>\n",
|
|
" <td>685</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>1124800342</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2044878</th>\n",
|
|
" <td>7996</td>\n",
|
|
" <td>22343</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1254639194</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>239277</th>\n",
|
|
" <td>947</td>\n",
|
|
" <td>5730</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>1253992436</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>305034</th>\n",
|
|
" <td>1178</td>\n",
|
|
" <td>9839</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>1304648204</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>121193</th>\n",
|
|
" <td>527</td>\n",
|
|
" <td>1512</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1125694603</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2563603</th>\n",
|
|
" <td>10758</td>\n",
|
|
" <td>738</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1301927887</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2034193</th>\n",
|
|
" <td>7949</td>\n",
|
|
" <td>1671</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>1276176595</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1373543</th>\n",
|
|
" <td>5369</td>\n",
|
|
" <td>893</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1299972980</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1798131</th>\n",
|
|
" <td>7027</td>\n",
|
|
" <td>4530</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1178099769</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>572517</th>\n",
|
|
" <td>2243</td>\n",
|
|
" <td>9773</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1187275220</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2160230</th>\n",
|
|
" <td>8470</td>\n",
|
|
" <td>12</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1306330169</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1672554</th>\n",
|
|
" <td>6554</td>\n",
|
|
" <td>5637</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1168168788</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1504944</th>\n",
|
|
" <td>5920</td>\n",
|
|
" <td>6659</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1254041654</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2657986</th>\n",
|
|
" <td>17116</td>\n",
|
|
" <td>738</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1238829652</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2123663</th>\n",
|
|
" <td>8319</td>\n",
|
|
" <td>1242</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>1225941971</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>561109</th>\n",
|
|
" <td>2206</td>\n",
|
|
" <td>4209</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1307884947</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>208970</th>\n",
|
|
" <td>887</td>\n",
|
|
" <td>4723</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>1306314265</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" userId movieId rating timestamp\n",
|
|
"1234569 4825 14852 5 1263084471\n",
|
|
"1817521 7121 140 4 1259054160\n",
|
|
"2417373 9449 116 3 1255344370\n",
|
|
"1234106 4822 685 5 1124800342\n",
|
|
"2044878 7996 22343 4 1254639194\n",
|
|
"239277 947 5730 5 1253992436\n",
|
|
"305034 1178 9839 5 1304648204\n",
|
|
"121193 527 1512 4 1125694603\n",
|
|
"2563603 10758 738 4 1301927887\n",
|
|
"2034193 7949 1671 5 1276176595\n",
|
|
"1373543 5369 893 3 1299972980\n",
|
|
"1798131 7027 4530 3 1178099769\n",
|
|
"572517 2243 9773 3 1187275220\n",
|
|
"2160230 8470 12 3 1306330169\n",
|
|
"1672554 6554 5637 3 1168168788\n",
|
|
"1504944 5920 6659 3 1254041654\n",
|
|
"2657986 17116 738 4 1238829652\n",
|
|
"2123663 8319 1242 4 1225941971\n",
|
|
"561109 2206 4209 3 1307884947\n",
|
|
"208970 887 4723 3 1306314265"
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"ratings.sample(20)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 3. links.csv"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 加载数据"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"links = pd.read_csv(path + 'links.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 字段说明"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"| 字段 | 说明 |\n",
|
|
"| ---- | ---- |\n",
|
|
"| movieId | 即 movies.csv 和 ratings.csv 中的 movieId |\n",
|
|
"| imdbId | IMDB 网站的电影编号 |\n",
|
|
"|doubanId | 豆瓣网站的电影编号 |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {
|
|
"scrolled": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style>\n",
|
|
" .dataframe thead tr:only-child th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: left;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>movieId</th>\n",
|
|
" <th>imdbId</th>\n",
|
|
" <th>doubanId</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>50304</th>\n",
|
|
" <td>50304</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>3712319</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>46231</th>\n",
|
|
" <td>46231</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>3035298</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>56597</th>\n",
|
|
" <td>56597</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2980174</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>54191</th>\n",
|
|
" <td>54191</td>\n",
|
|
" <td>86992.0</td>\n",
|
|
" <td>1294617</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3418</th>\n",
|
|
" <td>3418</td>\n",
|
|
" <td>87406.0</td>\n",
|
|
" <td>1533608</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6586</th>\n",
|
|
" <td>6586</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>6383567</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>52685</th>\n",
|
|
" <td>52685</td>\n",
|
|
" <td>376706.0</td>\n",
|
|
" <td>1770079</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>53372</th>\n",
|
|
" <td>53372</td>\n",
|
|
" <td>218839.0</td>\n",
|
|
" <td>1295836</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>27540</th>\n",
|
|
" <td>27540</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2371674</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>34467</th>\n",
|
|
" <td>34467</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4868728</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2301</th>\n",
|
|
" <td>2301</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>3732699</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>16687</th>\n",
|
|
" <td>16687</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4840386</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>36301</th>\n",
|
|
" <td>36301</td>\n",
|
|
" <td>364457.0</td>\n",
|
|
" <td>1764523</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>44922</th>\n",
|
|
" <td>44922</td>\n",
|
|
" <td>452640.0</td>\n",
|
|
" <td>1920065</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>27815</th>\n",
|
|
" <td>27815</td>\n",
|
|
" <td>114687.0</td>\n",
|
|
" <td>1773480</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>25370</th>\n",
|
|
" <td>25370</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4192036</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>36070</th>\n",
|
|
" <td>36070</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4848096</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>40954</th>\n",
|
|
" <td>40954</td>\n",
|
|
" <td>115906.0</td>\n",
|
|
" <td>1302469</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>38395</th>\n",
|
|
" <td>38395</td>\n",
|
|
" <td>436784.0</td>\n",
|
|
" <td>1857858</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>49680</th>\n",
|
|
" <td>49680</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4168480</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" movieId imdbId doubanId\n",
|
|
"50304 50304 NaN 3712319\n",
|
|
"46231 46231 NaN 3035298\n",
|
|
"56597 56597 NaN 2980174\n",
|
|
"54191 54191 86992.0 1294617\n",
|
|
"3418 3418 87406.0 1533608\n",
|
|
"6586 6586 NaN 6383567\n",
|
|
"52685 52685 376706.0 1770079\n",
|
|
"53372 53372 218839.0 1295836\n",
|
|
"27540 27540 NaN 2371674\n",
|
|
"34467 34467 NaN 4868728\n",
|
|
"2301 2301 NaN 3732699\n",
|
|
"16687 16687 NaN 4840386\n",
|
|
"36301 36301 364457.0 1764523\n",
|
|
"44922 44922 452640.0 1920065\n",
|
|
"27815 27815 114687.0 1773480\n",
|
|
"25370 25370 NaN 4192036\n",
|
|
"36070 36070 NaN 4848096\n",
|
|
"40954 40954 115906.0 1302469\n",
|
|
"38395 38395 436784.0 1857858\n",
|
|
"49680 49680 NaN 4168480"
|
|
]
|
|
},
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"links.sample(20)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "keras"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|