{ "cells": [ { "cell_type": "markdown", "metadata": { "cell_id": 39 }, "source": [ "### 加载数据集" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cell_id": 1 }, "outputs": [], "source": [ "from utils import load_corpus_bert\n", "\n", "TRAIN_PATH = \"./data/weibo2018/train.txt\"\n", "TEST_PATH = \"./data/weibo2018/test.txt\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "cell_id": 3 }, "outputs": [], "source": [ "# 分别加载训练集和测试集\n", "train_data = load_corpus_bert(TRAIN_PATH)\n", "test_data = load_corpus_bert(TEST_PATH)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "cell_id": 4 }, "outputs": [ { "data": { "text/html": [ "
| \n", " | text | \n", "label | \n", "
|---|---|---|
| 0 | \n", "“书中自有黄金屋,书中自有颜如玉”。沿着岁月的长河跋涉,或是风光旖旎,或是姹紫嫣红,万千... | \n", "1 | \n", "
| 1 | \n", "这是英超被黑的最惨的一次[二哈][二哈]十几年来,中国只有孙继海,董方卓,郑智,李铁登陆过英... | \n", "0 | \n", "
| 2 | \n", "中国远洋海运集团副总经理俞曾港4月21日在 上表示,中央企业“走出去”是要站在更高的平台参... | \n", "1 | \n", "
| 3 | \n", "看《流星花园》其实也还好啦,现在的观念以及时尚眼光都不一样了,或许十几年之后的人看我们的现在... | \n", "1 | \n", "
| 4 | \n", "汉武帝的罪己诏的真实性尽管存在着争议,然而“轮台罪己诏”作为中国历史上第一份皇帝自我批评的文... | \n", "1 | \n", "