Chinese text classification (include single-label and multi-label version), using pytorch & BERT
中文文本分类任务(含单标签和多标签分类)。
PyTorch实现,BERT框架,CPU/多卡GPU版本。
pytorch1.4.0版本、python3.6版本、pytorch_pretrained_bert0.6.2版本。【严格匹配!】argparse库、pandas库、glob库、sklearn库、numpy库(详见requirements.txt)。
Pycharm
data文件夹:Hotel_comment文件夹:酒店评论,二分类;cnews文件夹:新闻文本,多分类;
mytask_classifier.py:入口文件config.py:包含运行时所需参数的定义,参数可通过run.sh脚本文件赋值data.py:包含对原数据集的处理,形成结构化数据集model.py:包含使用BERT实现文本分类的模型代码,含有单标签/多标签两种实现;preprocess.py:包含BERT的输入预处理util.py:包含有用函数run.sh:脚本文件,可在Linux下运行,包含参数的赋值
pytorch== 1.4.0,python== 3.6,pytorch_pretrained_bert== 0.6.2 (Version needs to match exactly!)argparse,pandas,glob,sklearn,numpy(Please refer torequirements.txt).
Pycharm
datafolder:Hotel_commentfolder: Chinese comment of a hotel,binary classification task;cnewsfolder: Chinese news paragraph,multi classification task;
mytask_classifier.py: entrance fileconfig.: includes the definitions of parameters required at runtime, parameters can be assigned through therun.shscript filedata.py: includes the processing of the original data set to form a structured data setmodel.py:includes the model code for text classification with BERT, which contains the single-label/multi-label implementations;preprocess.py:includes the input preprocessing for BERTutil.py:includes some useful functionsrun.sh:script file, runnable under Linux, containing parameter assignments