mindspore/model_zoo/research/nlp/textrcnn
陈劢 4129b93a1e modify textrcnn readme 2021-03-01 10:23:54 +08:00
..
scripts move textrcnn from official to research, and raise acc when using lstm as RNN arch 2020-12-23 22:54:33 +08:00
src enable export textrcnn ,remove useless code in eval.py 2020-12-26 17:08:49 +08:00
data_helpers.py enable export textrcnn ,remove useless code in eval.py 2020-12-26 17:08:49 +08:00
eval.py enable export textrcnn ,remove useless code in eval.py 2020-12-26 17:08:49 +08:00
export.py enable export textrcnn ,remove useless code in eval.py 2020-12-26 17:08:49 +08:00
readme.md modify textrcnn readme 2021-03-01 10:23:54 +08:00
sample.txt move textrcnn from official to research, and raise acc when using lstm as RNN arch 2020-12-23 22:54:33 +08:00
train.py enable export textrcnn ,remove useless code in eval.py 2020-12-26 17:08:49 +08:00

readme.md

TextRCNN

Contents

TextRCNN Description

TextRCNN, a model for text classification, which is proposed by the Chinese Academy of Sciences in 2015. TextRCNN actually combines RNN and CNN, first uses bidirectional RNN to obtain upper semantic and grammatical information of the input text, and then uses maximum pooling to automatically filter out the most important feature. Then connect a fully connected layer for classification.

The TextCNN network structure contains a convolutional layer and a pooling layer. In RCNN, the feature extraction function of the convolutional layer is replaced by RNN. The overall structure consists of RNN and pooling layer, so it is called RCNN.

Paper: Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao: Recurrent Convolutional Neural Networks for Text Classification. AAAI 2015: 2267-2273

Model Architecture

Specifically, the TextRCNN is mainly composed of three parts: a recurrent structure layer, a max-pooling layer, and a fully connected layer. In the paper, the length of the word vector |e|=50, the length of the context vector |c|=50, the hidden layer size H=100, the learning rate \alpha=0.01, the amount of words is |V|, the input is a sequence of words, and the output is a vector containing categories.

Dataset

Dataset used: Sentence polarity dataset v1.0

  • Dataset size10662 movie comments in 2 classes, 9596 comments for train set, 1066 comments for test set.
  • Data formattext files. The processed data is in ./data/

Environment Requirements

Quick Start

  • Preparing environment
  # download the pretrained GoogleNews-vectors-negative300.bin, put it into /tmp
  # you can download from https://code.google.com/archive/p/word2vec/,
  # or from https://pan.baidu.com/s/1NC2ekA_bJ0uSL7BF3SjhIg, code: yk9a

  mv /tmp/GoogleNews-vectors-negative300.bin ./word2vec/
  • Preparing data
  # split the dataset by the following scripts.
  mkdir -p data/test && mkdir -p data/train
  python data_helpers.py --task dataset_split --data_dir dataset_dir

  • Running on Ascend
# run training
DEVICE_ID=7 python train.py
# or you can use the shell script to train in background
bash scripts/run_train.sh

# run evaluating
DEVICE_ID=7 python eval.py --ckpt_path {checkpoint path}
# or you can use the shell script to evaluate in background
bash scripts/run_eval.sh

Script Description

Script and Sample Code

├── model_zoo
    ├── README.md                          // descriptions about all the models
    ├── textrcnn
        ├── README.md                    // descriptions about TextRCNN
        ├── data_src
           ├──rt-polaritydata            // directory to save the source data
           ├──rt-polaritydata.README.1.0.txt    // readme file of dataset
        ├── scripts
           ├──run_train.sh             // shell script for train on Ascend
           ├──run_eval.sh              // shell script for evaluation on Ascend
           ├──sample.txt              // example shell to run the above the two scripts
        ├── src
           ├──dataset.py             // creating dataset
           ├──textrcnn.py          // textrcnn architecture
           ├──config.py            // parameter configuration
        ├── train.py               // training script
        ├── export.py             // export script
        ├── eval.py               //  evaluation script
        ├── data_helpers.py               //  dataset split script
        ├── sample.txt               //  the shell to train and eval the model without scripts

Script Parameters

Parameters for both training and evaluation can be set in config.py

  • config for Textrcnn, Sentence polarity dataset v1.0.

    'num_epochs': 10, # total training epochs
    'lstm_num_epochs': 15, # total training epochs when using lstm
    'batch_size': 64, # training batch size
    'cell': 'gru', # the RNN architecture, can be 'vanilla', 'gru' and 'lstm'.
    'ckpt_folder_path': './ckpt', # the path to save the checkpoints
    'preprocess_path': './preprocess', # the directory to save the processed data
    'preprocess' : 'false', # whethere to preprocess the data
    'data_path': './data/', # the path to store the splited data
    'lr': 1e-3, # the training learning rate
    'lstm_lr_init': 2e-3, # learning rate initial value when using lstm
    'lstm_lr_end': 5e-4, # learning rate end value when using lstm
    'lstm_lr_max': 3e-3, # learning eate max value when using lstm
    'lstm_lr_warm_up_epochs': 2 # warm up epoch num when using lstm
    'lstm_lr_adjust_epochs': 9 # lr adjust in lr_adjust_epoch, after that, the lr is lr_end when using lstm
    'emb_path': './word2vec', # the directory to save the embedding file
    'embed_size': 300, # the dimension of the word embedding
    'save_checkpoint_steps': 149, # per step to save the checkpoint
    'keep_checkpoint_max': 10 # max checkpoints to save
    

Performance

Model MindSpore + Ascend TensorFlow+GPU
Resource Ascend 910 NV SMX2 V100-32G
Version 1.0.1 1.4.0
Dataset Sentence polarity dataset v1.0 Sentence polarity dataset v1.0
batch_size 64 64
Accuracy 0.78 0.78
Speed 35ms/step 77ms/step

ModelZoo Homepage

Please check the official homepage.