mindspore

History

yuzhenhua 035fd48a78 adapt for new pkg		2021-05-31 11:44:43 +08:00
..
ascend310_infer	cnnctc and warpctc add 310 infer	2021-05-26 21:27:11 +08:00
scripts	adapt for new pkg	2021-05-31 11:44:43 +08:00
src	crnn_seq2seq_ocr merge	2021-05-22 14:41:26 +08:00
README.md	cnnctc and warpctc add 310 infer	2021-05-26 21:27:11 +08:00
README_CN.md	cnnctc and warpctc add 310 infer	2021-05-26 21:27:11 +08:00
default_config.yaml	crnn_seq2seq_ocr merge	2021-05-22 14:41:26 +08:00
eval.py	crnn_seq2seq_ocr merge	2021-05-22 14:41:26 +08:00
export.py	crnn_seq2seq_ocr merge	2021-05-22 14:41:26 +08:00
mindspore_hub_conf.py	add hub config for warpctc and maskrcnn	2020-09-23 09:21:51 +08:00
postprocess.py	cnnctc and warpctc add 310 infer	2021-05-26 21:27:11 +08:00
preprocess.py	cnnctc and warpctc add 310 infer	2021-05-26 21:27:11 +08:00
process_data.py	fix codedex warning	2020-09-15 21:15:30 +08:00
train.py	fix warpctc bug when open graph_kernel flag	2021-05-28 16:14:49 +08:00

WarpCTC Description

This is an example of training WarpCTC with self-generated captcha image dataset in MindSpore.

Model Architecture

WarpCTC is a two-layer stacked LSTM appending with one-layer FC neural network. See src/warpctc.py for details.

Dataset

The dataset is self-generated using a third-party library called captcha, which can randomly generate digits from 0 to 9 in image. In this network, we set the length of digits varying from 1 to 4.

Environment Requirements

Hardware（Ascend/GPU/CPU）
- Prepare hardware environment with Ascend, GPU or CPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

Generate dataset.

Run the script scripts/run_process_data.sh to generate a dataset. By default, the shell script will generate 10000 test images and 50000 train images separately.

 $ cd scripts
 $ sh run_process_data.sh

 # after execution, you will find the dataset like the follows:
 .  
 └─warpctc
   └─data
     ├─ train  # train dataset
     └─ test   # evaluate dataset

After the dataset is prepared, you may start running the training or the evaluation scripts as follows:

Running on Ascend

# distribute training example on Ascend
$ bash run_distribute_train.sh rank_table.json ../data/train

# evaluation example on Ascend
$ bash run_eval.sh ../data/test warpctc-30-97.ckpt Ascend

# standalone training example on Ascend
$ bash run_standalone_train.sh ../data/train Ascend

For distributed training, a hccl configuration file with JSON format needs to be created in advance.

Please follow the instructions in the link below:

https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.

Running on GPU

# distribute training example on GPU
$ bash run_distribute_train_for_gpu.sh 8 ../data/train

# standalone training example on GPU
$ bash run_standalone_train.sh ../data/train GPU

# evaluation example on GPU
$ bash run_eval.sh ../data/test warpctc-30-97.ckpt GPU

Running on CPU

# training example on CPU
$ bash run_standalone_train.sh ../data/train CPU
or
python train.py --train_data_dir=./data/train --device_target=CPU

# evaluation example on CPU
$ bash run_eval.sh ../data/test warpctc-30-97.ckpt CPU
or
python eval.py --test_data_dir=./data/test --checkpoint_path=warpctc-30-97.ckpt --device_target=CPU

running on ModelArts If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows

在ModelArt上使用8卡训练

# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/warpctc" on the website UI interface.
# (4) Set the startup file to /{path}/warpctc/train.py" on the website UI interface.
# (5) Perform a or b.
#     a. setting parameters in /{path}/warpctc/default_config.yaml.
#         1. Set ”run_distributed=True“
#         2. Set ”enable_modelarts=True“
#         3. Set ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
#     b. adding on the website UI interface.
#         1. Add ”run_distributed=True“
#         2. Add ”enable_modelarts=True“
#         3. Add ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
# (6) Upload the dataset or the zip package of dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Create your job.

在ModelArts上使用单卡验证

# (1) Upload the code folder to S3 bucket.
# (2)  Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/warpctc" on the website UI interface.
# (4) Set the startup file to /{path}/warpctc/eval.py" on the website UI interface.
# (5) Perform a or b.
#     a. 在 /path/warpctc 下的default_config.yaml 文件中设置参数
#         1. Set ”enable_modelarts=True“
#         2. Set “checkpoint_path={checkpoint_path}”({checkpoint_path} Indicates the path of the weight file to be evaluated relative to the file 'eval.py', and the weight file must be included in the code directory.)
#         3. Add ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
#     b. 在 网页上设置
#         1. Set ”enable_modelarts=True“
#         2. Set “checkpoint_path={checkpoint_path}”({checkpoint_path} Indicates the path of the weight file to be evaluated relative to the file 'eval.py', and the weight file must be included in the code directory.)
#         3. Add ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
# (6)  Upload the dataset or the zip package of dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Create your job.

Script Description

Script and Sample Code

.
└──warpctc
  ├── README.md                         # descriptions of warpctc
  ├── README_CN.md                      # chinese descriptions of warpctc
  ├── ascend310_infer                   # application for 310 inference
  ├── script
    ├── run_distribute_train.sh         # launch distributed training in Ascend(8 pcs)
    ├── run_distribute_train_for_gpu.sh # launch distributed training in GPU
    ├── run_eval.sh                     # launch evaluation
    ├── run_infer_310.sh                # launch 310infer
    ├── run_process_data.sh             # launch dataset generation
    └── run_standalone_train.sh         # launch standalone training(1 pcs)
  ├── src
    ├── model_utils
      ├── config.py                     # parsing parameter configuration file of "*.yaml"
      ├── devcie_adapter.py             # local or ModelArts training
      ├── local_adapter.py              # get related environment variables in local training
      └── moxing_adapter.py             # get related environment variables in ModelArts training
    ├── dataset.py                      # data preprocessing
    ├── loss.py                         # ctcloss definition
    ├── lr_generator.py                 # generate learning rate for each step
    ├── metric.py                       # accuracy metric for warpctc network
    ├── warpctc.py                      # warpctc network definition
    └── warpctc_for_train.py            # warpctc network with grad, loss and gradient clip
  ├── default_config.yaml               # parameter configuration
  ├── export.py                         # inference
  ├── mindspore_hub_conf.py             # mindspore hub interface
  ├── eval.py                           # eval net
  ├── process_data.py                   # dataset generation script
  ├── postprocess.py                    # 310infer postprocess script
  ├── preprocess.py                     # 310infer preprocess script
  └── train.py                          # train net

Script Parameters

Training Script Parameters

# distributed training in Ascend
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [TRAIN_DATA_DIR]

# distributed training in GPU
Usage: bash run_distribute_train_for_gpu.sh [RANK_SIZE] [TRAIN_DATA_DIR]

# standalone training
Usage: bash run_standalone_train.sh [TRAIN_DATA_DIR] [DEVICE_TARGET]

Parameters Configuration

Parameters for both training and evaluation can be set in config.py.

max_captcha_digits: 4                       # max number of digits in each
captcha_width: 160                          # width of captcha images
captcha_height: 64                          # height of capthca images
batch_size: 64                              # batch size of input tensor
epoch_size: 30                              # only valid for taining, which is always 1 for inference
hidden_size: 512                            # hidden size in LSTM layers
learning_rate: 0.01                         # initial learning rate
momentum: 0.9                               # momentum of SGD optimizer
save_checkpoint: True                       # whether save checkpoint or not
save_checkpoint_steps: 97                   # the step interval between two checkpoints. By default, the last checkpoint will be saved after the last step
keep_checkpoint_max: 30                     # only keep the last keep_checkpoint_max checkpoint
save_checkpoint_path: "./checkpoint"        # path to save checkpoint

Dataset Preparation

You may refer to "Generate dataset" in Quick Start to automatically generate a dataset, or you may choose to generate a captcha dataset by yourself.

Training Process

Set options in default_config.yaml, including learning rate and other network hyperparameters. Click MindSpore dataset preparation tutorial for more information about dataset.

Training

Run run_standalone_train.sh for non-distributed training of WarpCTC model, either on Ascend or on GPU.

bash run_standalone_train.sh [TRAIN_DATA_DIR] [DEVICE_TARGET]

Distributed Training

Run run_distribute_train.sh for distributed training of WarpCTC model on Ascend.

bash run_distribute_train.sh [RANK_TABLE_FILE] [TRAIN_DATA_DIR]

Run run_distribute_train_gpu.sh for distributed training of WarpCTC model on GPU.

bash run_distribute_train_gpu.sh [RANK_SIZE] [TRAIN_DATA_DIR]

Evaluation Process

Evaluation

Run run_eval.sh for evaluation.

bash run_eval.sh [TEST_DATA_DIR] [CHECKPOINT_PATH] [DEVICE_TARGET]

Inference Process

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]

The ckpt_file parameter is required, EXPORT_FORMAT should be in ["AIR", "MINDIR"]

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model. Current batch_size can only be set to 1. Use mindir+bin method for inferring, and bin is a binary format file of the preprocessed picture.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]

DATA_PATH is mandatory,the data format is the path of the bin.
DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

'Accuracy': 0.952

Parameters	Ascend 910	GPU
Model Version	v1.0	v1.0
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8	GPU(Tesla V100 SXM2)，CPU 2.1GHz 24cores，Memory 128G /
uploaded Date	07/01/2020 (month/day/year)	08/01/2020 (month/day/year)
MindSpore Version	0.5.0-alpha	0.6.0-alpha
Dataset	Captcha	Captcha
Training Parameters	epoch=30, steps per epoch=98, batch_size = 64	epoch=30, steps per epoch=98, batch_size = 64
Optimizer	SGD	SGD
Loss Function	CTCLoss	CTCLoss
outputs	probability	probability
Loss	0.0000157	0.0000246
Speed	980ms/step（8pcs）	150ms/step（8pcs）
Total time	30 mins	5 mins
Parameters (M)	2.75	2.75
Checkpoint for Fine tuning	20.3M (.ckpt file)	20.3M (.ckpt file)
Scripts	Link	Link

Evaluation Performance

Parameters	WarpCTC
Model Version	V1.0
Resource	Ascend 910; OS Euler2.8
Uploaded Date	08/01/2020 (month/day/year)
MindSpore Version	0.6.0-alpha
Dataset	Captcha
batch_size	64
outputs	ACC
Accuracy	99.0%
Model for inference	20.3M (.ckpt file)

Inference Performance

Parameters	Ascend
Model Version	WarpCTC
Resource	Ascend 310; CentOS 3.10
Uploaded Date	24/05/2021 (month/day/year)
MindSpore Version	1.2.0
Dataset	Captcha
batch_size	1
outputs	Accuracy
Accuracy	Accuracy=0.952
Model for inference	40.6M(.ckpt file)

Description of Random Situation

In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py for weight initialization.

ModelZoo Homepage

Please check the official homepage.

README.md

Contents

WarpCTC Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Script Parameters

Parameters Configuration

Dataset Preparation

Training Process

Training

Distributed Training

Evaluation Process

Evaluation

Inference Process

Export MindIR

Infer on Ascend310

result

Model Description

Performance

Training Performance

Evaluation Performance

Inference Performance

Description of Random Situation

ModelZoo Homepage

README.md Unescape Escape

Contents

Training Script Parameters

Parameters Configuration

Export MindIR

Infer on Ascend310

result

Inference Performance

README.md