mindspore/model_zoo/official/cv/cnnctc
yuzhenhua 035fd48a78 adapt for new pkg 2021-05-31 11:44:43 +08:00
..
ascend310_infer cnnctc and warpctc add 310 infer 2021-05-26 21:27:11 +08:00
scripts adapt for new pkg 2021-05-31 11:44:43 +08:00
src Change TensorAdd to Add, from r1.1 to master 2021-02-01 17:53:52 +08:00
README.md cnnctc amend redme 2021-05-27 15:27:34 +08:00
README_CN.md cnnctc amend redme 2021-05-27 15:27:34 +08:00
eval.py Add modelzoo CNNCTC Network. 2020-09-28 15:39:34 +08:00
export.py cnnctc and warpctc add 310 infer 2021-05-26 21:27:11 +08:00
mindspore_hub_conf.py some model lack 'hub_conf.py' file at model_zoo 2021-04-29 09:07:52 +08:00
postprocess.py cnnctc and warpctc add 310 infer 2021-05-26 21:27:11 +08:00
train.py Modify cnnctc network save checkpoint. 2021-02-01 16:35:42 +08:00

README.md

Contents

CNNCTC Description

This paper proposes three major contributions to addresses scene text recognition (STR). First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules. Paper: J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” ArXiv, vol. abs/1904.01906, 2019.

Model Architecture

This is an example of training CNN+CTC model for text recognition on MJSynth and SynthText dataset with MindSpore.

Dataset

Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.

The MJSynth and SynthText dataset are used for model training. The The IIIT 5K-word dataset dataset is used for evaluation.

  • step 1:

All the datasets have been preprocessed and stored in .lmdb format and can be downloaded HERE.

  • step 2:

Uncompress the downloaded file, rename the MJSynth dataset as MJ, the SynthText dataset as ST and the IIIT dataset as IIIT.

  • step 3:

Move above mentioned three datasets into cnnctc_data folder, and the structure should be as below:

|--- CNNCTC/
    |--- cnnctc_data/
        |--- ST/
            data.mdb
            lock.mdb
        |--- MJ/
            data.mdb
            lock.mdb
        |--- IIIT/
            data.mdb
            lock.mdb

    ......
  • step 4:

Preprocess the dataset by running:

python src/preprocess_dataset.py

This takes around 75 minutes.

Features

Mixed Precision

The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching reduce precision.

Environment Requirements

Quick Start

  • Install dependencies:
pip install lmdb
pip install Pillow
pip install tqdm
pip install six
  • Standalone Training:
bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT
  • Distributed Training:
bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT
  • Evaluation:
bash scripts/run_eval_ascend.sh $TRAINED_CKPT

Script Description

Script and Sample Code

The entire code structure is as following:

|--- CNNCTC/
    |---README.md    // descriptions about cnnctc
    |---train.py    // train scripts
    |---eval.py    // eval scripts
    |---export.py    // export scripts
    |---pstprocess.py    // postprocess scripts
    |---ascend310_infer    // application for 310 inference
    |---scripts
        |---run_infer_310.sh    // shell script for infer on ascend310
        |---run_standalone_train_ascend.sh    // shell script for standalone on ascend
        |---run_distribute_train_ascend.sh    // shell script for distributed on ascend
        |---run_eval_ascend.sh    // shell script for eval on ascend
    |---src
        |---__init__.py    // init file
        |---cnn_ctc.py    // cnn_ctc network
        |---config.py    // total config
        |---callback.py    // loss callback file
        |---dataset.py    // process dataset
        |---util.py    // routine operation
        |---preprocess_dataset.py    // preprocess dataset

Script Parameters

Parameters for both training and evaluation can be set in config.py.

Arguments:

  • --CHARACTER: Character labels.
  • --NUM_CLASS: The number of classes including all character labels and the label for CTCLoss.
  • --HIDDEN_SIZE: Model hidden size.
  • --FINAL_FEATURE_WIDTH: The number of features.
  • --IMG_H The height of input image.
  • --IMG_W The width of input image.
  • --TRAIN_DATASET_PATH The path to training dataset.
  • --TRAIN_DATASET_INDEX_PATH The path to training dataset index file which determines the order .
  • --TRAIN_BATCH_SIZE Training batch size. The batch size and index file must ensure input data is in fixed shape.
  • --TRAIN_DATASET_SIZE Training dataset size.
  • --TEST_DATASET_PATH The path to test dataset.
  • --TEST_BATCH_SIZE Test batch size.
  • --TRAIN_EPOCHSTotal training epochs.
  • --CKPT_PATHThe path to model checkpoint file, can be used to resume training and evaluation.
  • --SAVE_PATHThe path to save model checkpoint file.
  • --LRLearning rate for standalone training.
  • --LR_PARALearning rate for distributed training.
  • --MOMENTUMMomentum.
  • --LOSS_SCALELoss scale to prevent gradient underflow.
  • --SAVE_CKPT_PER_N_STEPSave model checkpoint file per N steps.
  • --KEEP_CKPT_MAX_NUMThe maximum number of saved model checkpoint file.

Training Process

Training

  • Standalone Training:
bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT

Results and checkpoints are written to ./train folder. Log can be found in ./train/log and loss values are recorded in ./train/loss.log.

$PRETRAINED_CKPT is the path to model checkpoint and it is optional. If none is given the model will be trained from scratch.

  • Distributed Training:
bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT

Results and checkpoints are written to ./train_parallel_{i} folder for device i respectively. Log can be found in ./train_parallel_{i}/log_{i}.log and loss values are recorded in ./train_parallel_{i}/loss.log.

$RANK_TABLE_FILE is needed when you are running a distribute task on ascend. $PATH_TO_CHECKPOINT is the path to model checkpoint and it is optional. If none is given the model will be trained from scratch.

Training Result

Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the following in loss.log.

# distribute training result(8p)
epoch: 1 step: 1 , loss is 76.25, average time per step is 0.235177839748392712
epoch: 1 step: 2 , loss is 73.46875, average time per step is 0.25798572540283203
epoch: 1 step: 3 , loss is 69.46875, average time per step is 0.229678678512573
epoch: 1 step: 4 , loss is 64.3125, average time per step is 0.23512671788533527
epoch: 1 step: 5 , loss is 58.375, average time per step is 0.23149147033691406
epoch: 1 step: 6 , loss is 52.7265625, average time per step is 0.2292975425720215
...
epoch: 1 step: 8689 , loss is 9.706798802612482, average time per step is 0.2184656601312549
epoch: 1 step: 8690 , loss is 9.70612545289855, average time per step is 0.2184725407765116
epoch: 1 step: 8691 , loss is 9.70695776049204, average time per step is 0.21847309686135555
epoch: 1 step: 8692 , loss is 9.707279624277456, average time per step is 0.21847339290613375
epoch: 1 step: 8693 , loss is 9.70763437950938, average time per step is 0.2184720295013031
epoch: 1 step: 8694 , loss is 9.707695425072046, average time per step is 0.21847410284595573
epoch: 1 step: 8695 , loss is 9.708408273381295, average time per step is 0.21847338271072345
epoch: 1 step: 8696 , loss is 9.708703753591953, average time per step is 0.2184726025560777
epoch: 1 step: 8697 , loss is 9.709536406025824, average time per step is 0.21847212061114694
epoch: 1 step: 8698 , loss is 9.708542263610315, average time per step is 0.2184715309307257

Evaluation Process

Evaluation

  • Evaluation:
bash scripts/run_eval_ascend.sh $TRAINED_CKPT

The model will be evaluated on the IIIT dataset, sample results and overall accuracy will be printed.

Inference process

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [EXPORT_FORMAT]

The ckpt_file parameter is required, The file_name parameter is file name after export. EXPORT_FORMAT should be in ["AIR", "MINDIR"]

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model. Current batch_size can only be set to 1, modify the parameter TEST_BATCH_SIZE in config.py to 1 before export the model

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [LABEL_PATH] [DVPP] [DEVICE_ID]
  • DVPP is mandatory, and must choose from ["DVPP", "CPU"], it's case-insensitive. CNNCTC only support CPU mode .
  • DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

'Accuracy': 0.8546

Model Description

Performance

Training Performance

Parameters CNNCTC
Model Version V1
Resource Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date 09/28/2020 (month/day/year)
MindSpore Version 1.0.0
Dataset MJSynth,SynthText
Training Parameters epoch=3, batch_size=192
Optimizer RMSProp
Loss Function CTCLoss
Speed 1pc: 250 ms/step; 8pcs: 260 ms/step
Total time 1pc: 15 hours; 8pcs: 1.92 hours
Parameters (M) 177
Scripts https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/cnnctc

Evaluation Performance

Parameters CNNCTC
Model Version V1
Resource Ascend 910; OS Euler2.8
Uploaded Date 09/28/2020 (month/day/year)
MindSpore Version 1.0.0
Dataset IIIT5K
batch_size 192
outputs Accuracy
Accuracy 85%
Model for inference 675M (.ckpt file)

Inference Performance

Parameters Ascend
Model Version CNNCTC
Resource Ascend 310; CentOS 3.10
Uploaded Date 19/05/2021 (month/day/year)
MindSpore Version 1.2.0
Dataset IIIT5K
batch_size 1
outputs Accuracy
Accuracy Accuracy=0.8546
Model for inference 675M(.ckpt file)

How to use

Inference

If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this Link. Following the steps below, this is a simple example:

  • Running on Ascend

    # Set context
    context.set_context(mode=context.GRAPH_HOME, device_target=cfg.device_target)
    context.set_context(device_id=cfg.device_id)
    
    # Load unseen dataset for inference
    dataset = dataset.create_dataset(cfg.data_path, 1, False)
    
    # Define model
    net = CNNCTC(cfg.NUM_CLASS, cfg.HIDDEN_SIZE, cfg.FINAL_FEATURE_WIDTH)
    opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01,
                   cfg.momentum, weight_decay=cfg.weight_decay)
    loss = P.CTCLoss(preprocess_collapse_repeated=False,
                  ctc_merge_repeated=True,
                  ignore_longer_outputs_than_inputs=False)
    model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
    
    # Load pre-trained model
    param_dict = load_checkpoint(cfg.checkpoint_path)
    load_param_into_net(net, param_dict)
    net.set_train(False)
    
    # Make predictions on the unseen dataset
    acc = model.eval(dataset)
    print("accuracy: ", acc)
    

Continue Training on the Pretrained Model

  • running on Ascend

    # Load dataset
    dataset = create_dataset(cfg.data_path, 1)
    batch_num = dataset.get_dataset_size()
    
    # Define model
    net = CNNCTC(cfg.NUM_CLASS, cfg.HIDDEN_SIZE, cfg.FINAL_FEATURE_WIDTH)
    # Continue training if set pre_trained to be True
    if cfg.pre_trained:
        param_dict = load_checkpoint(cfg.checkpoint_path)
        load_param_into_net(net, param_dict)
    lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size,
                  steps_per_epoch=batch_num)
    opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()),
                   Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay)
    loss = P.CTCLoss(preprocess_collapse_repeated=False,
                  ctc_merge_repeated=True,
                  ignore_longer_outputs_than_inputs=False)
    model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
                  amp_level="O2", keep_batchnorm_fp32=False,                   loss_scale_manager=None)
    
    # Set callbacks
    config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5,
                                 keep_checkpoint_max=cfg.keep_checkpoint_max)
    time_cb = TimeMonitor(data_size=batch_num)
    ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./",
                                 config=config_ck)
    loss_cb = LossMonitor()
    
    # Start training
    model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
    print("train success")
    

ModelZoo Homepage

Please check the official homepage.