mindspore/model_zoo/official/cv/deeptext
zhouneng 07484756e1 solve the problem of decreased accuracy caused by changes in the optimizer mechanism 2021-03-09 20:01:22 +08:00
..
scripts Modify readme of deeptext 2021-01-15 11:03:25 +08:00
src solve the problem of decreased accuracy caused by changes in the optimizer mechanism 2021-03-09 20:01:22 +08:00
README.md fix error links for master 2021-01-20 10:44:12 +08:00
eval.py fix typos in lincense 2021-03-03 14:15:28 +08:00
train.py fix typos in lincense 2021-03-03 14:15:28 +08:00

README.md

DeepText for Ascend

DeepText Description

DeepText is a convolutional neural network architecture for text detection in non-specific scenarios. The DeepText system is based on the elegant framework of Faster R-CNN. This idea was proposed in the paper "DeepText: A new approach for text proposal generation and text detection in natural images.", published in 2017.

Paper Zhuoyao Zhong, Lianwen Jin, Shuangping Huang, South China University of Technology (SCUT), Published in ICASSP 2017.

Model architecture

The overall network architecture of InceptionV4 is show below:

Link

Dataset

Here we used 4 datasets for training, and 1 datasets for Evaluation.

  • Dataset1: ICDAR 2013: Focused Scene Text
    • Train: 142MB, 229 images
    • Test: 110MB, 233 images
  • Dataset2: ICDAR 2013: Born-Digital Images
    • Train: 27.7MB, 410 images
  • Dataset3: SCUT-FORU: Flickr OCR Universal Database
    • Train: 388MB, 1715 images
  • Dataset4: CocoText v2(Subset of MSCOCO2017):
    • Train: 13GB, 63686 images

Features

Environment Requirements

Script description

Script and sample code

.
└─deeptext
  ├─README.md
  ├─scripts
    ├─run_standalone_train_ascend.sh    # launch standalone training with ascend platform(1p)
    ├─run_distribute_train_ascend.sh    # launch distributed training with ascend platform(8p)
    └─run_eval_ascend.sh                # launch evaluating with ascend platform
  ├─src
    ├─DeepText
      ├─__init__.py                     # package init file
      ├─anchor_genrator.py              # anchor generator
      ├─bbox_assign_sample.py           # proposal layer for stage 1
      ├─bbox_assign_sample_stage2.py    # proposal layer for stage 2
      ├─deeptext_vgg16.py               # main network definition
      ├─proposal_generator.py           # proposal generator
      ├─rcnn.py                         # rcnn
      ├─roi_align.py                    # roi_align cell wrapper
      ├─rpn.py                          # region-proposal network
      └─vgg16.py                        # backbone
    ├─config.py                       # training configuration
    ├─dataset.py                      # data proprocessing
    ├─lr_schedule.py                  # learning rate scheduler
    ├─network_define.py               # network definition
    └─utils.py                        # some functions which is commonly used
  ├─eval.py                           # eval net
  ├─export.py                         # export checkpoint, surpport .onnx, .air, .mindir convert
  └─train.py                          # train net

Training process

Usage

  • Ascend:
# distribute training example(8p)
sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
# standalone training
sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
# evaluation:
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]

Notes: RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it's better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.

This is processor cores binding operation regarding the device_num and total processor numbers. If you are not expect to do it, remove the operations taskset in scripts/run_distribute_train.sh

The pretrained_path should be a checkpoint of vgg16 trained on Imagenet2012. The name of weight in dict should be totally the same, also the batch_norm should be enabled in the trainig of vgg16, otherwise fails in further steps. COCO_TEXT_PARSER_PATH coco_text.py can refer to Link.

Launch

# training example
  shell:
    Ascend:
      # distribute training example(8p)
      sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
      # standalone training
      sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]

Result

Training result will be stored in the example path. Checkpoints will be stored at ckpt_path by default, and training log will be redirected to ./log, also the loss will be redirected to ./loss_0.log like followings.

469 epoch: 1 step: 982 ,rpn_loss: 0.03940, rcnn_loss: 0.48169, rpn_cls_loss: 0.02910, rpn_reg_loss: 0.00344, rcnn_cls_loss: 0.41943, rcnn_reg_loss: 0.06223, total_loss: 0.52109
659 epoch: 2 step: 982 ,rpn_loss: 0.03607, rcnn_loss: 0.32129, rpn_cls_loss: 0.02916, rpn_reg_loss: 0.00230, rcnn_cls_loss: 0.25732, rcnn_reg_loss: 0.06390, total_loss: 0.35736
847 epoch: 3 step: 982 ,rpn_loss: 0.07074, rcnn_loss: 0.40527, rpn_cls_loss: 0.03494, rpn_reg_loss: 0.01193, rcnn_cls_loss: 0.30591, rcnn_reg_loss: 0.09937, total_loss: 0.47601

Eval process

Usage

You can start training using python or shell scripts. The usage of shell scripts as follows:

  • Ascend:
  sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]

Launch

# eval example
  shell:
      Ascend:
            sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]

checkpoint can be produced in training process.

Result

Evaluation result will be stored in the example path, you can find result like the followings in log.

========================================

class 1 precision is 88.01%, recall is 82.77%

Model description

Performance

Training Performance

Parameters Ascend
Model Version Deeptext
Resource Ascend 910, cpu:2.60GHz 192cores, memory:755G
uploaded Date 12/26/2020
MindSpore Version 1.1.0
Dataset 66040 images
Batch_size 2
Training Parameters src/config.py
Optimizer Momentum
Loss Function SoftmaxCrossEntropyWithLogits for classification, SmoothL2Loss for bbox regression
Loss ~0.008
Total time (8p) 4h
Scripts deeptext script

Inference Performance

Parameters Ascend
Model Version Deeptext
Resource Ascend 910, cpu:2.60GHz 192cores, memory:755G
Uploaded Date 12/26/2020
MindSpore Version 1.1.0
Dataset 229 images
Batch_size 2
Accuracy precision=0.8801, recall=0.8277
Total time 1 min
Model for inference 3492M (.ckpt file)

Training performance results

Ascend train performance
1p 14 img/s
Ascend train performance
8p 50 img/s

Description of Random Situation

We set seed to 1 in train.py.

ModelZoo Homepage

Please check the official homepage.