zhouneng 07484756e1 | ||
---|---|---|
.. | ||
scripts | ||
src | ||
README.md | ||
eval.py | ||
train.py |
README.md
DeepText for Ascend
- DeepText Description
- Model Architecture
- Dataset
- Features
- Environment Requirements
- Script Description
- Model Description
- Description of Random Situation
- ModelZoo Homepage
DeepText Description
DeepText is a convolutional neural network architecture for text detection in non-specific scenarios. The DeepText system is based on the elegant framework of Faster R-CNN. This idea was proposed in the paper "DeepText: A new approach for text proposal generation and text detection in natural images.", published in 2017.
Paper Zhuoyao Zhong, Lianwen Jin, Shuangping Huang, South China University of Technology (SCUT), Published in ICASSP 2017.
Model architecture
The overall network architecture of InceptionV4 is show below:
Dataset
Here we used 4 datasets for training, and 1 datasets for Evaluation.
- Dataset1: ICDAR 2013: Focused Scene Text
- Train: 142MB, 229 images
- Test: 110MB, 233 images
- Dataset2: ICDAR 2013: Born-Digital Images
- Train: 27.7MB, 410 images
- Dataset3: SCUT-FORU: Flickr OCR Universal Database
- Train: 388MB, 1715 images
- Dataset4: CocoText v2(Subset of MSCOCO2017):
- Train: 13GB, 63686 images
Features
Environment Requirements
- Hardware(Ascend)
- Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the application form to ascend@huawei.com. Once approved, you can get the resources.
- Framework
- For more information, please check the resources below:
Script description
Script and sample code
.
└─deeptext
├─README.md
├─scripts
├─run_standalone_train_ascend.sh # launch standalone training with ascend platform(1p)
├─run_distribute_train_ascend.sh # launch distributed training with ascend platform(8p)
└─run_eval_ascend.sh # launch evaluating with ascend platform
├─src
├─DeepText
├─__init__.py # package init file
├─anchor_genrator.py # anchor generator
├─bbox_assign_sample.py # proposal layer for stage 1
├─bbox_assign_sample_stage2.py # proposal layer for stage 2
├─deeptext_vgg16.py # main network definition
├─proposal_generator.py # proposal generator
├─rcnn.py # rcnn
├─roi_align.py # roi_align cell wrapper
├─rpn.py # region-proposal network
└─vgg16.py # backbone
├─config.py # training configuration
├─dataset.py # data proprocessing
├─lr_schedule.py # learning rate scheduler
├─network_define.py # network definition
└─utils.py # some functions which is commonly used
├─eval.py # eval net
├─export.py # export checkpoint, surpport .onnx, .air, .mindir convert
└─train.py # train net
Training process
Usage
- Ascend:
# distribute training example(8p)
sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
# standalone training
sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
# evaluation:
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
Notes: RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it's better to export an external environment variable
export HCCL_CONNECT_TIMEOUT=600
to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.This is processor cores binding operation regarding the
device_num
and total processor numbers. If you are not expect to do it, remove the operationstaskset
inscripts/run_distribute_train.sh
The
pretrained_path
should be a checkpoint of vgg16 trained on Imagenet2012. The name of weight in dict should be totally the same, also the batch_norm should be enabled in the trainig of vgg16, otherwise fails in further steps. COCO_TEXT_PARSER_PATH coco_text.py can refer to Link.
Launch
# training example
shell:
Ascend:
# distribute training example(8p)
sh run_distribute_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [RANK_TABLE_FILE] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH]
# standalone training
sh run_standalone_train_ascend.sh [IMGS_PATH] [ANNOS_PATH] [PRETRAINED_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
Result
Training result will be stored in the example path. Checkpoints will be stored at ckpt_path
by default, and training log will be redirected to ./log
, also the loss will be redirected to ./loss_0.log
like followings.
469 epoch: 1 step: 982 ,rpn_loss: 0.03940, rcnn_loss: 0.48169, rpn_cls_loss: 0.02910, rpn_reg_loss: 0.00344, rcnn_cls_loss: 0.41943, rcnn_reg_loss: 0.06223, total_loss: 0.52109
659 epoch: 2 step: 982 ,rpn_loss: 0.03607, rcnn_loss: 0.32129, rpn_cls_loss: 0.02916, rpn_reg_loss: 0.00230, rcnn_cls_loss: 0.25732, rcnn_reg_loss: 0.06390, total_loss: 0.35736
847 epoch: 3 step: 982 ,rpn_loss: 0.07074, rcnn_loss: 0.40527, rpn_cls_loss: 0.03494, rpn_reg_loss: 0.01193, rcnn_cls_loss: 0.30591, rcnn_reg_loss: 0.09937, total_loss: 0.47601
Eval process
Usage
You can start training using python or shell scripts. The usage of shell scripts as follows:
- Ascend:
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
Launch
# eval example
shell:
Ascend:
sh run_eval_ascend.sh [IMGS_PATH] [ANNOS_PATH] [CHECKPOINT_PATH] [COCO_TEXT_PARSER_PATH] [DEVICE_ID]
checkpoint can be produced in training process.
Result
Evaluation result will be stored in the example path, you can find result like the followings in log
.
========================================
class 1 precision is 88.01%, recall is 82.77%
Model description
Performance
Training Performance
Parameters | Ascend |
---|---|
Model Version | Deeptext |
Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
uploaded Date | 12/26/2020 |
MindSpore Version | 1.1.0 |
Dataset | 66040 images |
Batch_size | 2 |
Training Parameters | src/config.py |
Optimizer | Momentum |
Loss Function | SoftmaxCrossEntropyWithLogits for classification, SmoothL2Loss for bbox regression |
Loss | ~0.008 |
Total time (8p) | 4h |
Scripts | deeptext script |
Inference Performance
Parameters | Ascend |
---|---|
Model Version | Deeptext |
Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
Uploaded Date | 12/26/2020 |
MindSpore Version | 1.1.0 |
Dataset | 229 images |
Batch_size | 2 |
Accuracy | precision=0.8801, recall=0.8277 |
Total time | 1 min |
Model for inference | 3492M (.ckpt file) |
Training performance results
Ascend | train performance |
---|---|
1p | 14 img/s |
Ascend | train performance |
---|---|
8p | 50 img/s |
Description of Random Situation
We set seed to 1 in train.py.
ModelZoo Homepage
Please check the official homepage.