diff --git a/model_zoo/official/cv/maskrcnn/README.md b/model_zoo/official/cv/maskrcnn/README.md index 5e70c41f706..df4e699d2b5 100644 --- a/model_zoo/official/cv/maskrcnn/README.md +++ b/model_zoo/official/cv/maskrcnn/README.md @@ -1,104 +1,334 @@ -# MaskRcnn Example +# Contents + +- [MaskRCNN Description](#maskrcnn-description) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Environment Requirements](#environment-requirements) +- [Quick Start](#quick-start) +- [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Script Parameters](#training-script-parameters) + - [Parameters Configuration](#parameters-configuration) + - [Training Process](#training-process) + - [Training](#training) + - [Distributed Training](#distributed-training) + - [Training Result](#training-result) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) + - [Evaluation Result](#evaluation-result) +- [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#training-performance) + - [Evaluation Performance](#evaluation-performance) +- [Description of Random Situation](#description-of-random-situation) +- [ModelZoo Homepage](#modelzoo-homepage) + +# [MaskRCNN Description](#contents) +MaskRCNN is a conceptually simple, flexible, and general framework for object instance segmentation. The approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in +parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing to estimate human poses in the same framework. +It shows top results in all three tracks of the COCO suite of challenges, including instance segmentation, boundingbox object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. + +# [Model Architecture](#contents) +MaskRCNN is a two-stage target detection network. It extends FasterRCNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.This network uses a region proposal network (RPN), which can share the convolution features of the whole image with the detection network, so that the calculation of region proposal is almost cost free. The whole network further combines RPN and mask branch into a network by sharing the convolution features. + +[Paper](http://cn.arxiv.org/pdf/1703.06870v3): Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick. "MaskRCNN" + +# [Dataset](#contents) +- [COCO2017](https://cocodataset.org/) is a popular dataset with bounding-box and pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. There are 118K/5K images for train/val. + +- Dataset size: 19G + - Train: 18G, 118000 images + - Val: 1G, 5000 images + - Annotations: 241M, instances, captions, person_keypoints, etc. -## Description +- Data format: image and json files + - Note: Data will be processed in dataset.py + +# [Environment Requirements](#contents) + +- Hardware(Ascend) + - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. +- Framework + - [MindSpore](https://gitee.com/mindspore/mindspore) +- For more information, please check the resources below: + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/en/master/index.html) + - [MindSpore API](https://www.mindspore.cn/api/en/master/index.html) -MaskRcnn is a two-stage target detection network,This network uses a region proposal network (RPN), which can share the convolution features of the whole image with the detection network, so that the calculation of region proposal is almost cost free. The whole network further combines RPN and MaskRcnn into a network by sharing the convolution features. +- third-party libraries -## Requirements +```bash +pip install Cython +pip install pycocotools +pip install mmcv=0.2.14 +``` + + +# [Quick Start](#contents) -- Install [MindSpore](https://www.mindspore.cn/install/en). +1. Download the dataset COCO2017. -- Download the dataset COCO2017. +2. Change the COCO_ROOT and other settings you need in `config.py`. The directory structure should look like the follows: + ``` + . + └─cocodataset + ├─annotations + ├─instance_train2017.json + └─instance_val2017.json + ├─val2017 + └─train2017 + ``` + If you use your own dataset to train the network, **Select dataset to other when run script.** + Create a txt file to store dataset information organized in the way as shown as following: + ``` + train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 + ``` + Each row is an image annotation split by spaces. The first column is a relative path of image, followed by columns containing box and class information in the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `IMAGE_DIR`(dataset directory) and the relative path in `ANNO_PATH`(the TXT file path), which can be set in `config.py`. -- We use coco2017 as training dataset in this example by default, and you can also use your own datasets. +3. Execute train script. + After dataset preparation, you can start training as follows: + ``` + # distributed training + sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT] + + # standalone training + sh run_standalone_train.sh [PRETRAINED_CKPT] + ``` + Note: + 1. To speed up data preprocessing, MindSpore provide a data format named MindRecord, hence the first step is to generate MindRecord files based on COCO2017 dataset before training. The process of converting raw COCO2017 dataset to MindRecord format may take about 4 hours. + 2. For distributed training, a [hccl configuration file](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools) with JSON format needs to be created in advance. + 3. PRETRAINED_CKPT is a resnet50 checkpoint that trained over ImageNet2012. - 1. If coco dataset is used. **Select dataset to coco when run script.** - Install Cython and pycocotool, and you can also install mmcv to process data. +4. Execute eval script. + After training, you can start evaluation as follows: + + ```bash + # Evaluation + sh run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] + ``` + Note: + 1. VALIDATION_JSON_FILE is a label json file for evaluation. - ``` - pip install Cython +# [Script Description](#contents) - pip install pycocotools +## [Script and Sample Code](#contents) - pip install mmcv - ``` - And change the COCO_ROOT and other settings you need in `config.py`. The directory structure is as follows: - - - ``` - . - └─cocodataset - ├─annotations - ├─instance_train2017.json - └─instance_val2017.json - ├─val2017 - └─train2017 - - ``` - Notice that the coco2017 dataset will be converted to MindRecord which is a data format in MindSpore. The dataset conversion may take about 4 hours. - 2. If your own dataset is used. **Select dataset to other when run script.** - Organize the dataset infomation into a TXT file, each row in the file is as follows: - - ``` - train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 - ``` - - Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `IMAGE_DIR`(dataset directory) and the relative path in `ANNO_PATH`(the TXT file path), `IMAGE_DIR` and `ANNO_PATH` are setting in `config.py`. - - -## Example structure - ```shell . -└─maskrcnn - ├─README.md - ├─scripts - ├─run_download_process_data.sh - ├─run_standalone_train.sh - ├─run_train.sh - └─run_eval.sh +└─MaskRcnn + ├─README.md # README + ├─scripts # shell script + ├─run_standalone_train.sh # training in standalone mode(1pcs) + ├─run_distribute_train.sh # training in parallel mode(8 pcs) + └─run_eval.sh # evaluation ├─src ├─maskrcnn ├─__init__.py - ├─anchor_generator.py - ├─bbox_assign_sample.py - ├─bbox_assign_sample_stage2.py - ├─mask_rcnn_r50.py - ├─fpn_neck.py - ├─proposal_generator.py - ├─rcnn_cls.py - ├─rcnn_mask.py - ├─resnet50.py - ├─roi_align.py - └─rpn.py - ├─config.py - ├─dataset.py - ├─lr_schedule.py - ├─network_define.py - └─util.py - ├─eval.py - └─train.py + ├─anchor_generator.py # generate base bounding box anchors + ├─bbox_assign_sample.py # filter positive and negative bbox for the first stage learning + ├─bbox_assign_sample_stage2.py # filter positive and negative bbox for the second stage learning + ├─mask_rcnn_r50.py # main network architecture of maskrcnn + ├─fpn_neck.py # fpn network + ├─proposal_generator.py # generate proposals based on feature map + ├─rcnn_cls.py # rcnn bounding box regression branch + ├─rcnn_mask.py # rcnn mask branch + ├─resnet50.py # backbone network + ├─roi_align.py # roi align network + └─rpn.py # reagion proposal network + ├─config.py # network configuration + ├─dataset.py # dataset utils + ├─lr_schedule.py # leanring rate geneatore + ├─network_define.py # network define for maskrcnn + └─util.py # routine operation + ├─eval.py # evaluation scripts + └─train.py # training scripts ``` -## Running the example - -### Train - -#### Usage +## [Script Parameters](#contents) +### [Training Script Parameters](#contents) ``` # distributed training -sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] +Usage: sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] +# standalone training +Usage: sh run_standalone_train.sh [PRETRAINED_MODEL] +``` + +### [Parameters Configuration](#contents) +``` +"img_width": 1280, # width of the input images +"img_height": 768, # height of the input images + +# random threshold in data augmentation +"keep_ratio": True, +"flip_ratio": 0.5, +"photo_ratio": 0.5, +"expand_ratio": 1.0, + +"max_instance_count": 128, # max number of bbox for each image +"mask_shape": (28, 28), # shape of mask in rcnn_mask + +# anchor +"feature_shapes": [(192, 320), (96, 160), (48, 80), (24, 40), (12, 20)], # shape of fpn feaure maps +"anchor_scales": [8], # area of base anchor +"anchor_ratios": [0.5, 1.0, 2.0], # ratio between width of height of base anchors +"anchor_strides": [4, 8, 16, 32, 64], # stride size of each feature map levels +"num_anchors": 3, # anchor number for each pixel + +# resnet +"resnet_block": [3, 4, 6, 3], # block number in each layer +"resnet_in_channels": [64, 256, 512, 1024], # in channel size for each layer +"resnet_out_channels": [256, 512, 1024, 2048], # out channel size for each layer + +# fpn +"fpn_in_channels": [256, 512, 1024, 2048], # in channel size for each layer +"fpn_out_channels": 256, # out channel size for every layer +"fpn_num_outs": 5, # out feature map size + +# rpn +"rpn_in_channels": 256, # in channel size +"rpn_feat_channels": 256, # feature out channel size +"rpn_loss_cls_weight": 1.0, # weight of bbox classification in rpn loss +"rpn_loss_reg_weight": 1.0, # weight of bbox regression in rpn loss +"rpn_cls_out_channels": 1, # classification out channel size +"rpn_target_means": [0., 0., 0., 0.], # bounding box decode/encode means +"rpn_target_stds": [1.0, 1.0, 1.0, 1.0], # bounding box decode/encode stds + +# bbox_assign_sampler +"neg_iou_thr": 0.3, # negative sample threshold after IOU +"pos_iou_thr": 0.7, # positive sample threshold after IOU +"min_pos_iou": 0.3, # minimal positive sample threshold after IOU +"num_bboxes": 245520, # total bbox numner +"num_gts": 128, # total ground truth number +"num_expected_neg": 256, # negative sample number +"num_expected_pos": 128, # positive sample number + +# proposal +"activate_num_classes": 2, # class number in rpn classification +"use_sigmoid_cls": True, # whethre use sigmoid as loss function in rpn classification + +# roi_alignj +"roi_layer": dict(type='RoIAlign', out_size=7, mask_out_size=14, sample_num=2), # ROIAlign parameters +"roi_align_out_channels": 256, # ROIAlign out channels size +"roi_align_featmap_strides": [4, 8, 16, 32], # stride size for differnt level of ROIAling feature map +"roi_align_finest_scale": 56, # finest scale ofr ROIAlign +"roi_sample_num": 640, # sample number in ROIAling layer + +# bbox_assign_sampler_stage2 # bbox assign sample for the second stage, parameter meaning is similar with bbox_assign_sampler +"neg_iou_thr_stage2": 0.5, +"pos_iou_thr_stage2": 0.5, +"min_pos_iou_stage2": 0.5, +"num_bboxes_stage2": 2000, +"num_expected_pos_stage2": 128, +"num_expected_neg_stage2": 512, +"num_expected_total_stage2": 512, + +# rcnn # rcnn parameter for the second stage, parameter meaning is similar with fpn +"rcnn_num_layers": 2, +"rcnn_in_channels": 256, +"rcnn_fc_out_channels": 1024, +"rcnn_mask_out_channels": 256, +"rcnn_loss_cls_weight": 1, +"rcnn_loss_reg_weight": 1, +"rcnn_loss_mask_fb_weight": 1, +"rcnn_target_means": [0., 0., 0., 0.], +"rcnn_target_stds": [0.1, 0.1, 0.2, 0.2], + +# train proposal +"rpn_proposal_nms_across_levels": False, +"rpn_proposal_nms_pre": 2000, # proposal number before nms in rpn +"rpn_proposal_nms_post": 2000, # proposal number after nms in rpn +"rpn_proposal_max_num": 2000, # max proposal number in rpn +"rpn_proposal_nms_thr": 0.7, # nms threshold for nms in rpn +"rpn_proposal_min_bbox_size": 0, # min size of box in rpn + +# test proposal # part of parameters are similar with train proposal +"rpn_nms_across_levels": False, +"rpn_nms_pre": 1000, +"rpn_nms_post": 1000, +"rpn_max_num": 1000, +"rpn_nms_thr": 0.7, +"rpn_min_bbox_min_size": 0, +"test_score_thr": 0.05, # score threshold +"test_iou_thr": 0.5, # IOU threshold +"test_max_per_img": 100, # max number of instance +"test_batch_size": 2, # batch size + +"rpn_head_loss_type": "CrossEntropyLoss", # loss type in rpn +"rpn_head_use_sigmoid": True, # whether use sigmoid or not in rpn +"rpn_head_weight": 1.0, # rpn head weight in loss +"mask_thr_binary": 0.5, # mask threshold for in rcnn + +# LR +"base_lr": 0.02, # base learning rate +"base_step": 58633, # bsae step in lr generator +"total_epoch": 13, # total epoch in lr generator +"warmup_step": 500, # warmp up step in lr generator +"warmup_mode": "linear", # warmp up mode +"warmup_ratio": 1/3.0, # warpm up ratio +"sgd_momentum": 0.9, # momentum in optimizer + +# train +"batch_size": 2, +"loss_scale": 1, +"momentum": 0.91, +"weight_decay": 1e-4, +"pretrain_epoch_size": 0, # pretrained epoch size +"epoch_size": 12, # total epoch size +"save_checkpoint": True, # whether save checkpoint or not +"save_checkpoint_epochs": 1, # save checkpoint interval +"keep_checkpoint_max": 12, # max number of saved checkpoint +"save_checkpoint_path": "./checkpoint", # path of checkpoint + +"mindrecord_dir": "/home/maskrcnn/MindRecord_COCO2017_Train", # path of mindrecord +"coco_root": "/home/maskrcnn/", # path of coco root dateset +"train_data_type": "train2017", # name of train dataset +"val_data_type": "val2017", # name of evaluation dataset +"instance_set": "annotations/instances_{}.json", # name of annotation +"coco_classes": ('background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', + 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', + 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', + 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', + 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', + 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', + 'kite', 'baseball bat', 'baseball glove', 'skateboard', + 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', + 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', + 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', + 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', + 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', + 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', + 'refrigerator', 'book', 'clock', 'vase', 'scissors', + 'teddy bear', 'hair drier', 'toothbrush'), +"num_classes": 81 +``` + + +## [Training Process](#contents) + +- Set options in `config.py`, including loss_scale, learning rate and network hyperparameters. Click [here](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#mindspore) for more information about dataset. + +### [Training](#content) +- Run `run_standalone_train.sh` for non-distributed training of MaskRCNN model. + +``` # standalone training sh run_standalone_train.sh [PRETRAINED_MODEL] ``` + +### [Distributed Training](#content) + +- Run `run_distribute_train.sh` for distributed training of Mask model. +``` +sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] +``` > hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). -> As for PRETRAINED_MODEL,if not set, the model will be trained from the very beginning.Ready-made pretrained_models are not available now. Stay tuned. +> As for PRETRAINED_MODEL, if not set, the model will be trained from the very beginning. Ready-made pretrained_models are not available now. Stay tuned. + +### [Training Result](#content) -#### Result - Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in loss.log. @@ -113,10 +343,11 @@ epoch: 11 step: 7393 ,rpn_loss: 0.03772, rcnn_loss: 0.60791, rpn_cls_loss: 0.030 epoch: 12 step: 7393 ,rpn_loss: 0.06482, rcnn_loss: 0.47681, rpn_cls_loss: 0.04770, rpn_reg_loss: 0.01709, rcnn_cls_loss: 0.16492, rcnn_reg_loss: 0.04990, rcnn_mask_loss: 0.26196, total_loss: 0.54163 ``` -### Evaluation - -#### Usage - +## [Evaluation Process](#contents) + +### [Evaluation](#content) + +- Run `run_eval.sh` for evaluation. ``` # infer sh run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH] @@ -124,38 +355,80 @@ sh run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH] > As for the COCO2017 dataset, VALIDATION_ANN_FILE_JSON is refer to the annotations/instances_val2017.json in the dataset directory. > checkpoint can be produced and saved in training process, whose folder name begins with "train/checkpoint" or "train_parallel*/checkpoint". -#### Result +### [Evaluation result](#content) Inference result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log. ``` Evaluate annotation type *bbox* Accumulating evaluation results... - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366 - Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.591 - Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393 - Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.241 - Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.405 - Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.454 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.304 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.492 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.521 - Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.372 - Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.560 - Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.637 + Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.376 + Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.598 + Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.405 + Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.239 + Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.414 + Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.475 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.311 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.500 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528 + Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371 + Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.572 + Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.653 Evaluate annotation type *segm* Accumulating evaluation results... - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.318 - Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.546 - Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.332 - Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.165 - Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.348 - Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.449 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.272 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.421 - Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.440 - Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.292 - Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.479 + Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.326 + Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.553 + Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.344 + Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.169 + Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.356 + Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.462 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.278 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.426 + Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.445 + Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.294 + Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.484 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.558 -``` \ No newline at end of file +``` + +# Model Description +## Performance + +### Training Performance + +| Parameters | MaskRCNN | +| -------------------------- | ----------------------------------------------------------- | +| Model Version | V1 | +| Resource | Ascend 910; CPU 2.60GHz, 56cores; Memory, 314G | +| uploaded Date | 08/01/2020 (month/day/year) | +| MindSpore Version | 0.6.0-alpha | +| Dataset | COCO2017 | +| Training Parameters | epoch=12, batch_size = 2 | +| Optimizer | SGD | +| Loss Function | Softmax Cross Entropy ,Sigmoid Cross Entropy,SmoothL1Loss | +| Speed | 1pc: 250 ms/step; 8pcs: 260 ms/step | +| Total time | 1pc: 52 hours; 8pcs: 6.6 hours | +| Parameters (M) | 280 | +| Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/maskrcnn | + + +### Evaluation Performance + +| Parameters | MaskRCNN | +| ------------------- | --------------------------- | +| Model Version | V1 | +| Resource | Ascend 910 | +| Uploaded Date | 08/01/2020 (month/day/year) | +| MindSpore Version | 0.6.0-alpha | +| Dataset | COCO2017 | +| batch_size | 2 | +| outputs | mAP | +| Accuracy | IoU=0.50:0.95 32.4% | +| Model for inference | 254M (.ckpt file) | + + +# [Description of Random Situation](#contents) +In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py for weight initialization. + +# [ModelZoo Homepage](#contents) +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). \ No newline at end of file diff --git a/model_zoo/official/cv/warpctc/README.md b/model_zoo/official/cv/warpctc/README.md index 9f59e89f604..cf046bfd775 100644 --- a/model_zoo/official/cv/warpctc/README.md +++ b/model_zoo/official/cv/warpctc/README.md @@ -1,30 +1,104 @@ -# Warpctc Example +# Contents -## Description +- [WarpCTC Description](#warpctc-description) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Environment Requirements](#environment-requirements) +- [Quick Start](#quick-start) +- [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Script Parameters](#script-parameters) + - [Training Script Parameters](#training-script-parameters) + - [Parameters Configuration](#parameters-configuration) + - [Dataset Preparation](#dataset-preparation) + - [Training Process](#training-process) + - [Training](#training) + - [Distributed Training](#distributed-training) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) +- [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#training-performance) + - [Evaluation Performance](#evaluation-performance) +- [Description of Random Situation](#description-of-random-situation) +- [ModelZoo Homepage](#modelzoo-homepage) -These is an example of training Warpctc with self-generated captcha image dataset in MindSpore. +# [WarpCTC Description](#contents) -## Requirements +This is an example of training WarpCTC with self-generated captcha image dataset in MindSpore. -- Install [MindSpore](https://www.mindspore.cn/install/en). +# [Model Architecture](#content) -- Generate captcha images. +WarpCTC is a two-layer stacked LSTM appending with one-layer FC neural network. See src/warpctc.py for details. -> The [captcha](https://github.com/lepture/captcha) library can be used to generate captcha images. You can generate the train and test dataset by yourself or just run the script `scripts/run_process_data.sh`. By default, the shell script will generate 10000 test images and 50000 train images separately. -> ``` -> $ cd scripts -> $ sh run_process_data.sh -> -> # after execution, you will find the dataset like the follows: -> . -> └─warpctc -> └─data -> ├─ train # train dataset -> └─ test # evaluate dataset -> ... +# [Dataset](#content) + +The dataset is self-generated using a third-party library called [captcha](https://github.com/lepture/captcha), which can randomly generate digits from 0 to 9 in image. In this network, we set the length of digits varying from 1 to 4. + +# [Environment Requirements](#contents) + +- Hardware(Ascend/GPU) + - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. You will be able to have access to related resources once approved. +- Framework + - [MindSpore](https://gitee.com/mindspore/mindspore) +- For more information, please check the resources below: + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/en/master/index.html) + - [MindSpore API](https://www.mindspore.cn/api/en/master/index.html) -## Structure +# [Quick Start](#contents) + +- Generate dataset. + + Run the script `scripts/run_process_data.sh` to generate a dataset. By default, the shell script will generate 10000 test images and 50000 train images separately. + + ``` + $ cd scripts + $ sh run_process_data.sh + + # after execution, you will find the dataset like the follows: + . + └─warpctc + └─data + ├─ train # train dataset + └─ test # evaluate dataset + ``` + +- After the dataset is prepared, you may start running the training or the evaluation scripts as follows: + + - Running on Ascend + ``` + # distribute training example in Ascend + $ bash run_distribute_train.sh rank_table.json ../data/train + + # evaluation example in Ascend + $ bash run_eval.sh ../data/test warpctc-30-97.ckpt Ascend + + # standalone training example in Ascend + $ bash run_standalone_train.sh ../data/train Ascend + ``` + For distributed training, a hccl configuration file with JSON format needs to be created in advance. + + Please follow the instructions in the link below: + + https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. + + - Running on GPU + + ``` + # distribute training example in GPU + $ bash run_distribute_train_for_gpu.sh 8 ../data/train + + # standalone training example in GPU + $ bash run_standalone_train.sh ../data/train GPU + + # evaluation example in GPU + $ bash run_eval.sh ../data/test warpctc-30-97.ckpt GPU + ``` + +# [Script Description](#contents) + +## [Script and Sample Code](#contents) ```shell . @@ -43,14 +117,27 @@ These is an example of training Warpctc with self-generated captcha image datase ├── lr_generator.py # generate learning rate for each step ├── metric.py # accuracy metric for warpctc network ├── warpctc.py # warpctc network definition - └── warpctc_for_train.py # warp network with grad, loss and gradient clip + └── warpctc_for_train.py # warpctc network with grad, loss and gradient clip ├── eval.py # eval net ├── process_data.py # dataset generation script └── train.py # train net ``` +## [Script Parameters](#contents) -## Parameter configuration +### Training Script Parameters +``` +# distributed training in Ascend +Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] + +# distributed training in GPU +Usage: bash run_distribute_train_for_gpu.sh [RANK_SIZE] [DATASET_PATH] + +# standalone training +Usage: bash run_standalone_train.sh [DATASET_PATH] [PLATFORM] +``` + +### Parameters Configuration Parameters for both training and evaluation can be set in config.py. @@ -69,82 +156,82 @@ Parameters for both training and evaluation can be set in config.py. "save_checkpoint_path": "./checkpoint", # path to save checkpoint ``` -## Running the example +## [Dataset Preparation](#contents) +- You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a captcha dataset by yourself. -### Train +## [Training Process](#contents) -#### Usage +- Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#mindspore) for more information about dataset. + +### [Training](#contents) +- Run `run_standalone_train.sh` for non-distributed training of WarpCTC model, either on Ascend or on GPU. +``` bash +bash run_standalone_train.sh [DATASET_PATH] [PLATFORM] ``` -# distributed training in Ascend -Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] + +### [Distributed Training](#contents) +- Run `run_distribute_train.sh` for distributed training of WarpCTC model on Ascend. -# distributed training in GPU -Usage: bash run_distribute_train_for_gpu.sh [RANK_SIZE] [DATASET_PATH] - -# standalone training -Usage: bash run_standalone_train.sh [DATASET_PATH] [PLATFORM] +``` bash +bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] ``` -#### Launch - -``` -# distribute training example in Ascend -bash run_distribute_train.sh rank_table.json ../data/train - -# distribute training example in GPU -bash run_distribute_train_for_gpu.sh 8 ../data/train - -# standalone training example in Ascend -bash run_standalone_train.sh ../data/train Ascend - -# standalone training example in GPU -bash run_standalone_train.sh ../data/train GPU +- Run `run_distribute_train_gpu.sh` for distributed training of WarpCTC model on GPU. +``` bash +bash run_distribute_train_gpu.sh [RANK_SIZE] [DATASET_PATH] ``` -> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html). +## [Evaluation Process](#contents) +### [Evaluation](#contents) -#### Result - -Training result will be stored in folder `scripts`, whose name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the followings in log. - -``` -# distribute training result(8 pcs) -Epoch: [ 1/ 30], step: [ 97/ 97], loss: [0.5853/0.5853], time: [376813.7944] -Epoch: [ 2/ 30], step: [ 97/ 97], loss: [0.4007/0.4007], time: [75882.0951] -Epoch: [ 3/ 30], step: [ 97/ 97], loss: [0.0921/0.0921], time: [75150.9385] -Epoch: [ 4/ 30], step: [ 97/ 97], loss: [0.1472/0.1472], time: [75135.0193] -Epoch: [ 5/ 30], step: [ 97/ 97], loss: [0.0186/0.0186], time: [75199.5809] -... +- Run `run_eval.sh` for evaluation. +``` bash +bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [PLATFORM] ``` +# [Model Description](#contents) -### Evaluation +## [Performance](#contents) -#### Usage +### [Training Performance](#contents) -``` -# evaluation -Usage: bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [PLATFORM] -``` +| Parameters | Ascend 910 | GPU | +| -------------------------- | --------------------------------------------- |---------------------------------- | +| Model Version | v1.0 | v1.0 | +| Resource | Ascend 910,CPU 2.60GHz 56cores,Memory 314G | GPU(Tesla V100 SXM2),CPU 2.1GHz 24cores,Memory 128G / +| uploaded Date | 07/01/2020 (month/day/year) | 08/01/2020 (month/day/year) | +| MindSpore Version | 0.5.0-alpha | 0.6.0-alpha | +| Dataset | Captcha | Captcha | +| Training Parameters | epoch=30, steps per epoch=98, batch_size = 64 | epoch=30, steps per epoch=98, batch_size = 64 | +| Optimizer | SGD | SGD | +| Loss Function | CTCLoss | CTCLoss | +| outputs | probability | probability | +| Loss | 0.0000157 | 0.0000246 | +| Speed | 980ms/step(8pcs) | 150ms/step(8pcs)| +| Total time | 30 mins | 5 mins| +| Parameters (M) | 2.75 | 2.75 | +| Checkpoint for Fine tuning | 20.3M (.ckpt file) | 20.3M (.ckpt file) | +| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/warpctc) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/warpctc) | -#### Launch -``` -# evaluation example in Ascend -bash run_eval.sh ../data/test warpctc-30-97.ckpt Ascend +### [Evaluation Performance](#contents) -# evaluation example in GPU -bash run_eval.sh ../data/test warpctc-30-97.ckpt GPU -``` +| Parameters | WarpCTC | +| ------------------- | --------------------------- | +| Model Version | V1.0 | +| Resource | Ascend 910 | +| Uploaded Date | 08/01/2020 (month/day/year) | +| MindSpore Version | 0.6.0-alpha | +| Dataset | Captcha | +| batch_size | 64 | +| outputs | ACC | +| Accuracy | 99.0% | +| Model for inference | 20.3M (.ckpt file) | -> checkpoint can be produced in training process. +# [Description of Random Situation](#contents) +In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py for weight initialization. -#### Result - -Evaluation result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log. - -``` -result: {'WarpCTCAccuracy': 0.9901472929936306} -``` +# [ModelZoo Homepage](#contents) +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). \ No newline at end of file