forked from OSSInnovation/mindspore
4.5 KiB
4.5 KiB
YOLOV3-DarkNet53 Example
Description
This is an example of training YOLOV3-DarkNet53 with COCO2014 dataset in MindSpore.
Requirements
-
Install MindSpore.
-
Download the dataset COCO2014.
Unzip the COCO2014 dataset to any path you want, the folder should include train and eval dataset as follows:
.
└─dataset
├─train2014
├─val2014
└─annotations
Structure
.
└─yolov3_darknet53
├─README.md
├─scripts
├─run_standalone_train.sh # launch standalone training(1p)
├─run_distribute_train.sh # launch distributed training(8p)
└─run_eval.sh # launch evaluating
├─src
├─config.py # parameter configuration
├─darknet.py # backbone of network
├─distributed_sampler.py # iterator of dataset
├─initializer.py # initializer of parameters
├─logger.py # log function
├─loss.py # loss function
├─lr_scheduler.py # generate learning rate
├─transforms.py # Preprocess data
├─util.py # util function
├─yolo.py # yolov3 network
├─yolo_dataset.py # create dataset for YOLOV3
├─eval.py # eval net
└─train.py # train net
Running the example
Train
Usage
# distributed training
sh run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [MINDSPORE_HCCL_CONFIG_PATH]
# standalone training
sh run_standalone_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE]
Launch
# distributed training example(8p)
sh run_distribute_train.sh dataset/coco2014 backbone/backbone.ckpt rank_table_8p.json
# standalone training example(1p)
sh run_standalone_train.sh dataset/coco2014 backbone/backbone.ckpt
About rank_table.json, you can refer to the distributed training tutorial.
Result
Training result will be stored in the scripts path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in log.txt.
# distribute training result(8p)
epoch[0], iter[0], loss:14623.384766, 1.23 imgs/sec, lr:7.812499825377017e-05
epoch[0], iter[100], loss:1486.253051, 15.01 imgs/sec, lr:0.007890624925494194
epoch[0], iter[200], loss:288.579535, 490.41 imgs/sec, lr:0.015703124925494194
epoch[0], iter[300], loss:153.136754, 531.99 imgs/sec, lr:0.023515624925494194
epoch[1], iter[400], loss:106.429322, 405.14 imgs/sec, lr:0.03132812678813934
...
epoch[318], iter[102000], loss:34.135306, 431.06 imgs/sec, lr:9.63797629083274e-06
epoch[319], iter[102100], loss:35.652469, 449.52 imgs/sec, lr:2.409552052995423e-06
epoch[319], iter[102200], loss:34.652273, 384.02 imgs/sec, lr:2.409552052995423e-06
epoch[319], iter[102300], loss:35.430038, 423.49 imgs/sec, lr:2.409552052995423e-06
...
Infer
Usage
# infer
sh run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH]
Launch
# infer with checkpoint
sh run_eval.sh dataset/coco2014/ checkpoint/0-319_102400.ckpt
checkpoint can be produced in training process.
Result
Inference result will be stored in the scripts path, whose folder name is "eval". Under this, you can find result like the followings in log.txt.
=============coco eval reulst=========
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.322
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.127
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.428
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.259
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.398
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.423
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.224
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.442
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.551