Merge pull request !20574 from 张晓晓/code_docs_ssd
This commit is contained in:
i-robot 2021-07-21 02:32:07 +00:00 committed by Gitee
commit d976a5cce3
17 changed files with 587 additions and 65 deletions

View File

@ -257,10 +257,15 @@ Then you can run everything just like on ascend.
│ ├── device_adapter.py ## device adapter
│ ├── local_adapter.py ## local adapter
│ ├── moxing_adapter.py ## moxing adapter
├─ ssd_mobilenet_v1_fpn_config.yaml ## parameter configuration
├─ ssd_resnet50_fpn_config.yaml ## parameter configuration
├─ ssd_vgg16_config.yaml ## parameter configuration
├─ ssd300_config.yaml ## parameter configuration
├─ config
├─ ssd_mobilenet_v1_fpn_config.yaml ## parameter configuration
├─ ssd_resnet50_fpn_config.yaml ## parameter configuration
├─ ssd_vgg16_config.yaml ## parameter configuration
├─ ssd300_config.yaml ## parameter configuration
├─ ssd_mobilenet_v1_fpn_config_gpu.yaml ## GPU parameter configuration
├─ ssd_resnet50_fpn_config_gpu.yaml ## GPU parameter configuration
├─ ssd_vgg16_config_gpu.yaml ## GPU parameter configuration
├─ ssd300_config_gpu.yaml ## GPU parameter configuration
├─ Dockerfile ## docker file
├─ eval.py ## eval scripts
├─ export.py ## export mindir script
@ -357,7 +362,7 @@ epoch time: 39064.8467540741, per step time: 85.29442522723602
bash run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [CONFIG_PATH] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)
```
We need four or six parameters for this scripts.
We need five or seven parameters for this scripts.
- `DEVICE_NUM`: the device number for distributed train.
- `EPOCH_NUM`: epoch num for distributed train.
@ -401,7 +406,7 @@ You can train your own model based on either pretrained classification model or
bash run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
```
We need two parameters for this scripts.
We need four parameters for this scripts.
- `DATASET`the dataset mode of evaluation dataset.
- `CHECKPOINT_PATH`: the absolute path for checkpoint file.
@ -437,7 +442,7 @@ mAP: 0.23808886505483504
bash run_eval_gpu.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
```
We need two parameters for this scripts.
We need four parameters for this scripts.
- `DATASET`the dataset mode of evaluation dataset.
- `CHECKPOINT_PATH`: the absolute path for checkpoint file.
@ -506,8 +511,8 @@ Export on ModelArts (If you want to run in modelarts, please check the official
### Infer on Ascend310
Before performing inference, the mindir file must bu exported by `export.py` script. We only provide an example of inference using MINDIR model.
Current batch_Size can only be set to 1. The precision calculation process needs about 70G+ memory space, otherwise the process will be killed for execeeding memory limits.
Before performing inference, the mindir file must be exported by `export.py` script. We only provide an example of inference using MINDIR model.
Current batch size can only be set to 1. The precision calculation process needs about 70G+ memory space, otherwise the process will be killed for execeeding memory limits.
```shell
# Ascend310 inference
@ -542,34 +547,34 @@ mAP: 0.33880018942412393
#### Evaluation Performance
| Parameters | Ascend | GPU | Ascend |
| ------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| Model Version | SSD V1 | SSD V1 | SSD-Mobilenet-V1-Fpn |
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 | NV SMX2 V100-16G | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
| uploaded Date | 07/05/2021 (month/day/year) | 09/24/2020 (month/day/year) | 01/13/2021 (month/day/year) |
| MindSpore Version | 1.3.0 | 1.0.0 | 1.1.0 |
| Dataset | COCO2017 | COCO2017 | COCO2017 |
| Training Parameters | epoch = 500, batch_size = 32 | epoch = 800, batch_size = 32 | epoch = 60, batch_size = 32 |
| Optimizer | Momentum | Momentum | Momentum |
| Loss Function | Sigmoid Cross Entropy,SmoothL1Loss | Sigmoid Cross Entropy,SmoothL1Loss | Sigmoid Cross Entropy,SmoothL1Loss |
| Speed | 8pcs: 90ms/step | 8pcs: 121ms/step | 8pcs: 547ms/step |
| Total time | 8pcs: 4.81hours | 8pcs: 12.31hours | 8pcs: 4.22hours |
| Parameters (M) | 34 | 34 | 48M |
| Scripts | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> |
| Parameters | Ascend | GPU | Ascend | GPU |
| ------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |----------------------------------------------------------------------------- |
| Model Version | SSD V1 | SSD V1 | SSD-Mobilenet-V1-Fpn |SSD-Mobilenet-V1-Fpn |
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 | NV SMX2 V100-16G | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |NV SMX2 V100-32G |
| uploaded Date | 07/05/2021 (month/day/year) | 09/24/2020 (month/day/year) | 01/13/2021 (month/day/year) |07/20/2021 (month/day/year) |
| MindSpore Version | 1.3.0 | 1.0.0 | 1.1.0 |1.3.0 |
| Dataset | COCO2017 | COCO2017 | COCO2017 |COCO2017 |
| Training Parameters | epoch = 500, batch_size = 32 | epoch = 800, batch_size = 32 | epoch = 60, batch_size = 32 | epoch = 60, batch_size = 16 |
| Optimizer | Momentum | Momentum | Momentum |Momentum |
| Loss Function | Sigmoid Cross Entropy,SmoothL1Loss | Sigmoid Cross Entropy,SmoothL1Loss | Sigmoid Cross Entropy,SmoothL1Loss |Sigmoid Cross Entropy,SmoothL1Loss |
| Speed | 8pcs: 90ms/step | 8pcs: 121ms/step | 8pcs: 547ms/step |1pcs: 547ms/step |
| Total time | 8pcs: 4.81hours | 8pcs: 12.31hours | 8pcs: 4.22hours |1pcs: 4.22hours |
| Parameters (M) | 34 | 34 | 48M |97M |
| Scripts | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> | <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> |<https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ssd> |
#### Inference Performance
| Parameters | Ascend | GPU | Ascend |
| ------------------- | --------------------------- | --------------------------- | --------------------------- |
| Model Version | SSD V1 | SSD V1 | SSD-Mobilenet-V1-Fpn |
| Resource | Ascend 910; OS Euler2.8 | GPU |Ascend 910; OS Euler2.8 |
| Uploaded Date | 07/05/2020 (month/day/year) | 09/24/2020 (month/day/year) | 09/24/2020 (month/day/year) |
| MindSpore Version | 1.3.0 | 1.0.0 | 1.1.0 |
| Dataset | COCO2017 | COCO2017 | COCO2017 |
| batch_size | 1 | 1 | 1 |
| outputs | mAP | mAP | mAP |
| Accuracy | IoU=0.50: 23.8% | IoU=0.50: 22.4% | Iout=0.50: 30% |
| Model for inference | 34M(.ckpt file) | 34M(.ckpt file) | 48M(.ckpt file) |
| Parameters | Ascend | GPU | Ascend | GPU |
| ------------------- | --------------------------- | --------------------------- | --------------------------- |--------------------------- |
| Model Version | SSD V1 | SSD V1 | SSD-Mobilenet-V1-Fpn | SSD-Mobilenet-V1-Fpn |
| Resource | Ascend 910; OS Euler2.8 | GPU |Ascend 910; OS Euler2.8 | NV SMX2 V100-32G |
| Uploaded Date | 07/05/2020 (month/day/year) | 09/24/2020 (month/day/year) | 09/24/2020 (month/day/year) | 07/20/2021 (month/day/year) |
| MindSpore Version | 1.3.0 | 1.0.0 | 1.1.0 | 1.3.0 |
| Dataset | COCO2017 | COCO2017 | COCO2017 | COCO2017 |
| batch_size | 1 | 1 | 1 | 1 |
| outputs | mAP | mAP | mAP | mAP |
| Accuracy | IoU=0.50: 23.8% | IoU=0.50: 22.4% | Iout=0.50: 30% | Iout=0.50: 30% |
| Model for inference | 34M(.ckpt file) | 34M(.ckpt file) | 48M(.ckpt file) | 97M(.ckpt file) |
## [Description of Random Situation](#contents)

View File

@ -124,7 +124,7 @@ sh run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
```shell script
# GPU分布式训练
sh run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [CONFIG_PATH]
bash run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [CONFIG_PATH]
```
```shell script
@ -207,10 +207,15 @@ sh run_eval_gpu.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
│ ├──device_adapter.py ## 设备配置
│ ├──local_adapter.py ## 本地设备配置
│ ├──moxing_adapter.py ## modelarts设备配置
├─ ssd_mobilenet_v1_fpn_config.yaml ## 参数配置
├─ ssd_resnet50_fpn_config.yaml ## 参数配置
├─ ssd_vgg16_config.yaml ## 参数配置
├─ ssd300_config.yaml ## 参数配置
├─ config
├─ ssd_mobilenet_v1_fpn_config.yaml ## 参数配置
├─ ssd_resnet50_fpn_config.yaml ## 参数配置
├─ ssd_vgg16_config.yaml ## 参数配置
├─ ssd300_config.yaml ## 参数配置
├─ ssd_mobilenet_v1_fpn_config_gpu.yaml ## GPU参数配置
├─ ssd_resnet50_fpn_config_gpu.yaml ## GPU参数配置
├─ ssd_vgg16_config_gpu.yaml ## GPU参数配置
├─ ssd300_config_gpu.yaml ## GPU参数配置
├─ Dockerfile ## docker文件
├─ eval.py ## 评估脚本
├─ export.py ## 导出 AIR,MINDIR模型的脚本
@ -291,10 +296,10 @@ epoch time: 39064.8467540741, per step time: 85.29442522723602
- 分布式
```shell script
sh run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [CONFIG_PATH] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)
bash run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [CONFIG_PATH] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)
```
此脚本需要四或六个参数。
此脚本需要五或七个参数。
- `DEVICE_NUM`:分布式训练的设备数。
- `EPOCH_NUM`:分布式训练的轮次数。
@ -325,7 +330,7 @@ epoch time: 150753.701, per step time: 329.157
sh run_eval.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
```
此脚本需要个参数。
此脚本需要个参数。
- `DATASET`:评估数据集的模式。
- `CHECKPOINT_PATH`:检查点文件的绝对路径。
@ -361,7 +366,7 @@ mAP: 0.23808886505483504
sh run_eval_gpu.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] [CONFIG_PATH]
```
此脚本需要个参数。
此脚本需要个参数。
- `DATASET`:评估数据集的模式。
- `CHECKPOINT_PATH`:检查点文件的绝对路径。

View File

@ -23,6 +23,7 @@ match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [29, 58, 89]
# learing rate settings
lr_init: 0.001

View File

@ -0,0 +1,127 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "GPU"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: "ssd-500_458.ckpt"
# ==============================================================================
# Training options
model_name: "ssd300"
img_shape: [300, 300]
num_ssd_boxes: 1917
match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [29, 58, 89]
use_float16: True
# learing rate settings
lr_init: 0.001
lr_end_rate: 0.001
warmup_epochs: 2
momentum: 0.9
weight_decay: 0.00015
ssd_vgg_bn: False
pretrain_vgg_bn: False
# network
num_default: [3, 6, 6, 6, 6, 6]
extras_in_channels: [256, 576, 1280, 512, 256, 256]
extras_out_channels: [576, 1280, 512, 256, 256, 128]
extras_strides: [1, 1, 2, 2, 2, 2]
extras_ratio: [0.2, 0.2, 0.2, 0.25, 0.5, 0.25]
feature_size: [19, 10, 5, 3, 2, 1]
min_scale: 0.2
max_scale: 0.95
aspect_ratios: [[], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
steps: [16, 32, 64, 100, 150, 300]
prior_scaling: [0.1, 0.2]
gamma: 2.0
alpha: 0.75
dataset: "coco"
lr: 0.05
mode_sink: "sink"
device_id: 0
device_num: 1
epoch_size: 500
batch_size: 32
loss_scale: 1024
pre_trained: ""
pre_trained_epoch_size: 0
save_checkpoint_epochs: 10
only_create_dataset: False
eval_start_epoch: 40
eval_interval: 1
run_eval: False
filter_weight: False
freeze_layer: None
save_best_ckpt: True
result_path: ""
img_path: ""
drop: False
# `mindrecord_dir` and `coco_root` are better to use absolute path.
feature_extractor_base_param: ""
checkpoint_filter_list: ['multi_loc_layers', 'multi_cls_layers']
mindrecord_dir: "MindRecord_COCO"
coco_root: "coco_ori"
train_data_type: "train2017"
val_data_type: "val2017"
instances_set: "annotations/instances_{}.json"
classes: ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
num_classes: 81
# The annotation.json position of voc validation dataset.
voc_json: "annotations/voc_instances_val.json"
# voc original dataset.
voc_root: "/data/voc_dataset"
# if coco or voc used, `image_dir` and `anno_path` are useless.
image_dir: ""
anno_path: ""
file_name: "ssd"
file_format: "AIR"
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
train_url: "Training output url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint"
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."

View File

@ -23,6 +23,7 @@ match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [29, 58, 89]
# learning rate settings
global_step: 0

View File

@ -0,0 +1,131 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "GPU"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: "mobilenet_v1.ckpt"
# ==============================================================================
# Training options
model_name: "ssd_mobilenet_v1_fpn"
img_shape: [640, 640]
num_ssd_boxes: -1
match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [29, 58, 89]
use_float16: False
# learning rate settings
global_step: 0
lr_init: 0.01333
lr_end_rate: 0.0
warmup_epochs: 2
weight_decay: 0.00004
momentum: 0.9
ssd_vgg_bn: False
pretrain_vgg_bn: False
# network
num_default: [6, 6, 6, 6, 6]
extras_in_channels: [256, 512, 1024, 256, 256]
extras_out_channels: [256, 256, 256, 256, 256]
extras_strides: [1, 1, 2, 2, 2, 2]
extras_ratio: [0.2, 0.2, 0.2, 0.25, 0.5, 0.25]
feature_size: [80, 40, 20, 10, 5]
min_scale: 0.2
max_scale: 0.95
aspect_ratios: [[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
steps: [8, 16, 32, 64, 128]
prior_scaling: [0.1, 0.2]
gamma: 2.0
alpha: 0.25
num_addition_layers: 4
use_anchor_generator: True
use_global_norm: True
dataset: "coco"
lr: 0.05
mode_sink: "sink"
device_id: 0
device_num: 1
epoch_size: 500
batch_size: 16
loss_scale: 1024
pre_trained: ""
pre_trained_epoch_size: 0
save_checkpoint_epochs: 10
only_create_dataset: False
eval_start_epoch: 40
eval_interval: 1
run_eval: False
filter_weight: False
freeze_layer: None
save_best_ckpt: True
result_path: ""
img_path: ""
drop: False
# `mindrecord_dir` and `coco_root` are better to use absolute path.
feature_extractor_base_param: "/ckpt/mobilenet_v1.ckpt"
checkpoint_filter_list: ['network.multi_box.cls_layers.0.weight', 'network.multi_box.cls_layers.0.bias',
'network.multi_box.loc_layers.0.weight', 'network.multi_box.loc_layers.0.bias']
mindrecord_dir: "MindRecord_COCO"
coco_root: "coco_ori"
train_data_type: "train2017"
val_data_type: "val2017"
instances_set: "annotations/instances_{}.json"
classes: ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
num_classes: 81
# The annotation.json position of voc validation dataset.
voc_json: "annotations/voc_instances_val.json"
# voc original dataset.
voc_root: "/data/voc_dataset"
# if coco or voc used, `image_dir` and `anno_path` are useless.
image_dir: ""
anno_path: ""
file_name: "ssd"
file_format: "AIR"
---
# Help description for each configuration
enable_modelarts: 'Whether training on modelarts, default: False'
data_url: 'Dataset url for obs'
train_url: 'Training output url for obs'
checkpoint_url: 'The location of checkpoint for obs'
data_path: 'Dataset path for local'
output_path: 'Training output path for local'
load_path: 'The location of checkpoint for obs'
device_target: 'Target device type, available: [Ascend, GPU, CPU]'
enable_profiling: 'Whether enable profiling while training, default: False'
num_classes: 'Class for dataset'
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint"
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."

View File

@ -23,6 +23,7 @@ match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [90, 183, 279]
# learning rate settings
global_step: 0

View File

@ -0,0 +1,131 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "GPU"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: "resnet50.ckpt"
# ==============================================================================
# Training options
model_name: "ssd_resnet50_fpn"
img_shape: [640, 640]
num_ssd_boxes: -1
match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [90, 183, 279]
use_float16: False
# learning rate settings
global_step: 0
lr_init: 0.01333
lr_end_rate: 0.0
warmup_epochs: 2
weight_decay: 0.0004
momentum: 0.9
ssd_vgg_bn: False
pretrain_vgg_bn: False
# network
num_default: [6, 6, 6, 6, 6]
extras_in_channels: [256, 512, 1024, 256, 256]
extras_out_channels: [256, 256, 256, 256, 256]
extras_strides: [1, 1, 2, 2, 2, 2]
extras_ratio: [0.2, 0.2, 0.2, 0.25, 0.5, 0.25]
feature_size: [80, 40, 20, 10, 5]
min_scale: 0.2
max_scale: 0.95
aspect_ratios: [[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
steps: [8, 16, 32, 64, 128]
prior_scaling: [0.1, 0.2]
gamma: 2.0
alpha: 0.25
num_addition_layers: 4
use_anchor_generator: True
use_global_norm: True
dataset: "coco"
lr: 0.05
mode_sink: "sink"
device_id: 0
device_num: 1
epoch_size: 500
batch_size: 16
loss_scale: 1024
pre_trained: ""
pre_trained_epoch_size: 0
save_checkpoint_epochs: 10
only_create_dataset: False
eval_start_epoch: 40
eval_interval: 1
run_eval: False
filter_weight: False
freeze_layer: None
save_best_ckpt: True
result_path: ""
img_path: ""
drop: False
# `mindrecord_dir` and `coco_root` are better to use absolute path.
feature_extractor_base_param: "/ckpt/resnet50.ckpt"
checkpoint_filter_list: ['network.multi_box.cls_layers.0.weight', 'network.multi_box.cls_layers.0.bias',
'network.multi_box.loc_layers.0.weight', 'network.multi_box.loc_layers.0.bias']
mindrecord_dir: "MindRecord_COCO"
coco_root: "coco_ori"
train_data_type: "train2017"
val_data_type: "val2017"
instances_set: "annotations/instances_{}.json"
classes: ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
num_classes: 81
# The annotation.json position of voc validation dataset.
voc_json: "annotations/voc_instances_val.json"
# voc original dataset.
voc_root: "/data/voc_dataset"
# if coco or voc used, `image_dir` and `anno_path` are useless.
image_dir: ""
anno_path: ""
file_name: "ssd"
file_format: "AIR"
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
train_url: "Training output url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint"
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."

View File

@ -23,6 +23,7 @@ match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: [20, 41, 62]
# learing rate settings
lr_init: 0.001

View File

@ -0,0 +1,126 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "GPU"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: "ssd-500_458.ckpt"
# ==============================================================================
# Training options
model_name: "ssd_vgg16"
img_shape: [300, 300]
num_ssd_boxes: 7308
match_threshold: 0.5
nms_threshold: 0.6
min_score: 0.1
max_boxes: 100
all_reduce_fusion_config: []
use_float16: False
# learing rate settings
lr_init: 0.001
lr_end_rate: 0.001
warmup_epochs: 2
momentum: 0.9
weight_decay: 0.00015
ssd_vgg_bn: False
pretrain_vgg_bn: False
# network
num_default: [3, 6, 6, 6, 6, 6]
extras_in_channels: [256, 512, 1024, 512, 256, 256]
extras_out_channels: [512, 1024, 512, 256, 256, 256]
extras_strides: [1, 1, 2, 2, 2, 2]
extras_ratio: [0.2, 0.2, 0.2, 0.25, 0.5, 0.25]
feature_size: [38, 19, 10, 5, 3, 1]
min_scale: 0.2
max_scale: 0.95
aspect_ratios: [[], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
steps: [8, 16, 32, 64, 100, 300]
prior_scaling: [0.1, 0.2]
gamma: 2.0
alpha: 0.75
dataset: "coco"
lr: 0.05
mode_sink: "sink"
device_id: 0
device_num: 1
epoch_size: 500
batch_size: 32
loss_scale: 1024
pre_trained: ""
pre_trained_epoch_size: 0
save_checkpoint_epochs: 10
only_create_dataset: False
eval_start_epoch: 40
eval_interval: 1
run_eval: False
filter_weight: False
freeze_layer: None
save_best_ckpt: True
result_path: ""
img_path: ""
drop: False
# `mindrecord_dir` and `coco_root` are better to use absolute path.
feature_extractor_base_param: ""
checkpoint_filter_list: ['multi_loc_layers', 'multi_cls_layers']
mindrecord_dir: "MindRecord_COCO"
coco_root: "coco_ori"
train_data_type: "train2017"
val_data_type: "val2017"
instances_set: "annotations/instances_{}.json"
classes: ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
num_classes: 81
# The annotation.json position of voc validation dataset.
voc_json: "annotations/voc_instances_val.json"
# voc original dataset.
voc_root: "/data/voc_dataset"
# if coco or voc used, `image_dir` and `anno_path` are useless.
image_dir: ""
anno_path: ""
file_name: "ssd"
file_format: "AIR"
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
train_url: "Training output url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
keep_checkpoint_max: "keep the last keep_checkpoint_max checkpoint"
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."

View File

@ -50,7 +50,7 @@ do
rm -rf LOG$i
mkdir ./LOG$i
cp ./*.py ./LOG$i
cp ./*.yaml ./LOG$i
cp ./config/*.yaml ./LOG$i
cp -r ./src ./LOG$i
cd ./LOG$i || exit
export RANK_ID=$i

View File

@ -1,5 +1,5 @@
#!/bin/bash
# Copyright 2020 Huawei Technologies Co., Ltd
# Copyright 2020-2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -16,14 +16,14 @@
echo "=============================================================================================================="
echo "Please run the script as: "
echo "sh run_distribute_train_gpu.sh DEVICE_NUM EPOCH_SIZE LR DATASET CONFIG_PATH PRE_TRAINED PRE_TRAINED_EPOCH_SIZE"
echo "for example: sh run_distribute_train_gpu.sh 8 500 0.2 coco /config_path /opt/ssd-300.ckpt(optional) 200(optional)"
echo "bash run_distribute_train_gpu.sh DEVICE_NUM EPOCH_SIZE LR DATASET CONFIG_PATH PRE_TRAINED PRE_TRAINED_EPOCH_SIZE"
echo "for example: bash run_distribute_train_gpu.sh 8 500 0.2 coco /config_path /opt/ssd-300.ckpt(optional) 200(optional)"
echo "It is better to use absolute path."
echo "================================================================================================================="
if [ $# != 5 ] && [ $# != 7 ]
then
echo "Usage: sh run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] \
echo "Usage: bash run_distribute_train_gpu.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] \
[CONFIG_PATH] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional)"
exit 1
fi
@ -46,7 +46,7 @@ PRE_TRAINED_EPOCH_SIZE=$7
rm -rf LOG
mkdir ./LOG
cp ./*.py ./LOG
cp ./*.yaml ./LOG
cp ./config/*.yaml ./LOG
cp -r ./src ./LOG
cd ./LOG || exit
@ -79,5 +79,5 @@ then
--device_target="GPU" \
--epoch_size=$EPOCH_SIZE \
--config_path=$CONFIG_PATH \
--output_path './output' > log.txt 2>&1 &
--output_path './output' > log.txt 2>&1 &
fi

View File

@ -56,7 +56,7 @@ fi
mkdir ./eval$3
cp ./*.py ./eval$3
cp ./*.yaml ./eval$3
cp ./config/*.yaml ./eval$3
cp -r ./src ./eval$3
cd ./eval$3 || exit
env > env.log

View File

@ -56,7 +56,7 @@ fi
mkdir ./eval$3
cp ./*.py ./eval$3
cp ./*.yaml ./eval$3
cp ./config/*.yaml ./eval$3
cp -r ./src ./eval$3
cd ./eval$3 || exit
env > env.log

View File

@ -118,7 +118,7 @@ def get_config():
parser = argparse.ArgumentParser(description="default name", add_help=False)
current_dir = os.path.dirname(os.path.abspath(__file__))
parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, \
"../../ssd300_config.yaml"), help="Config file path")
"../../config/ssd300_config.yaml"), help="Config file path")
path_args, _ = parser.parse_known_args()
default, helper, choices = parse_yaml(path_args.config_path)
pprint(default)

View File

@ -1,4 +1,4 @@
# Copyright 2020 Huawei Technologies Co., Ltd
# Copyright 2020-2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -84,14 +84,6 @@ def set_graph_kernel_context(device_target, model):
context.set_context(enable_graph_kernel=True,
graph_kernel_flags="--enable_parallel_fusion --enable_expand_ops=Conv2D")
def set_parameter(model_name):
if model_name == "ssd_resnet50_fpn":
context.set_auto_parallel_context(all_reduce_fusion_config=[90, 183, 279])
if model_name == "ssd_vgg16":
context.set_auto_parallel_context(all_reduce_fusion_config=[20, 41, 62])
else:
context.set_auto_parallel_context(all_reduce_fusion_config=[29, 58, 89])
@moxing_wrapper()
def train_net():
if hasattr(config, 'num_ssd_boxes') and config.num_ssd_boxes == -1:
@ -114,7 +106,8 @@ def train_net():
context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True,
device_num=device_num)
init()
set_parameter(model_name=config.model_name)
if config.all_reduce_fusion_config:
context.set_auto_parallel_context(all_reduce_fusion_config=config.all_reduce_fusion_config)
rank = get_rank()
mindrecord_file = create_mindrecord(config.dataset, "ssd.mindrecord", True)
@ -134,7 +127,7 @@ def train_net():
dataset_size = dataset.get_dataset_size()
print(f"Create dataset done! dataset size is {dataset_size}")
ssd = ssd_model_build()
if (hasattr(config, 'use_float16') and config.use_float16) or config.device_target == "GPU":
if (hasattr(config, 'use_float16') and config.use_float16):
ssd.to_float(dtype.float16)
net = SSDWithLossCell(ssd, config)

View File

@ -334,7 +334,7 @@ mAP: 0.24270569394180577
| -------------------------- | -------------------------------------------------------------| -------------------------------------------------|
| Model Version | SSD ghostnet |SSD ghostnet |
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |NV SMX2 V100-32G |
| MindSpore Version | 1.3.0 |07/19/2021 (month/day/year) |
| MindSpore Version | 1.3.0 |1.3.0 |
| Dataset | COCO2017 |COCO2017 |
| Training Parameters | epoch = 500, batch_size = 32 | epoch = 500, batch_size = 32 |
| Optimizer | Momentum |Momentum |