Network consolidation

This commit is contained in:
huchunmei 2021-09-02 15:20:15 +08:00
parent 032c8e28db
commit 4f4aefc3e5
45 changed files with 450 additions and 2571 deletions

View File

@ -56,7 +56,7 @@
ResNet (residual neural network) was proposed by Kaiming He and other four Chinese of Microsoft Research Institute. Through the use of ResNet unit, it successfully trained 152 layers of neural network, and won the championship in ilsvrc2015. The error rate on top 5 was 3.57%, and the parameter quantity was lower than vggnet, so the effect was very outstanding. Traditional convolution network or full connection network will have more or less information loss. At the same time, it will lead to the disappearance or explosion of gradient, which leads to the failure of deep network training. ResNet solves this problem to a certain extent. By passing the input information to the output, the integrity of the information is protected. The whole network only needs to learn the part of the difference between input and output, which simplifies the learning objectives and difficulties.The structure of ResNet can accelerate the training of neural network very quickly, and the accuracy of the model is also greatly improved. At the same time, ResNet is very popular, even can be directly used in the concept net network.
These are examples of training ResNet18/ResNet50/ResNet101/SE-ResNet50 with CIFAR-10/ImageNet2012 dataset in MindSpore.ResNet50 and ResNet101 can reference [paper 1](https://arxiv.org/pdf/1512.03385.pdf) below, and SE-ResNet50 is a variant of ResNet50 which reference [paper 2](https://arxiv.org/abs/1709.01507) and [paper 3](https://arxiv.org/abs/1812.01187) below, Training SE-ResNet50 for just 24 epochs using 8 Ascend 910, we can reach top-1 accuracy of 75.9%.(Training ResNet101 with dataset CIFAR-10 and SE-ResNet50 with CIFAR-10 is not supported yet.)
These are examples of training ResNet18/ResNet50/ResNet101/ResNet152/SE-ResNet50 with CIFAR-10/ImageNet2012 dataset in MindSpore.ResNet50 and ResNet101 can reference [paper 1](https://arxiv.org/pdf/1512.03385.pdf) below, and SE-ResNet50 is a variant of ResNet50 which reference [paper 2](https://arxiv.org/abs/1709.01507) and [paper 3](https://arxiv.org/abs/1812.01187) below, Training SE-ResNet50 for just 24 epochs using 8 Ascend 910, we can reach top-1 accuracy of 75.9%.(Training ResNet101 with dataset CIFAR-10 and SE-ResNet50 with CIFAR-10 is not supported yet.)
## Paper
@ -214,6 +214,7 @@ If you want to run in modelarts, please check the official documentation of [mod
├── resnet50_imagenet2012_config.yaml
├── resnet50_imagenet2012_GPU_Thor_config.yaml
├── resnet101_imagenet2012_config.yaml
├── resnet152_imagenet2012_config.yaml
└── se-resnet50_imagenet2012_config.yaml
├── scripts
├── run_distribute_train.sh # launch ascend distributed training(8 pcs)
@ -334,6 +335,27 @@ Parameters for both training and evaluation can be set in config file.
"lr": 0.1 # base learning rate
```
- Config for ResNet152, ImageNet2012 dataset
```bash
"class_num": 1001, # dataset class number
"batch_size": 32, # batch size of input tensor
"loss_scale": 1024, # loss scale
"momentum": 0.9, # momentum optimizer
"weight_decay": 1e-4, # weight decay
"epoch_size": 140, # epoch size for training
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_path":"./", # the save path of the checkpoint relative to the execution path
"save_checkpoint_epochs": 5, # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
"keep_checkpoint_max": 10, # only keep the last keep_checkpoint_max checkpoint
"warmup_epochs": 0, # number of warmup epoch
"lr_decay_mode": "steps" # decay mode for generating learning rate
"use_label_smooth": True, # label_smooth
"label_smooth_factor": 0.1, # label_smooth_factor
"lr": 0.1, # base learning rate
"lr_end": 0.0001, # end learning rate
```
- Config for SE-ResNet50, ImageNet2012 dataset
```bash
@ -516,6 +538,18 @@ epoch: 5 step: 5004, loss is 3.1718972
...
```
- Training ResNet152 with ImageNet2012 dataset
```bash
# 分布式训练结果8P
epoch: 1 step: 5004, loss is 4.184874
epoch: 2 step: 5004, loss is 4.013571
epoch: 3 step: 5004, loss is 3.695777
epoch: 4 step: 5004, loss is 3.3244863
epoch: 5 step: 5004, loss is 3.4899402
...
```
- Training SE-ResNet50 with ImageNet2012 dataset
```bash
@ -604,6 +638,12 @@ result: {'top_1_accuracy': 0.736758814102564} ckpt=train_parallel0/resnet-90_625
result: {'top_5_accuracy': 0.9429417413572343, 'top_1_accuracy': 0.7853513124199744} ckpt=train_parallel0/resnet-120_5004.ckpt
```
- Evaluating ResNet152 with ImageNet2012 dataset
```bash
result: {'top_5_accuracy': 0.9438420294494239, 'top_1_accuracy': 0.78817221518} ckpt= resnet152-140_5004.ckpt
```
- Evaluating SE-ResNet50 with ImageNet2012 dataset
```bash
@ -655,10 +695,10 @@ Current batch_Size can only be set to 1. The precision calculation process needs
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [CONFIG_PATH] [DEVICE_ID]
```
- `NET_TYPE` can choose from [resnet18, se-resnet50, resnet34, resnet50, resnet101].
- `NET_TYPE` can choose from [resnet18, se-resnet50, resnet34, resnet50, resnet101, resnet152].
- `DATASET` can choose from [cifar10, imagenet].
- `DEVICE_ID` is optional, default value is 0.
@ -693,7 +733,7 @@ Total data: 10000, top1 accuracy: 0.9310, top5 accuracy: 0.9980.
- Evaluating ResNet50 with ImageNet2012 dataset
```bash
Total data: 50000, top1 accuracy: 0.0.7696, top5 accuracy: 0.93432.
Total data: 50000, top1 accuracy: 0.7696, top5 accuracy: 0.93432.
```
- Evaluating ResNet101 with ImageNet2012 dataset
@ -702,6 +742,12 @@ Total data: 50000, top1 accuracy: 0.0.7696, top5 accuracy: 0.93432.
Total data: 50000, top1 accuracy: 0.7871, top5 accuracy: 0.94354.
```
- Evaluating ResNet152 with ImageNet2012 dataset
```bash
Total data: 50000, top1 accuracy: 0.78625, top5 accuracy: 0.94358.
```
- Evaluating SE-ResNet50 with ImageNet2012 dataset
```bash
@ -834,6 +880,26 @@ Total data: 50000, top1 accuracy: 0.76844, top5 accuracy: 0.93522.
| Checkpoint for Fine tuning | 343M (.ckpt file) |343M (.ckpt file) |
| Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
#### ResNet152 on ImageNet2012
| Parameters | Ascend 910 |
|---|---|
| Model Version | ResNet152 |
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
| uploaded Date | 02/10/2021 (month/day/year) |
| MindSpore Version | 1.0.1 |
| Dataset | ImageNet2012 |
| Training Parameters | epoch=140, steps per epoch=5004, batch_size = 32 |
| Optimizer | Momentum |
| Loss Function |Softmax Cross Entropy |
| outputs | probability |
| Loss | 1.7375104 |
| Speed|47.47ms/step8pcs |
| Total time | 577 mins |
| Parameters(M) | 60.19 |
| Checkpoint for Fine tuning | 462M.ckpt file |
| Scripts | [Link](https://gitee.com/panpanrui/mindspore/tree/master/model_zoo/official/cv/resnet152) |
#### SE-ResNet50 on ImageNet2012
| Parameters | Ascend 910
@ -940,6 +1006,20 @@ Total data: 50000, top1 accuracy: 0.76844, top5 accuracy: 0.93522.
| Accuracy | 78.53% | 78.64% |
| Model for inference | 171M (.air file) | |
#### ResNet152 on ImageNet2012
| Parameters | Ascend |
| ------------------- | --------------------------- |
| Model Version | ResNet152 |
| Resource | Ascend 910; OS Euler2.8 |
| Uploaded Date | 09/01/2021 (month/day/year) |
| MindSpore Version | 1.4.0 |
| Dataset | ImageNet2012 |
| batch_size | 32 |
| outputs | probability |
| Accuracy | 78.60% |
| Model for inference | 236M (.air file) |
#### SE-ResNet50 on ImageNet2012
| Parameters | Ascend |

View File

@ -52,7 +52,7 @@
残差神经网络ResNet由微软研究院何凯明等五位华人提出通过ResNet单元成功训练152层神经网络赢得了ILSVRC2015冠军。ResNet前五项的误差率为3.57%参数量低于VGGNet因此效果非常显著。传统的卷积网络或全连接网络或多或少存在信息丢失的问题还会造成梯度消失或爆炸导致深度网络训练失败ResNet则在一定程度上解决了这个问题。通过将输入信息传递给输出确保信息完整性。整个网络只需要学习输入和输出的差异部分简化了学习目标和难度。ResNet的结构大幅提高了神经网络训练的速度并且大大提高了模型的准确率。正因如此ResNet十分受欢迎甚至可以直接用于ConceptNet网络。
如下为MindSpore使用CIFAR-10/ImageNet2012数据集对ResNet18/ResNet50/ResNet101/SE-ResNet50进行训练的示例。ResNet50和ResNet101可参考[论文1](https://arxiv.org/pdf/1512.03385.pdf)SE-ResNet50是ResNet50的一个变体可参考[论文2](https://arxiv.org/abs/1709.01507)和[论文3](https://arxiv.org/abs/1812.01187)。使用8卡Ascend 910训练SE-ResNet50仅需24个周期TOP1准确率就达到了75.9%暂不支持用CIFAR-10数据集训练ResNet101以及用用CIFAR-10数据集训练SE-ResNet50
如下为MindSpore使用CIFAR-10/ImageNet2012数据集对ResNet18/ResNet50/ResNet101/ResNet152/SE-ResNet50进行训练的示例。ResNet50和ResNet101可参考[论文1](https://arxiv.org/pdf/1512.03385.pdf)SE-ResNet50是ResNet50的一个变体可参考[论文2](https://arxiv.org/abs/1709.01507)和[论文3](https://arxiv.org/abs/1812.01187)。使用8卡Ascend 910训练SE-ResNet50仅需24个周期TOP1准确率就达到了75.9%暂不支持用CIFAR-10数据集训练ResNet101以及用用CIFAR-10数据集训练SE-ResNet50
## 论文
@ -200,6 +200,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
├── resnet50_imagenet2012_config.yaml
├── resnet50_imagenet2012_GPU_Thor_config.yaml
├── resnet101_imagenet2012_config.yaml
├── resnet152_imagenet2012_config.yaml
├── se-resnet50_imagenet2012_config.yaml
├── scripts
├── run_distribute_train.sh # 启动Ascend分布式训练8卡
@ -314,6 +315,27 @@ bash run_eval_gpu.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
"lr":0.1 # 基础学习率
```
- 配置ResNet152和ImageNet2012数据集。
```text
"class_num":1001, # 数据集类数
"batch_size":32, # 输入张量的批次大小
"loss_scale":1024, # 损失等级
"momentum":0.9, # 动量优化器
"weight_decay":1e-4, # 权重衰减
"epoch_size":140, # 训练周期大小
"save_checkpoint":True, # 是否保存检查点
"save_checkpoint_epochs":5, # 两个检查点之间的周期间隔;默认情况下,最后一个检查点将在最后一个周期完成后保存
"keep_checkpoint_max":10, # 只保存最后一个keep_checkpoint_max检查点
"save_checkpoint_path":"./", # 检查点相对于执行路径的保存路径
"warmup_epochs":0, # 热身周期数
"lr_decay_mode":"steps", # 用于生成学习率的衰减模式
"use_label_smooth":True, # 标签平滑
"label_smooth_factor":0.1, # 标签平滑因子
"lr":0.1, # 基础学习率
"lr_end":0.0001, # 最终学习率
```
- 配置SE-ResNet50和ImageNet2012数据集。
```text
@ -490,6 +512,18 @@ epoch:70 step:5004, loss is 1.8717369
...
```
- 使用ImageNet2012数据集训练ResNet152
```text
# 分布式训练结果8P
epoch: 1 step: 5004, loss is 4.184874
epoch: 2 step: 5004, loss is 4.013571
epoch: 3 step: 5004, loss is 3.695777
epoch: 4 step: 5004, loss is 3.3244863
epoch: 5 step: 5004, loss is 3.4899402
...
```
- 使用ImageNet2012数据集训练SE-ResNet50
```text
@ -544,31 +578,37 @@ result: {'acc': 0.7053685897435897} ckpt=train_parallel0/resnet-90_5004.ckpt
- 使用CIFAR-10数据集评估ResNet50
```text
```bash
result:{'acc':0.91446314102564111} ckpt=~/resnet50_cifar10/train_parallel0/resnet-90_195.ckpt
```
- 使用ImageNet2012数据集评估ResNet50
```text
```bash
result:{'acc':0.7671054737516005} ckpt=train_parallel0/resnet-90_5004.ckpt
```
- 使用ImageNet2012数据集评估ResNet34
```text
```bash
result: {'top_1_accuracy': 0.736758814102564} ckpt=train_parallel0/resnet-90_625 .ckpt
```
- 使用ImageNet2012数据集评估ResNet101
```text
```bash
result:{'top_5_accuracy':0.9429417413572343, 'top_1_accuracy':0.7853513124199744} ckpt=train_parallel0/resnet-120_5004.ckpt
```
- 使用ImageNet2012数据集评估ResNet152
```bash
result: {'top_5_accuracy': 0.9438420294494239, 'top_1_accuracy': 0.78817221518} ckpt= resnet152-140_5004.ckpt
```
- 使用ImageNet2012数据集评估SE-ResNet50
```text
```bash
result:{'top_5_accuracy':0.9342589628681178, 'top_1_accuracy':0.768065781049936} ckpt=train_parallel0/resnet-24_5004.ckpt
```
@ -615,10 +655,10 @@ ModelArts导出mindir
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [CONFIG_PATH] [DEVICE_ID]
```
- `NET_TYPE` 选择范围:[resnet18, resnet34, se-resnet50, resnet50, resnet101]。
- `NET_TYPE` 选择范围:[resnet18, resnet34, se-resnet50, resnet50, resnet101, resnet152]。
- `DATASET` 选择范围:[cifar10, imagenet]。
- `DEVICE_ID` 可选默认值为0。
@ -640,31 +680,37 @@ Total data: 50000, top1 accuracy: 0.70668, top5 accuracy: 0.89698.
- 使用CIFAR-10数据集评估ResNet50
```text
```bash
Total data: 10000, top1 accuracy: 0.9310, top5 accuracy: 0.9980.
```
- 使用ImageNet2012数据集评估ResNet50
```text
```bash
Total data: 50000, top1 accuracy: 0.7696, top5 accuracy: 0.93432.
```
- 使用ImageNet2012数据集评估ResNet34
```text
```bash
Total data: 50000, top1 accuracy: 0.7367.
```
- 使用ImageNet2012数据集评估ResNet101
```text
```bash
Total data: 50000, top1 accuracy: 0.7871, top5 accuracy: 0.94354.
```
- 使用ImageNet2012数据集评估ResNet152
```bash
Total data: 50000, top1 accuracy: 0.78625, top5 accuracy: 0.94358.
```
- 使用ImageNet2012数据集评估SE-ResNet50
```text
```bash
Total data: 50000, top1 accuracy: 0.76844, top5 accuracy: 0.93522.
```
@ -794,6 +840,26 @@ Total data: 50000, top1 accuracy: 0.76844, top5 accuracy: 0.93522.
| 微调检查点| 343M.ckpt文件 | 343M.ckpt文件 |
|脚本 | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet) |
#### ImageNet2012上的ResNet152
| 参数 | Ascend 910 |
|---|---|
| 模型版本 | ResNet152 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8 |
| 上传日期 |2021-02-10 ; |
| MindSpore版本 | 1.0.1 |
| 数据集 | ImageNet2012 |
| 训练参数 | epoch=140, steps per epoch=5004, batch_size = 32 |
| 优化器 | Momentum |
| 损失函数 |Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 1.7375104 |
|速度|47.47毫秒/步8卡 |
|总时长 | 577分钟 |
|参数(M) | 60.19 |
| 微调检查点 | 462M.ckpt文件 |
| 脚本 | [链接](https://gitee.com/panpanrui/mindspore/tree/master/model_zoo/official/cv/resnet152) |
#### ImageNet2012上的SE-ResNet50
| 参数 | Ascend 910

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 32
loss_scale: 1024
momentum: 0.9
@ -42,7 +41,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -59,16 +57,22 @@ all_reduce_fusion_config:
- 2
- 60
- 220
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet101"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet101_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -0,0 +1,95 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
run_distribute: False
enable_profiling: False
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path/"
device_target: "Ascend"
checkpoint_path: "./checkpoint/"
checkpoint_file_path: ""
# ==============================================================================
# Training options
optimizer: "Momentum"
infer_label: ""
class_num: 1001
batch_size: 32
loss_scale: 1024
momentum: 0.9
weight_decay: 0.0001
epoch_size: 140
pretrain_epoch_size: 0
save_checkpoint: True
save_checkpoint_epochs: 5
keep_checkpoint_max: 10
save_checkpoint_path: "./"
warmup_epochs: 0
lr_decay_mode: "step"
use_label_smooth: True
label_smooth_factor: 0.1
lr_init: 0.0
lr_max: 0.1
lr_end: 0.0001
lars_epsilon: 0.0
lars_coefficient: 0.001
net_name: "resnet152"
dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_dataset_path: ""
parameter_server: False
filter_weight: False
save_best_ckpt: True
eval_start_epoch: 40
eval_interval: 1
enable_cache: False
cache_session_id: ""
mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
all_reduce_fusion_config:
- 180
- 313
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet152"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet152_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Dataset url for obs"
checkpoint_url: "The location of checkpoint for obs"
data_path: "Dataset path for local"
output_path: "Training output path for local"
load_path: "The location of checkpoint for obs"
device_target: "Target device type, available: [Ascend, GPU, CPU]"
enable_profiling: "Whether enable profiling while training, default: False"
num_classes: "Class for dataset"
batch_size: "Batch size for training and evaluation"
epoch_size: "Total training epochs."
checkpoint_path: "The location of the checkpoint file."
checkpoint_file_path: "The location of the checkpoint file."
result_path: "result files path."
label_path: "image file path."

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 10
train_image_size: 224
batch_size: 32
loss_scale: 1024
momentum: 0.9
@ -42,7 +41,6 @@ dataset: "cifar10"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -55,16 +53,22 @@ mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet18"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet18_cifar10"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 10
train_image_size: 224
batch_size: 32
loss_scale: 1024
momentum: 0.9
@ -42,7 +41,6 @@ dataset: "cifar10"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -55,16 +53,22 @@ mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet18"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet18_cifar10"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 256
loss_scale: 1024
momentum: 0.9
@ -44,7 +43,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -57,16 +55,22 @@ mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet18"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet18_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 256
loss_scale: 1024
momentum: 0.9
@ -44,7 +43,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -57,16 +55,22 @@ mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet18"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet18_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 256
loss_scale: 1024
momentum: 0.9
@ -44,7 +43,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -57,16 +55,22 @@ mode_name: "GRAPH"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet34"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet34_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 10
train_image_size: 224
batch_size: 32
loss_scale: 1024
momentum: 0.9
@ -42,7 +41,6 @@ dataset: "cifar10"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -58,16 +56,22 @@ dense_init: "TruncatedNormal"
all_reduce_fusion_config:
- 2
- 115
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_cifar10"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "LARS"
infer_label: ""
class_num: 1001
train_image_size: 192
batch_size: 256
loss_scale: 1024
momentum: 0.85
@ -44,7 +43,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -60,16 +58,22 @@ dense_init: "RandomNormal"
all_reduce_fusion_config:
- 85
- 160
train_image_size: 192
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Thor"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 32
loss_scale: 128
momentum: 0.9
@ -43,7 +42,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -59,16 +57,22 @@ dense_init: "HeUniform"
all_reduce_fusion_config:
- 85
- 160
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Thor"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 32
loss_scale: 128
momentum: 0.9
@ -43,7 +42,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -59,16 +57,22 @@ dense_init: "HeUniform"
all_reduce_fusion_config:
- 85
- 160
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 256
loss_scale: 1024
momentum: 0.9
@ -44,7 +43,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 224
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -60,16 +58,22 @@ dense_init: "TruncatedNormal"
all_reduce_fusion_config:
- 85
- 160
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -18,28 +18,32 @@ checkpoint_file_path: ''
# Training options
optimizer: "Momentum"
infer_label: ""
train_image_size: 224
batch_size: 256
epoch_size: 2
print_per_steps: 20
eval: False
eval_image_size: 224
save_ckpt: False
mode_name: "GRAPH"
dtype: "fp16"
acc_mode: "O0"
conv_init: "XavierUniform"
dense_init: "TruncatedNormal"
train_image_size: 224
eval_image_size: 224
# Export options
device_id: 0
width: 224
height: 224
file_name: "resnet"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -19,7 +19,6 @@ checkpoint_file_path: ""
optimizer: "Momentum"
infer_label: ""
class_num: 1001
train_image_size: 224
batch_size: 32
loss_scale: 1024
momentum: 0.9
@ -45,7 +44,6 @@ dataset: "imagenet2012"
device_num: 1
pre_trained: ""
run_eval: False
eval_image_size: 256
eval_dataset_path: ""
parameter_server: False
filter_weight: False
@ -61,16 +59,22 @@ dense_init: "TruncatedNormal"
all_reduce_fusion_config:
- 1
- 100
train_image_size: 224
eval_image_size: 256
# Export options
device_id: 0
width: 256
height: 256
file_name: "se-resnet50"
file_format: "AIR"
file_format: "MINDIR"
ckpt_file: ""
network_dataset: "se-resnet50_imagenet2012"
# postprocess resnet inference
result_path: ''
label_path: ''
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"

View File

@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train resnet."""
"""eval resnet."""
import os
from mindspore import context
from mindspore.common import set_seed
@ -25,13 +25,15 @@ from src.model_utils.moxing_adapter import moxing_wrapper
set_seed(1)
if config.net_name in ("resnet18", "resnet34", "resnet50"):
if config.net_name in ("resnet18", "resnet34", "resnet50", "resnet152"):
if config.net_name == "resnet18":
from src.resnet import resnet18 as resnet
if config.net_name == "resnet34":
elif config.net_name == "resnet34":
from src.resnet import resnet34 as resnet
if config.net_name == "resnet50":
elif config.net_name == "resnet50":
from src.resnet import resnet50 as resnet
else:
from src.resnet import resnet152 as resnet
if config.dataset == "cifar10":
from src.dataset import create_dataset1 as create_dataset
else:

View File

@ -24,7 +24,7 @@ from src.model_utils.config import config
from src.model_utils.moxing_adapter import moxing_wrapper
context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
if config.device_target == "Ascend":
if config.device_target != "GPU":
context.set_context(device_id=config.device_id)
def modelarts_pre_process():
@ -34,18 +34,16 @@ def modelarts_pre_process():
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_export():
"""run export."""
if config.network_dataset == 'resnet18_cifar10':
from src.resnet import resnet18 as resnet
elif config.network_dataset == 'resnet18_imagenet2012':
if config.network_dataset in ['resnet18_cifar10', 'resnet18_imagenet2012']:
from src.resnet import resnet18 as resnet
elif config.network_dataset == 'resnet34_imagenet2012':
from src.resnet import resnet34 as resnet
elif config.network_dataset == 'resnet50_cifar10':
from src.resnet import resnet50 as resnet
elif config.network_dataset == 'resnet50_imagenet2012':
elif config.network_dataset in ['resnet50_cifar10', 'resnet50_imagenet2012']:
from src.resnet import resnet50 as resnet
elif config.network_dataset == 'resnet101_imagenet2012':
from src.resnet import resnet101 as resnet
elif config.network_dataset == 'resnet152_imagenet2012':
from src.resnet import resnet152 as resnet
elif config.network_dataset == 'se-resnet50_imagenet2012':
from src.resnet import se_resnet50 as resnet
else:
@ -61,5 +59,6 @@ def run_export():
input_arr = Tensor(np.zeros([config.batch_size, 3, config.height, config.width], np.float32))
export(net, input_arr, file_name=config.file_name, file_format=config.file_format)
if __name__ == '__main__':
run_export()

View File

@ -15,15 +15,11 @@
"""post process for 310 inference"""
import os
import json
import argparse
import numpy as np
from src.model_utils.config import config
batch_size = 1
parser = argparse.ArgumentParser(description="resnet inference")
parser.add_argument("--dataset", type=str, required=True, help="dataset type.")
parser.add_argument("--result_path", type=str, required=True, help="result files path.")
parser.add_argument("--label_path", type=str, required=True, help="image file path.")
args = parser.parse_args()
def get_top5_acc(top5_arg, gt_class):
sub_count = 0
@ -80,7 +76,7 @@ def cal_acc_imagenet(result_path, label_path):
if __name__ == '__main__':
if args.dataset.lower() == "cifar10":
cal_acc_cifar10(args.result_path, args.label_path)
if config.dataset.lower() == "cifar10":
cal_acc_cifar10(config.result_path, config.label_path)
else:
cal_acc_imagenet(args.result_path, args.label_path)
cal_acc_imagenet(config.result_path, config.label_path)

View File

@ -14,9 +14,9 @@
# limitations under the License.
# ============================================================================
if [[ $# -lt 4 || $# -gt 5 ]]; then
echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
NET_TYPE can choose from [resnet18, resnet34, se-resnet50, resnet50, resnet101]
if [[ $# -lt 5 || $# -gt 6 ]]; then
echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [CONFIG_PATH] [DEVICE_ID]
NET_TYPE can choose from [resnet18, resnet34, se-resnet50, resnet50, resnet101, resnet152]
DATASET can choose from [cifar10, imagenet]
DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
exit 1
@ -30,7 +30,7 @@ get_real_path(){
fi
}
model=$(get_real_path $1)
if [ $2 == 'resnet18' ] || [ $2 == 'resnet34' ] || [ $2 == 'se-resnet50' ] || [ $2 == 'resnet50' ] || [ $2 == 'resnet101' ]; then
if [ $2 == 'resnet18' ] || [ $2 == 'resnet34' ] || [ $2 == 'se-resnet50' ] || [ $2 == 'resnet50' ] || [ $2 == 'resnet152' ] || [ $2 == 'resnet101' ]; then
network=$2
else
echo "NET_TYPE can choose from [resnet18, se-resnet50]"
@ -45,10 +45,11 @@ else
fi
data_path=$(get_real_path $4)
config_path=$(get_real_path $5)
device_id=0
if [ $# == 5 ]; then
device_id=$5
if [ $# == 6 ]; then
device_id=$6
fi
echo "mindir name: "$model
@ -86,10 +87,7 @@ function preprocess_data()
rm -rf ./preprocess_Result
fi
mkdir preprocess_Result
BASE_PATH=$(dirname "$(dirname "$(readlink -f $0)")")
CONFIG_FILE="${BASE_PATH}/config/$1"
python3.7 ../preprocess.py --data_path=$data_path --output_path=./preprocess_Result --config_path=$CONFIG_FILE &> preprocess.log
python3.7 ../preprocess.py --data_path=$data_path --output_path=./preprocess_Result --config_path=$config_path &> preprocess.log
}
function infer()
@ -109,10 +107,10 @@ function infer()
function cal_acc()
{
if [ "x${dataset}" == "xcifar10" ] || [ "x${dataset}" == "xCifar10" ]; then
python ../postprocess.py --dataset=$dataset --label_path=./preprocess_Result/label --result_path=result_Files &> acc.log
python ../postprocess.py --dataset=$dataset --label_path=./preprocess_Result/label --result_path=result_Files --config_path=$config_path &> acc.log
else
python3.7 ../create_imagenet2012_label.py --img_path=$data_path
python3.7 ../postprocess.py --dataset=$dataset --result_path=./result_Files --label_path=./imagenet_label.json &> acc.log
python3.7 ../postprocess.py --dataset=$dataset --result_path=./result_Files --label_path=./imagenet_label.json --config_path=$config_path &> acc.log
fi
if [ $? -ne 0 ]; then
echo "calculate accuracy failed"

View File

@ -209,7 +209,6 @@ def create_dataset_pynative(dataset_path, do_train, repeat_num=1, batch_size=32,
data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=2, shuffle=True,
num_shards=device_num, shard_id=rank_id)
mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
std = [0.229 * 255, 0.224 * 255, 0.225 * 255]

View File

@ -48,6 +48,36 @@ def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps):
return lr_each_step
def _generate_step_lr(lr_init, lr_max, total_steps, warmup_steps):
"""
Applies three steps decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
decay_epoch_index = [0.2 * total_steps, 0.5 * total_steps, 0.7 * total_steps, 0.9 * total_steps]
lr_each_step = []
for i in range(total_steps):
if i < decay_epoch_index[0]:
lr = lr_max
elif i < decay_epoch_index[1]:
lr = lr_max * 0.1
elif i < decay_epoch_index[2]:
lr = lr_max * 0.01
elif i < decay_epoch_index[3]:
lr = lr_max * 0.001
else:
lr = 0.00005
lr_each_step.append(lr)
return lr_each_step
def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies polynomial decay to generate learning rate array.
@ -155,6 +185,9 @@ def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch
if lr_decay_mode == 'steps':
lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'step':
warmup_steps = warmup_epochs
lr_each_step = _generate_step_lr(lr_init, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'poly':
lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'cosine':

View File

@ -29,6 +29,8 @@ def conv_variance_scaling_initializer(in_channel, out_channel, kernel_size):
scale = 1.0
scale /= max(1., fan_in)
stddev = (scale ** 0.5) / .87962566103423978
if config.net_name == "resnet152":
stddev = (scale ** 0.5)
mu, sigma = 0, stddev
weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size)
weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size))
@ -113,6 +115,8 @@ def _conv3x3(in_channel, out_channel, stride=1, use_se=False, res_base=False):
else:
weight_shape = (out_channel, in_channel, 3, 3)
weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
if config.net_name == "resnet152":
weight = _weight_variable(weight_shape)
if res_base:
return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride,
padding=1, pad_mode='pad', weight_init=weight)
@ -126,6 +130,8 @@ def _conv1x1(in_channel, out_channel, stride=1, use_se=False, res_base=False):
else:
weight_shape = (out_channel, in_channel, 1, 1)
weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
if config.net_name == "resnet152":
weight = _weight_variable(weight_shape)
if res_base:
return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride,
padding=0, pad_mode='pad', weight_init=weight)
@ -139,6 +145,8 @@ def _conv7x7(in_channel, out_channel, stride=1, use_se=False, res_base=False):
else:
weight_shape = (out_channel, in_channel, 7, 7)
weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
if config.net_name == "resnet152":
weight = _weight_variable(weight_shape)
if res_base:
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
@ -166,6 +174,8 @@ def _fc(in_channel, out_channel, use_se=False):
else:
weight_shape = (out_channel, in_channel)
weight = Tensor(kaiming_uniform(weight_shape, a=math.sqrt(5)))
if config.net_name == "resnet152":
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
@ -209,7 +219,7 @@ class ResidualBlock(nn.Cell):
self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se)
self.bn3 = _bn(out_channel)
if config.optimizer == "Thor":
if config.optimizer == "Thor" or config.net_name == "resnet152":
self.bn3 = _bn_last(out_channel)
if self.se_block:
self.se_global_pool = P.ReduceMean(keep_dims=False)
@ -594,3 +604,24 @@ def resnet101(class_num=1001):
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
def resnet152(class_num=1001):
"""
Get ResNet152 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet152 neural network.
Examples:
# >>> net = resnet152(1001)
"""
return ResNet(ResidualBlock,
[3, 8, 36, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)

View File

@ -40,15 +40,18 @@ from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_rank_id, get_device_num
from src.resnet import conv_variance_scaling_initializer
set_seed(1)
if config.net_name in ("resnet18", "resnet34", "resnet50"):
if config.net_name in ("resnet18", "resnet34", "resnet50", "resnet152"):
if config.net_name == "resnet18":
from src.resnet import resnet18 as resnet
if config.net_name == "resnet34":
elif config.net_name == "resnet34":
from src.resnet import resnet34 as resnet
if config.net_name == "resnet50":
elif config.net_name == "resnet50":
from src.resnet import resnet50 as resnet
else:
from src.resnet import resnet152 as resnet
if config.dataset == "cifar10":
from src.dataset import create_dataset1 as create_dataset
else:
@ -109,7 +112,7 @@ def set_parameter():
if config.net_name == "resnet50" or config.net_name == "se-resnet50":
if config.acc_mode not in ["O1", "O2"]:
context.set_auto_parallel_context(all_reduce_fusion_config=config.all_reduce_fusion_config)
elif config.net_name == "resnet101":
elif config.net_name in ["resnet101", "resnet152"]:
context.set_auto_parallel_context(all_reduce_fusion_config=config.all_reduce_fusion_config)
init()
# GPU target
@ -159,7 +162,7 @@ def init_lr(step_size):
from src.lr_generator import get_thor_lr
lr = get_thor_lr(0, config.lr_init, config.lr_decay, config.lr_end_epoch, step_size, decay_epochs=39)
else:
if config.net_name in ("resnet18", "resnet34", "resnet50", "se-resnet50"):
if config.net_name in ("resnet18", "resnet34", "resnet50", "resnet152", "se-resnet50"):
lr = get_lr(lr_init=config.lr_init, lr_end=config.lr_end, lr_max=config.lr_max,
warmup_epochs=config.warmup_epochs, total_epochs=config.epoch_size, steps_per_epoch=step_size,
lr_decay_mode=config.lr_decay_mode)
@ -249,7 +252,7 @@ def train_net():
metrics = {"acc"}
if config.run_distribute:
metrics = {'acc': DistAccuracy(batch_size=config.batch_size, device_num=config.device_num)}
if (config.net_name not in ("resnet18", "resnet34", "resnet50", "resnet101", "se-resnet50")) or \
if (config.net_name not in ("resnet18", "resnet34", "resnet50", "resnet101", "resnet152", "se-resnet50")) or \
config.parameter_server or target == "CPU":
## fp32 training
model = Model(net, loss_fn=loss, optimizer=opt, metrics=metrics, eval_network=dist_eval_network)

View File

@ -1,236 +0,0 @@
# Resnet152描述
## 概述
ResNet系列模型是在2015年提出的通过ResNet单元成功训练152层神经网络一举在ILSVRC2015比赛中取得冠军。该网络创新性的提出了残差结构通过堆叠多个残差结构从而构建了ResNet网络。传统的卷积网络或全连接网络或多或少存在信息丢失的问题还会造成梯度消失或爆炸导致深度网络训练失败ResNet则在一定程度上解决了这个问题。通过将输入信息传递给输出确保信息完整性。整个网络只需要学习输入和输出的差异部分简化了学习目标和难度。正因如此ResNet十分受欢迎甚至可以直接用于ConceptNet网络。
如下为MindSpore使用ImageNet2012数据集对ResNet152进行训练的示例。
## 论文
1. [论文](https://arxiv.org/pdf/1512.03385.pdf): Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun."Deep Residual Learning for Image Recognition"
# 模型架构
ResNet152的总体网络架构如下[链接](https://arxiv.org/pdf/1512.03385.pdf)
# 数据集
使用的数据集:[ImageNet2012](http://www.image-net.org/)
- 数据集大小共1000个类、224*224彩色图像
- 训练集共1,281,167张图像
- 测试集共50,000张图像
- 数据格式JPEG
- 注数据在dataset.py中处理。
- 下载数据集,目录结构如下:
```text
└─dataset
├─ilsvrc # 训练数据集
└─validation_preprocess # 评估数据集
```
# 环境要求
- 硬件
- 准备Ascend处理器搭建硬件环境。
- 框架
- [MindSpore](https://www.mindspore.cn/install/en)
- 如需查看详情,请参见如下资源:
- [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# 快速入门
通过官方网站安装MindSpore后您可以按照如下步骤进行训练和评估
- Ascend处理器环境运行
```Shell
# 分布式训练
用法bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](可选)
# 单机训练
用法bash run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH](可选)
# 运行评估示例
用法bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH]
```
# 脚本说明
## 脚本及样例代码
```text
└──resnet
├── README.md
├── scripts
├── run_distribute_train.sh # 启动Ascend分布式训练8卡
├── run_eval.sh # 启动Ascend评估
└── run_standalone_train.sh # 启动Ascend单机训练单卡
├── src
├── config.py # 参数配置
├── dataset.py # 数据预处理
├── CrossEntropySmooth.py # ImageNet2012数据集的损失定义
├── lr_generator.py # 生成每个步骤的学习率
└── resnet.py # ResNet骨干网络包括ResNet50、ResNet101、SE-ResNet50和Resnet152
├── eval.py # 评估网络
└── train.py # 训练网络
```
# 脚本参数
在config.py中可以同时配置训练参数和评估参数。
- 配置ResNet152和ImageNet2012数据集。
```Python
"class_num":1001, # 数据集类数
"batch_size":32, # 输入张量的批次大小
"loss_scale":1024, # 损失等级
"momentum":0.9, # 动量优化器
"weight_decay":1e-4, # 权重衰减
"epoch_size":140, # 训练周期大小
"save_checkpoint":True, # 是否保存检查点
"save_checkpoint_epochs":5, # 两个检查点之间的周期间隔;默认情况下,最后一个检查点将在最后一个周期完成后保存
"keep_checkpoint_max":10, # 只保存最后一个keep_checkpoint_max检查点
"save_checkpoint_path":"./", # 检查点相对于执行路径的保存路径
"warmup_epochs":0, # 热身周期数
"lr_decay_mode":"steps", # 用于生成学习率的衰减模式
"use_label_smooth":True, # 标签平滑
"label_smooth_factor":0.1, # 标签平滑因子
"lr":0.1 # 基础学习率
"lr_end":0.0001, # 最终学习率
```
# 训练过程
## 用法
## Ascend处理器环境运行
```Shell
# 分布式训练
用法bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](可选)
# 单机训练
用法bash run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH](可选)
```
分布式训练需要提前创建JSON格式的HCCL配置文件。
具体操作,参见[hccn_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)中的说明。
训练结果保存在示例路径中文件夹名称以“train”或“train_parallel”开头。您可在此路径下的日志中找到检查点文件以及结果如下所示。
## 结果
- 使用ImageNet2012数据集训练ResNet50
```text
# 分布式训练结果8P
epoch: 1 step: 5004, loss is 4.184874
epoch: 2 step: 5004, loss is 4.013571
epoch: 3 step: 5004, loss is 3.695777
epoch: 4 step: 5004, loss is 3.3244863
epoch: 5 step: 5004, loss is 3.4899402
...
```
# 评估过程
## 用法
### Ascend处理器环境运行
```Shell
# 评估
Usage: bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH]
```
```Shell
# 评估示例
bash run_eval.sh /data/dataset/ImageNet/imagenet_original Resnet152-140_5004.ckpt
```
训练过程中可以生成检查点。
## 结果
评估结果保存在示例路径中文件夹名为“eval”。您可在此路径下的日志找到如下结果
- 使用ImageNet2012数据集评估ResNet152
```text
result: {'top_5_accuracy': 0.9438420294494239, 'top_1_accuracy': 0.78817221518} ckpt= resnet152-140_5004.ckpt
```
# 推理过程
## [导出MindIR](#contents)
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
参数ckpt_file为必填项
`EXPORT_FORMAT` 必须在 ["AIR", "MINDIR"]中选择。
## 在Ascend310执行推理
在执行推理前mindir文件必须通过`export.py`脚本导出。以下展示了使用minir模型执行推理的示例。
目前imagenet2012数据集仅支持batch_Size为1的推理。
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
```
- `MINDIR_PATH` mindir文件路径
- `DATASET_PATH` 推理数据集路径
- `DEVICE_ID` 可选默认值为0。
## 结果
推理结果保存在脚本执行的当前路径你可以在acc.log中看到以下精度计算结果。
```bash
'acc': 0.7871
```
# 模型描述
## 性能
### 评估性能
#### ImageNet2012上的ResNet152
| 参数 | Ascend 910 |
|---|---|
| 模型版本 | ResNet152 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8 |
| 上传日期 |2021-02-10 ; |
| MindSpore版本 | 1.0.1 |
| 数据集 | ImageNet2012 |
| 训练参数 | epoch=140, steps per epoch=5004, batch_size = 32 |
| 优化器 | Momentum |
| 损失函数 |Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 1.7375104 |
|速度|47.47毫秒/步8卡 |
|总时长 | 577分钟 |
|参数(M) | 60.19 |
| 微调检查点 | 462M.ckpt文件 |
| 脚本 | [链接](https://gitee.com/panpanrui/mindspore/tree/master/model_zoo/official/cv/resnet152) |
# 随机情况说明
dataset.py中设置了“create_dataset”函数内的种子同时还使用了train.py中的随机种子。
# ModelZoo主页
请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。

View File

@ -1,14 +0,0 @@
cmake_minimum_required(VERSION 3.14.1)
project(Ascend310Infer)
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined")
set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/)
option(MINDSPORE_PATH "mindspore install path" "")
include_directories(${MINDSPORE_PATH})
include_directories(${MINDSPORE_PATH}/include)
include_directories(${PROJECT_SRC_ROOT})
find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib)
file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*)
add_executable(main src/main.cc src/utils.cc)
target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags)

View File

@ -1,29 +0,0 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [ -d out ]; then
rm -rf out
fi
mkdir out
cd out || exit
if [ -f "Makefile" ]; then
make clean
fi
cmake .. \
-DMINDSPORE_PATH="`pip3.7 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`"
make

View File

@ -1,35 +0,0 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_INFERENCE_UTILS_H_
#define MINDSPORE_INFERENCE_UTILS_H_
#include <sys/stat.h>
#include <dirent.h>
#include <vector>
#include <string>
#include <memory>
#include "include/api/types.h"
std::vector<std::string> GetAllFiles(std::string_view dirName);
DIR *OpenDir(std::string_view dirName);
std::string RealPath(std::string_view path);
mindspore::MSTensor ReadFileToTensor(const std::string &file);
int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs);
std::vector<std::string> GetAllFiles(std::string dir_name);
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name);
#endif

View File

@ -1,181 +0,0 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <sys/time.h>
#include <gflags/gflags.h>
#include <dirent.h>
#include <iostream>
#include <string>
#include <algorithm>
#include <iosfwd>
#include <vector>
#include <fstream>
#include <sstream>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/types.h"
#include "include/api/serialization.h"
#include "include/dataset/vision_ascend.h"
#include "include/dataset/execute.h"
#include "include/dataset/transforms.h"
#include "include/dataset/vision.h"
#include "inc/utils.h"
using mindspore::Context;
using mindspore::Serialization;
using mindspore::Model;
using mindspore::Status;
using mindspore::ModelType;
using mindspore::GraphCell;
using mindspore::kSuccess;
using mindspore::MSTensor;
using mindspore::dataset::Execute;
using mindspore::dataset::vision::Decode;
using mindspore::dataset::vision::Resize;
using mindspore::dataset::vision::CenterCrop;
using mindspore::dataset::vision::Normalize;
using mindspore::dataset::vision::HWC2CHW;
DEFINE_string(mindir_path, "", "mindir path");
DEFINE_string(dataset_name, "imagenet2012", "['cifar10', 'imagenet2012']");
DEFINE_string(input0_path, ".", "input0 path");
DEFINE_int32(device_id, 0, "device id");
int load_model(Model *model, std::vector<MSTensor> *model_inputs, std::string mindir_path, int device_id) {
if (RealPath(mindir_path).empty()) {
std::cout << "Invalid mindir" << std::endl;
return 1;
}
auto context = std::make_shared<Context>();
auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
ascend310->SetDeviceID(device_id);
context->MutableDeviceInfo().push_back(ascend310);
mindspore::Graph graph;
Serialization::Load(mindir_path, ModelType::kMindIR, &graph);
Status ret = model->Build(GraphCell(graph), context);
if (ret != kSuccess) {
std::cout << "ERROR: Build failed." << std::endl;
return 1;
}
*model_inputs = model->GetInputs();
if (model_inputs->empty()) {
std::cout << "Invalid model, inputs is empty." << std::endl;
return 1;
}
return 0;
}
int main(int argc, char **argv) {
gflags::ParseCommandLineFlags(&argc, &argv, true);
Model model;
std::vector<MSTensor> model_inputs;
load_model(&model, &model_inputs, FLAGS_mindir_path, FLAGS_device_id);
std::map<double, double> costTime_map;
struct timeval start = {0};
struct timeval end = {0};
double startTimeMs;
double endTimeMs;
if (FLAGS_dataset_name == "cifar10") {
auto input0_files = GetAllFiles(FLAGS_input0_path);
if (input0_files.empty()) {
std::cout << "ERROR: no input data." << std::endl;
return 1;
}
size_t size = input0_files.size();
for (size_t i = 0; i < size; ++i) {
std::vector<MSTensor> inputs;
std::vector<MSTensor> outputs;
std::cout << "Start predict input files:" << input0_files[i] <<std::endl;
auto input0 = ReadFileToTensor(input0_files[i]);
inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
input0.Data().get(), input0.DataSize());
gettimeofday(&start, nullptr);
Status ret = model.Predict(inputs, &outputs);
gettimeofday(&end, nullptr);
if (ret != kSuccess) {
std::cout << "Predict " << input0_files[i] << " failed." << std::endl;
return 1;
}
startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
WriteResult(input0_files[i], outputs);
}
} else {
auto input0_files = GetAllInputData(FLAGS_input0_path);
if (input0_files.empty()) {
std::cout << "ERROR: no input data." << std::endl;
return 1;
}
size_t size = input0_files.size();
for (size_t i = 0; i < size; ++i) {
for (size_t j = 0; j < input0_files[i].size(); ++j) {
std::vector<MSTensor> inputs;
std::vector<MSTensor> outputs;
std::cout << "Start predict input files:" << input0_files[i][j] <<std::endl;
auto decode = Decode();
auto resize = Resize({256});
auto centercrop = CenterCrop({224, 224});
auto normalize = Normalize({123.675, 116.28, 103.53}, {58.395, 57.12, 57.375});
auto hwc2chw = HWC2CHW();
Execute SingleOp({decode, resize, centercrop, normalize, hwc2chw});
auto imgDvpp = std::make_shared<MSTensor>();
SingleOp(ReadFileToTensor(input0_files[i][j]), imgDvpp.get());
inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
imgDvpp->Data().get(), imgDvpp->DataSize());
gettimeofday(&start, nullptr);
Status ret = model.Predict(inputs, &outputs);
gettimeofday(&end, nullptr);
if (ret != kSuccess) {
std::cout << "Predict " << input0_files[i][j] << " failed." << std::endl;
return 1;
}
startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
WriteResult(input0_files[i][j], outputs);
}
}
}
double average = 0.0;
int inferCount = 0;
for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) {
double diff = 0.0;
diff = iter->second - iter->first;
average += diff;
inferCount++;
}
average = average / inferCount;
std::stringstream timeCost;
timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl;
std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl;
std::string fileName = "./time_Result" + std::string("/test_perform_static.txt");
std::ofstream fileStream(fileName.c_str(), std::ios::trunc);
fileStream << timeCost.str();
fileStream.close();
costTime_map.clear();
return 0;
}

View File

@ -1,185 +0,0 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <fstream>
#include <algorithm>
#include <iostream>
#include "inc/utils.h"
using mindspore::MSTensor;
using mindspore::DataType;
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name) {
std::vector<std::vector<std::string>> ret;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
struct dirent *filename;
/* read all the files in the dir ~ */
std::vector<std::string> sub_dirs;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
// get rid of "." and ".."
if (d_name == "." || d_name == ".." || d_name.empty()) {
continue;
}
std::string dir_path = RealPath(std::string(dir_name) + "/" + filename->d_name);
struct stat s;
lstat(dir_path.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
continue;
}
sub_dirs.emplace_back(dir_path);
}
std::sort(sub_dirs.begin(), sub_dirs.end());
(void)std::transform(sub_dirs.begin(), sub_dirs.end(), std::back_inserter(ret),
[](const std::string &d) { return GetAllFiles(d); });
return ret;
}
std::vector<std::string> GetAllFiles(std::string dir_name) {
struct dirent *filename;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
if (d_name == "." || d_name == ".." || d_name.size() <= 3) {
continue;
}
res.emplace_back(std::string(dir_name) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
return res;
}
std::vector<std::string> GetAllFiles(std::string_view dirName) {
struct dirent *filename;
DIR *dir = OpenDir(dirName);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string dName = std::string(filename->d_name);
if (dName == "." || dName == ".." || filename->d_type != DT_REG) {
continue;
}
res.emplace_back(std::string(dirName) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
for (auto &f : res) {
std::cout << "image file: " << f << std::endl;
}
return res;
}
int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) {
std::string homePath = "./result_Files";
for (size_t i = 0; i < outputs.size(); ++i) {
size_t outputSize;
std::shared_ptr<const void> netOutput;
netOutput = outputs[i].Data();
outputSize = outputs[i].DataSize();
int pos = imageFile.rfind('/');
std::string fileName(imageFile, pos + 1);
fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin");
std::string outFileName = homePath + "/" + fileName;
FILE *outputFile = fopen(outFileName.c_str(), "wb");
fwrite(netOutput.get(), outputSize, sizeof(char), outputFile);
fclose(outputFile);
outputFile = nullptr;
}
return 0;
}
mindspore::MSTensor ReadFileToTensor(const std::string &file) {
if (file.empty()) {
std::cout << "Pointer file is nullptr" << std::endl;
return mindspore::MSTensor();
}
std::ifstream ifs(file);
if (!ifs.good()) {
std::cout << "File: " << file << " is not exist" << std::endl;
return mindspore::MSTensor();
}
if (!ifs.is_open()) {
std::cout << "File: " << file << "open failed" << std::endl;
return mindspore::MSTensor();
}
ifs.seekg(0, std::ios::end);
size_t size = ifs.tellg();
mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size);
ifs.seekg(0, std::ios::beg);
ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size);
ifs.close();
return buffer;
}
DIR *OpenDir(std::string_view dirName) {
if (dirName.empty()) {
std::cout << " dirName is null ! " << std::endl;
return nullptr;
}
std::string realPath = RealPath(dirName);
struct stat s;
lstat(realPath.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
std::cout << "dirName is not a valid directory !" << std::endl;
return nullptr;
}
DIR *dir;
dir = opendir(realPath.c_str());
if (dir == nullptr) {
std::cout << "Can not open dir " << dirName << std::endl;
return nullptr;
}
std::cout << "Successfully opened the dir " << dirName << std::endl;
return dir;
}
std::string RealPath(std::string_view path) {
char realPathMem[PATH_MAX] = {0};
char *realPathRet = nullptr;
realPathRet = realpath(path.data(), realPathMem);
if (realPathRet == nullptr) {
std::cout << "File: " << path << " is not exist.";
return "";
}
std::string realPath(realPathMem);
std::cout << path << " realpath is: " << realPath << std::endl;
return realPath;
}

View File

@ -1,65 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""eval resnet."""
import argparse
from mindspore import context
from mindspore.common import set_seed
from mindspore.train.model import Model
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.CrossEntropySmooth import CrossEntropySmooth
from src.resnet import resnet152 as resnet
from src.config import config5 as config
from src.dataset import create_dataset2 as create_dataset
parser = argparse.ArgumentParser(description='Image classification')
parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
parser.add_argument('--data_url', type=str, default=None, help='Dataset path')
args_opt = parser.parse_args()
set_seed(1)
if __name__ == '__main__':
target = "Ascend"
# init context
context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False)
# create dataset
local_data_path = args_opt.data_url
print('Download data.')
dataset = create_dataset(dataset_path=local_data_path, do_train=False, batch_size=config.batch_size,
target=target)
step_size = dataset.get_dataset_size()
# define net
net = resnet(class_num=config.class_num)
ckpt_name = args_opt.checkpoint_path
param_dict = load_checkpoint(ckpt_name)
load_param_into_net(net, param_dict)
net.set_train(False)
# define loss, model
if not config.use_label_smooth:
config.label_smooth_factor = 0.0
loss = CrossEntropySmooth(sparse=True, reduction='mean',
smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
# define model
model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
# eval model
res = model.eval(dataset)
print("result:", res, "ckpt=", ckpt_name)

View File

@ -1,47 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Export DPN
suggest run as python export.py --file_name [filename] --file_format [file format] --checkpoint_path [ckpt path]
"""
import argparse
import numpy as np
from mindspore import Tensor, context
from mindspore.train.serialization import export, load_checkpoint
from src.resnet import resnet152 as resnet
from src.config import config5 as config
parser = argparse.ArgumentParser(description="resnet152 export ")
parser.add_argument("--device_id", type=int, default=0, help="Device id")
parser.add_argument("--ckpt_file", type=str, required=True, help="checkpoint file path")
parser.add_argument("--dataset", type=str, default="imagenet2012", help="Dataset, either cifar10 or imagenet2012")
parser.add_argument("--width", type=int, default=224, help="input width")
parser.add_argument("--height", type=int, default=224, help="input height")
parser.add_argument("--file_name", type=str, default='resnet152', help="output file name")
parser.add_argument("--file_format", type=str, choices=['AIR', 'ONNX', 'MINDIR'], default='AIR', help="Device id")
parser.add_argument("--device_target", type=str, choices=['Ascend', 'GPU', 'CPU'], default='Ascend', help="target")
args = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
if __name__ == "__main__":
target = args.device_target
if target != "GPU":
context.set_context(device_id=args.device_id)
# define net
network = resnet(class_num=config.class_num)
param_dict = load_checkpoint(args.ckpt_file, net=network)
network.set_train(False)
input_data = Tensor(np.zeros([1, 3, args.height, args.width]).astype(np.float32))
export(network, input_data, file_name=args.file_name, file_format=args.file_format)

View File

@ -1,51 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""postprocess for 310 inference"""
import os
import json
import argparse
import numpy as np
from mindspore.nn import Top1CategoricalAccuracy, Top5CategoricalAccuracy
parser = argparse.ArgumentParser(description="postprocess")
parser.add_argument("--result_dir", type=str, required=True, help="result files path.")
parser.add_argument("--label_dir", type=str, required=True, help="image file path.")
parser.add_argument('--dataset_name', type=str, choices=["cifar10", "imagenet2012"], default="imagenet2012")
args = parser.parse_args()
def calcul_acc(lab, preds):
return sum(1 for x, y in zip(lab, preds) if x == y) / len(lab)
if __name__ == '__main__':
batch_size = 1
top1_acc = Top1CategoricalAccuracy()
rst_path = args.result_dir
label_list = []
pred_list = []
#from src.config import config2 as cfg
top5_acc = Top5CategoricalAccuracy()
file_list = os.listdir(rst_path)
with open(args.label_dir, "r") as label:
labels = json.load(label)
for f in file_list:
label = f.split("_0.bin")[0] + ".JPEG"
label_list.append(labels[label])
pred = np.fromfile(os.path.join(rst_path, f), np.float32)
pred = pred.reshape(batch_size, int(pred.shape[0] / batch_size))
top1_acc.update(pred, [labels[label],])
top5_acc.update(pred, [labels[label],])
print("Top1 acc: ", top1_acc.eval())
print("Top5 acc: ", top5_acc.eval())

View File

@ -1,52 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""preprocess"""
import os
import argparse
import json
from src.config import config2 as cfg
#import numpy as np
def create_label(result_path, dir_path):
print("[WARNING] Create imagenet label. Currently only use for Imagenet2012!")
dirs = os.listdir(dir_path)
file_list = []
for file in dirs:
file_list.append(file)
file_list = sorted(file_list)
total = 0
img_label = {}
for i, file_dir in enumerate(file_list):
files = os.listdir(os.path.join(dir_path, file_dir))
for f in files:
img_label[f] = i
total += len(files)
json_file = os.path.join(result_path, "imagenet_label.json")
with open(json_file, "w+") as label:
json.dump(img_label, label)
print("[INFO] Completed! Total {} data.".format(total))
parser = argparse.ArgumentParser('preprocess')
parser.add_argument('--dataset', type=str, choices=["cifar10", "imagenet2012"], default="cifar10")
parser.add_argument('--data_path', type=str, default='', help='eval data dir')
parser.add_argument('--result_path', type=str, default='./preprocess_Result/', help='result path')
args = parser.parse_args()
args.per_batch_size = cfg.batch_size
if __name__ == "__main__":
create_label(args.result_path, args.data_path)

View File

@ -1,3 +0,0 @@
numpy
scipy
easydict

View File

@ -1,89 +0,0 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_distribute_train.sh RANK_TABLE_FILE DATA_PATH PRETRAINED_CKPT_PATH](optional)"
echo "For example: bash run_distribute_train.sh hccl_8p_01234567_127.0.0.1.json /path/dataset"
echo "It is better to use the absolute path."
echo "=============================================================================================================="
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
PATH1=$(get_real_path $1)
PATH2=$(get_real_path $2)
if [ $# == 3 ]
then
PATH3=$(get_real_path $5)
fi
if [ ! -f $PATH1 ]
then
echo "error: RANK_TABLE_FILE=$PATH1 is not a file"
exit 1
fi
if [ ! -d $PATH2 ]
then
echo "error: DATA_PATH=$PATH2 is not a directory"
exit 1
fi
if [ $# == 3 ] && [ ! -f $PATH3 ]
then
echo "error: PRETRAINED_CKPT_PATH=$PATH3 is not a file"
exit 1
fi
ulimit -u unlimited
export DEVICE_NUM=8
export RANK_SIZE=8
export RANK_TABLE_FILE=$PATH1
DATA_PATH=$2
export DATA_PATH=${DATA_PATH}
for((i=0;i<${RANK_SIZE};i++))
do
rm -rf device$i
mkdir device$i
cp ../*.py ./device$i
cp *.sh ./device$i
cp -r ../src ./device$i
cd ./device$i || exit
export DEVICE_ID=$i
export RANK_ID=$i
echo "start training for device $i"
env > env$i.log
if [ $# == 2 ]
then
python train.py --run_distribute=True --data_url=$PATH2 &> train.log &
fi
if [ $# == 3 ]
then
python train.py --run_distribute=True --data_url=$PATH2 --pre_trained=$PATH3 &> train.log &
fi
cd ../
done

View File

@ -1,64 +0,0 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_eval.sh DATA_PATH CHECKPOINT_PATH "
echo "For example: bash run.sh /path/dataset Resnet152-140_5004.ckpt"
echo "It is better to use the absolute path."
echo "=============================================================================================================="
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
PATH1=$(get_real_path $1)
PATH2=$(get_real_path $2)
if [ ! -d $PATH1 ]
then
echo "error: DATASET_PATH=$PATH1 is not a directory"
exit 1
fi
if [ ! -f $PATH2 ]
then
echo "error: CHECKPOINT_PATH=$PATH2 is not a file"
exit 1
fi
ulimit -u unlimited
export DEVICE_NUM=1
export DEVICE_ID=0
export RANK_SIZE=$DEVICE_NUM
export RANK_ID=0
if [ -d "eval" ];
then
rm -rf ./eval
fi
mkdir ./eval
cp ../*.py ./eval
cp *.sh ./eval
cp -r ../src ./eval
cd ./eval || exit
env > env.log
echo "start evaluation for device $DEVICE_ID"
python eval.py --data_url=$PATH1 --checkpoint_path=$PATH2 &> eval.log &
cd ..

View File

@ -1,130 +0,0 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -lt 2 || $# -gt 3 ]]; then
echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
exit 1
fi
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
model=$(get_real_path $1)
dataset_path=$(get_real_path $2)
dataset_name="imagenet2012"
device_id=0
if [ $# == 3 ]; then
device_id=$3
fi
echo "mindir name: "$model
echo "dataset path: "$dataset_path
echo "device id: "$device_id
export ASCEND_HOME=/usr/local/Ascend/
if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then
export PATH=$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe
export PYTHONPATH=${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp
else
export PATH=$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export PYTHONPATH=$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/opp
fi
export ASCEND_HOME=/usr/local/Ascend
export PATH=$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/toolkit/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib/:/usr/local/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:/usr/local/Ascend/toolkit/lib64:$LD_LIBRARY_PATH
export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages
export PATH=/usr/local/python375/bin:$PATH
export NPU_HOST_LIB=/usr/local/Ascend/acllib/lib64/stub
export ASCEND_OPP_PATH=/usr/local/Ascend/opp
export ASCEND_AICPU_PATH=/usr/local/Ascend
export LD_LIBRARY_PATH=/usr/local/lib64/:$LD_LIBRARY_PATH
function preprocess_data()
{
if [ -d preprocess_Result ]; then
rm -rf ./preprocess_Result
fi
mkdir preprocess_Result
python3.7 ../preprocess.py --dataset=$dataset_name --data_path=$dataset_path --result_path=./preprocess_Result/
}
function compile_app()
{
cd ../ascend310_infer/ || exit
bash build.sh &> build.log
}
function infer()
{
cd - || exit
if [ -d result_Files ]; then
rm -rf ./result_Files
fi
if [ -d time_Result ]; then
rm -rf ./time_Result
fi
mkdir result_Files
mkdir time_Result
../ascend310_infer/out/main --mindir_path=$model --dataset_name=$dataset_name --input0_path=$dataset_path --device_id=$device_id &> infer.log
}
function cal_acc()
{
python3.7 ../postprocess.py --result_dir=./result_Files --label_dir=./preprocess_Result/imagenet_label.json &> acc.log
}
preprocess_data
if [ $? -ne 0 ]; then
echo "preprocess dataset failed"
exit 1
fi
compile_app
if [ $? -ne 0 ]; then
echo "compile app code failed"
exit 1
fi
infer
if [ $? -ne 0 ]; then
echo " execute inference failed"
exit 1
fi
cal_acc
if [ $? -ne 0 ]; then
echo "calculate accuracy failed"
exit 1
fi

View File

@ -1,77 +0,0 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_standalone_train.sh DATA_PATH PRETRAINED_CKPT_PATH(optional)"
echo "For example: bash run_standalone_train.sh /path/dataset"
echo "It is better to use the absolute path."
echo "=============================================================================================================="
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
PATH1=$(get_real_path $1)
if [ $# == 2 ]
then
PATH2=$(get_real_path $2)
fi
if [ ! -d $PATH1 ]
then
echo "error: DATASET_PATH=$PATH1 is not a directory"
exit 1
fi
if [ $# == 2 ] && [ ! -f $PATH2 ]
then
echo "error: PRETRAINED_CKPT_PATH=$PATH2 is not a file"
exit 1
fi
ulimit -u unlimited
export DEVICE_NUM=1
export DEVICE_ID=6
export RANK_SIZE=$DEVICE_NUM
export RANK_ID=0
if [ -d "train" ];
then
rm -rf ./train
fi
mkdir ./train
cp ../*.py ./train
cp *.sh ./train
cp -r ../src ./train
cd ./train || exit
echo "start training for device $DEVICE_ID"
env > env.log
if [ $# == 1 ]
then
python train.py --data_url=$PATH1 &> train.log &
fi
if [ $# == 2 ]
then
python train.py --data_url=$PATH1 --pre_trained=$PATH2 &> train.log &
fi
cd ..

View File

@ -1,38 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""define loss function for network"""
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.common import dtype as mstype
from mindspore.nn.loss.loss import LossBase
from mindspore.ops import functional as F
from mindspore.ops import operations as P
class CrossEntropySmooth(LossBase):
"""CrossEntropy"""
def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
super(CrossEntropySmooth, self).__init__()
self.onehot = P.OneHot()
self.sparse = sparse
self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
def construct(self, logit, label):
if self.sparse:
label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
loss = self.ce(logit, label)
return loss

View File

@ -1,124 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
network config setting, will be used in train.py and eval.py
"""
from easydict import EasyDict as ed
# config for resent50, cifar10
config1 = ed({
"class_num": 10,
"batch_size": 32,
"loss_scale": 1024,
"momentum": 0.9,
"weight_decay": 1e-4,
"epoch_size": 90,
"pretrain_epoch_size": 0,
"save_checkpoint": True,
"save_checkpoint_epochs": 5,
"keep_checkpoint_max": 10,
"save_checkpoint_path": "./",
"warmup_epochs": 5,
"lr_decay_mode": "poly",
"lr_init": 0.01,
"lr_end": 0.00001,
"lr_max": 0.1
})
# config for resnet50, imagenet2012
config2 = ed({
"class_num": 1001,
"batch_size": 256,
"loss_scale": 1024,
"momentum": 0.9,
"weight_decay": 1e-4,
"epoch_size": 90,
"pretrain_epoch_size": 0,
"save_checkpoint": True,
"save_checkpoint_epochs": 5,
"keep_checkpoint_max": 10,
"save_checkpoint_path": "./",
"warmup_epochs": 0,
"lr_decay_mode": "linear",
"use_label_smooth": True,
"label_smooth_factor": 0.1,
"lr_init": 0,
"lr_max": 0.8,
"lr_end": 0.0
})
# config for resent101, imagenet2012
config3 = ed({
"class_num": 1001,
"batch_size": 32,
"loss_scale": 1024,
"momentum": 0.9,
"weight_decay": 1e-4,
"epoch_size": 120,
"pretrain_epoch_size": 0,
"save_checkpoint": True,
"save_checkpoint_epochs": 5,
"keep_checkpoint_max": 10,
"save_checkpoint_path": "./",
"warmup_epochs": 0,
"lr_decay_mode": "cosine",
"use_label_smooth": True,
"label_smooth_factor": 0.1,
"lr": 0.1
})
# config for se-resnet50, imagenet2012
config4 = ed({
"class_num": 1001,
"batch_size": 32,
"loss_scale": 1024,
"momentum": 0.9,
"weight_decay": 1e-4,
"epoch_size": 28,
"train_epoch_size": 24,
"pretrain_epoch_size": 0,
"save_checkpoint": True,
"save_checkpoint_epochs": 4,
"keep_checkpoint_max": 10,
"save_checkpoint_path": "./",
"warmup_epochs": 3,
"lr_decay_mode": "cosine",
"use_label_smooth": True,
"label_smooth_factor": 0.1,
"lr_init": 0.0,
"lr_max": 0.3,
"lr_end": 0.0001
})
# config for resnet152, imagenet2012
config5 = ed({
"class_num": 1001,
"batch_size": 32,
"loss_scale": 1024,
"momentum": 0.9,
"weight_decay": 1e-4,
"epoch_size": 140,
"save_checkpoint": True,
"save_checkpoint_epochs": 5,
"keep_checkpoint_max": 10,
"save_checkpoint_path": "./",
"warmup_epochs": 0,
"lr_decay_mode": "steps",
"use_label_smooth": True,
"label_smooth_factor": 0.1,
"lr_init": 0.0,
"lr_max": 0.1,
"lr_end": 0.0001
})

View File

@ -1,300 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
create train or eval dataset.
"""
import os
import mindspore.common.dtype as mstype
import mindspore.dataset.engine as de
import mindspore.dataset.vision.c_transforms as C
import mindspore.dataset.transforms.c_transforms as C2
from mindspore.communication.management import init, get_rank, get_group_size
def create_dataset1(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
"""
create a train or evaluate cifar10 dataset for resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
Returns:
dataset
"""
if not do_train:
dataset_path = os.path.join(dataset_path, 'eval')
else:
dataset_path = os.path.join(dataset_path, 'train')
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
if distribute:
init()
rank_id = get_rank()
device_num = get_group_size()
else:
device_num = 1
if device_num == 1:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
# define map operations
trans = []
if do_train:
trans += [
C.RandomCrop((32, 32), (4, 4, 4, 4)),
C.RandomHorizontalFlip(prob=0.5)
]
trans += [
C.Resize((224, 224)),
C.Rescale(1.0 / 255.0, 0.0),
C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
"""
create a train or eval imagenet2012 dataset for resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
Returns:
dataset
"""
if not do_train:
dataset_path = os.path.join(dataset_path, 'val')
else:
dataset_path = os.path.join(dataset_path, 'train')
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
if distribute:
init()
rank_id = get_rank()
device_num = get_group_size()
else:
device_num = 1
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(prob=0.5),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(256),
C.CenterCrop(image_size),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset3(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
"""
create a train or eval imagenet2012 dataset for resnet101
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
Returns:
dataset
"""
if not do_train:
dataset_path = os.path.join(dataset_path, 'val')
else:
dataset_path = os.path.join(dataset_path, 'train')
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
if distribute:
init()
rank_id = get_rank()
device_num = get_group_size()
else:
device_num = 1
rank_id = 1
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [0.475 * 255, 0.451 * 255, 0.392 * 255]
std = [0.275 * 255, 0.267 * 255, 0.278 * 255]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(rank_id / (rank_id + 1)),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(256),
C.CenterCrop(image_size),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def create_dataset4(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False):
"""
create a train or eval imagenet2012 dataset for se-resnet50
Args:
dataset_path(string): the path of dataset.
do_train(bool): whether dataset is used for train or eval.
repeat_num(int): the repeat times of dataset. Default: 1
batch_size(int): the batch size of dataset. Default: 32
target(str): the device target. Default: Ascend
distribute(bool): data for distribute or not. Default: False
Returns:
dataset
"""
if target == "Ascend":
device_num, rank_id = _get_rank_info()
else:
if distribute:
init()
rank_id = get_rank()
device_num = get_group_size()
else:
device_num = 1
if device_num == 1:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True)
else:
ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True,
num_shards=device_num, shard_id=rank_id)
image_size = 224
mean = [123.68, 116.78, 103.94]
std = [1.0, 1.0, 1.0]
# define map operations
if do_train:
trans = [
C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
C.RandomHorizontalFlip(prob=0.5),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
else:
trans = [
C.Decode(),
C.Resize(292),
C.CenterCrop(256),
C.Normalize(mean=mean, std=std),
C.HWC2CHW()
]
type_cast_op = C2.TypeCast(mstype.int32)
ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=12)
ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
# apply batch operations
ds = ds.batch(batch_size, drop_remainder=True)
# apply dataset repeat operation
ds = ds.repeat(repeat_num)
return ds
def _get_rank_info():
"""
get rank size and rank id
"""
rank_size = int(os.environ.get("RANK_SIZE", 1))
if rank_size > 1:
rank_size = get_group_size()
rank_id = get_rank()
else:
rank_size = 1
rank_id = 0
return rank_size, rank_id

View File

@ -1,198 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""learning rate generator"""
import math
import numpy as np
def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps):
"""
Applies three steps decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
decay_epoch_index = [0.2 * total_steps, 0.5 * total_steps, 0.7 * total_steps, 0.9 * total_steps]
lr_each_step = []
for i in range(total_steps):
if i < decay_epoch_index[0]:
lr = lr_max
elif i < decay_epoch_index[1]:
lr = lr_max * 0.1
elif i < decay_epoch_index[2]:
lr = lr_max * 0.01
elif i < decay_epoch_index[3]:
lr = lr_max * 0.001
else:
lr = 0.00005
lr_each_step.append(lr)
return lr_each_step
def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies polynomial decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
lr_each_step = []
if warmup_steps != 0:
inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
else:
inc_each_step = 0
for i in range(total_steps):
if i < warmup_steps:
lr = float(lr_init) + inc_each_step * float(i)
else:
base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
lr = float(lr_max) * base * base
if lr < 0.0:
lr = 0.0
lr_each_step.append(lr)
return lr_each_step
def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies cosine decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
decay_steps = total_steps - warmup_steps
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
lr = float(lr_init) + lr_inc * (i + 1)
else:
linear_decay = (total_steps - i) / decay_steps
cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
decayed = linear_decay * cosine_decay + 0.00001
lr = lr_max * decayed
lr_each_step.append(lr)
return lr_each_step
def _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
"""
Applies liner decay to generate learning rate array.
Args:
lr_init(float): init learning rate.
lr_end(float): end learning rate
lr_max(float): max learning rate.
total_steps(int): all steps in training.
warmup_steps(int): all steps in warmup epochs.
Returns:
np.array, learning rate array.
"""
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr = lr_init + (lr_max - lr_init) * i / warmup_steps
else:
lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps)
lr_each_step.append(lr)
return lr_each_step
def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode):
"""
generate learning rate array
Args:
lr_init(float): init learning rate
lr_end(float): end learning rate
lr_max(float): max learning rate
warmup_epochs(int): number of warmup epochs
total_epochs(int): total epoch of training
steps_per_epoch(int): steps of one epoch
lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or liner(default)
Returns:
np.array, learning rate array
"""
lr_each_step = []
total_steps = int(steps_per_epoch * total_epochs)
warmup_steps = warmup_epochs
if lr_decay_mode == 'steps':
lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'poly':
lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
elif lr_decay_mode == 'cosine':
lr_each_step = _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
else:
lr_each_step = _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
lr_each_step = np.array(lr_each_step).astype(np.float32)
return lr_each_step
def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
lr = float(init_lr) + lr_inc * current_step
return lr
def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0):
"""
generate learning rate array with cosine
Args:
lr(float): base learning rate
steps_per_epoch(int): steps size of one epoch
warmup_epochs(int): number of warmup epochs
max_epoch(int): total epochs of training
global_step(int): the current start index of lr array
Returns:
np.array, learning rate array
"""
base_lr = lr
warmup_init_lr = 0
total_steps = int(max_epoch * steps_per_epoch)
warmup_steps = int(warmup_epochs * steps_per_epoch)
decay_steps = total_steps - warmup_steps
lr_each_step = []
for i in range(total_steps):
if i < warmup_steps:
lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
else:
linear_decay = (total_steps - i) / decay_steps
cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
decayed = linear_decay * cosine_decay + 0.00001
lr = base_lr * decayed
lr_each_step.append(lr)
lr_each_step = np.array(lr_each_step).astype(np.float32)
learning_rate = lr_each_step[global_step:]
return learning_rate

View File

@ -1,407 +0,0 @@
"""ResNet"""
import numpy as np
from scipy.stats import truncnorm
import mindspore.nn as nn
import mindspore.common.dtype as mstype
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore.common.tensor import Tensor
def _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size):
fan_in = in_channel * kernel_size * kernel_size
scale = 1.0
scale /= max(1., fan_in)
stddev = (scale ** 0.5)
mu, sigma = 0, stddev
weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size)
weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size))
return Tensor(weight, dtype=mstype.float32)
def _weight_variable(shape, factor=0.01):
init_value = np.random.randn(*shape).astype(np.float32) * factor
return Tensor(init_value)
def _conv3x3(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=3)
else:
weight_shape = (out_channel, in_channel, 3, 3)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=3, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _conv1x1(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=1)
else:
weight_shape = (out_channel, in_channel, 1, 1)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _conv7x7(in_channel, out_channel, stride=1, use_se=False):
if use_se:
weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=7)
else:
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
def _bn_last(channel):
return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1)
def _fc(in_channel, out_channel, use_se=False):
if use_se:
weight = np.random.normal(loc=0, scale=0.01, size=out_channel*in_channel)
weight = Tensor(np.reshape(weight, (out_channel, in_channel)), dtype=mstype.float32)
else:
weight_shape = (out_channel, in_channel)
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
class ResidualBlock(nn.Cell):
"""
ResNet V1 residual block definition.
Args:
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer. Default: 1.
use_se (bool): enable SE-ResNet50 net. Default: False.
se_block(bool): use se block in SE-ResNet50 net. Default: False.
Returns:
Tensor, output tensor.
Examples:
# >>> ResidualBlock(3, 256, stride=2)
"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1,
use_se=False,
se_block=False):
super(ResidualBlock, self).__init__()
self.stride = stride
self.use_se = use_se
self.se_block = se_block
channel = out_channel // self.expansion
self.conv1 = _conv1x1(in_channel, channel, stride=1, use_se=self.use_se)
self.bn1 = _bn(channel)
if self.use_se and self.stride != 1:
self.e2 = nn.SequentialCell([_conv3x3(channel, channel, stride=1, use_se=True), _bn(channel),
nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same')])
else:
self.conv2 = _conv3x3(channel, channel, stride=stride, use_se=self.use_se)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se)
self.bn3 = _bn_last(out_channel)
if self.se_block:
self.se_global_pool = P.ReduceMean(keep_dims=False)
self.se_dense_0 = _fc(out_channel, int(out_channel/4), use_se=self.use_se)
self.se_dense_1 = _fc(int(out_channel/4), out_channel, use_se=self.use_se)
self.se_sigmoid = nn.Sigmoid()
self.se_mul = P.Mul()
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
if self.use_se:
if stride == 1:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel,
stride, use_se=self.use_se), _bn(out_channel)])
else:
self.down_sample_layer = nn.SequentialCell([nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same'),
_conv1x1(in_channel, out_channel, 1,
use_se=self.use_se), _bn(out_channel)])
else:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride,
use_se=self.use_se), _bn(out_channel)])
self.add = P.TensorAdd()
def construct(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
if self.use_se and self.stride != 1:
out = self.e2(out)
else:
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.se_block:
out_se = out
out = self.se_global_pool(out, (2, 3))
out = self.se_dense_0(out)
out = self.relu(out)
out = self.se_dense_1(out)
out = self.se_sigmoid(out)
out = F.reshape(out, F.shape(out) + (1, 1))
out = self.se_mul(out, out_se)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
class ResNet(nn.Cell):
"""
ResNet architecture.
Args:
block (Cell): Block for network.
layer_nums (list): Numbers of block in different layers.
in_channels (list): Input channel in each layer.
out_channels (list): Output channel in each layer.
strides (list): Stride size in each layer.
num_classes (int): The number of classes that the training images are belonging to.
use_se (bool): enable SE-ResNet50 net. Default: False.
# se_block(bool): use se block in SE-ResNet50 net in layer 3 and layer 4. Default: False.
Returns:
Tensor, output tensor.
Examples:
# >>> ResNet(ResidualBlock,
# >>> [3, 4, 6, 3],
# >>> [64, 256, 512, 1024],
# >>> [256, 512, 1024, 2048],
# >>> [1, 2, 2, 2],
# >>> 10)
"""
def __init__(self,
block,
layer_nums,
in_channels,
out_channels,
strides,
num_classes,
use_se=False):
super(ResNet, self).__init__()
if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
self.use_se = use_se
self.se_block = False
if self.use_se:
self.se_block = True
if self.use_se:
self.conv1_0 = _conv3x3(3, 32, stride=2, use_se=self.use_se)
self.bn1_0 = _bn(32)
self.conv1_1 = _conv3x3(32, 32, stride=1, use_se=self.use_se)
self.bn1_1 = _bn(32)
self.conv1_2 = _conv3x3(32, 64, stride=1, use_se=self.use_se)
else:
self.conv1 = _conv7x7(3, 64, stride=2) # (224, 224, 3) --> (112, 112, 64)
self.bn1 = _bn(64)
self.relu = P.ReLU()
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
self.layer1 = self._make_layer(block,
layer_nums[0],
in_channel=in_channels[0],
out_channel=out_channels[0],
stride=strides[0],
use_se=self.use_se)
self.layer2 = self._make_layer(block,
layer_nums[1],
in_channel=in_channels[1],
out_channel=out_channels[1],
stride=strides[1],
use_se=self.use_se)
self.layer3 = self._make_layer(block,
layer_nums[2],
in_channel=in_channels[2],
out_channel=out_channels[2],
stride=strides[2],
use_se=self.use_se,
se_block=self.se_block)
self.layer4 = self._make_layer(block,
layer_nums[3],
in_channel=in_channels[3],
out_channel=out_channels[3],
stride=strides[3],
use_se=self.use_se,
se_block=self.se_block)
self.mean = P.ReduceMean(keep_dims=True)
self.flatten = nn.Flatten()
self.end_point = _fc(out_channels[3], num_classes, use_se=self.use_se)
def _make_layer(self, block, layer_num, in_channel, out_channel, stride, use_se=False, se_block=False):
"""
Make stage network of ResNet.
Args:
block (Cell): Resnet block.
layer_num (int): Layer number.
in_channel (int): Input channel.
out_channel (int): Output channel.
stride (int): Stride size for the first convolutional layer.
se_block(bool): use se block in SE-ResNet50 net. Default: False.
Returns:
SequentialCell, the output layer.
Examples:
# >>> _make_layer(ResidualBlock, 3, 128, 256, 2)
"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride, use_se=use_se)
layers.append(resnet_block)
if se_block:
for _ in range(1, layer_num - 1):
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
layers.append(resnet_block)
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se, se_block=se_block)
layers.append(resnet_block)
else:
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
layers.append(resnet_block)
return nn.SequentialCell(layers)
def construct(self, x):
if self.use_se:
x = self.conv1_0(x)
x = self.bn1_0(x)
x = self.relu(x)
x = self.conv1_1(x)
x = self.bn1_1(x)
x = self.relu(x)
x = self.conv1_2(x)
else:
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
out = self.mean(c5, (2, 3))
out = self.flatten(out)
out = self.end_point(out)
return out
def resnet50(class_num=10):
"""
Get ResNet50 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet50 neural network.
Examples:
# >>> net = resnet50(10)
"""
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
def se_resnet50(class_num=1001):
"""
Get SE-ResNet50 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of SE-ResNet50 neural network.
Examples:
# >>> net = se-resnet50(1001)
"""
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num,
use_se=True)
def resnet101(class_num=1001):
"""
Get ResNet101 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet101 neural network.
Examples:
# >>> net = resnet101(1001)
"""
return ResNet(ResidualBlock,
[3, 4, 23, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
def resnet152(class_num=1001):
"""
Get ResNet152 neural network.
Args:
class_num (int): Class number.
Returns:
Cell, cell instance of ResNet152 neural network.
Examples:
# >>> net = resnet152(1001)
"""
return ResNet(ResidualBlock,
[3, 8, 36, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)

View File

@ -1,150 +0,0 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""train resnet."""
import os
import argparse
import ast
from mindspore import context
from mindspore import Tensor
from mindspore.nn.optim.momentum import Momentum
from mindspore.train.model import Model
from mindspore.context import ParallelMode
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
from mindspore.train.loss_scale_manager import FixedLossScaleManager
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from mindspore.communication.management import init, get_rank
from mindspore.common import set_seed
import mindspore.nn as nn
import mindspore.common.initializer as weight_init
from src.lr_generator import get_lr
from src.CrossEntropySmooth import CrossEntropySmooth
from src.resnet import resnet152 as resnet
from src.config import config5 as config
from src.dataset import create_dataset2 as create_dataset # imagenet2012
parser = argparse.ArgumentParser(description='Image classification--resnet152')
parser.add_argument('--data_url', type=str, default=None, help='Dataset path')
parser.add_argument('--run_distribute', type=ast.literal_eval, default=False, help='Run distribute')
parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path')
parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
parser.add_argument('--is_save_on_master', type=ast.literal_eval, default=True, help='save ckpt on master or all rank')
args_opt = parser.parse_args()
set_seed(1)
if __name__ == '__main__':
ckpt_save_dir = config.save_checkpoint_path
# init context
print(args_opt.run_distribute)
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False)
if args_opt.run_distribute:
device_id = int(os.getenv('DEVICE_ID'))
rank_size = int(os.environ.get("RANK_SIZE", 1))
print(rank_size)
device_num = rank_size
context.set_context(device_id=device_id, enable_auto_mixed_precision=True)
context.set_auto_parallel_context(device_num=device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True, all_reduce_fusion_config=[180, 313])
init()
args_opt.rank = get_rank()
print(args_opt.rank)
# select for master rank save ckpt or all rank save, compatible for model parallel
args_opt.rank_save_ckpt_flag = 0
if args_opt.is_save_on_master:
if args_opt.rank == 0:
args_opt.rank_save_ckpt_flag = 1
else:
args_opt.rank_save_ckpt_flag = 1
local_data_path = args_opt.data_url
local_data_path = args_opt.data_url
print('Download data:')
# create dataset
dataset = create_dataset(dataset_path=local_data_path, do_train=True, repeat_num=1,
batch_size=config.batch_size, target="Ascend", distribute=args_opt.run_distribute)
step_size = dataset.get_dataset_size()
print("step"+str(step_size))
# define net
net = resnet(class_num=config.class_num)
# init weight
if args_opt.pre_trained:
param_dict = load_checkpoint(args_opt.pre_trained)
load_param_into_net(net, param_dict)
else:
for _, cell in net.cells_and_names():
if isinstance(cell, nn.Conv2d):
cell.weight.set_data(weight_init.initializer(weight_init.HeUniform(),
cell.weight.shape,
cell.weight.dtype))
if isinstance(cell, nn.Dense):
cell.weight.set_data(weight_init.initializer(weight_init.HeNormal(),
cell.weight.shape,
cell.weight.dtype))
# init lr
lr = get_lr(lr_init=config.lr_init, lr_end=config.lr_end, lr_max=config.lr_max,
warmup_epochs=config.warmup_epochs, total_epochs=config.epoch_size, steps_per_epoch=step_size,
lr_decay_mode=config.lr_decay_mode)
lr = Tensor(lr)
# define opt
decayed_params = []
no_decayed_params = []
for param in net.trainable_params():
if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name:
decayed_params.append(param)
else:
no_decayed_params.append(param)
group_params = [{'params': decayed_params, 'weight_decay': config.weight_decay},
{'params': no_decayed_params},
{'order_params': net.trainable_params()}]
opt = Momentum(group_params, lr, config.momentum, loss_scale=config.loss_scale)
# define loss, model
if not config.use_label_smooth:
config.label_smooth_factor = 0.0
loss = CrossEntropySmooth(sparse=True, reduction="mean",
smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale,
metrics={'top_1_accuracy', 'top_5_accuracy'},
amp_level="O3", keep_batchnorm_fp32=False)
# define callbacks
time_cb = TimeMonitor(data_size=step_size)
loss_cb = LossMonitor()
cb = [time_cb, loss_cb]
if config.save_checkpoint:
if args_opt.rank_save_ckpt_flag:
config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_epochs * step_size,
keep_checkpoint_max=config.keep_checkpoint_max)
ckpt_cb = ModelCheckpoint(prefix="resnet152", directory=ckpt_save_dir, config=config_ck)
cb += [ckpt_cb]
# train model
dataset_sink_mode = True
print(dataset.get_dataset_size())
model.train(config.epoch_size, dataset, callbacks=cb,
sink_size=dataset.get_dataset_size(), dataset_sink_mode=dataset_sink_mode)