wideresnet

2021-07-20 20:54:19 +08:00 · 2021-07-20 20:54:19 +08:00 · e8a96b9b81
parent 36820f38dc
commit e8a96b9b81
13 changed files with 1134 additions and 0 deletions
--- a/model_zoo/research/cv/wideresnet/README_CN.md
+++ b/model_zoo/research/cv/wideresnet/README_CN.md
@ -0,0 +1,258 @@
 # 目录
 <!-- TOC -->
 - [目录](#目录)
 - [WideResNet描述](#wideresnet描述)
 - [模型架构](#模型架构)
 - [数据集](#数据集)
 - [环境要求](#环境要求)
 - [快速入门](#快速入门)
 - [脚本说明](#脚本说明)
    - [脚本及样例代码](#脚本及样例代码)
    - [脚本参数](#脚本参数)
    - [训练过程](#训练过程)
        - [用法](#用法)
        - [Ascend处理器环境运行](#ascend处理器环境运行)
        - [结果](#结果)
    - [评估过程](#评估过程)
        - [用法](#用法)
        - [Ascend处理器环境运行](#ascend处理器环境运行)
        - [结果](#结果)
 - [模型描述](#模型描述)
    - [性能](#性能)
        - [评估性能](#评估性能)
            - [cifar10上的WideResNet](#cifar10上的wideresnet)
 - [随机情况说明](#随机情况说明)
 - [ModelZoo主页](#modelzoo主页)
 <!-- /TOC -->
 # WideResNet描述
 ## 概述
 szagoruyko在ResNet基础上提出WideResNet，用于解决网络模型瘦长时，只有有限层学到了有用的知识，更多的层对最终结果只做出了很少的贡献。这个问题也被称为diminishing feature reuse，WideResNet作者加宽了残差块，将训练速度提升几倍，精度也有明显改善。
 如下为MindSpore使用cifar10数据集对WideResNet进行训练的示例。
 ## 论文
 1. [论文](https://arxiv.org/abs/1605.07146): Sergey Zagoruyko."Wide Residual Netwoks"
 # 模型架构
 WideResNet的总体网络架构如下：[链接](https://arxiv.org/abs/1605.07146)
 # 数据集
 使用的数据集：[cifar10](http://www.cs.toronto.edu/~kriz/cifar.html)
 - 数据集大小：共10个类、32*32彩色图像
    - 训练集：共50,000张图像
    - 测试集：共10,000张图像
    - 注：数据在dataset.py中处理。
 - 下载数据集，目录结构如下：
 ```text
 └─cifar-10-batches-bin
    ├─data_batch_1.bin                  # 训练数据集
    ├─data_batch_2.bin                  # 训练数据集
    ├─data_batch_3.bin                  # 训练数据集
    ├─data_batch_4.bin                  # 训练数据集
    ├─data_batch_5.bin                  # 训练数据集
    └─test_batch.bin                    # 评估数据集
 ```
 # 环境要求
 - 硬件
    - 准备Ascend处理器搭建硬件环境。
 - 框架
    - [MindSpore](https://www.mindspore.cn/install/)
 - 如需查看详情，请参见如下资源：
    - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
 # 快速入门
 通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：
 - Ascend处理器环境运行
 ```Shell
 # 分布式训练
 用法：sh run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]（可选）
 # 单机训练
 用法：sh run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH]（可选）
 # 运行评估示例
 用法：sh run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH]
 ```
 # 脚本说明
 ## 脚本及样例代码
 ```text
 └──wideresnet
  ├── README.md
  ├── scripts
    ├── run_distribute_train.sh            # 启动Ascend分布式训练（8卡）
    ├── run_eval.sh                        # 启动Ascend评估
    └── run_standalone_train.sh            # 启动Ascend单机训练（单卡）
  ├── src
    ├── config.py                          # 参数配置
    ├── dataset.py                         # 数据预处理
    ├── cross_entropy_smooth.py            # cifar10数据集的损失定义
    ├── generator_lr.py                    # 生成每个步骤的学习率
    ├── save_callback.py                   # 自定义回调函数保存最优ckpt
    └── wide_resnet.py                     # WideResNet网络结构
  ├── eval.py                              # 评估网络
  ├── export.py                            # 导出网络
  └── train.py                             # 训练网络
 ```
 # 脚本参数
 在config.py中可以同时配置训练参数和评估参数。
 - 配置WideResNet和cifar10数据集。
 ```Python
 "num_classes":10,                # 数据集类数
 "batch_size":32,                 # 输入张量的批次大小
 "epoch_size":300,                # 训练周期大小
 "save_checkpoint_path":"./",     # 检查点相对执行路劲的保存路径
 "repeat_num":1,                  # 数据集重复次数
 "widen_factor":10,               # 网络宽度
 "depth":40,                      # 网络深度
 "lr_init":0.1,                   # 初始学习率
 "weight_decay":5e-4,             # 权重衰减
 "momentum":0.9,                  # 动量优化器
 "loss_scale":32,                 # 损失等级
 "save_checkpoint":True,          # 是否保存检查点
 "save_checkpoint_epochs":5,      # 两个检查点之间的周期间隔；默认情况下，最后一个检查点将在最后一个周期完成后保存
 "keep_checkpoint_max":10,        # 只保存最后一个keep_checkpoint_max检查点
 "use_label_smooth":True,         # 标签平滑
 "label_smooth_factor":0.1,       # 标签平滑因子
 "pretrain_epoch_size":0,         # 预训练周期
 "warmup_epochs":5,               # 热身周期
 ```
 # 训练过程
 ## 用法
 ## Ascend处理器环境运行
 ```Shell
 # 分布式训练
 用法：sh run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH]（可选）
 # 单机训练
 用法：sh run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH]（可选）
 ```
 分布式训练需要提前创建JSON格式的HCCL配置文件。
 具体操作，参见[hccn_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)中的说明。
 训练结果保存在示例路径中，文件夹名称以“train”或“train_parallel”开头。您可在此路径下的日志中找到检查点文件以及结果，如下所示。
 ## 结果
 - 使用cifar10数据集训练WideResNet
 ```text
 # 分布式训练结果（8P）
 epoch: 2 step: 195, loss is 1.4352043
 epoch: 2 step: 195, loss is 1.4611206
 epoch: 2 step: 195, loss is 1.2635705
 epoch: 2 step: 195, loss is 1.3457444
 epoch: 2 step: 195, loss is 1.4664338
 epoch: 2 step: 195, loss is 1.3559061
 epoch: 2 step: 195, loss is 1.5225968
 epoch: 2 step: 195, loss is 1.246567
 epoch: 3 step: 195, loss is 1.0763402
 epoch: 3 step: 195, loss is 1.3007892
 epoch: 3 step: 195, loss is 1.2473519
 epoch: 3 step: 195, loss is 1.3249974
 epoch: 3 step: 195, loss is 1.3388557
 epoch: 3 step: 195, loss is 1.2402486
 epoch: 3 step: 195, loss is 1.2878766
 epoch: 3 step: 195, loss is 1.1507874
 epoch: 4 step: 195, loss is 1.014946
 epoch: 4 step: 195, loss is 1.1934564
 epoch: 4 step: 195, loss is 0.9506259
 epoch: 4 step: 195, loss is 1.2101849
 epoch: 4 step: 195, loss is 1.0160742
 epoch: 4 step: 195, loss is 1.2643425
 epoch: 4 step: 195, loss is 1.3422029
 epoch: 4 step: 195, loss is 1.221174
 ...
 ```
 # 评估过程
 ## 用法
 ### Ascend处理器环境运行
 ```Shell
 # 评估
 Usage: sh run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH]
 ```
 ```Shell
 # 评估示例
 sh  run_eval.sh  /cifar10  WideResNet_best.ckpt
 ```
 训练过程中可以生成检查点。
 ## 结果
 评估结果保存在示例路径中，文件夹名为“eval”。您可在此路径下的日志找到如下结果：
 - 使用cifar10数据集评估WideResNet
 ```text
 result: {'top_1_accuracy': 0.9622395833333334}
 ```
 # 模型描述
 ## 性能
 ### 评估性能
 #### cifar10上的WideResNet
 | 参数 | Ascend 910  |
 |---|---|
 | 模型版本  | WideResNet  |
 | 资源  |  Ascend 910；CPU：2.60GHz，192核；内存：755G |
 | 上传日期  |2021-05-20 ;  |
 | MindSpore版本  | 1.2.1 |
 | 数据集  |  cifar10 |
 | 训练参数  | epoch=300, steps per epoch=195, batch_size = 32  |
 | 优化器  | Momentum  |
 | 损失函数  |Softmax交叉熵  |
 | 输出  | 概率 |
 |  损失 | 0.545541  |
 |速度|65.2毫秒/步（8卡） |
 |总时长   |  70分钟 |
 |参数(M)   | 52.1 |
 |  微调检查点 | 426.49M（.ckpt文件）  |
 | 脚本  | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/wideresnet)  |
 # 随机情况说明
 dataset.py中设置了“create_dataset”函数内的种子，同时还使用了train.py中的随机种子。
 # ModelZoo主页
 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。
--- a/model_zoo/research/cv/wideresnet/eval.py
+++ b/model_zoo/research/cv/wideresnet/eval.py
@ -0,0 +1,83 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 ##############test WideResNet example on cifar10#################
 python eval.py
 """
 import os
 import ast
 import argparse
 from mindspore import context
 from mindspore.train.model import Model
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from src.cross_entropy_smooth import CrossEntropySmooth
 from src.wide_resnet import wideresnet
 from src.dataset import create_dataset
 from src.config import config_WideResnet as cfg
 parser = argparse.ArgumentParser(description='Ascend WideResNet CIFAR10 Eval')
 parser.add_argument('--data_url', required=True, default=None, help='Location of data')
 parser.add_argument('--ckpt_url', type=str, default=None, help='location of ckpt')
 parser.add_argument('--modelart', required=True, type=ast.literal_eval, default=False,
                    help='training on modelart or not, default is False')
 args = parser.parse_args()
 device_id = int(os.getenv('DEVICE_ID'))
 device_num = int(os.getenv('RANK_SIZE'))
 if __name__ == '__main__':
    target = 'Ascend'
    context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False,
                        device_id=int(os.environ["DEVICE_ID"]))
    data_path = '/cache/data_path'
    if args.modelart:
        import moxing as mox
        mox.file.copy_parallel(src_url=args.data_url, dst_url=data_path)
    else:
        data_path = args.data_url
    ds_eval = create_dataset(dataset_path=data_path,
                             do_train=False,
                             repeat_num=cfg.repeat_num,
                             batch_size=cfg.batch_size)
    net = wideresnet()
    ckpt_path = '/cache/ckpt_path/'
    if args.modelart:
        import moxing as mox
        mox.file.copy_parallel(args.ckpt_url, dst_url=ckpt_path)
        param_dict = load_checkpoint('/cache/ckpt_path/WideResNet_best.ckpt')
    else:
        param_dict = load_checkpoint(args.ckpt_url)
    load_param_into_net(net, param_dict)
    net.set_train(False)
    if not cfg.use_label_smooth:
        cfg.label_smooth_factor = 0.0
    loss = CrossEntropySmooth(sparse=True, reduction='mean',
                              smooth_factor=cfg.label_smooth_factor, num_classes=cfg.num_classes)
    model = Model(net, loss_fn=loss, metrics={'top_1_accuracy'})
    output = model.eval(ds_eval)
    print("result:", output)
--- a/model_zoo/research/cv/wideresnet/export.py
+++ b/model_zoo/research/cv/wideresnet/export.py
@ -0,0 +1,62 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 ##############export checkpoint file into air, onnx, mindir models#################
 python export.py
 """
 import os
 import ast
 import argparse
 import numpy as np
 import mindspore.common.dtype as mstype
 from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
 from src.wide_resnet import wideresnet
 parser = argparse.ArgumentParser(description='WideResNet export')
 parser.add_argument("--run_modelart", type=ast.literal_eval, default=False, help="Run on modelArt, default is false.")
 parser.add_argument('--data_url', default=None, help='Directory contains cifar10 dataset.')
 parser.add_argument('--train_url', default=None, help='Directory contains checkpoint file')
 parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file name.")
 parser.add_argument("--batch_size", type=int, default=1, help="batch size")
 parser.add_argument('--file_format', type=str, choices=["AIR", "ONNX", "MINDIR"], default='AIR', help='file format')
 args = parser.parse_args()
 context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")
 context.set_context(device_id=int(os.environ["DEVICE_ID"]))
 if args.run_modelart:
    import moxing as mox
    device_id = int(os.getenv('DEVICE_ID'))
    local_output_url = '/cache/ckpt' + str(device_id)
    mox.file.copy_parallel(src_url=os.path.join(args.train_url, args.ckpt_file),
                           dst_url=os.path.join(local_output_url, args.ckpt_file))
 if __name__ == '__main__':
    net = wideresnet()
    param_dict = load_checkpoint(os.path.join(local_output_url, args.ckpt_file))
    print('load ckpt')
    load_param_into_net(net, param_dict)
    print('load ckpt to net')
    net.set_train(False)
    input_arr = Tensor(np.ones([args.batch_size, 3, 32, 32]), mstype.float32)
    print('input')
    export(net, input_arr, file_name="WideResNet", file_format=args.file_format)
    if args.run_modelart:
        file_name = "WideResNet." + args.file_format.lower()
        mox.file.copy_parallel(src_url=file_name,
                               dst_url=os.path.join(args.train_url, file_name))
--- a/model_zoo/research/cv/wideresnet/scripts/run_distribute_train.sh
+++ b/model_zoo/research/cv/wideresnet/scripts/run_distribute_train.sh
@ -0,0 +1,74 @@
 #!/bin/bash
 # Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==========================================================================
 get_real_path(){
  if [ "${1:0:1}" == "/" ]; then
    echo "$1"
  else
    echo "$(realpath -m $PWD/$1)"
  fi
 }
 PATH1=$(get_real_path $1)
 PATH2=$(get_real_path $2)
 PATH3=$(get_real_path $3)
 PATH4=$4
 echo "$PATH1"
 echo "$PATH2"
 echo "$PATH3"
 echo "$PATH4"
 if [ ! -d $PATH2 ]
 then
    echo "error: DATA_URL=$PATH2 is not a directory"
 exit 1
 fi
 if [ ! -d $PATH3 ]
 then
    echo "error: CKPT_URL=$PATH3 is not a directory"
 exit 1
 fi
 ulimit -u unlimited
 export DEVICE_NUM=8
 export RANK_SIZE=8
 export RANK_TABLE_FILE=$PATH1
 export MINDSPORE_HCCL_CONFIG_PATH=$PATH1
 DATA_URL=$2
 export DATA_URL=${DATA_URL}
 for((i=0;i<${RANK_SIZE};i++))
 do
    rm -rf device$i
    mkdir device$i
    cp ../*.py ./device$i
    cp *.sh ./device$i
    cp -r ../src ./device$i
    cd ./device$i
    export DEVICE_ID=$i
    export RANK_ID=$i
    echo "start training for device $i"
    env > env$i.log
    if [ $# == 4 ]
    then
        python train.py --data_url=$PATH2 --ckpt_url=$PATH3 --modelart=$PATH4 &> train.log &
    fi
    cd ../
 done
--- a/model_zoo/research/cv/wideresnet/scripts/run_eval.sh
+++ b/model_zoo/research/cv/wideresnet/scripts/run_eval.sh
@ -0,0 +1,69 @@
 #!/bin/bash
 # Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 #if [$# != 3]
 #then
  #echo "Usage: bash run_eval.sh [DATA_URL] [CKPT_URL] [MODELART]"
 #exit 1
 #fi
 get_real_path(){
  if [ "${1:0:1}" == "/" ]; then
    echo "$1"
      else
    echo "$(realpath -m $PWD/$1)"
  fi
 }
 PATH1=$(get_real_path $1)
 PATH2=$(get_real_path $2)
 PATH3=$3
 echo "$PATH1"
 echo "$PATH2"
 echo "$PATH3"
 if [ ! -d $PATH1 ]
 then
    echo "error: DATA_URL=$PATH1 is not a directory"
 exit 1
 fi
 if [ ! -f $PATH2 ]
 then
    echo "error: CKPT_URL=$PATH2 is not a file"
 exit 1
 fi
 ulimit -u unlimited
 export DEVICE_NUM=1
 export DEVICE_ID=0
 export RANK_SIZE=$DEVICE_NUM
 export RANK_ID=0
 if [ -d "eval" ];
 then
    rm -rf ./eval
 fi
 mkdir ./eval
 cp ../*.py ./eval
 cp *.sh ./eval
 cp -r ../src ./eval
 cd ./eval
 env > env.log
 echo "start evaluation for device $DEVICE_ID"
 python eval.py --data_url=$PATH1 --ckpt_url=$PATH2 --modelart=$PATH3 &> eval.log &
 cd ..
--- a/model_zoo/research/cv/wideresnet/scripts/run_standalone_train.sh
+++ b/model_zoo/research/cv/wideresnet/scripts/run_standalone_train.sh
@ -0,0 +1,77 @@
 #!/bin/bash
 # Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 if [ $# != 3 ]
 then
  echo "Usage: bash run_standalone_train.sh [DATA_URL] [CKPT_URL] [MODELART]"
 exit 1
 fi
 get_real_path(){
  if [ "${1:0:1}" == "/" ]; then
  echo "$1"
  else
    echo "$(realpath -m $PWD/$1)"
  fi
 }
 PATH1=$(get_real_path $1)
 if [ $# == 3 ]
 then
    PATH2=$(get_real_path $2)
 fi
 if [ ! -d $PATH1 ]
 then
    echo "error: DATA_URL=$PATH1 is not a directory"
 exit 1
 fi
 if [ ! -f $PATH2 ]
 then
    echo "error: CKPT_URL=$PATH2 is not a file"
 exit 1
 fi
 PATH3=$3
 echo "$PATH1"
 echo "$PATH2"
 echo "$PATH3"
 ulimit -u unlimited
 export DEVICE_NUM=1
 export DEVICE_ID=6
 export RANK_SIZE=$DEVICE_NUM
 export RANK_ID=0
 if [ -d "train" ];
 then
    rm -rf ./train
 fi
 mkdir ./train
 cp ../*.py ./train
 cp *.sh ./train
 cp -r ../src ./train
 cd ./train
 echo "start training for device $DEVICE_ID"
 env > env.log
 if [ $# == 3 ]
 then
    python train.py  --data_url=$PATH1 --ckpt_url=$PATH2 --modelart=$PATH3 &> train.log &
 fi
 cd ..
--- a/model_zoo/research/cv/wideresnet/src/config.py
+++ b/model_zoo/research/cv/wideresnet/src/config.py
@ -0,0 +1,45 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """Config parameters for WideResNet models."""
 class Config_WideResNet:
    """
    Config parameters for the WideResNet.
    Examples:
        Config_WideResNet()
    """
    num_classes = 10
    batch_size = 32
    epoch_size = 300
    save_checkpoint_path = "./"
    repeat_num = 1
    widen_factor = 10
    depth = 40
    lr_init = 0.1
    weight_decay = 5e-4
    momentum = 0.9
    loss_scale = 32
    save_checkpoint = True
    save_checkpoint_epochs = 5
    keep_checkpoint_max = 10
    use_label_smooth = True
    label_smooth_factor = 0.1
    pretrain_epoch_size = 0
    warmup_epochs = 5
 config_WideResnet = Config_WideResNet()
--- a/model_zoo/research/cv/wideresnet/src/cross_entropy_smooth.py
+++ b/model_zoo/research/cv/wideresnet/src/cross_entropy_smooth.py
@ -0,0 +1,39 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """define loss function for network"""
 import mindspore.nn as nn
 from mindspore import Tensor
 from mindspore.common import dtype as mstype
 from mindspore.nn.loss.loss import _Loss
 from mindspore.ops import functional as F
 from mindspore.ops import operations as P
 class CrossEntropySmooth(_Loss):
    """CrossEntropy"""
    def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
        super(CrossEntropySmooth, self).__init__()
        self.onehot = P.OneHot()
        self.sparse = sparse
        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
        self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
    def construct(self, logit, label):
        if self.sparse:
            label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
        loss = self.ce(logit, label)
        return loss
--- a/model_zoo/research/cv/wideresnet/src/dataset.py
+++ b/model_zoo/research/cv/wideresnet/src/dataset.py
@ -0,0 +1,78 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 Data operations, will be used in train.py and eval.py
 """
 import os
 import mindspore.common.dtype as mstype
 import mindspore.dataset.engine as de
 import mindspore.dataset.vision.c_transforms as C
 import mindspore.dataset.transforms.c_transforms as C2
 def create_dataset(dataset_path, do_train, repeat_num=1, batch_size=32):
    """
    create a train or evaluate cifar10 dataset for WideResnet
    Args:
        dataset_path(string): the path of dataset.
        do_train(bool): whether dataset is used for train or eval.
        repeat_num(int): the repeat times of dataset. Default: 1
        batch_size(int): the batch size of dataset. Default: 32
    Returns:
        dataset
    """
    device_id = int(os.getenv('DEVICE_ID'))
    device_num = int(os.getenv('RANK_SIZE'))
    if do_train:
        dataset_path = os.path.join(dataset_path, 'train')
    else:
        dataset_path = os.path.join(dataset_path, 'eval')
    if device_num == 1:
        ds = de.Cifar10Dataset(dataset_path)
    else:
        if do_train:
            ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
                                   num_shards=device_num, shard_id=device_id)
        else:
            ds = de.Cifar10Dataset(dataset_path)
    # define map operations
    trans = []
    if do_train:
        trans += [
            C.RandomCrop((32, 32), (4, 4, 4, 4)),
            C.RandomHorizontalFlip(prob=0.5)
        ]
    trans += [
        C.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
        C.HWC2CHW()
    ]
    type_cast_op = C2.TypeCast(mstype.int32)
    ds = ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=8)
    ds = ds.map(operations=trans, input_columns="image", num_parallel_workers=8)
    ds = ds.batch(batch_size, drop_remainder=True)
    ds = ds.repeat(repeat_num)
    return ds
--- a/model_zoo/research/cv/wideresnet/src/generator_lr.py
+++ b/model_zoo/research/cv/wideresnet/src/generator_lr.py
@ -0,0 +1,45 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """generate learning rate"""
 import numpy as np
 def get_lr(total_epochs,
           steps_per_epoch,
           lr_init
           ):
    """
    generate learning rate
    """
    lr_each_step = []
    total_steps = steps_per_epoch * total_epochs
    for i in range(int(total_steps)):
        if i <= int(60 * steps_per_epoch):
            lr = lr_init
        elif i <= int(120 * steps_per_epoch):
            lr = lr_init * 0.1 + 0.01
        elif i <= int(160 * steps_per_epoch):
            lr = lr_init * 0.1 * 0.1 + 0.003
        elif i <= int(200 * steps_per_epoch):
            lr = 0.001
        elif i <= int(240 * steps_per_epoch):
            lr = 0.0008
        elif i <= int(260 * steps_per_epoch):
            lr = 0.0006
        lr_each_step.append(lr)
    lr_each_step = np.array(lr_each_step).astype(np.float32)
    return lr_each_step
--- a/model_zoo/research/cv/wideresnet/src/save_callback.py
+++ b/model_zoo/research/cv/wideresnet/src/save_callback.py
@ -0,0 +1,51 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """save best ckpt"""
 from mindspore.train.serialization import save_checkpoint
 from mindspore.train.callback import Callback
 from src.config import config_WideResnet as cfg
 class SaveCallback(Callback):
    """
    save best ckpt
    """
    def __init__(self, model, eval_dataset, ckpt_path, modelart):
        super(SaveCallback, self).__init__()
        self.model = model
        self.eval_dataset = eval_dataset
        self.cpkt_path = ckpt_path
        self.acc = 0.96
        self.cur_acc = 0.0
        self.modelart = modelart
    def step_end(self, run_context):
        """
        step end
        """
        cb_params = run_context.original_args()
        result = self.model.eval(self.eval_dataset)
        self.cur_acc = result['accuracy']
        print("cur_acc is", self.cur_acc)
        if result['accuracy'] > self.acc:
            self.acc = result['accuracy']
            file_name = "WideResNet_best" + ".ckpt"
            save_checkpoint(save_obj=cb_params.train_network, ckpt_file_name=file_name)
            if self.modelart:
                import moxing as mox
                mox.file.copy_parallel(src_url=cfg.save_checkpoint_path, dst_url=self.cpkt_path)
            print("Save the maximum accuracy checkpoint,the accuracy is", self.acc)
--- a/model_zoo/research/cv/wideresnet/src/wide_resnet.py
+++ b/model_zoo/research/cv/wideresnet/src/wide_resnet.py
@ -0,0 +1,124 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """WideResNet"""
 import mindspore.nn as nn
 import mindspore.ops as ops
 class WideBasic(nn.Cell):
    """
    WideBasic
    """
    def __init__(self, in_channels, out_channels, stride=1):
        super(WideBasic, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.relu = nn.ReLU()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=True)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.dropout = nn.Dropout(keep_prob=0.7)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, has_bias=True)
        self.shortcut = nn.SequentialCell()
        if in_channels != out_channels or stride != 1:
            self.shortcut = nn.SequentialCell(
                [nn.Conv2d(in_channels, out_channels, 1, stride=stride, has_bias=True)]
            )
    def construct(self, x):
        """
        basic construct
        """
        identity = x
        x = self.bn1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.conv1(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv2(x)
        shortcut = self.shortcut(identity)
        return x + shortcut
 class WideResNet(nn.Cell):
    """
    WideReNet
    """
    def __init__(self, num_classes, block, depth=50, widen_factor=1):
        """
        classes, block, depth, widen_factor
        """
        super(WideResNet, self).__init__()
        self.depth = depth
        k = widen_factor
        n = int((depth - 4) / 6)
        self.in_channels = 16
        self.conv1 = nn.Conv2d(3, self.in_channels, 3, 1, padding=0, pad_mode='same')
        self.conv2 = self._make_layer(block, 16 * k, n, 1)
        self.conv3 = self._make_layer(block, 32 * k, n, 2)
        self.conv4 = self._make_layer(block, 64 * k, n, 2)
        self.bn = nn.BatchNorm2d(64 * k, momentum=0.9)
        self.relu = nn.ReLU()
        self.mean = ops.ReduceMean(keep_dims=True)
        self.flatten = nn.Flatten()
        self.linear = nn.Dense(64 * k, num_classes, has_bias=True)
        self.bn1 = nn.BatchNorm2d(16)
    def construct(self, x):
        """
        WideResNet construct
        """
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.bn(x)
        x = self.relu(x)
        x = self.mean(x, (2, 3))
        x = self.flatten(x)
        x = self.linear(x)
        return x
    def _make_layer(self, block, out_channels, num_blocks, stride):
        """
        make layer
        """
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for s in strides:
            layers.append(block(self.in_channels, out_channels, s))
            self.in_channels = out_channels
        return nn.SequentialCell(*layers)
 def wideresnet(depth=40, widen_factor=10):
    net = WideResNet(10, WideBasic, depth=depth, widen_factor=widen_factor)
    return net
--- a/model_zoo/research/cv/wideresnet/train.py
+++ b/model_zoo/research/cv/wideresnet/train.py
@ -0,0 +1,129 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 #################train WideResNet example on cifar10########################
 python train.py
 """
 import ast
 import os
 import argparse
 import numpy as np
 from mindspore.common import set_seed
 from mindspore import context
 from mindspore.communication.management import init
 from mindspore.context import ParallelMode
 from mindspore import Tensor
 from mindspore.nn.optim import Momentum
 from mindspore.train.loss_scale_manager import FixedLossScaleManager
 from mindspore.train.callback import LossMonitor, TimeMonitor
 from mindspore.train.model import Model
 import mindspore.nn as nn
 import mindspore.common.initializer as weight_init
 from src.wide_resnet import wideresnet
 from src.dataset import create_dataset
 from src.config import config_WideResnet as cfg
 from src.generator_lr import get_lr
 from src.cross_entropy_smooth import CrossEntropySmooth
 from src.save_callback import SaveCallback
 set_seed(1)
 if __name__ == '__main__':
    device_id = int(os.getenv('DEVICE_ID'))
    device_num = int(os.getenv('RANK_SIZE'))
    parser = argparse.ArgumentParser(description='Ascend WideResnet+CIFAR10 Training')
    parser.add_argument('--data_url', required=True, default=None, help='Location of data')
    parser.add_argument('--ckpt_url', required=True, default=None, help='Location of ckpt.')
    parser.add_argument('--modelart', required=True, type=ast.literal_eval, default=False,
                        help='training on modelart or not, default is False')
    args = parser.parse_args()
    target = "Ascend"
    context.set_context(mode=context.GRAPH_MODE, device_target=target, save_graphs=False,
                        device_id=device_id)
    if device_num > 1:
        init()
        context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True)
    dataset_sink_mode = True
    if args.modelart:
        import moxing as mox
        data_path = '/cache/data_path'
        mox.file.copy_parallel(src_url=args.data_url, dst_url=data_path)
    else:
        data_path = args.data_url
    ds_train = create_dataset(dataset_path=data_path,
                              do_train=True,
                              batch_size=cfg.batch_size)
    ds_eval = create_dataset(dataset_path=data_path,
                             do_train=False,
                             batch_size=cfg.batch_size)
    step_size = ds_train.get_dataset_size()
    net = wideresnet()
    for _, cell in net.cells_and_names():
        if isinstance(cell, nn.Conv2d):
            cell.weight.set_data(weight_init.initializer(weight_init.XavierUniform(gain=np.sqrt(2)),
                                                         cell.weight.shape,
                                                         cell.weight.dtype))
    loss = CrossEntropySmooth(sparse=True, reduction="mean",
                              smooth_factor=cfg.label_smooth_factor,
                              num_classes=cfg.num_classes)
    loss_scale = FixedLossScaleManager(loss_scale=cfg.loss_scale, drop_overflow_update=False)
    lr = get_lr(total_epochs=cfg.epoch_size, steps_per_epoch=step_size, lr_init=cfg.lr_init)
    lr = Tensor(lr)
    decayed_params = []
    no_decayed_params = []
    for param in net.trainable_params():
        if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name:
            decayed_params.append(param)
        else:
            no_decayed_params.append(param)
    group_params = [{'params': decayed_params, 'weight_decay': cfg.weight_decay},
                    {'params': no_decayed_params},
                    {'order_params': net.trainable_params()}]
    opt = Momentum(group_params,
                   learning_rate=lr,
                   momentum=cfg.momentum,
                   loss_scale=cfg.loss_scale,
                   use_nesterov=True,
                   weight_decay=cfg.weight_decay)
    model = Model(net,
                  amp_level="O2",
                  loss_fn=loss,
                  optimizer=opt,
                  loss_scale_manager=loss_scale,
                  metrics={'accuracy'},
                  keep_batchnorm_fp32=False
                  )
    loss_cb = LossMonitor()
    time_cb = TimeMonitor()
    cb = [loss_cb, time_cb]
    ckpt_path = args.ckpt_url
    cb += [SaveCallback(model, ds_eval, ckpt_path, args.modelart)]
    model.train(epoch=cfg.epoch_size, train_dataset=ds_train, callbacks=cb,
                dataset_sink_mode=dataset_sink_mode)