!1964 add backbone for resnet101

Merge pull request !1964 from meixiaowei/master
2020-06-10 19:43:35 +08:00 · 2020-06-10 19:43:35 +08:00 · 8ab436910f
parent 18e2a0f12e 0fe4f09fb5
commit 8ab436910f
14 changed files with 264 additions and 767 deletions
--- a/example/resnet101_imagenet2012/README.md
+++ b/example/resnet101_imagenet2012/README.md
@ -1,142 +0,0 @@
-# ResNet101 Example
- 
-## Description
- 
-This is an example of training ResNet101 with ImageNet dataset in MindSpore.
-
-## Requirements
-
- Install [MindSpore](https://www.mindspore.cn/install/en).
-
- Download the dataset ImageNet2012.
- 
-> Unzip the ImageNet2012 dataset to any path you want, the folder should include train and eval dataset as follows:
- 
-```
-.
-└─dataset
-    ├─ilsvrc
-    │
-    └─validation_preprocess
-```
-
-## Example structure
- 
-```shell
-.
-├── crossentropy.py                 # CrossEntropy loss function
-├── config.py                       # parameter configuration
-├── dataset.py                      # data preprocessing
-├── eval.py                         # eval net
-├── lr_generator.py                 # generate learning rate
-├── run_distribute_train.sh         # launch distributed training(8p)
-├── run_infer.sh                    # launch evaluating
-├── run_standalone_train.sh         # launch standalone training(1p)
-└── train.py                        # train net
-```
- 
-## Parameter configuration
- 
-Parameters for both training and evaluating can be set in config.py.
- 
-```
-"class_num": 1001,                # dataset class number
-"batch_size": 32,                 # batch size of input tensor
-"loss_scale": 1024,               # loss scale
-"momentum": 0.9,                  # momentum optimizer
-"weight_decay": 1e-4,             # weight decay
-"epoch_size": 120,                # epoch sizes for training
-"pretrain_epoch_size": 0,         # epoch size of pretrain checkpoint
-"buffer_size": 1000,              # number of queue size in data preprocessing
-"image_height": 224,              # image height
-"image_width": 224,               # image width
-"save_checkpoint": True,          # whether save checkpoint or not
-"save_checkpoint_epochs": 1,      # the epoch interval between two checkpoints. By default, the last checkpoint will be saved after the last epoch
-"keep_checkpoint_max": 10,        # only keep the last keep_checkpoint_max checkpoint
-"save_checkpoint_path": "./",     # path to save checkpoint relative to the executed path
-"warmup_epochs": 0,               # number of warmup epoch
-"lr_decay_mode": "cosine"         # decay mode for generating learning rate
-"label_smooth": 1,                # label_smooth
-"label_smooth_factor": 0.1,       # label_smooth_factor
-"lr": 0.1                         # base learning rate
-```
-
-## Running the example
-
-### Train
- 
-#### Usage
-
-```
-# distributed training
-sh run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATASET_PATH] [PRETRAINED_PATH](optional)
- 
-# standalone training
-sh run_standalone_train.sh [DATASET_PATH] [PRETRAINED_PATH](optional)
-```
- 
-#### Launch
- 
-```bash
-# distributed training example(8p)
-sh run_distribute_train.sh rank_table_8p.json dataset/ilsvrc
-
-If you want to load pretrained ckpt file, 
-sh run_distribute_train.sh rank_table_8p.json dataset/ilsvrc ./ckpt/pretrained.ckpt
-
-# standalone training example（1p）
-sh run_standalone_train.sh dataset/ilsvrc
-
-If you want to load pretrained ckpt file,
-sh run_standalone_train.sh dataset/ilsvrc ./ckpt/pretrained.ckpt
-```
- 
-> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).
-
-#### Result
- 
-Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in log.
-
- 
-```
-# distribute training result(8p)
-epoch: 1 step: 5004, loss is 4.805483
-epoch: 2 step: 5004, loss is 3.2121816
-epoch: 3 step: 5004, loss is 3.429647
-epoch: 4 step: 5004, loss is 3.3667371
-epoch: 5 step: 5004, loss is 3.1718972
-...
-epoch: 67 step: 5004, loss is 2.2768745
-epoch: 68 step: 5004, loss is 1.7223864
-epoch: 69 step: 5004, loss is 2.0665488
-epoch: 70 step: 5004, loss is 1.8717369
-...
-```
-
-### Infer
- 
-#### Usage
- 
-```
-# infer
-sh run_infer.sh [VALIDATION_DATASET_PATH] [CHECKPOINT_PATH]
-```
- 
-#### Launch
- 
-```bash
-# infer with checkpoint
-sh run_infer.sh dataset/validation_preprocess/ train_parallel0/resnet-120_5004.ckpt
-
-```
- 
-> checkpoint can be produced in training process.
- 
-
-#### Result
- 
-Inference result will be stored in the example path, whose folder name is "infer". Under this, you can find result like the followings in log.
- 
-```
-result: {'top_5_accuracy': 0.9429417413572343, 'top_1_accuracy': 0.7853513124199744} ckpt=train_parallel0/resnet-120_5004.ckpt
-```
--- a/example/resnet101_imagenet2012/config.py
+++ b/example/resnet101_imagenet2012/config.py
@ -1,40 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""
-network config setting, will be used in train.py and eval.py
-"""
-from easydict import EasyDict as ed
-
-config = ed({
-    "class_num": 1001,
-    "batch_size": 32,
-    "loss_scale": 1024,
-    "momentum": 0.9,
-    "weight_decay": 1e-4,
-    "epoch_size": 120,
-    "pretrain_epoch_size": 0,
-    "buffer_size": 1000,
-    "image_height": 224,
-    "image_width": 224,
-    "save_checkpoint": True,
-    "save_checkpoint_epochs": 5,
-    "keep_checkpoint_max": 10,
-    "save_checkpoint_path": "./",
-    "warmup_epochs": 0,
-    "lr_decay_mode": "cosine",
-    "label_smooth": 1,
-    "label_smooth_factor": 0.1,
-    "lr": 0.1
-})
--- a/example/resnet101_imagenet2012/crossentropy.py
+++ b/example/resnet101_imagenet2012/crossentropy.py
@ -1,36 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""define loss function for network"""
-from mindspore.nn.loss.loss import _Loss
-from mindspore.ops import operations as P
-from mindspore.ops import functional as F
-from mindspore import Tensor
-from mindspore.common import dtype as mstype
-import mindspore.nn as nn
-
-class CrossEntropy(_Loss):
-    """the redefined loss function with SoftmaxCrossEntropyWithLogits"""
-    def __init__(self, smooth_factor=0., num_classes=1001):
-        super(CrossEntropy, self).__init__()
-        self.onehot = P.OneHot()
-        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
-        self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32)
-        self.ce = nn.SoftmaxCrossEntropyWithLogits()
-        self.mean = P.ReduceMean(False)
-    def construct(self, logit, label):
-        one_hot_label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
-        loss = self.ce(logit, one_hot_label)
-        loss = self.mean(loss, 0)
-        return loss
--- a/example/resnet101_imagenet2012/dataset.py
+++ b/example/resnet101_imagenet2012/dataset.py
@ -1,89 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""
-create train or eval dataset.
-"""
-import os
-import mindspore.common.dtype as mstype
-import mindspore.dataset.engine as de
-import mindspore.dataset.transforms.vision.c_transforms as C
-import mindspore.dataset.transforms.c_transforms as C2
-from config import config
-
-def create_dataset(dataset_path, do_train, repeat_num=1, batch_size=32):
-    """
-    create a train or evaluate dataset
-    Args:
-        dataset_path(string): the path of dataset.
-        do_train(bool): whether dataset is used for train or eval.
-        repeat_num(int): the repeat times of dataset. Default: 1
-        batch_size(int): the batch size of dataset. Default: 32
-
-    Returns:
-        dataset
-    """
-    device_num = int(os.getenv("RANK_SIZE"))
-    rank_id = int(os.getenv("RANK_ID"))
-
-    if device_num == 1:
-        ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True)
-    else:
-        ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True,
-                                     num_shards=device_num, shard_id=rank_id)
-    resize_height = 224
-    rescale = 1.0 / 255.0
-    shift = 0.0
-
-    # define map operations
-    decode_op = C.Decode()
-
-    random_resize_crop_op = C.RandomResizedCrop(resize_height, (0.08, 1.0), (0.75, 1.33), max_attempts=100)
-    horizontal_flip_op = C.RandomHorizontalFlip(rank_id / (rank_id + 1))
-    resize_op_256 = C.Resize((256, 256))
-    center_crop = C.CenterCrop(224)
-    rescale_op = C.Rescale(rescale, shift)
-    normalize_op = C.Normalize((0.475, 0.451, 0.392), (0.275, 0.267, 0.278))
-    changeswap_op = C.HWC2CHW()
-
-    trans = []
-    if do_train:
-        trans = [decode_op,
-                 random_resize_crop_op,
-                 horizontal_flip_op,
-                 rescale_op,
-                 normalize_op,
-                 changeswap_op]
-
-    else:
-        trans = [decode_op,
-                 resize_op_256,
-                 center_crop,
-                 rescale_op,
-                 normalize_op,
-                 changeswap_op]
-
-    type_cast_op = C2.TypeCast(mstype.int32)
-
-    ds = ds.map(input_columns="image", operations=trans, num_parallel_workers=8)
-    ds = ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=8)
-
-    # apply shuffle operations
-    ds = ds.shuffle(buffer_size=config.buffer_size)
-    # apply batch operations
-    ds = ds.batch(batch_size, drop_remainder=True)
-    # apply dataset repeat operation
-    ds = ds.repeat(repeat_num)
-
-    return ds
--- a/example/resnet101_imagenet2012/eval.py
+++ b/example/resnet101_imagenet2012/eval.py
@ -1,75 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""
-eval.
-"""
-import os
-import argparse
-import random
-import numpy as np
-from dataset import create_dataset
-from config import config
-from mindspore import context
-from mindspore.model_zoo.resnet import resnet101
-from mindspore.parallel._auto_parallel_context import auto_parallel_context
-from mindspore.train.model import Model, ParallelMode
-from mindspore.train.serialization import load_checkpoint, load_param_into_net
-import mindspore.dataset.engine as de
-from mindspore.communication.management import init
-from crossentropy import CrossEntropy
-
-random.seed(1)
-np.random.seed(1)
-de.config.set_seed(1)
-
-parser = argparse.ArgumentParser(description='Image classification')
-parser.add_argument('--run_distribute', type=bool, default=False, help='Run distribute')
-parser.add_argument('--device_num', type=int, default=1, help='Device num.')
-parser.add_argument('--do_train', type=bool, default=False, help='Do train or not.')
-parser.add_argument('--do_eval', type=bool, default=True, help='Do eval or not.')
-parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
-parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
-args_opt = parser.parse_args()
-
-device_id = int(os.getenv('DEVICE_ID'))
-
-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False, device_id=device_id)
-
-if __name__ == '__main__':
-    if not args_opt.do_eval and args_opt.run_distribute:
-        context.set_auto_parallel_context(device_num=args_opt.device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
-                                          mirror_mean=True, parameter_broadcast=True)
-        auto_parallel_context().set_all_reduce_fusion_split_indices([180, 313])
-        init()
-
-    epoch_size = config.epoch_size
-    net = resnet101(class_num=config.class_num)
-
-    if not config.label_smooth:
-        config.label_smooth_factor = 0.0
-    loss = CrossEntropy(smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
-
-    if args_opt.do_eval:
-        dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=False, batch_size=config.batch_size)
-        step_size = dataset.get_dataset_size()
-
-        if args_opt.checkpoint_path:
-            param_dict = load_checkpoint(args_opt.checkpoint_path)
-            load_param_into_net(net, param_dict)
-        net.set_train(False)
-
-        model = Model(net, loss_fn=loss, metrics={'top_1_accuracy', 'top_5_accuracy'})
-        res = model.eval(dataset)
-        print("result:", res, "ckpt=", args_opt.checkpoint_path)
--- a/example/resnet101_imagenet2012/lr_generator.py
+++ b/example/resnet101_imagenet2012/lr_generator.py
@ -1,56 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""learning rate generator"""
-import math
-import numpy as np
-
-def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
-    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
-    lr = float(init_lr) + lr_inc * current_step
-    return lr
-
-def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0):
-    """
-    generate learning rate array with cosine
-
-    Args:
-       lr(float): base learning rate
-       steps_per_epoch(int): steps size of one epoch
-       warmup_epochs(int): number of warmup epochs
-       max_epoch(int): total epochs of training
-       global_step(int): the current start index of lr array
-    Returns:
-       np.array, learning rate array
-    """
-    base_lr = lr
-    warmup_init_lr = 0
-    total_steps = int(max_epoch * steps_per_epoch)
-    warmup_steps = int(warmup_epochs * steps_per_epoch)
-    decay_steps = total_steps - warmup_steps
-
-    lr_each_step = []
-    for i in range(total_steps):
-        if i < warmup_steps:
-            lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
-        else:
-            linear_decay = (total_steps - i) / decay_steps
-            cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
-            decayed = linear_decay * cosine_decay + 0.00001
-            lr = base_lr * decayed
-        lr_each_step.append(lr)
-
-    lr_each_step = np.array(lr_each_step).astype(np.float32)
-    learning_rate = lr_each_step[global_step:]
-    return learning_rate
--- a/example/resnet101_imagenet2012/run_distribute_train.sh
+++ b/example/resnet101_imagenet2012/run_distribute_train.sh
@ -1,86 +0,0 @@
-#!/bin/bash
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-
-if [ $# != 2 ] && [ $# != 3 ]
-then 
-    echo "Usage: sh run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATASET_PATH] [PRETRAINED_PATH](optional)"
-exit 1
-fi
-
-get_real_path(){
-  if [ "${1:0:1}" == "/" ]; then
-    echo "$1"
-  else
-    echo "$(realpath -m $PWD/$1)"
-  fi
-}
-PATH1=$(get_real_path $1)
-PATH2=$(get_real_path $2)
-echo $PATH1
-echo $PATH2
-if [ $# == 3 ]
-then 
-    PATH3=$(get_real_path $3)
-    echo $PATH3
-fi
-
-if [ ! -f $PATH1 ]
-then 
-    echo "error: MINDSPORE_HCCL_CONFIG_PATH=$PATH1 is not a file"
-exit 1
-fi 
-
-if [ ! -d $PATH2 ]
-then 
-    echo "error: DATASET_PATH=$PATH2 is not a directory"
-exit 1
-fi 
-
-if [ $# == 3 ] && [ ! -f $PATH3 ]
-then
-    echo "error: PRETRAINED_PATH=$PATH3 is not a file"
-exit 1
-fi
-
-ulimit -u unlimited
-export DEVICE_NUM=8
-export RANK_SIZE=8
-export MINDSPORE_HCCL_CONFIG_PATH=$PATH1
-export RANK_TABLE_FILE=$PATH1
-
-for((i=0; i<${DEVICE_NUM}; i++))
-do
-    export DEVICE_ID=$i
-    export RANK_ID=$i
-    rm -rf ./train_parallel$i
-    mkdir ./train_parallel$i
-    cp *.py ./train_parallel$i
-    cp *.sh ./train_parallel$i
-    cd ./train_parallel$i || exit
-    echo "start training for rank $RANK_ID, device $DEVICE_ID"
-    env > env.log
-    if [ $# == 2 ]
-    then	    
-        python train.py --do_train=True --run_distribute=True --device_num=$DEVICE_NUM --dataset_path=$PATH2 &> log &
-    fi
-    
-    if [ $# == 3 ]
-    then
-        python train.py --do_train=True --run_distribute=True --device_num=$DEVICE_NUM --dataset_path=$PATH2 --pre_trained=$PATH3 &> log &
-    fi
-
-    cd ..
-done
--- a/example/resnet101_imagenet2012/run_infer.sh
+++ b/example/resnet101_imagenet2012/run_infer.sh
@ -1,64 +0,0 @@
-#!/bin/bash
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-
-if [ $# != 2 ]
-then 
-    echo "Usage: sh run_infer.sh [DATASET_PATH] [CHECKPOINT_PATH]"
-exit 1
-fi
-
-get_real_path(){
-  if [ "${1:0:1}" == "/" ]; then
-    echo "$1"
-  else
-    echo "$(realpath -m $PWD/$1)"
-  fi
-}
-PATH1=$(get_real_path $1)
-PATH2=$(get_real_path $2)
-echo $PATH1
-echo $PATH2
-
-if [ ! -d $PATH1 ]
-then 
-    echo "error: DATASET_PATH=$PATH1 is not a directory"
-exit 1
-fi 
-
-if [ ! -f $PATH2 ]
-then 
-    echo "error: CHECKPOINT_PATH=$PATH2 is not a file"
-exit 1
-fi 
-
-ulimit -u unlimited
-export DEVICE_NUM=1
-export DEVICE_ID=0
-export RANK_SIZE=$DEVICE_NUM
-export RANK_ID=0
-
-if [ -d "infer" ];
-then
-    rm -rf ./infer
-fi
-mkdir ./infer
-cp *.py ./infer
-cp *.sh ./infer
-cd ./infer || exit
-env > env.log
-echo "start infering for device $DEVICE_ID"
-python eval.py --do_eval=True --dataset_path=$PATH1 --checkpoint_path=$PATH2 &> log &
-cd ..
--- a/example/resnet101_imagenet2012/run_standalone_train.sh
+++ b/example/resnet101_imagenet2012/run_standalone_train.sh
@ -1,75 +0,0 @@
-#!/bin/bash
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-
-if [ $# != 1 ] && [ $# != 2 ]
-then 
-    echo "Usage: sh run_standalone_train.sh [DATASET_PATH] [PRETRAINED_PATH](optional)"
-exit 1
-fi
-
-get_real_path(){
-  if [ "${1:0:1}" == "/" ]; then
-    echo "$1"
-  else
-    echo "$(realpath -m $PWD/$1)"
-  fi
-}
-PATH1=$(get_real_path $1)
-echo $PATH1
-if [ $# == 2 ]
-then
-    PATH2=$(get_real_path $2)
-    echo $PATH2
-fi
-
-if [ ! -d $PATH1 ]
-then 
-    echo "error: DATASET_PATH=$PATH1 is not a directory"
-exit 1
-fi
-
-if [ $# == 2 ] && [ ! -f $PATH2 ]
-then
-    echo "error: PRETRAINED_PATH=$PATH2 is not a file"
-exit 1
-fi
-
-ulimit -u unlimited
-export DEVICE_NUM=1
-export DEVICE_ID=0
-export RANK_ID=0
-export RANK_SIZE=1
-
-if [ -d "train" ];
-then
-    rm -rf ./train
-fi
-mkdir ./train
-cp *.py ./train
-cp *.sh ./train
-cd ./train || exit
-echo "start training for device $DEVICE_ID"
-env > env.log
-if [ $# == 1 ]
-then
-    python train.py --do_train=True --dataset_path=$PATH1 &> log &
-fi
-
-if [ $# == 2 ]
-then
-    python train.py --do_train=True --dataset_path=$PATH1 --pre_trained=$PATH2 &> log &
-fi
-cd ..
--- a/example/resnet101_imagenet2012/train.py
+++ b/example/resnet101_imagenet2012/train.py
@ -1,102 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""train_imagenet."""
-import os
-import argparse
-import random
-import numpy as np
-from dataset import create_dataset
-from lr_generator import warmup_cosine_annealing_lr
-from config import config
-from mindspore import context
-from mindspore import Tensor
-from mindspore.model_zoo.resnet import resnet101
-from mindspore.parallel._auto_parallel_context import auto_parallel_context
-from mindspore.nn.optim.momentum import Momentum
-from mindspore.train.model import Model, ParallelMode
-from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
-from mindspore.train.loss_scale_manager import FixedLossScaleManager
-from mindspore.train.serialization import load_checkpoint, load_param_into_net
-import mindspore.dataset.engine as de
-from mindspore.communication.management import init
-import mindspore.nn as nn
-import mindspore.common.initializer as weight_init
-from crossentropy import CrossEntropy
-
-random.seed(1)
-np.random.seed(1)
-de.config.set_seed(1)
-
-parser = argparse.ArgumentParser(description='Image classification')
-parser.add_argument('--run_distribute', type=bool, default=False, help='Run distribute')
-parser.add_argument('--device_num', type=int, default=1, help='Device num.')
-parser.add_argument('--do_train', type=bool, default=True, help='Do train or not.')
-parser.add_argument('--do_eval', type=bool, default=False, help='Do eval or not.')
-parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
-parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path')
-args_opt = parser.parse_args()
-
-device_id = int(os.getenv('DEVICE_ID'))
-
-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False, device_id=device_id,
-                    enable_auto_mixed_precision=True)
-
-if __name__ == '__main__':
-    if not args_opt.do_eval and args_opt.run_distribute:
-        context.set_auto_parallel_context(device_num=args_opt.device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
-                                          mirror_mean=True, parameter_broadcast=True)
-        auto_parallel_context().set_all_reduce_fusion_split_indices([180, 313])
-        init()
-
-    epoch_size = config.epoch_size
-    net = resnet101(class_num=config.class_num)
-    # weight init
-    for _, cell in net.cells_and_names():
-        if isinstance(cell, nn.Conv2d):
-            cell.weight.default_input = weight_init.initializer(weight_init.XavierUniform(),
-                                                                cell.weight.default_input.shape(),
-                                                                cell.weight.default_input.dtype()).to_tensor()
-        if isinstance(cell, nn.Dense):
-            cell.weight.default_input = weight_init.initializer(weight_init.TruncatedNormal(),
-                                                                cell.weight.default_input.shape(),
-                                                                cell.weight.default_input.dtype()).to_tensor()
-    if not config.label_smooth:
-        config.label_smooth_factor = 0.0
-    loss = CrossEntropy(smooth_factor=config.label_smooth_factor, num_classes=config.class_num)
-    if args_opt.do_train:
-        dataset = create_dataset(dataset_path=args_opt.dataset_path, do_train=True,
-                                 repeat_num=epoch_size, batch_size=config.batch_size)
-        step_size = dataset.get_dataset_size()
-        loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False)
-        if args_opt.pre_trained:
-            param_dict = load_checkpoint(args_opt.pre_trained)
-            load_param_into_net(net, param_dict)
-
-        # learning rate strategy with cosine
-        lr = Tensor(warmup_cosine_annealing_lr(config.lr, step_size, config.warmup_epochs, 120,
-                                               config.pretrain_epoch_size*step_size))
-        opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config.momentum,
-                       config.weight_decay, config.loss_scale)
-        model = Model(net, loss_fn=loss, optimizer=opt, amp_level='O2', keep_batchnorm_fp32=False,
-                      loss_scale_manager=loss_scale, metrics={'acc'})
-        time_cb = TimeMonitor(data_size=step_size)
-        loss_cb = LossMonitor()
-        cb = [time_cb, loss_cb]
-        if config.save_checkpoint:
-            config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_epochs*step_size,
-                                         keep_checkpoint_max=config.keep_checkpoint_max)
-            ckpt_cb = ModelCheckpoint(prefix="resnet", directory=config.save_checkpoint_path, config=config_ck)
-            cb += [ckpt_cb]
-        model.train(epoch_size, dataset, callbacks=cb)
--- a/model_zoo/resnet101/README.md
+++ b/model_zoo/resnet101/README.md
@ -35,6 +35,7 @@ This is an example of training ResNet101 with ImageNet dataset in MindSpore.
    ├─crossentropy.py                 # CrossEntropy loss function
    ├─dataset.py                      # data preprocessin
    ├─lr_generator.py                 # generate learning rate
+    ├─resnet101.py                    # resnet101 backbone
  ├─eval.py                           # eval net
  └─train.py                          # train net
 ```
--- a/model_zoo/resnet101/eval.py
+++ b/model_zoo/resnet101/eval.py
@ -20,12 +20,12 @@ import argparse
 import random
 import numpy as np
 from mindspore import context
-from mindspore.model_zoo.resnet import resnet101
 from mindspore.parallel._auto_parallel_context import auto_parallel_context
 from mindspore.train.model import Model, ParallelMode
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 import mindspore.dataset.engine as de
 from mindspore.communication.management import init
+from src.resnet101 import resnet101
 from src.dataset import create_dataset
 from src.config import config
 from src.crossentropy import CrossEntropy
--- a/model_zoo/resnet101/src/resnet101.py
+++ b/model_zoo/resnet101/src/resnet101.py
@ -0,0 +1,261 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""ResNet101."""
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+
+
+def _weight_variable(shape, factor=0.01):
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+
+def _conv3x3(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 3, 3)
+    weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=3, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv1x1(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 1, 1)
+    weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=1, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv7x7(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 7, 7)
+    weight = _weight_variable(weight_shape)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _bn(channel):
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _bn_last(channel):
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _fc(in_channel, out_channel):
+    weight_shape = (out_channel, in_channel)
+    weight = _weight_variable(weight_shape)
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+
+class ResidualBlock(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channel (int): Input channel.
+        out_channel (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlock(3, 256, stride=2)
+    """
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1):
+        super(ResidualBlock, self).__init__()
+
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1)
+        self.bn1 = _bn(channel)
+
+        self.conv2 = _conv3x3(channel, channel, stride=stride)
+        self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1)
+        self.bn3 = _bn_last(out_channel)
+
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
+                                                        _bn(out_channel)])
+        self.add = P.TensorAdd()
+
+    def construct(self, x):
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+
+class ResNet(nn.Cell):
+    """
+    ResNet architecture.
+
+    Args:
+        block (Cell): Block for network.
+        layer_nums (list): Numbers of block in different layers.
+        in_channels (list): Input channel in each layer.
+        out_channels (list): Output channel in each layer.
+        strides (list):  Stride size in each layer.
+        num_classes (int): The number of classes that the training images are belonging to.
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResNet(ResidualBlock,
+        >>>        [3, 4, 6, 3],
+        >>>        [64, 256, 512, 1024],
+        >>>        [256, 512, 1024, 2048],
+        >>>        [1, 2, 2, 2],
+        >>>        10)
+    """
+
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+
+        self.conv1 = _conv7x7(3, 64, stride=2)
+        self.bn1 = _bn(64)
+        self.relu = P.ReLU()
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0])
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1])
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2])
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3])
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
+        """
+        Make stage network of ResNet.
+
+        Args:
+            block (Cell): Resnet block.
+            layer_num (int): Layer number.
+            in_channel (int): Input channel.
+            out_channel (int): Output channel.
+            stride (int): Stride size for the first convolutional layer.
+
+        Returns:
+            SequentialCell, the output layer.
+
+        Examples:
+            >>> _make_layer(ResidualBlock, 3, 128, 256, 2)
+        """
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride)
+        layers.append(resnet_block)
+
+        for _ in range(1, layer_num):
+            resnet_block = block(out_channel, out_channel, stride=1)
+            layers.append(resnet_block)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return out
+
+def resnet101(class_num=1001):
+    """
+    Get ResNet101 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet101 neural network.
+
+    Examples:
+        >>> net = resnet101(1001)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 23, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
--- a/model_zoo/resnet101/train.py
+++ b/model_zoo/resnet101/train.py
@ -19,7 +19,6 @@ import random
 import numpy as np
 from mindspore import context
 from mindspore import Tensor
-from mindspore.model_zoo.resnet import resnet101
 from mindspore.parallel._auto_parallel_context import auto_parallel_context
 from mindspore.nn.optim.momentum import Momentum
 from mindspore.train.model import Model, ParallelMode
@ -30,6 +29,7 @@ import mindspore.dataset.engine as de
 from mindspore.communication.management import init
 import mindspore.nn as nn
 import mindspore.common.initializer as weight_init
+from src.resnet101 import resnet101
 from src.dataset import create_dataset
 from src.lr_generator import warmup_cosine_annealing_lr
 from src.config import config