diff --git a/model_zoo/densenet121/README.md b/model_zoo/densenet121/README.md
new file mode 100644
index 00000000000..c299db3b8f2
--- /dev/null
+++ b/model_zoo/densenet121/README.md
@@ -0,0 +1,272 @@
+# Contents
+
+- [DenseNet121 Description](#densenet121-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)    
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+        - [Training](#training)
+        - [Distributed Training](#distributed-training)  
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+- [Model Description](#model-description)
+    - [Performance](#performance)  
+        - [Training accuracy results](#training-accuracy-results)
+        - [Training performance results](#yraining-performance-results)
+- [Description of Random Situation](#description-of-random-situation)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+
+# [DenseNet121 Description](#contents)
+
+DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/).
+
+The repository also contains scripts to launch training and inference routines.
+
+# [Model Architecture](#contents)
+
+DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
+
+
+
+# [Dataset](#contents)
+
+Dataset used: ImageNet 
+The default configuration of the Dataset are as follows:
+ - Training Dataset preprocess:
+   - Input size of images is 224\*224
+   - Range (min, max) of respective size of the original size to be cropped is (0.08, 1.0)
+   - Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
+   - Probability of the image being flipped set to 0.5
+   - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
+   - Normalize the input image with respect to mean and standard deviation
+
+ - Test Dataset preprocess:
+   - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
+   - Normalize the input image with respect to mean and standard deviation
+
+
+
+# [Features](#contents)
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. 
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+
+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend）
+  - Prepare hardware environment with Ascend AI processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
+- Framework
+  - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
+  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
+
+
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows: 
+
+  ```python
+  # run training example
+  python train.py --data_dir /PATH/TO/DATASET --is_distributed 0> train.log 2>&1 & 
+  
+  # run distributed training example
+  sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
+  
+  # run evaluation example
+  python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 & 
+  OR
+  sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
+  ```
+
+  For distributed training, a hccl configuration file with JSON format needs to be created in advance.
+
+  Please follow the instructions in the link below:
+
+  https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
+
+
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+```
+├── model_zoo
+    ├── README.md                          // descriptions about all the models
+    ├── densenet121        
+        ├── README.md                    // descriptions about densenet121
+        ├── scripts 
+        │   ├── run_distribute_train.sh             // shell script for distributed on Ascend
+        │   ├── run_distribute_eval.sh              // shell script for evaluation on Ascend
+        ├── src 
+        │   ├── datasets             // dataset processing function
+        │   ├── losses          
+        │       ├──crossentropy.py            // densenet loss function
+        │   ├── lr_scheduler           
+        │       ├──lr_scheduler.py            // densenet learning rate schedule function
+        │   ├── network            
+        │       ├──densenet.py            // densenet architecture
+        │   ├──optimizers            // densenet optimize function
+        │   ├──utils            
+        │       ├──logging.py            // logging function
+        │       ├──var_init.py            // densenet variable init function
+        │   ├── config.py             // network config
+        ├── train.py               // training script 
+        ├── eval.py               //  evaluation script 
+```
+
+## [Script Parameters](#contents)
+
+You can modify the training behaviour through the various flags in the `train.py` script. Flags in the `train.py` script are as follows:
+
+```
+  --data_dir              train data dir
+  --num_classes           num of classes in dataset（default:1000)
+  --image_size            image size of the dataset
+  --per_batch_size        mini-batch size (default: 256) per gpu
+  --pretrained            path of pretrained model
+  --lr_scheduler          type of LR schedule: exponential, cosine_annealing
+  --lr                    initial learning rate
+  --lr_epochs             epoch milestone of lr changing
+  --lr_gamma              decrease lr by a factor of exponential lr_scheduler
+  --eta_min               eta_min in cosine_annealing scheduler
+  --T_max                 T_max in cosine_annealing scheduler
+  --max_epoch             max epoch num to train the model
+  --warmup_epochs         warmup epoch(when batchsize is large)
+  --weight_decay          weight decay (default: 1e-4)
+  --momentum              momentum(default: 0.9)
+  --label_smooth          whether to use label smooth in CE
+  --label_smooth_factor   smooth strength of original one-hot
+  --log_interval          logging interval(dafault:100)
+  --ckpt_path             path to save checkpoint
+  --ckpt_interval         the interval to save checkpoint
+  --is_save_on_master     save checkpoint on master or all rank
+  --is_distributed        if multi device(default: 1)
+  --rank                  local rank of distributed(default: 0)
+  --group_size            world size of distributed(default: 1)
+```
+
+
+
+## [Training Process](#contents)
+
+### Training 
+
+- running on Ascend
+
+  ```
+  python train.py --data_dir /PATH/TO/DATASET --is_distributed 0> train.log 2>&1 & 
+  ```
+  
+  The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
+   
+  ```
+  2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
+  2020-08-22 16:58:56,619:INFO:local passed
+  2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
+  2020-08-22 17:02:19,921:INFO:local passed
+  2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
+  2020-08-22 17:05:43,113:INFO:local passed
+  ...
+  ```
+
+
+
+### Distributed Training
+
+- running on Ascend
+
+  ```
+  sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
+  ```
+  
+  The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `LOG[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
+  
+  ```
+  2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
+  2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
+  2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
+  2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
+  2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
+  2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
+  ...
+  ...
+  ```
+
+
+
+## [Evaluation Process](#contents)
+
+### Evaluation
+
+- evaluation on Ascend
+
+  running the command below for evaluation. 
+  
+  ```
+  python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 & 
+  OR
+  sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
+  ```
+  
+  The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows:
+  
+  ```
+  2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
+  2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
+  ```
+
+
+
+
+# [Model Description](#contents)
+## [Performance](#contents)
+
+### Training accuracy results
+
+| Parameters          | Densenet                    |
+| ------------------- | --------------------------- |
+| Model Version       | Inception V1                |
+| Resource            | Ascend 910                  |
+| Uploaded Date       | 08/28/2020 (month/day/year) |
+| MindSpore Version   | 0.5.0-alpha                 |
+| Dataset             | ImageNet                    |
+| epochs              | 120                         |
+| outputs             | probability                 |
+| train performance   | Top1:75.13%; Top5:92.57%    |
+
+### Training performance results
+
+| Parameters          | Densenet                    |
+| ------------------- | --------------------------- |
+| Model Version       | Inception V1                |
+| Resource            | Ascend 910                  |
+| Uploaded Date       | 08/28/2020 (month/day/year) |
+| MindSpore Version   | 0.5.0-alpha                 |
+| Dataset             | ImageNet                    |
+| batch_size          | 32                          |
+| outputs             | probability                 |
+| speed               | 1pc:760 img/s;8pc:6000 img/s|
+
+
+
+# [Description of Random Situation](#contents)
+
+In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. 
+
+
+# [ModelZoo Homepage](#contents)  
+ Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
diff --git a/model_zoo/densenet121/eval.py b/model_zoo/densenet121/eval.py
new file mode 100644
index 00000000000..cf7fea6c95c
--- /dev/null
+++ b/model_zoo/densenet121/eval.py
@@ -0,0 +1,247 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+##############test densenet example#################
+python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT
+"""
+
+import os
+import argparse
+import datetime
+import glob
+import numpy as np
+from mindspore import context
+
+import mindspore.nn as nn
+from mindspore import Tensor
+from mindspore.communication.management import init, get_rank, get_group_size, release
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common import dtype as mstype
+
+from src.utils.logging import get_logger
+from src.datasets import classification_dataset
+from src.network import DenseNet121
+from src.config import config
+
+devid = int(os.getenv('DEVICE_ID'))
+context.set_context(mode=context.GRAPH_MODE, device_target="Davinci",
+                    save_graphs=True, device_id=devid)
+
+
+class ParameterReduce(nn.Cell):
+    """
+    reduce parameter
+    """
+    def __init__(self):
+        super(ParameterReduce, self).__init__()
+        self.cast = P.Cast()
+        self.reduce = P.AllReduce()
+
+    def construct(self, x):
+        one = self.cast(F.scalar_to_array(1.0), mstype.float32)
+        out = x * one
+        ret = self.reduce(out)
+        return ret
+
+
+def parse_args(cloud_args=None):
+    """
+    parse args
+    """
+    parser = argparse.ArgumentParser('mindspore classification test')
+
+    # dataset related
+    parser.add_argument('--data_dir', type=str, default='', help='eval data dir')
+    parser.add_argument('--num_classes', type=int, default=1000, help='num of classes in dataset')
+    parser.add_argument('--image_size', type=str, default='224,224', help='image size of the dataset')
+    # network related
+    parser.add_argument('--backbone', default='resnet50', help='backbone')
+    parser.add_argument('--pretrained', default='', type=str, help='fully path of pretrained model to load.'
+                                                                   'If it is a direction, it will test all ckpt')
+
+    # logging related
+    parser.add_argument('--log_path', type=str, default='outputs/', help='path to save log')
+    parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')
+    parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
+    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
+
+    # roma obs
+    parser.add_argument('--train_url', type=str, default="", help='train url')
+
+    args, _ = parser.parse_known_args()
+    args = merge_args(args, cloud_args)
+
+    args.per_batch_size = config.per_batch_size
+    args.image_size = list(map(int, args.image_size.split(',')))
+
+    return args
+
+
+def get_top5_acc(top5_arg, gt_class):
+    sub_count = 0
+    for top5, gt in zip(top5_arg, gt_class):
+        if gt in top5:
+            sub_count += 1
+    return sub_count
+
+def merge_args(args, cloud_args):
+    """
+    merge args and cloud_args
+    """
+    args_dict = vars(args)
+    if isinstance(cloud_args, dict):
+        for key in cloud_args.keys():
+            val = cloud_args[key]
+            if key in args_dict and val:
+                arg_type = type(args_dict[key])
+                if arg_type is not type(None):
+                    val = arg_type(val)
+                args_dict[key] = val
+    return args
+
+def test(cloud_args=None):
+    """
+    network eval function. Get top1 and top5 ACC from classification.
+    The result will be save at [./outputs] by default.
+    """
+    args = parse_args(cloud_args)
+
+    # init distributed
+    if args.is_distributed:
+        init()
+        args.rank = get_rank()
+        args.group_size = get_group_size()
+
+    args.outputs_dir = os.path.join(args.log_path,
+                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
+
+    args.logger = get_logger(args.outputs_dir, args.rank)
+    args.logger.save_args(args)
+
+    # network
+    args.logger.important_info('start create network')
+    if os.path.isdir(args.pretrained):
+        models = list(glob.glob(os.path.join(args.pretrained, '*.ckpt')))
+        print(models)
+
+        f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('-')[-1].split('_')[0])
+
+        args.models = sorted(models, key=f)
+    else:
+        args.models = [args.pretrained,]
+
+    for model in args.models:
+        de_dataset = classification_dataset(args.data_dir, image_size=args.image_size,
+                                            per_batch_size=args.per_batch_size,
+                                            max_epoch=1, rank=args.rank, group_size=args.group_size,
+                                            mode='eval')
+        eval_dataloader = de_dataset.create_tuple_iterator()
+        network = DenseNet121(args.num_classes)
+
+        param_dict = load_checkpoint(model)
+        param_dict_new = {}
+        for key, values in param_dict.items():
+            if key.startswith('moments.'):
+                continue
+            elif key.startswith('network.'):
+                param_dict_new[key[8:]] = values
+            else:
+                param_dict_new[key] = values
+            print("key:" + key)
+            print(values.data)
+        load_param_into_net(network, param_dict_new)
+        args.logger.info('load model {} success'.format(model))
+
+         # must add
+        network.add_flags_recursive(fp16=True)
+
+        img_tot = 0
+        top1_correct = 0
+        top5_correct = 0
+        network.set_train(False)
+        for data, gt_classes in eval_dataloader:
+            output = network(Tensor(data, mstype.float32))
+            output = output.asnumpy()
+
+            top1_output = np.argmax(output, (-1))
+            top5_output = np.argsort(output)[:, -5:]
+
+            t1_correct = np.equal(top1_output, gt_classes).sum()
+            top1_correct += t1_correct
+            top5_correct += get_top5_acc(top5_output, gt_classes)
+            img_tot += args.per_batch_size
+
+        results = [[top1_correct], [top5_correct], [img_tot]]
+        args.logger.info('before results={}'.format(results))
+        if args.is_distributed:
+            model_md5 = model.replace('/', '')
+            tmp_dir = '/cache'
+            if not os.path.exists(tmp_dir):
+                os.mkdir(tmp_dir)
+            top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(args.rank, model_md5)
+            top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(args.rank, model_md5)
+            img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(args.rank, model_md5)
+            np.save(top1_correct_npy, top1_correct)
+            np.save(top5_correct_npy, top5_correct)
+            np.save(img_tot_npy, img_tot)
+            while True:
+                rank_ok = True
+                for other_rank in range(args.group_size):
+                    top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5)
+                    top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5)
+                    img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5)
+                    if not os.path.exists(top1_correct_npy) or not os.path.exists(top5_correct_npy) \
+                       or not os.path.exists(img_tot_npy):
+                        rank_ok = False
+                if rank_ok:
+                    break
+
+            top1_correct_all = 0
+            top5_correct_all = 0
+            img_tot_all = 0
+            for other_rank in range(args.group_size):
+                top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5)
+                top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5)
+                img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5)
+                top1_correct_all += np.load(top1_correct_npy)
+                top5_correct_all += np.load(top5_correct_npy)
+                img_tot_all += np.load(img_tot_npy)
+            results = [[top1_correct_all], [top5_correct_all], [img_tot_all]]
+            results = np.array(results)
+
+        else:
+            results = np.array(results)
+
+        args.logger.info('after results={}'.format(results))
+        top1_correct = results[0, 0]
+        top5_correct = results[1, 0]
+        img_tot = results[2, 0]
+        acc1 = 100.0 * top1_correct / img_tot
+        acc5 = 100.0 * top5_correct / img_tot
+        args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct,
+                                                                                             img_tot,
+                                                                                             acc1))
+        args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct,
+                                                                                             img_tot,
+                                                                                             acc5))
+    if args.is_distributed:
+        release()
+
+
+if __name__ == "__main__":
+    test()
diff --git a/model_zoo/densenet121/scripts/run_distribute_eval.sh b/model_zoo/densenet121/scripts/run_distribute_eval.sh
new file mode 100644
index 00000000000..0e9271dbe11
--- /dev/null
+++ b/model_zoo/densenet121/scripts/run_distribute_eval.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the scipt as: "
+echo "sh run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_PATH"
+echo "for example: sh run_distribute_train.sh 8 /data/hccl.json /path/to/dataset /path/to/ckpt"
+echo "It is better to use absolute path."
+echo "================================================================================================================="
+
+echo "After running the scipt, the network runs in the background. The log will be generated in LOGx/log.txt"
+
+export RANK_SIZE=$1
+export RANK_TABLE_FILE=$2
+DATASET=$3
+CKPT_PATH=$4
+
+for((i=0;i<RANK_SIZE;i++))
+do
+    export DEVICE_ID=$i
+    rm -rf LOG$i
+    mkdir ./LOG$i
+    cp ./*.py ./LOG$i
+    cp -r ./src ./LOG$i
+    cd ./LOG$i || exit
+    export RANK_ID=$i
+    echo "start training for rank $i, device $DEVICE_ID"
+    env > env.log
+    python eval.py  \
+    --data_dir=$DATASET  \
+    --pretrained=$CKPT_PATH > log.txt 2>&1 &
+
+    cd ../
+done
+
diff --git a/model_zoo/densenet121/scripts/run_distribute_train.sh b/model_zoo/densenet121/scripts/run_distribute_train.sh
new file mode 100644
index 00000000000..086856d04dc
--- /dev/null
+++ b/model_zoo/densenet121/scripts/run_distribute_train.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the scipt as: "
+echo "sh run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE DATASET"
+echo "for example: sh run_distribute_train.sh 8 /data/hccl.json /path/to/dataset"
+echo "It is better to use absolute path."
+echo "================================================================================================================="
+
+echo "After running the scipt, the network runs in the background. The log will be generated in LOGx/log.txt"
+
+export RANK_SIZE=$1
+export RANK_TABLE_FILE=$2
+DATASET=$3
+
+for((i=0;i<RANK_SIZE;i++))
+do
+    export DEVICE_ID=$i
+    rm -rf LOG$i
+    mkdir ./LOG$i
+    cp ./*.py ./LOG$i
+    cp -r ./src ./LOG$i
+    cd ./LOG$i || exit
+    export RANK_ID=$i
+    echo "start training for rank $i, device $DEVICE_ID"
+    env > env.log
+    python train.py  \
+    --data_dir=$DATASET > log.txt 2>&1 &
+
+    cd ../
+done
diff --git a/model_zoo/densenet121/src/__init__.py b/model_zoo/densenet121/src/__init__.py
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/model_zoo/densenet121/src/config.py b/model_zoo/densenet121/src/config.py
new file mode 100644
index 00000000000..b925ac7d94b
--- /dev/null
+++ b/model_zoo/densenet121/src/config.py
@@ -0,0 +1,46 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""config"""
+from easydict import EasyDict as ed
+
+config = ed({
+    "image_size": '224,224',
+    "num_classes": 1000,
+
+    "lr": 0.1,
+    "lr_scheduler": 'cosine_annealing',
+    "lr_epochs": '30,60,90,120',
+    "lr_gamma": 0.1,
+    "eta_min": 0,
+    "T_max": 120,
+    "max_epoch": 120,
+    "per_batch_size": 32,
+    "warmup_epochs": 0,
+
+    "weight_decay": 0.0001,
+    "momentum": 0.9,
+    "is_dynamic_loss_scale": 0,
+    "loss_scale": 1024,
+    "label_smooth": 0,
+    "label_smooth_factor": 0.1,
+
+    "log_interval": 100,
+    "ckpt_interval": 2000,
+    "ckpt_path": 'outputs/',
+    "is_save_on_master": 1,
+
+    "rank": 0,
+    "group_size": 1
+})
diff --git a/model_zoo/densenet121/src/datasets/__init__.py b/model_zoo/densenet121/src/datasets/__init__.py
new file mode 100644
index 00000000000..a1e6a794227
--- /dev/null
+++ b/model_zoo/densenet121/src/datasets/__init__.py
@@ -0,0 +1,22 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+read dataset for classification
+"""
+
+from .classification import classification_dataset
+
+__all__ = ["classification_dataset"]
diff --git a/model_zoo/densenet121/src/datasets/classification.py b/model_zoo/densenet121/src/datasets/classification.py
new file mode 100644
index 00000000000..6c3a066fc13
--- /dev/null
+++ b/model_zoo/densenet121/src/datasets/classification.py
@@ -0,0 +1,155 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+A function that returns a dataset for classification.
+"""
+
+import os
+from PIL import Image, ImageFile
+from mindspore import dtype as mstype
+import mindspore.dataset as de
+import mindspore.dataset.transforms.vision.c_transforms as vision_C
+import mindspore.dataset.transforms.c_transforms as normal_C
+from src.datasets.sampler import DistributedSampler
+
+ImageFile.LOAD_TRUNCATED_IMAGES = True
+
+class TxtDataset():
+    """
+    read dataset from txt
+    """
+    def __init__(self, root, txt_name):
+        super(TxtDataset, self).__init__()
+        self.imgs = []
+        self.labels = []
+        fin = open(txt_name, "r")
+        for line in fin:
+            img_name, label = line.strip().split(' ')
+            self.imgs.append(os.path.join(root, img_name))
+            self.labels.append(int(label))
+        fin.close()
+
+    def __getitem__(self, index):
+        img = Image.open(self.imgs[index]).convert('RGB')
+        return img, self.labels[index]
+
+    def __len__(self):
+        return len(self.imgs)
+
+
+def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size,
+                           mode='train',
+                           input_mode='folder',
+                           root='',
+                           num_parallel_workers=None,
+                           shuffle=None,
+                           sampler=None,
+                           class_indexing=None,
+                           drop_remainder=True,
+                           transform=None,
+                           target_transform=None):
+    """
+    A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt".
+    If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images
+    are written into a textfile.
+
+    Args:
+        data_dir (str): Path to the root directory that contains the dataset for "input_mode="folder"".
+            Or path of the textfile that contains every image's path of the dataset.
+        image_size (str): Size of the input images.
+        per_batch_size (int): the batch size of evey step during training.
+        max_epoch (int): the number of epochs.
+        rank (int): The shard ID within num_shards (default=None).
+        group_size (int): Number of shards that the dataset should be divided
+            into (default=None).
+        mode (str): "train" or others. Default: " train".
+        input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder".
+        root (str): the images path for "input_mode="txt"". Default: " ".
+        num_parallel_workers (int): Number of workers to read the data. Default: None.
+        shuffle (bool): Whether or not to perform shuffle on the dataset
+            (default=None, performs shuffle).
+        sampler (Sampler): Object used to choose samples from the dataset. Default: None.
+        class_indexing (dict): A str-to-int mapping from folder name to index
+            (default=None, the folder names will be sorted
+            alphabetically and each class will be given a
+            unique index starting from 0).
+
+    Examples:
+        >>> from src.datasets.classification import classification_dataset
+        >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images
+        >>> dataset_dir = "/path/to/imagefolder_directory"
+        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
+        >>>                               per_batch_size=64, max_epoch=100,
+        >>>                               rank=0, group_size=4)
+        >>> # Path of the textfile that contains every image's path of the dataset.
+        >>> dataset_dir = "/path/to/dataset/images/train.txt"
+        >>> images_dir = "/path/to/dataset/images"
+        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
+        >>>                               per_batch_size=64, max_epoch=100,
+        >>>                               rank=0, group_size=4,
+        >>>                               input_mode="txt", root=images_dir)
+    """
+
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    if transform is None:
+        if mode == 'train':
+            transform_img = [
+                vision_C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+                vision_C.RandomHorizontalFlip(prob=0.5),
+                vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4),
+                vision_C.Normalize(mean=mean, std=std),
+                vision_C.HWC2CHW()
+            ]
+        else:
+            transform_img = [
+                vision_C.Decode(),
+                vision_C.Resize((256, 256)),
+                vision_C.CenterCrop(image_size),
+                vision_C.Normalize(mean=mean, std=std),
+                vision_C.HWC2CHW()
+            ]
+    else:
+        transform_img = transform
+
+    if target_transform is None:
+        transform_label = [
+            normal_C.TypeCast(mstype.int32)
+        ]
+    else:
+        transform_label = target_transform
+
+    if input_mode == 'folder':
+        de_dataset = de.ImageFolderDatasetV2(data_dir, num_parallel_workers=num_parallel_workers,
+                                             shuffle=shuffle, sampler=sampler, class_indexing=class_indexing,
+                                             num_shards=group_size, shard_id=rank)
+    else:
+        dataset = TxtDataset(root, data_dir)
+        sampler = DistributedSampler(dataset, rank, group_size, shuffle=shuffle)
+        de_dataset = de.GeneratorDataset(dataset, ["image", "label"], sampler=sampler)
+        de_dataset.set_dataset_size(len(sampler))
+
+    de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img)
+    de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label)
+
+    columns_to_project = ["image", "label"]
+    de_dataset = de_dataset.project(columns=columns_to_project)
+
+    de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder)
+    de_dataset = de_dataset.repeat(max_epoch)
+
+    return de_dataset
diff --git a/model_zoo/densenet121/src/datasets/sampler.py b/model_zoo/densenet121/src/datasets/sampler.py
new file mode 100644
index 00000000000..52c2cbab44f
--- /dev/null
+++ b/model_zoo/densenet121/src/datasets/sampler.py
@@ -0,0 +1,51 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+shuffle and distribute sample
+"""
+
+import math
+import numpy as np
+
+
+class DistributedSampler():
+    """
+    function to distribute and shuffle sample
+    """
+    def __init__(self, dataset, rank, group_size, shuffle=True, seed=0):
+        self.dataset = dataset
+        self.rank = rank
+        self.group_size = group_size
+        self.dataset_length = len(self.dataset)
+        self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size))
+        self.total_size = self.num_samples * self.group_size
+        self.shuffle = shuffle
+        self.seed = seed
+
+    def __iter__(self):
+        if self.shuffle:
+            self.seed = (self.seed + 1) & 0xffffffff
+            np.random.seed(self.seed)
+            indices = np.random.permutation(self.dataset_length).tolist()
+        else:
+            indices = list(range(len(self.dataset_length)))
+
+        indices += indices[:(self.total_size - len(indices))]
+        indices = indices[self.rank::self.group_size]
+        return iter(indices)
+
+    def __len__(self):
+        return self.num_samples
diff --git a/model_zoo/densenet121/src/losses/__init__.py b/model_zoo/densenet121/src/losses/__init__.py
new file mode 100644
index 00000000000..297d16bb4e7
--- /dev/null
+++ b/model_zoo/densenet121/src/losses/__init__.py
@@ -0,0 +1,19 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+loss function
+"""
+
+from .crossentropy import *
diff --git a/model_zoo/densenet121/src/losses/crossentropy.py b/model_zoo/densenet121/src/losses/crossentropy.py
new file mode 100644
index 00000000000..5edf448aea2
--- /dev/null
+++ b/model_zoo/densenet121/src/losses/crossentropy.py
@@ -0,0 +1,44 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+loss function CrossEntropy
+"""
+
+from mindspore.nn.loss.loss import _Loss
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+import mindspore.nn as nn
+
+
+class CrossEntropy(_Loss):
+    """
+    loss function CrossEntropy
+    """
+    def __init__(self, smooth_factor=0., num_classes=1000):
+        super(CrossEntropy, self).__init__()
+        self.onehot = P.OneHot()
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits()
+        self.mean = P.ReduceMean(False)
+
+    def construct(self, logit, label):
+        one_hot_label = self.onehot(label,
+                                    F.shape(logit)[1], self.on_value, self.off_value)
+        loss = self.ce(logit, one_hot_label)
+        loss = self.mean(loss, 0)
+        return loss
diff --git a/model_zoo/densenet121/src/lr_scheduler/__init__.py b/model_zoo/densenet121/src/lr_scheduler/__init__.py
new file mode 100644
index 00000000000..6ee3c24f622
--- /dev/null
+++ b/model_zoo/densenet121/src/lr_scheduler/__init__.py
@@ -0,0 +1,19 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+learning rate scheduler
+"""
+from .lr_scheduler import *
diff --git a/model_zoo/densenet121/src/lr_scheduler/lr_scheduler.py b/model_zoo/densenet121/src/lr_scheduler/lr_scheduler.py
new file mode 100644
index 00000000000..db75492e5d1
--- /dev/null
+++ b/model_zoo/densenet121/src/lr_scheduler/lr_scheduler.py
@@ -0,0 +1,656 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+learning rate scheduler
+"""
+
+import math
+from collections import Counter
+import numpy as np
+
+__all__ = ["LambdaLR", "MultiplicativeLR", "StepLR", "MultiStepLR", "ExponentialLR",
+           "CosineAnnealingLR", "CyclicLR", "CosineAnnealingWarmRestarts", "OneCycleLR"]
+
+class _WarmUp():
+    def __init__(self, warmup_init_lr):
+        self.warmup_init_lr = warmup_init_lr
+
+    def get_lr(self):
+        # Get learning rate during warmup
+        raise NotImplementedError
+
+class _LinearWarmUp(_WarmUp):
+    """
+    linear warmup function
+    """
+    def __init__(self, lr, warmup_epochs, steps_per_epoch, warmup_init_lr=0):
+        self.base_lr = lr
+        self.warmup_init_lr = warmup_init_lr
+        self.warmup_steps = int(warmup_epochs * steps_per_epoch)
+
+        super(_LinearWarmUp, self).__init__(warmup_init_lr)
+
+    def get_warmup_steps(self):
+        return self.warmup_steps
+
+    def get_lr(self, current_step):
+        lr_inc = (float(self.base_lr) - float(self.warmup_init_lr)) / float(self.warmup_steps)
+        lr = float(self.warmup_init_lr) + lr_inc * current_step
+        return lr
+
+class _ConstWarmUp(_WarmUp):
+
+    def get_lr(self):
+        return self.warmup_init_lr
+
+class _LRScheduler():
+
+    def __init__(self, lr, max_epoch, steps_per_epoch):
+        self.base_lr = lr
+        self.steps_per_epoch = steps_per_epoch
+        self.total_steps = int(max_epoch * steps_per_epoch)
+
+    def get_lr(self):
+        # Compute learning rate using chainable form of the scheduler
+        raise NotImplementedError
+
+
+class LambdaLR(_LRScheduler):
+    """Sets the learning rate to the initial lr times a given function.
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        lr_lambda (function or list): A function which computes a multiplicative
+            factor given an integer parameter epoch.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+    Example:
+        >>> # Assuming optimizer has two groups.
+        >>> lambda1 = lambda epoch: epoch // 30
+        >>> scheduler = LambdaLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
+        >>>                      max_epoch=90, warmup_epochs=0)
+        >>> lr = scheduler.get_lr()
+    """
+
+    def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
+        self.lr_lambda = lr_lambda
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(LambdaLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                cur_ep = i // self.steps_per_epoch
+                lr = self.base_lr * self.lr_lambda(cur_ep)
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class MultiplicativeLR(_LRScheduler):
+    """Multiply the learning rate by the factor given
+    in the specified function.
+
+    Args:
+        lr_lambda (function or list): A function which computes a multiplicative
+            factor given an integer parameter epoch,.
+
+    Example:
+        >>> lmbda = lambda epoch: 0.95
+        >>> scheduler = MultiplicativeLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
+        >>>                              max_epoch=90, warmup_epochs=0)
+        >>> lr = scheduler.get_lr()
+    """
+    def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
+        self.lr_lambda = lr_lambda
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(MultiplicativeLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        current_lr = self.base_lr
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                cur_ep = i // self.steps_per_epoch
+                if i % self.steps_per_epoch == 0 and cur_ep > 0:
+                    current_lr = current_lr * self.lr_lambda(cur_ep)
+
+                lr = current_lr
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class StepLR(_LRScheduler):
+    """Decays the learning rate by gamma every epoch_size epochs.
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        epoch_size (int): Period of learning rate decay.
+        gamma (float): Multiplicative factor of learning rate decay.
+            Default: 0.1.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+    Example:
+        >>> # Assuming optimizer uses lr = 0.05 for all groups
+        >>> # lr = 0.05     if epoch < 30
+        >>> # lr = 0.005    if 30 <= epoch < 60
+        >>> # lr = 0.0005   if 60 <= epoch < 90
+        >>> # ...
+        >>> scheduler = StepLR(lr=0.1, epoch_size=30, gamma=0.1, steps_per_epoch=5000,
+        >>>                     max_epoch=90, warmup_epochs=0)
+        >>> lr = scheduler.get_lr()
+    """
+
+    def __init__(self, lr, epoch_size, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
+        self.epoch_size = epoch_size
+        self.gamma = gamma
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(StepLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                cur_ep = i // self.steps_per_epoch
+                lr = self.base_lr * self.gamma**(cur_ep // self.epoch_size)
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class MultiStepLR(_LRScheduler):
+    """Decays the learning rate by gamma once the number of epoch reaches one
+    of the milestones.
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        milestones (list): List of epoch indices. Must be increasing.
+        gamma (float): Multiplicative factor of learning rate decay.
+            Default: 0.1.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+    Example:
+        >>> # Assuming optimizer uses lr = 0.05 for all groups
+        >>> # lr = 0.05     if epoch < 30
+        >>> # lr = 0.005    if 30 <= epoch < 80
+        >>> # lr = 0.0005   if epoch >= 80
+        >>> scheduler = MultiStepLR(lr=0.1, milestones=[30,80], gamma=0.1, steps_per_epoch=5000,
+        >>>                         max_epoch=90, warmup_epochs=0)
+        >>> lr = scheduler.get_lr()
+    """
+
+    def __init__(self, lr, milestones, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
+        self.milestones = Counter(milestones)
+        self.gamma = gamma
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(MultiStepLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        current_lr = self.base_lr
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                cur_ep = i // self.steps_per_epoch
+                if i % self.steps_per_epoch == 0 and cur_ep in self.milestones:
+                    current_lr = current_lr * self.gamma
+                lr = current_lr
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class ExponentialLR(_LRScheduler):
+    """Decays the learning rate of each parameter group by gamma every epoch.
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        gamma (float): Multiplicative factor of learning rate decay.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+    """
+
+    def __init__(self, lr, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
+        self.gamma = gamma
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(ExponentialLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        current_lr = self.base_lr
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                if i % self.steps_per_epoch == 0 and i > 0:
+                    current_lr = current_lr * self.gamma
+                lr = current_lr
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class CosineAnnealingLR(_LRScheduler):
+    r"""Set the learning rate using a cosine annealing schedule, where
+    :math:`\eta_{max}` is set to the initial lr and :math:`T_{cur}` is the
+    number of epochs since the last restart in SGDR:
+
+    .. math::
+        \begin{aligned}
+            \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1
+            + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right),
+            & T_{cur} \neq (2k+1)T_{max}; \\
+            \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min})
+            \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right),
+            & T_{cur} = (2k+1)T_{max}.
+        \end{aligned}
+
+    It has been proposed in
+    `SGDR: Stochastic Gradient Descent with Warm Restarts`_. Note that this only
+    implements the cosine annealing part of SGDR, and not the restarts.
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        T_max (int): Maximum number of iterations.
+        eta_min (float): Minimum learning rate. Default: 0.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+    .. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
+        https://arxiv.org/abs/1608.03983
+    """
+
+    def __init__(self, lr, T_max, steps_per_epoch, max_epoch, warmup_epochs=0, eta_min=0):
+        self.T_max = T_max
+        self.eta_min = eta_min
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(CosineAnnealingLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        current_lr = self.base_lr
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                cur_ep = i // self.steps_per_epoch
+                if i % self.steps_per_epoch == 0 and i > 0:
+                    current_lr = self.eta_min + \
+                                 (self.base_lr - self.eta_min) * (1. + math.cos(math.pi*cur_ep / self.T_max)) / 2
+
+                lr = current_lr
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class CyclicLR(_LRScheduler):
+    r"""Sets the learning rate according to cyclical learning rate policy (CLR).
+    The policy cycles the learning rate between two boundaries with a constant
+    frequency, as detailed in the paper `Cyclical Learning Rates for Training
+    Neural Networks`_. The distance between the two boundaries can be scaled on
+    a per-iteration or per-cycle basis.
+
+    Cyclical learning rate policy changes the learning rate after every batch.
+
+    This class has three built-in policies, as put forth in the paper:
+
+    * "triangular": A basic triangular cycle without amplitude scaling.
+    * "triangular2": A basic triangular cycle that scales initial amplitude by half each cycle.
+    * "exp_range": A cycle that scales initial amplitude by :math:`\text{gamma}^{\text{cycle iterations}}`
+      at each cycle iteration.
+
+    This implementation was adapted from the github repo: `bckenstler/CLR`_
+
+    Args:
+        lr (float): Initial learning rate which is the
+            lower boundary in the cycle.
+        max_lr (float): Upper learning rate boundaries in the cycle.
+            Functionally, it defines the cycle amplitude (max_lr - base_lr).
+            The lr at any cycle is the sum of base_lr and some scaling
+            of the amplitude; therefore max_lr may not actually be reached
+            depending on scaling function.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        step_size_up (int): Number of training iterations in the
+            increasing half of a cycle. Default: 2000
+        step_size_down (int): Number of training iterations in the
+            decreasing half of a cycle. If step_size_down is None,
+            it is set to step_size_up. Default: None
+        mode (str): One of {triangular, triangular2, exp_range}.
+            Values correspond to policies detailed above.
+            If scale_fn is not None, this argument is ignored.
+            Default: 'triangular'
+        gamma (float): Constant in 'exp_range' scaling function:
+            gamma**(cycle iterations)
+            Default: 1.0
+        scale_fn (function): Custom scaling policy defined by a single
+            argument lambda function, where
+            0 <= scale_fn(x) <= 1 for all x >= 0.
+            If specified, then 'mode' is ignored.
+            Default: None
+        scale_mode (str): {'cycle', 'iterations'}.
+            Defines whether scale_fn is evaluated on
+            cycle number or cycle iterations (training
+            iterations since start of cycle).
+            Default: 'cycle'
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+    .. _Cyclical Learning Rates for Training Neural Networks: https://arxiv.org/abs/1506.01186
+    .. _bckenstler/CLR: https://github.com/bckenstler/CLR
+    """
+
+    def __init__(self,
+                 lr,
+                 max_lr,
+                 steps_per_epoch,
+                 max_epoch,
+                 step_size_up=2000,
+                 step_size_down=None,
+                 mode='triangular',
+                 gamma=1.,
+                 scale_fn=None,
+                 scale_mode='cycle',
+                 warmup_epochs=0):
+
+        self.max_lr = max_lr
+
+        step_size_up = float(step_size_up)
+        step_size_down = float(step_size_down) if step_size_down is not None else step_size_up
+        self.total_size = step_size_up + step_size_down
+        self.step_ratio = step_size_up / self.total_size
+
+        if mode not in ['triangular', 'triangular2', 'exp_range'] \
+                and scale_fn is None:
+            raise ValueError('mode is invalid and scale_fn is None')
+
+        self.mode = mode
+        self.gamma = gamma
+
+        if scale_fn is None:
+            if self.mode == 'triangular':
+                self.scale_fn = self._triangular_scale_fn
+                self.scale_mode = 'cycle'
+            elif self.mode == 'triangular2':
+                self.scale_fn = self._triangular2_scale_fn
+                self.scale_mode = 'cycle'
+            elif self.mode == 'exp_range':
+                self.scale_fn = self._exp_range_scale_fn
+                self.scale_mode = 'iterations'
+        else:
+            self.scale_fn = scale_fn
+            self.scale_mode = scale_mode
+
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(CyclicLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def _triangular_scale_fn(self, x):
+        return 1.
+
+    def _triangular2_scale_fn(self, x):
+        return 1 / (2. ** (x - 1))
+
+    def _exp_range_scale_fn(self, x):
+        return self.gamma**(x)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                # Calculates the learning rate at batch index.
+                cycle = math.floor(1 + i / self.total_size)
+                x = 1. + i / self.total_size - cycle
+                if x <= self.step_ratio:
+                    scale_factor = x / self.step_ratio
+                else:
+                    scale_factor = (x - 1) / (self.step_ratio - 1)
+
+                base_height = (self.max_lr - self.base_lr) * scale_factor
+                if self.scale_mode == 'cycle':
+                    lr = self.base_lr + base_height * self.scale_fn(cycle)
+                else:
+                    lr = self.base_lr + base_height * self.scale_fn(i)
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class CosineAnnealingWarmRestarts(_LRScheduler):
+    r"""Set the learning rate using a cosine annealing schedule, where
+    :math:`\eta_{max}` is set to the initial lr, :math:`T_{cur}` is the
+    number of epochs since the last restart and :math:`T_{i}` is the number
+    of epochs between two warm restarts in SGDR:
+
+    .. math::
+        \eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
+        \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)
+
+    When :math:`T_{cur}=T_{i}`, set :math:`\eta_t = \eta_{min}`.
+    When :math:`T_{cur}=0` after restart, set :math:`\eta_t=\eta_{max}`.
+
+    It has been proposed in
+    `SGDR: Stochastic Gradient Descent with Warm Restarts`_.
+
+    Args:
+        lr (float): Initial learning rate.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        T_0 (int): Number of iterations for the first restart.
+        T_mult (int, optional): A factor increases :math:`T_{i}` after a restart. Default: 1.
+        eta_min (float, optional): Minimum learning rate. Default: 0.
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+    .. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
+        https://arxiv.org/abs/1608.03983
+    """
+
+    def __init__(self, lr, steps_per_epoch, max_epoch, T_0, T_mult=1, eta_min=0, warmup_epochs=0):
+        if T_0 <= 0 or not isinstance(T_0, int):
+            raise ValueError("Expected positive integer T_0, but got {}".format(T_0))
+        if T_mult < 1 or not isinstance(T_mult, int):
+            raise ValueError("Expected integer T_mult >= 1, but got {}".format(T_mult))
+        self.T_0 = T_0
+        self.T_i = T_0
+        self.T_mult = T_mult
+        self.eta_min = eta_min
+        self.T_cur = 0
+
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(CosineAnnealingWarmRestarts, self).__init__(lr, max_epoch, steps_per_epoch)
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                if i % self.steps_per_epoch == 0 and i > 0:
+                    self.T_cur += 1
+                    if self.T_cur >= self.T_i:
+                        self.T_cur = self.T_cur - self.T_i
+                        self.T_i = self.T_i * self.T_mult
+
+                lr = self.eta_min + (self.base_lr - self.eta_min) * \
+                            (1 + math.cos(math.pi * self.T_cur / self.T_i)) / 2
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
+
+
+class OneCycleLR(_LRScheduler):
+    r"""Sets the learning rate of each parameter group according to the
+    1cycle learning rate policy. The 1cycle policy anneals the learning
+    rate from an initial learning rate to some maximum learning rate and then
+    from that maximum learning rate to some minimum learning rate much lower
+    than the initial learning rate.
+    This policy was initially described in the paper `Super-Convergence:
+    Very Fast Training of Neural Networks Using Large Learning Rates`_.
+
+    The 1cycle learning rate policy changes the learning rate after every batch.
+    This scheduler is not chainable.
+
+
+    Args:
+        lr (float): Initial learning rate.
+        steps_per_epoch (int): The number of steps per epoch to train for. This is
+            used along with epochs in order to infer the total number of steps in the cycle.
+        max_epoch (int): The number of epochs to train for. This is used along
+            with steps_per_epoch in order to infer the total number of steps in the cycle.
+        pct_start (float): The percentage of the cycle (in number of steps) spent
+            increasing the learning rate.
+            Default: 0.3
+        anneal_strategy (str): {'cos', 'linear'}
+            Specifies the annealing strategy: "cos" for cosine annealing, "linear" for
+            linear annealing.
+            Default: 'cos'
+        div_factor (float): Determines the max learning rate via
+            max_lr = lr * div_factor
+            Default: 25
+        final_div_factor (float): Determines the minimum learning rate via
+            min_lr = lr / final_div_factor
+            Default: 1e4
+        warmup_epochs (int): The number of epochs to Warmup.
+            Default: 0
+
+
+    .. _Super-Convergence\: Very Fast Training of Neural Networks Using Large Learning Rates:
+        https://arxiv.org/abs/1708.07120
+    """
+    def __init__(self,
+                 lr,
+                 steps_per_epoch,
+                 max_epoch,
+                 pct_start=0.3,
+                 anneal_strategy='cos',
+                 div_factor=25.,
+                 final_div_factor=1e4,
+                 warmup_epochs=0):
+
+        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
+        super(OneCycleLR, self).__init__(lr, max_epoch, steps_per_epoch)
+
+        self.step_size_up = float(pct_start * self.total_steps) - 1
+        self.step_size_down = float(self.total_steps - self.step_size_up) - 1
+
+        # Validate pct_start
+        if pct_start < 0 or pct_start > 1 or not isinstance(pct_start, float):
+            raise ValueError("Expected float between 0 and 1 pct_start, but got {}".format(pct_start))
+
+        # Validate anneal_strategy
+        if anneal_strategy not in ['cos', 'linear']:
+            raise ValueError("anneal_strategy must by one of 'cos' or 'linear', instead got {}".format(anneal_strategy))
+        if anneal_strategy == 'cos':
+            self.anneal_func = self._annealing_cos
+        elif anneal_strategy == 'linear':
+            self.anneal_func = self._annealing_linear
+
+        # Initialize learning rate variables
+        self.max_lr = lr * div_factor
+        self.min_lr = lr / final_div_factor
+
+    def _annealing_cos(self, start, end, pct):
+        "Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
+        cos_out = math.cos(math.pi * pct) + 1
+        return end + (start - end) / 2.0 * cos_out
+
+    def _annealing_linear(self, start, end, pct):
+        "Linearly anneal from `start` to `end` as pct goes from 0.0 to 1.0."
+        return (end - start) * pct + start
+
+    def get_lr(self):
+        warmup_steps = self.warmup.get_warmup_steps()
+
+        lr_each_step = []
+        for i in range(self.total_steps):
+            if i < warmup_steps:
+                lr = self.warmup.get_lr(i+1)
+            else:
+                if i <= self.step_size_up:
+                    lr = self.anneal_func(self.base_lr, self.max_lr, i / self.step_size_up)
+
+                else:
+                    down_step_num = i - self.step_size_up
+                    lr = self.anneal_func(self.max_lr, self.min_lr, down_step_num / self.step_size_down)
+
+            lr_each_step.append(lr)
+
+        return np.array(lr_each_step).astype(np.float32)
diff --git a/model_zoo/densenet121/src/network/__init__.py b/model_zoo/densenet121/src/network/__init__.py
new file mode 100644
index 00000000000..bb1727b0635
--- /dev/null
+++ b/model_zoo/densenet121/src/network/__init__.py
@@ -0,0 +1,18 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+densenet network
+"""
+from .densenet import DenseNet121
diff --git a/model_zoo/densenet121/src/network/densenet.py b/model_zoo/densenet121/src/network/densenet.py
new file mode 100644
index 00000000000..182de580197
--- /dev/null
+++ b/model_zoo/densenet121/src/network/densenet.py
@@ -0,0 +1,230 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""
+model architecture of densenet
+"""
+
+import math
+from collections import OrderedDict
+
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common import initializer as init
+from src.utils.var_init import default_recurisive_init, KaimingNormal
+
+__all__ = ["DenseNet121"]
+
+class GlobalAvgPooling(nn.Cell):
+    """
+    GlobalAvgPooling function.
+    """
+    def __init__(self):
+        super(GlobalAvgPooling, self).__init__()
+        self.mean = P.ReduceMean(True)
+        self.shape = P.Shape()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        x = self.mean(x, (2, 3))
+        b, c, _, _ = self.shape(x)
+        x = self.reshape(x, (b, c))
+        return x
+
+class CommonHead(nn.Cell):
+    def __init__(self, num_classes, out_channels):
+        super(CommonHead, self).__init__()
+        self.avgpool = GlobalAvgPooling()
+        self.fc = nn.Dense(out_channels, num_classes, has_bias=True)
+
+    def construct(self, x):
+        x = self.avgpool(x)
+        x = self.fc(x)
+        return x
+
+def conv7x7(in_channels, out_channels, stride=1, padding=3, has_bias=False):
+    return nn.Conv2d(in_channels, out_channels, kernel_size=7, stride=stride, has_bias=has_bias,
+                     padding=padding, pad_mode="pad")
+
+
+def conv3x3(in_channels, out_channels, stride=1, padding=1, has_bias=False):
+    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=has_bias,
+                     padding=padding, pad_mode="pad")
+
+
+def conv1x1(in_channels, out_channels, stride=1, padding=0, has_bias=False):
+    return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, has_bias=has_bias,
+                     padding=padding, pad_mode="pad")
+
+
+class _DenseLayer(nn.Cell):
+    """
+    the dense layer, include 2 conv layer
+    """
+    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
+        super(_DenseLayer, self).__init__()
+        self.norm1 = nn.BatchNorm2d(num_input_features)
+        self.relu1 = nn.ReLU()
+        self.conv1 = conv1x1(num_input_features, bn_size*growth_rate)
+
+        self.norm2 = nn.BatchNorm2d(bn_size*growth_rate)
+        self.relu2 = nn.ReLU()
+        self.conv2 = conv3x3(bn_size*growth_rate, growth_rate)
+
+        # nn.Dropout in MindSpore use keep_prob, diff from Pytorch
+        self.keep_prob = 1 - drop_rate
+        self.dropout = nn.Dropout(keep_prob=self.keep_prob)
+
+    def construct(self, features):
+        bottleneck = self.conv1(self.relu1(self.norm1(features)))
+        new_features = self.conv2(self.relu2(self.norm2(bottleneck)))
+        if self.keep_prob < 1:
+            new_features = self.dropout(new_features)
+        return new_features
+
+class _DenseBlock(nn.Cell):
+    """
+    the dense block
+    """
+    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
+        super(_DenseBlock, self).__init__()
+        self.cell_list = nn.CellList()
+        for i in range(num_layers):
+            layer = _DenseLayer(
+                num_input_features + i * growth_rate,
+                growth_rate=growth_rate,
+                bn_size=bn_size,
+                drop_rate=drop_rate
+            )
+            self.cell_list.append(layer)
+
+        self.concate = P.Concat(axis=1)
+
+    def construct(self, init_features):
+        features = init_features
+        for layer in self.cell_list:
+            new_features = layer(features)
+            features = self.concate((features, new_features))
+        return features
+
+class _Transition(nn.Cell):
+    """
+    the transiton layer
+    """
+    def __init__(self, num_input_features, num_output_features):
+        super(_Transition, self).__init__()
+        self.features = nn.SequentialCell(OrderedDict([
+            ('norm', nn.BatchNorm2d(num_input_features)),
+            ('relu', nn.ReLU()),
+            ('conv', conv1x1(num_input_features, num_output_features)),
+            ('pool', nn.MaxPool2d(kernel_size=2, stride=2))
+        ]))
+
+    def construct(self, x):
+        x = self.features(x)
+        return x
+
+class Densenet(nn.Cell):
+    """
+    the densenet architecture
+    """
+    __constants__ = ['features']
+
+    def __init__(self, growth_rate, block_config, num_init_features, bn_size=4, drop_rate=0):
+        super(Densenet, self).__init__()
+
+        layers = OrderedDict()
+        layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3)
+        layers['norm0'] = nn.BatchNorm2d(num_init_features)
+        layers['relu0'] = nn.ReLU()
+        layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
+
+        # Each denseblock
+        num_features = num_init_features
+        for i, num_layers in enumerate(block_config):
+            block = _DenseBlock(
+                num_layers=num_layers,
+                num_input_features=num_features,
+                bn_size=bn_size,
+                growth_rate=growth_rate,
+                drop_rate=drop_rate
+            )
+            layers['denseblock%d'%(i+1)] = block
+            num_features = num_features + num_layers*growth_rate
+
+            if i != len(block_config)-1:
+                trans = _Transition(num_input_features=num_features,
+                                    num_output_features=num_features // 2)
+                layers['transition%d'%(i+1)] = trans
+                num_features = num_features // 2
+
+        # Final batch norm
+        layers['norm5'] = nn.BatchNorm2d(num_features)
+        layers['relu5'] = nn.ReLU()
+
+        self.features = nn.SequentialCell(layers)
+        self.out_channels = num_features
+
+    def construct(self, x):
+        x = self.features(x)
+        return x
+
+    def get_out_channels(self):
+        return self.out_channels
+
+def _densenet121(**kwargs):
+    return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs)
+
+
+def _densenet161(**kwargs):
+    return Densenet(growth_rate=48, block_config=(6, 12, 36, 24), num_init_features=96, **kwargs)
+
+
+def _densenet169(**kwargs):
+    return Densenet(growth_rate=32, block_config=(6, 12, 32, 32), num_init_features=64, **kwargs)
+
+
+def _densenet201(**kwargs):
+    return Densenet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, **kwargs)
+
+
+
+class DenseNet121(nn.Cell):
+    """
+    the densenet121 architectur
+    """
+    def __init__(self, num_classes):
+        super(DenseNet121, self).__init__()
+        self.backbone = _densenet121()
+        out_channels = self.backbone.get_out_channels()
+        self.head = CommonHead(num_classes, out_channels)
+
+        default_recurisive_init(self)
+        for _, cell in self.cells_and_names():
+            if isinstance(cell, nn.Conv2d):
+                cell.weight.default_input = init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out',
+                                                                           nonlinearity='relu'),
+                                                             cell.weight.default_input.shape,
+                                                             cell.weight.default_input.dtype).to_tensor()
+            elif isinstance(cell, nn.BatchNorm2d):
+                cell.gamma.default_input = init.initializer('ones', cell.gamma.default_input.shape).to_tensor()
+                cell.beta.default_input = init.initializer('zeros', cell.beta.default_input.shape).to_tensor()
+            elif isinstance(cell, nn.Dense):
+                cell.bias.default_input = init.initializer('zeros', cell.bias.default_input.shape).to_tensor()
+
+    def construct(self, x):
+        x = self.backbone(x)
+        x = self.head(x)
+        return x
diff --git a/model_zoo/densenet121/src/optimizers/__init__.py b/model_zoo/densenet121/src/optimizers/__init__.py
new file mode 100644
index 00000000000..32b9242b288
--- /dev/null
+++ b/model_zoo/densenet121/src/optimizers/__init__.py
@@ -0,0 +1,41 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+get parameter function
+"""
+def get_param_groups(network):
+    """
+    get parameter groups
+    """
+    decay_params = []
+    no_decay_params = []
+    for x in network.trainable_params():
+        parameter_name = x.name
+        if parameter_name.endswith('.bias'):
+            # all bias not using weight decay
+            # print('no decay:{}'.format(parameter_name))
+            no_decay_params.append(x)
+        elif parameter_name.endswith('.gamma'):
+            # bn weight bias not using weight decay, be carefully for now x not include BN
+            # print('no decay:{}'.format(parameter_name))
+            no_decay_params.append(x)
+        elif parameter_name.endswith('.beta'):
+            # bn weight bias not using weight decay, be carefully for now x not include BN
+            # print('no decay:{}'.format(parameter_name))
+            no_decay_params.append(x)
+        else:
+            decay_params.append(x)
+
+    return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]
diff --git a/model_zoo/densenet121/src/utils/__init__.py b/model_zoo/densenet121/src/utils/__init__.py
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/model_zoo/densenet121/src/utils/logging.py b/model_zoo/densenet121/src/utils/logging.py
new file mode 100644
index 00000000000..ac37bec4ecc
--- /dev/null
+++ b/model_zoo/densenet121/src/utils/logging.py
@@ -0,0 +1,82 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+get logger.
+"""
+import logging
+import os
+import sys
+from datetime import datetime
+
+class LOGGER(logging.Logger):
+    """
+    set up logging file.
+
+    Args:
+        logger_name (string): logger name.
+        log_dir (string): path of logger.
+
+    Returns:
+        string, logger path
+    """
+    def __init__(self, logger_name, rank=0):
+        super(LOGGER, self).__init__(logger_name)
+        if rank % 8 == 0:
+            console = logging.StreamHandler(sys.stdout)
+            console.setLevel(logging.INFO)
+            formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
+            console.setFormatter(formatter)
+            self.addHandler(console)
+
+    def setup_logging_file(self, log_dir, rank=0):
+        """set up log file"""
+        self.rank = rank
+        if not os.path.exists(log_dir):
+            os.makedirs(log_dir, exist_ok=True)
+        log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank)
+        self.log_fn = os.path.join(log_dir, log_name)
+        fh = logging.FileHandler(self.log_fn)
+        fh.setLevel(logging.INFO)
+        formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
+        fh.setFormatter(formatter)
+        self.addHandler(fh)
+
+    def info(self, msg, *args, **kwargs):
+        if self.isEnabledFor(logging.INFO):
+            self._log(logging.INFO, msg, args, **kwargs)
+
+    def save_args(self, args):
+        self.info('Args:')
+        args_dict = vars(args)
+        for key in args_dict.keys():
+            self.info('--> %s: %s', key, args_dict[key])
+        self.info('')
+
+    def important_info(self, msg, *args, **kwargs):
+        if self.isEnabledFor(logging.INFO) and self.rank == 0:
+            line_width = 2
+            important_msg = '\n'
+            important_msg += ('*'*70 + '\n')*line_width
+            important_msg += ('*'*line_width + '\n')*2
+            important_msg += '*'*line_width + ' '*8 + msg + '\n'
+            important_msg += ('*'*line_width + '\n')*2
+            important_msg += ('*'*70 + '\n')*line_width
+            self.info(important_msg, *args, **kwargs)
+
+
+def get_logger(path, rank):
+    logger = LOGGER("mindversion", rank)
+    logger.setup_logging_file(path, rank)
+    return logger
diff --git a/model_zoo/densenet121/src/utils/var_init.py b/model_zoo/densenet121/src/utils/var_init.py
new file mode 100644
index 00000000000..0512c7d6ae4
--- /dev/null
+++ b/model_zoo/densenet121/src/utils/var_init.py
@@ -0,0 +1,211 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+Initialize.
+"""
+import math
+from functools import reduce
+import numpy as np
+import mindspore.nn as nn
+from mindspore import Tensor
+from mindspore.common import initializer as init
+
+def _calculate_gain(nonlinearity, param=None):
+    r"""
+    Return the recommended gain value for the given nonlinearity function.
+
+    The values are as follows:
+    ================= ====================================================
+    nonlinearity      gain
+    ================= ====================================================
+    Linear / Identity :math:`1`
+    Conv{1,2,3}D      :math:`1`
+    Sigmoid           :math:`1`
+    Tanh              :math:`\frac{5}{3}`
+    ReLU              :math:`\sqrt{2}`
+    Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
+    ================= ====================================================
+
+    Args:
+        nonlinearity: the non-linear function
+        param: optional parameter for the non-linear function
+
+    Examples:
+        >>> gain = calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
+    """
+    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
+        return 1
+    if nonlinearity == 'tanh':
+        return 5.0 / 3
+    if nonlinearity == 'relu':
+        return math.sqrt(2.0)
+    if nonlinearity == 'leaky_relu':
+        if param is None:
+            negative_slope = 0.01
+        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
+            negative_slope = param
+        else:
+            raise ValueError("negative_slope {} not a valid number".format(param))
+        return math.sqrt(2.0 / (1 + negative_slope ** 2))
+
+    raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
+
+def _assignment(arr, num):
+    """Assign the value of `num` to `arr`."""
+    if arr.shape == ():
+        arr = arr.reshape((1))
+        arr[:] = num
+        arr = arr.reshape(())
+    else:
+        if isinstance(num, np.ndarray):
+            arr[:] = num[:]
+        else:
+            arr[:] = num
+    return arr
+
+def _calculate_in_and_out(arr):
+    """
+    Calculate n_in and n_out.
+
+    Args:
+        arr (Array): Input array.
+
+    Returns:
+        Tuple, a tuple with two elements, the first element is `n_in` and the second element is `n_out`.
+    """
+    dim = len(arr.shape)
+    if dim < 2:
+        raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
+
+    n_in = arr.shape[1]
+    n_out = arr.shape[0]
+
+    if dim > 2:
+        counter = reduce(lambda x, y: x * y, arr.shape[2:])
+        n_in *= counter
+        n_out *= counter
+    return n_in, n_out
+
+def _select_fan(array, mode):
+    mode = mode.lower()
+    valid_modes = ['fan_in', 'fan_out']
+    if mode not in valid_modes:
+        raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))
+
+    fan_in, fan_out = _calculate_in_and_out(array)
+    return fan_in if mode == 'fan_in' else fan_out
+
+class KaimingInit(init.Initializer):
+    r"""
+    Base Class. Initialize the array with He kaiming algorithm.
+
+    Args:
+        a: the negative slope of the rectifier used after this layer (only
+            used with ``'leaky_relu'``)
+        mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
+            preserves the magnitude of the variance of the weights in the
+            forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
+            backwards pass.
+        nonlinearity: the non-linear function, recommended to use only with
+            ``'relu'`` or ``'leaky_relu'`` (default).
+    """
+    def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):
+        super(KaimingInit, self).__init__()
+        self.mode = mode
+        self.gain = _calculate_gain(nonlinearity, a)
+
+
+class KaimingUniform(KaimingInit):
+    r"""
+    Initialize the array with He kaiming uniform algorithm. The resulting tensor will
+    have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where
+
+    .. math::
+        \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
+
+    Input:
+        arr (Array): The array to be assigned.
+
+    Returns:
+        Array, assigned array.
+
+    Examples:
+        >>> w = np.empty(3, 5)
+        >>> KaimingUniform(w, mode='fan_in', nonlinearity='relu')
+    """
+
+    def _initialize(self, arr):
+        fan = _select_fan(arr, self.mode)
+        bound = math.sqrt(3.0) * self.gain / math.sqrt(fan)
+        np.random.seed(1)
+        data = np.random.uniform(-bound, bound, arr.shape)
+
+        _assignment(arr, data)
+
+
+class KaimingNormal(KaimingInit):
+    r"""
+    Initialize the array with He kaiming normal algorithm. The resulting tensor will
+    have values sampled from :math:`\mathcal{N}(0, \text{std}^2)` where
+
+    .. math::
+        \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}
+
+    Input:
+        arr (Array): The array to be assigned.
+
+    Returns:
+        Array, assigned array.
+
+    Examples:
+        >>> w = np.empty(3, 5)
+        >>> KaimingNormal(w, mode='fan_out', nonlinearity='relu')
+    """
+
+    def _initialize(self, arr):
+        fan = _select_fan(arr, self.mode)
+        std = self.gain / math.sqrt(fan)
+        np.random.seed(1)
+        data = np.random.normal(0, std, arr.shape)
+
+        _assignment(arr, data)
+
+
+def default_recurisive_init(custom_cell):
+    """default_recurisive_init"""
+    for _, cell in custom_cell.cells_and_names():
+        if isinstance(cell, nn.Conv2d):
+            cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
+                                                         cell.weight.default_input.shape,
+                                                         cell.weight.default_input.dtype).to_tensor()
+            if cell.bias is not None:
+                fan_in, _ = _calculate_in_and_out(cell.weight.default_input.asnumpy())
+                bound = 1 / math.sqrt(fan_in)
+                np.random.seed(1)
+                cell.bias.default_input = Tensor(np.random.uniform(-bound, bound, cell.bias.default_input.shape),
+                                                 cell.bias.default_input.dtype)
+        elif isinstance(cell, nn.Dense):
+            cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
+                                                         cell.weight.default_input.shape,
+                                                         cell.weight.default_input.dtype).to_tensor()
+            if cell.bias is not None:
+                fan_in, _ = _calculate_in_and_out(cell.weight.default_input.asnumpy())
+                bound = 1 / math.sqrt(fan_in)
+                np.random.seed(1)
+                cell.bias.default_input = Tensor(np.random.uniform(-bound, bound, cell.bias.default_input.shape),
+                                                 cell.bias.default_input.dtype)
+        elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)):
+            pass
diff --git a/model_zoo/densenet121/train.py b/model_zoo/densenet121/train.py
new file mode 100644
index 00000000000..25bf0fd42d9
--- /dev/null
+++ b/model_zoo/densenet121/train.py
@@ -0,0 +1,286 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""train launch."""
+import os
+import time
+import argparse
+import datetime
+
+import mindspore.nn as nn
+from mindspore import Tensor, ParallelMode
+from mindspore.nn.optim import Momentum
+from mindspore.communication.management import init, get_rank, get_group_size
+from mindspore.train.callback import ModelCheckpoint
+from mindspore.train.callback import CheckpointConfig, Callback
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.train.model import Model
+from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
+from mindspore import context
+
+from src.optimizers import get_param_groups
+from src.network import DenseNet121
+from src.datasets import classification_dataset
+from src.losses.crossentropy import CrossEntropy
+from src.lr_scheduler import MultiStepLR, CosineAnnealingLR
+from src.utils.logging import get_logger
+from src.config import config
+
+devid = int(os.getenv('DEVICE_ID'))
+context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True,
+                    device_target="Davinci", save_graphs=False, device_id=devid)
+
+class BuildTrainNetwork(nn.Cell):
+    """build training network"""
+    def __init__(self, network, criterion):
+        super(BuildTrainNetwork, self).__init__()
+        self.network = network
+        self.criterion = criterion
+
+    def construct(self, input_data, label):
+        output = self.network(input_data)
+        loss = self.criterion(output, label)
+        return loss
+
+class ProgressMonitor(Callback):
+    """monitor loss and time"""
+    def __init__(self, args):
+        super(ProgressMonitor, self).__init__()
+        self.me_epoch_start_time = 0
+        self.me_epoch_start_step_num = 0
+        self.args = args
+        self.ckpt_history = []
+
+    def begin(self, run_context):
+        self.args.logger.info('start network train...')
+
+    def epoch_begin(self, run_context):
+        pass
+
+    def epoch_end(self, run_context, *me_args):
+        """process epoch end"""
+        cb_params = run_context.original_args()
+        me_step = cb_params.cur_step_num - 1
+
+        real_epoch = me_step // self.args.steps_per_epoch
+        time_used = time.time() - self.me_epoch_start_time
+        fps_mean = self.args.per_batch_size * (me_step-self.me_epoch_start_step_num) * self.args.group_size / time_used
+        self.args.logger.info('epoch[{}], iter[{}], loss:{},'
+                              'mean_fps:{:.2f} imgs/sec'.format(real_epoch, me_step, cb_params.net_outputs, fps_mean))
+        if self.args.rank_save_ckpt_flag:
+            import glob
+            ckpts = glob.glob(os.path.join(self.args.outputs_dir, '*.ckpt'))
+            for ckpt in ckpts:
+                ckpt_fn = os.path.basename(ckpt)
+                if not ckpt_fn.startswith('{}-'.format(self.args.rank)):
+                    continue
+                if ckpt in self.ckpt_history:
+                    continue
+                self.ckpt_history.append(ckpt)
+                self.args.logger.info('epoch[{}], iter[{}], loss:{}, ckpt:{},'
+                                      'ckpt_fn:{}'.format(real_epoch, me_step, cb_params.net_outputs, ckpt, ckpt_fn))
+
+        self.me_epoch_start_step_num = me_step
+        self.me_epoch_start_time = time.time()
+
+    def step_begin(self, run_context):
+        pass
+
+    def step_end(self, run_context, *me_args):
+        pass
+
+    def end(self, run_context):
+        self.args.logger.info('end network train...')
+
+
+def parse_args(cloud_args=None):
+    """parameters"""
+    parser = argparse.ArgumentParser('mindspore classification training')
+
+    # dataset related
+    parser.add_argument('--data_dir', type=str, default='', help='train data dir')
+
+    # network related
+    parser.add_argument('--pretrained', default='', type=str, help='model_path, local pretrained model to load')
+
+    # distributed related
+    parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')
+
+    # roma obs
+    parser.add_argument('--train_url', type=str, default="", help='train url')
+
+    args, _ = parser.parse_known_args()
+    args = merge_args(args, cloud_args)
+    args.image_size = config.image_size
+    args.num_classes = config.num_classes
+    args.lr = config.lr
+    args.lr_scheduler = config.lr_scheduler
+    args.lr_epochs = config.lr_epochs
+    args.lr_gamma = config.lr_gamma
+    args.eta_min = config.eta_min
+    args.T_max = config.T_max
+    args.max_epoch = config.max_epoch
+    args.warmup_epochs = config.warmup_epochs
+    args.weight_decay = config.weight_decay
+    args.momentum = config.momentum
+    args.is_dynamic_loss_scale = config.is_dynamic_loss_scale
+    args.loss_scale = config.loss_scale
+    args.label_smooth = config.label_smooth
+    args.label_smooth_factor = config.label_smooth_factor
+    args.ckpt_interval = config.ckpt_interval
+    args.ckpt_path = config.ckpt_path
+    args.is_save_on_master = config.is_save_on_master
+    args.rank = config.rank
+    args.group_size = config.group_size
+    args.log_interval = config.log_interval
+    args.per_batch_size = config.per_batch_size
+
+    args.lr_epochs = list(map(int, args.lr_epochs.split(',')))
+    args.image_size = list(map(int, args.image_size.split(',')))
+
+    return args
+
+def merge_args(args, cloud_args):
+    """dictionary"""
+    args_dict = vars(args)
+    if isinstance(cloud_args, dict):
+        for key in cloud_args.keys():
+            val = cloud_args[key]
+            if key in args_dict and val:
+                arg_type = type(args_dict[key])
+                if arg_type is not type(None):
+                    val = arg_type(val)
+                args_dict[key] = val
+    return args
+
+def train(cloud_args=None):
+    """training process"""
+    args = parse_args(cloud_args)
+
+    # init distributed
+    if args.is_distributed:
+        init()
+        args.rank = get_rank()
+        args.group_size = get_group_size()
+
+    if args.is_dynamic_loss_scale == 1:
+        args.loss_scale = 1  # for dynamic loss scale can not set loss scale in momentum opt
+
+    # select for master rank save ckpt or all rank save, compatiable for model parallel
+    args.rank_save_ckpt_flag = 0
+    if args.is_save_on_master:
+        if args.rank == 0:
+            args.rank_save_ckpt_flag = 1
+    else:
+        args.rank_save_ckpt_flag = 1
+
+    # logger
+    args.outputs_dir = os.path.join(args.ckpt_path,
+                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
+    args.logger = get_logger(args.outputs_dir, args.rank)
+
+    # dataloader
+    de_dataset = classification_dataset(args.data_dir, args.image_size,
+                                        args.per_batch_size, args.max_epoch,
+                                        args.rank, args.group_size)
+    de_dataset.map_model = 4  # !!!important
+    args.steps_per_epoch = de_dataset.get_dataset_size()
+
+    args.logger.save_args(args)
+
+    # network
+    args.logger.important_info('start create network')
+    # get network and init
+    network = DenseNet121(args.num_classes)
+    # loss
+    if not args.label_smooth:
+        args.label_smooth_factor = 0.0
+    criterion = CrossEntropy(smooth_factor=args.label_smooth_factor,
+                             num_classes=args.num_classes)
+
+    # load pretrain model
+    if os.path.isfile(args.pretrained):
+        param_dict = load_checkpoint(args.pretrained)
+        param_dict_new = {}
+        for key, values in param_dict.items():
+            if key.startswith('moments.'):
+                continue
+            elif key.startswith('network.'):
+                param_dict_new[key[8:]] = values
+            else:
+                param_dict_new[key] = values
+        load_param_into_net(network, param_dict_new)
+        args.logger.info('load model {} success'.format(args.pretrained))
+
+    # lr scheduler
+    if args.lr_scheduler == 'exponential':
+        lr_scheduler = MultiStepLR(args.lr,
+                                   args.lr_epochs,
+                                   args.lr_gamma,
+                                   args.steps_per_epoch,
+                                   args.max_epoch,
+                                   warmup_epochs=args.warmup_epochs)
+    elif args.lr_scheduler == 'cosine_annealing':
+        lr_scheduler = CosineAnnealingLR(args.lr,
+                                         args.T_max,
+                                         args.steps_per_epoch,
+                                         args.max_epoch,
+                                         warmup_epochs=args.warmup_epochs,
+                                         eta_min=args.eta_min)
+    else:
+        raise NotImplementedError(args.lr_scheduler)
+    lr_schedule = lr_scheduler.get_lr()
+
+    # optimizer
+    opt = Momentum(params=get_param_groups(network),
+                   learning_rate=Tensor(lr_schedule),
+                   momentum=args.momentum,
+                   weight_decay=args.weight_decay,
+                   loss_scale=args.loss_scale)
+
+    # mixed precision training
+    criterion.add_flags_recursive(fp32=True)
+
+    # package training process, adjust lr + forward + backward + optimizer
+    train_net = BuildTrainNetwork(network, criterion)
+    if args.is_distributed:
+        parallel_mode = ParallelMode.DATA_PARALLEL
+    else:
+        parallel_mode = ParallelMode.STAND_ALONE
+    if args.is_dynamic_loss_scale == 1:
+        loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000)
+    else:
+        loss_scale_manager = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
+
+    context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=args.group_size,
+                                      parameter_broadcast=True, mirror_mean=True)
+    model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O3")
+
+    # checkpoint save
+    progress_cb = ProgressMonitor(args)
+    callbacks = [progress_cb,]
+    if args.rank_save_ckpt_flag:
+        ckpt_max_num = args.max_epoch * args.steps_per_epoch // args.ckpt_interval
+        ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval,
+                                       keep_checkpoint_max=ckpt_max_num)
+        ckpt_cb = ModelCheckpoint(config=ckpt_config,
+                                  directory=args.outputs_dir,
+                                  prefix='{}'.format(args.rank))
+        callbacks.append(ckpt_cb)
+
+    model.train(args.max_epoch, de_dataset, callbacks=callbacks)
+
+
+if __name__ == "__main__":
+    train()