add mobilenetC2 quant

2020-06-12 10:25:10 +08:00 · 2020-06-12 10:25:10 +08:00 · 60dc921186
parent 0e4fab2368
commit 60dc921186
20 changed files with 1711 additions and 2 deletions
--- a/example/mobilenetv2_imagenet/Readme.md
+++ b/example/mobilenetv2_imagenet/Readme.md
@ -3,13 +3,13 @@

 MobileNetV2 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019.

-[Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for MobileNetV2." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019.
+[Paper](https://arxiv.org/pdf/1801.04381) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for MobileNetV2." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019.

 # Model architecture

 The overall network architecture of MobileNetV2 is show below:

-[Link](https://arxiv.org/pdf/1905.02244)
+[Link](https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html)

 # Dataset

--- a/example/mobilenetv2_imagenet/eval.py
+++ b/example/mobilenetv2_imagenet/eval.py
--- a/example/mobilenetv2_imagenet/scripts/run_infer.sh
+++ b/example/mobilenetv2_imagenet/scripts/run_infer.sh
--- a/example/mobilenetv2_imagenet/scripts/run_train.sh
+++ b/example/mobilenetv2_imagenet/scripts/run_train.sh
--- a/example/mobilenetv2_imagenet/src/config.py
+++ b/example/mobilenetv2_imagenet/src/config.py
--- a/example/mobilenetv2_imagenet/src/dataset.py
+++ b/example/mobilenetv2_imagenet/src/dataset.py
--- a/example/mobilenetv2_imagenet/src/launch.py
+++ b/example/mobilenetv2_imagenet/src/launch.py
--- a/example/mobilenetv2_imagenet/src/lr_generator.py
+++ b/example/mobilenetv2_imagenet/src/lr_generator.py
--- a/example/mobilenetv2_imagenet/train.py
+++ b/example/mobilenetv2_imagenet/train.py
--- a/example/mobilenetv2_quant/Readme.md
+++ b/example/mobilenetv2_quant/Readme.md
@ -0,0 +1,259 @@
+# MobileNetV2 Description
+
+
+MobileNetV2 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019.
+
+[Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for MobileNetV2." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019.
+
+# Model architecture
+
+The overall network architecture of MobileNetV2 is show below:
+
+[Link](https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html)
+
+# Dataset
+
+Dataset used: imagenet
+
+- Dataset size: ~125G, 1.2W colorful images in 1000 classes
+	- Train: 120G, 1.2W images
+	- Test: 5G, 50000 images
+- Data format: RGB images.
+	- Note: Data will be processed in src/dataset.py 
+
+
+# Features
+
+
+# Environment Requirements
+
+- Hardware（Ascend)
+  - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
+- Framework
+  - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/)
+- For more information, please check the resources below：
+  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
+  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
+
+
+# Script description
+
+## Script and sample code
+
+```python
+├── MobileNetV2        
+  ├── Readme.md                      
+  ├── scripts 
+  │   ├──run_train.sh                  
+  │   ├──run_eval.sh                    
+  ├── src                              
+  │   ├──config.py                     
+  │   ├──dataset.py
+  │   ├──luanch.py       
+  │   ├──lr_generator.py                                 
+  │   ├──mobilenetV2.py
+  ├── train.py
+  ├── eval.py
+```
+
+## Training process
+
+### Usage
+
+- Ascend: sh run_train.sh Ascend [DEVICE_NUM] [SERVER_IP(x.x.x.x)] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CKPT_PATH]
+
+
+### Launch
+
+``` 
+# training example
+  Ascend: sh run_train.sh Ascend 8 192.168.0.1 0,1,2,3,4,5,6,7 ~/imagenet/train/
+```
+
+### Result
+
+Training result will be stored in the example path. Checkpoints will be stored at `. /checkpoint` by default, and training log  will be redirected to `./train/train.log` like followings. 
+
+``` 
+epoch: [  0/200], step:[  624/  625], loss:[5.258/5.258], time:[140412.236], lr:[0.100]
+epoch time: 140522.500, per step time: 224.836, avg loss: 5.258
+epoch: [  1/200], step:[  624/  625], loss:[3.917/3.917], time:[138221.250], lr:[0.200]
+epoch time: 138331.250, per step time: 221.330, avg loss: 3.917
+```
+
+## Eval process
+
+### Usage
+
+- Ascend: sh run_infer.sh Ascend [DATASET_PATH] [CHECKPOINT_PATH]
+
+### Launch
+
+``` 
+# infer example
+    Ascend: sh run_infer.sh Ascend ~/imagenet/val/ ~/train/mobilenet-200_625.ckpt
+```
+
+> checkpoint can be produced in training process. 
+
+### Result
+
+Inference result will be stored in the example path, you can find result like the followings in `val.log`. 
+
+``` 
+result: {'acc': 0.71976314102564111} ckpt=/path/to/checkpoint/mobilenet-200_625.ckpt
+```
+
+# Model description
+
+## Performance
+
+### Training Performance
+
+<table>
+<thead>
+<tr>
+<th>Parameters</th>
+<th>MobilenetV2</th>
+<th>MobilenetV2 Quant</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>Resource</td>
+<td>Ascend 910 <br />
+	cpu:2.60GHz 56cores <br />
+	memory:314G</td>
+<td>Ascend 910 <br />
+	cpu:2.60GHz 56cores <br />
+	memory:314G</td>
+</tr>
+<tr>
+<td>uploaded Date</td>
+<td>05/06/2020</td>
+<td>06/12/2020</td>
+</tr>
+<tr>
+<td>MindSpore Version</td>
+<td>0.3.0</td>
+<td>0.3.0</td>
+</tr>
+<tr>
+<td>Dataset</td>
+<td>ImageNet</td>
+<td>ImageNet</td>
+</tr>
+<tr>
+<td>Training Parameters</td>
+<td>src/config.py</td>
+<td>src/config.py</td>
+</tr>
+<tr>
+<td>Optimizer</td>
+<td>Momentum</td>
+<td>Momentum</td>
+</tr>
+<tr>
+<td>Loss Function</td>
+<td>CrossEntropyWithLabelSmooth</td>
+<td>CrossEntropyWithLabelSmooth</td>
+</tr>
+<tr>
+<td>Loss</td>
+<td>200 epoch:1.913</td>
+<td>50 epoch:1.912</td>
+</tr>
+<tr>
+<td>Train Accuracy</td>
+<td>ACC1[77.09%] ACC5[92.57%]</td>
+<td>ACC1[77.09%] ACC5[92.57%]</td>
+</tr>
+<tr>
+<td>Eval Accuracy</td>
+<td>ACC1[77.09%] ACC5[92.57%]</td>
+<td>ACC1[77.09%] ACC5[92.57%]</td>
+</tr>
+<tr>
+<td>Total time</td>
+<td>48h</td>
+<td>12h</td>
+</tr>
+<tr>
+<td>Checkpoint</td>
+<td>/</td>
+<td>mobilenetv2.ckpt</td>
+</tr>
+</tbody>
+</table>
+
+#### Inference Performance
+
+<table>
+<thead>
+<tr>
+<th>Parameters</th>
+<th>Ascend 910</th>
+<th>Ascend 310</th>
+<th>Nvidia V100</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>uploaded Date</td>
+<td>06/12/2020</td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>MindSpore Version</td>
+<td>0.3.0</td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>Dataset</td>
+<td>ImageNet, 1.2W</td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>batch_size</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>outputs</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>Accuracy</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>Speed</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>Total time</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+<tr>
+<td>Model for inference</td>
+<td></td>
+<td></td>
+<td></td>
+</tr>
+</tbody>
+</table>	
+
+# ModelZoo Homepage  
+ [Link](https://gitee.com/mindspore/mindspore/tree/master/mindspore/model_zoo)  
--- a/example/mobilenetv2_quant/eval.py
+++ b/example/mobilenetv2_quant/eval.py
@ -0,0 +1,77 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+eval.
+"""
+import os
+import argparse
+from mindspore import context
+from mindspore import nn
+from mindspore.train.model import Model
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.common import dtype as mstype
+from mindspore.model_zoo.mobilenetV2 import mobilenet_v2
+from src.dataset import create_dataset
+from src.config import config_ascend, config_gpu
+
+
+parser = argparse.ArgumentParser(description='Image classification')
+parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path')
+parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
+parser.add_argument('--platform', type=str, default=None, help='run platform')
+args_opt = parser.parse_args()
+
+
+if __name__ == '__main__':
+    config_platform = None
+    net = None
+    if args_opt.platform == "Ascend":
+        config_platform = config_ascend
+        device_id = int(os.getenv('DEVICE_ID'))
+        context.set_context(mode=context.GRAPH_MODE, device_target="Ascend",
+                            device_id=device_id, save_graphs=False)
+        net = mobilenet_v2(num_classes=config_platform.num_classes, platform="Ascend")
+    elif args_opt.platform == "GPU":
+        config_platform = config_gpu
+        context.set_context(mode=context.GRAPH_MODE,
+                            device_target="GPU", save_graphs=False)
+        net = mobilenet_v2(num_classes=config_platform.num_classes, platform="GPU")
+    else:
+        raise ValueError("Unsupport platform.")
+
+    loss = nn.SoftmaxCrossEntropyWithLogits(
+        is_grad=False, sparse=True, reduction='mean')
+
+    if args_opt.platform == "Ascend":
+        net.to_float(mstype.float16)
+        for _, cell in net.cells_and_names():
+            if isinstance(cell, nn.Dense):
+                cell.to_float(mstype.float32)
+
+    dataset = create_dataset(dataset_path=args_opt.dataset_path,
+                             do_train=False,
+                             config=config_platform,
+                             platform=args_opt.platform,
+                             batch_size=config_platform.batch_size)
+    step_size = dataset.get_dataset_size()
+
+    if args_opt.checkpoint_path:
+        param_dict = load_checkpoint(args_opt.checkpoint_path)
+        load_param_into_net(net, param_dict)
+    net.set_train(False)
+
+    model = Model(net, loss_fn=loss, metrics={'acc'})
+    res = model.eval(dataset)
+    print("result:", res, "ckpt=", args_opt.checkpoint_path)
--- a/example/mobilenetv2_quant/scripts/run_infer.sh
+++ b/example/mobilenetv2_quant/scripts/run_infer.sh
@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 3 ]
+then
+    echo "Ascend: sh run_infer.sh [PLATFORM] [DATASET_PATH] [CHECKPOINT_PATH] \
+          GPU: sh run_infer.sh [PLATFORM] [DATASET_PATH] [CHECKPOINT_PATH]"
+exit 1
+fi
+
+# check dataset path
+if [ ! -d $2 ]
+then
+    echo "error: DATASET_PATH=$2 is not a directory"
+exit 1
+fi
+
+# check checkpoint file
+if [ ! -f $3 ]
+then
+    echo "error: CHECKPOINT_PATH=$3 is not a file"
+exit 1
+fi
+
+# set environment
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+export DEVICE_ID=0
+export RANK_ID=0
+export RANK_SIZE=1
+if [ -d "eval" ];
+then
+    rm -rf ../eval
+fi
+mkdir ../eval
+cd ../eval || exit
+
+# luanch
+python ${BASEPATH}/../eval.py \
+        --platform=$1 \
+        --dataset_path=$2 \
+        --checkpoint_path=$3 \
+        &> ../infer.log &  # dataset val folder path
--- a/example/mobilenetv2_quant/scripts/run_train.sh
+++ b/example/mobilenetv2_quant/scripts/run_train.sh
@ -0,0 +1,97 @@
+#!/usr/bin/env bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+run_ascend()
+{
+    if [ $2 -lt 1 ] && [ $2 -gt 8 ]
+    then
+        echo "error: DEVICE_NUM=$2 is not in (1-8)"
+    exit 1
+    fi
+
+    if [ ! -d $5 ]
+    then
+        echo "error: DATASET_PATH=$5 is not a directory"
+    exit 1
+    fi
+
+    BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+    export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+    if [ -d "train" ];
+    then
+        rm -rf ../train
+    fi
+    mkdir ../train
+    cd ../train || exit
+    python ${BASEPATH}/../src/launch.py \
+            --nproc_per_node=$2 \
+            --visible_devices=$4 \
+            --server_id=$3 \
+            --training_script=${BASEPATH}/../train.py \
+            --dataset_path=$5 \
+            --pre_trained=$6 \
+            --platform=$1 &> ../train.log &  # dataset train folder
+}
+
+run_gpu()
+{
+    if [ $2 -lt 1 ] && [ $2 -gt 8 ]
+    then
+        echo "error: DEVICE_NUM=$2 is not in (1-8)"
+    exit 1
+    fi
+
+    if [ ! -d $4 ]
+    then
+        echo "error: DATASET_PATH=$4 is not a directory"
+    exit 1
+    fi
+
+    BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+    export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+    if [ -d "train" ];
+    then
+        rm -rf ../train
+    fi
+    mkdir ../train
+    cd ../train || exit
+
+    export CUDA_VISIBLE_DEVICES="$3"
+    mpirun -n $2 --allow-run-as-root \
+    python ${BASEPATH}/../train.py \
+        --dataset_path=$4 \
+        --platform=$1 \
+        --pre_trained=$5 \
+        &> ../train.log &  # dataset train folder
+}
+
+if [ $# -gt 6 ] || [ $# -lt 4 ]
+then
+    echo "Usage:\n \
+          Ascend: sh run_train.sh Ascend [DEVICE_NUM] [SERVER_IP(x.x.x.x)] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CKPT_PATH]\n \
+          GPU: sh run_train.sh GPU [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CKPT_PATH]\n \
+          "
+exit 1
+fi
+
+if [ $1 = "Ascend" ] ; then
+    run_ascend "$@"
+elif [ $1 = "GPU" ] ; then
+    run_gpu "$@"
+else
+    echo "not support platform"
+fi;
+
--- a/example/mobilenetv2_quant/src/config.py
+++ b/example/mobilenetv2_quant/src/config.py
@ -0,0 +1,54 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+network config setting, will be used in train.py and eval.py
+"""
+from easydict import EasyDict as ed
+
+config_ascend = ed({
+    "num_classes": 1000,
+    "image_height": 224,
+    "image_width": 224,
+    "batch_size": 256,
+    "epoch_size": 200,
+    "warmup_epochs": 4,
+    "lr": 0.4,
+    "momentum": 0.9,
+    "weight_decay": 4e-5,
+    "label_smooth": 0.1,
+    "loss_scale": 1024,
+    "save_checkpoint": True,
+    "save_checkpoint_epochs": 1,
+    "keep_checkpoint_max": 200,
+    "save_checkpoint_path": "./checkpoint",
+})
+
+config_gpu = ed({
+    "num_classes": 1000,
+    "image_height": 224,
+    "image_width": 224,
+    "batch_size": 64,
+    "epoch_size": 200,
+    "warmup_epochs": 4,
+    "lr": 0.5,
+    "momentum": 0.9,
+    "weight_decay": 4e-5,
+    "label_smooth": 0.1,
+    "loss_scale": 1024,
+    "save_checkpoint": True,
+    "save_checkpoint_epochs": 1,
+    "keep_checkpoint_max": 200,
+    "save_checkpoint_path": "./checkpoint",
+})
--- a/example/mobilenetv2_quant/src/dataset.py
+++ b/example/mobilenetv2_quant/src/dataset.py
@ -0,0 +1,89 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+create train or eval dataset.
+"""
+import os
+import mindspore.common.dtype as mstype
+import mindspore.dataset.engine as de
+import mindspore.dataset.transforms.vision.c_transforms as C
+import mindspore.dataset.transforms.c_transforms as C2
+
+def create_dataset(dataset_path, do_train, config, platform, repeat_num=1, batch_size=32):
+    """
+    create a train or eval dataset
+
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1.
+        batch_size(int): the batch size of dataset. Default: 32.
+
+    Returns:
+        dataset
+    """
+    if platform == "Ascend":
+        rank_size = int(os.getenv("RANK_SIZE"))
+        rank_id = int(os.getenv("RANK_ID"))
+        if rank_size == 1:
+            ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True)
+        else:
+            ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True,
+                                         num_shards=rank_size, shard_id=rank_id)
+    elif platform == "GPU":
+        if do_train:
+            from mindspore.communication.management import get_rank, get_group_size
+            ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True,
+                                         num_shards=get_group_size(), shard_id=get_rank())
+        else:
+            ds = de.ImageFolderDatasetV2(dataset_path, num_parallel_workers=8, shuffle=True)
+    else:
+        raise ValueError("Unsupport platform.")
+
+    resize_height = config.image_height
+    resize_width = config.image_width
+    buffer_size = 1000
+
+    # define map operations
+    decode_op = C.Decode()
+    resize_crop_op = C.RandomCropDecodeResize(resize_height, scale=(0.08, 1.0), ratio=(0.75, 1.333))
+    horizontal_flip_op = C.RandomHorizontalFlip(prob=0.5)
+
+    resize_op = C.Resize((256, 256))
+    center_crop = C.CenterCrop(resize_width)
+    rescale_op = C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4)
+    normalize_op = C.Normalize(mean=[0.485*255, 0.456*255, 0.406*255], std=[0.229*255, 0.224*255, 0.225*255])
+    change_swap_op = C.HWC2CHW()
+
+    if do_train:
+        trans = [resize_crop_op, horizontal_flip_op, rescale_op, normalize_op, change_swap_op]
+    else:
+        trans = [decode_op, resize_op, center_crop, normalize_op, change_swap_op]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    ds = ds.map(input_columns="image", operations=trans, num_parallel_workers=8)
+    ds = ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=8)
+
+    # apply shuffle operations
+    ds = ds.shuffle(buffer_size=buffer_size)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    ds = ds.repeat(repeat_num)
+
+    return ds
--- a/example/mobilenetv2_quant/src/launch.py
+++ b/example/mobilenetv2_quant/src/launch.py
@ -0,0 +1,163 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""launch train script"""
+import os
+import sys
+import json
+import subprocess
+import shutil
+from argparse import ArgumentParser
+
+def parse_args():
+    """
+    parse args .
+
+    Args:
+
+    Returns:
+        args.
+
+    Examples:
+        >>> parse_args()
+    """
+    parser = ArgumentParser(description="mindspore distributed training launch "
+                                        "helper utilty that will spawn up "
+                                        "multiple distributed processes")
+    parser.add_argument("--nproc_per_node", type=int, default=1,
+                        help="The number of processes to launch on each node, "
+                             "for D training, this is recommended to be set "
+                             "to the number of D in your system so that "
+                             "each process can be bound to a single D.")
+    parser.add_argument("--visible_devices", type=str, default="0,1,2,3,4,5,6,7",
+                        help="will use the visible devices sequentially")
+    parser.add_argument("--server_id", type=str, default="",
+                        help="server ip")
+    parser.add_argument("--training_script", type=str,
+                        help="The full path to the single D training "
+                             "program/script to be launched in parallel, "
+                             "followed by all the arguments for the "
+                             "training script")
+    # rest from the training program
+    args, unknown = parser.parse_known_args()
+    args.training_script_args = unknown
+    return args
+
+
+def main():
+    print("start", __file__)
+    args = parse_args()
+    print(args)
+    visible_devices = args.visible_devices.split(',')
+    assert os.path.isfile(args.training_script)
+    assert len(visible_devices) >= args.nproc_per_node
+    print('visible_devices:{}'.format(visible_devices))
+    if not args.server_id:
+        print('pleaser input server ip!!!')
+        exit(0)
+    print('server_id:{}'.format(args.server_id))
+
+    # construct hccn_table
+    hccn_configs = open('/etc/hccn.conf', 'r').readlines()
+    device_ips = {}
+    for hccn_item in hccn_configs:
+        hccn_item = hccn_item.strip()
+        if hccn_item.startswith('address_'):
+            device_id, device_ip = hccn_item.split('=')
+            device_id = device_id.split('_')[1]
+            device_ips[device_id] = device_ip
+            print('device_id:{}, device_ip:{}'.format(device_id, device_ip))
+    hccn_table = {}
+    hccn_table['board_id'] = '0x0000'
+    hccn_table['chip_info'] = '910'
+    hccn_table['deploy_mode'] = 'lab'
+    hccn_table['group_count'] = '1'
+    hccn_table['group_list'] = []
+    instance_list = []
+    usable_dev = ''
+    for instance_id in range(args.nproc_per_node):
+        instance = {}
+        instance['devices'] = []
+        device_id = visible_devices[instance_id]
+        device_ip = device_ips[device_id]
+        usable_dev += str(device_id)
+        instance['devices'].append({
+            'device_id': device_id,
+            'device_ip': device_ip,
+        })
+        instance['rank_id'] = str(instance_id)
+        instance['server_id'] = args.server_id
+        instance_list.append(instance)
+    hccn_table['group_list'].append({
+        'device_num': str(args.nproc_per_node),
+        'server_num': '1',
+        'group_name': '',
+        'instance_count': str(args.nproc_per_node),
+        'instance_list': instance_list,
+    })
+    hccn_table['para_plane_nic_location'] = 'device'
+    hccn_table['para_plane_nic_name'] = []
+    for instance_id in range(args.nproc_per_node):
+        eth_id = visible_devices[instance_id]
+        hccn_table['para_plane_nic_name'].append('eth{}'.format(eth_id))
+    hccn_table['para_plane_nic_num'] = str(args.nproc_per_node)
+    hccn_table['status'] = 'completed'
+
+    # save hccn_table to file
+    table_path = os.getcwd()
+    if not os.path.exists(table_path):
+        os.mkdir(table_path)
+    table_fn = os.path.join(table_path,
+                            'rank_table_{}p_{}_{}.json'.format(args.nproc_per_node, usable_dev, args.server_id))
+    with open(table_fn, 'w') as table_fp:
+        json.dump(hccn_table, table_fp, indent=4)
+    sys.stdout.flush()
+
+    # spawn the processes
+    processes = []
+    cmds = []
+    log_files = []
+    env = os.environ.copy()
+    env['RANK_SIZE'] = str(args.nproc_per_node)
+    cur_path = os.getcwd()
+    for rank_id in range(0, args.nproc_per_node):
+        os.chdir(cur_path)
+        device_id = visible_devices[rank_id]
+        device_dir = os.path.join(cur_path, 'device{}'.format(rank_id))
+        env['RANK_ID'] = str(rank_id)
+        env['DEVICE_ID'] = str(device_id)
+        if args.nproc_per_node > 1:
+            env['MINDSPORE_HCCL_CONFIG_PATH'] = table_fn
+            env['RANK_TABLE_FILE'] = table_fn
+        if os.path.exists(device_dir):
+            shutil.rmtree(device_dir)
+        os.mkdir(device_dir)
+        os.chdir(device_dir)
+        cmd = [sys.executable, '-u']
+        cmd.append(args.training_script)
+        cmd.extend(args.training_script_args)
+        log_file = open('{dir}/log{id}.log'.format(dir=device_dir, id=rank_id), 'w')
+        process = subprocess.Popen(cmd, stdout=log_file, stderr=log_file, env=env)
+        processes.append(process)
+        cmds.append(cmd)
+        log_files.append(log_file)
+    for process, cmd, log_file in zip(processes, cmds, log_files):
+        process.wait()
+        if process.returncode != 0:
+            raise subprocess.CalledProcessError(returncode=process, cmd=cmd)
+        log_file.close()
+
+
+if __name__ == "__main__":
+    main()
--- a/example/mobilenetv2_quant/src/lr_generator.py
+++ b/example/mobilenetv2_quant/src/lr_generator.py
@ -0,0 +1,54 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""learning rate generator"""
+import math
+import numpy as np
+
+
+def get_lr(global_step, lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch):
+    """
+    generate learning rate array
+
+    Args:
+       global_step(int): total steps of the training
+       lr_init(float): init learning rate
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate
+       warmup_epochs(int): number of warmup epochs
+       total_epochs(int): total epoch of training
+       steps_per_epoch(int): steps of one epoch
+
+    Returns:
+       np.array, learning rate array
+    """
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    warmup_steps = steps_per_epoch * warmup_epochs
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+        else:
+            lr = lr_end + \
+                 (lr_max - lr_end) * \
+                 (1. + math.cos(math.pi * (i - warmup_steps) / (total_steps - warmup_steps))) / 2.
+        if lr < 0.0:
+            lr = 0.0
+        lr_each_step.append(lr)
+
+    current_step = global_step
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[current_step:]
+
+    return learning_rate
--- a/example/mobilenetv2_quant/src/mobilenetV2.py
+++ b/example/mobilenetv2_quant/src/mobilenetV2.py
@ -0,0 +1,291 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""MobileNetV2 model define"""
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.ops.operations import TensorAdd
+from mindspore import Parameter, Tensor
+from mindspore.common.initializer import initializer
+
+__all__ = ['mobilenet_v2']
+
+
+def _make_divisible(v, divisor, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+class GlobalAvgPooling(nn.Cell):
+    """
+    Global avg pooling definition.
+
+    Args:
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> GlobalAvgPooling()
+    """
+
+    def __init__(self):
+        super(GlobalAvgPooling, self).__init__()
+        self.mean = P.ReduceMean(keep_dims=False)
+
+    def construct(self, x):
+        x = self.mean(x, (2, 3))
+        return x
+
+
+class DepthwiseConv(nn.Cell):
+    """
+    Depthwise Convolution warpper definition.
+
+    Args:
+        in_planes (int): Input channel.
+        kernel_size (int): Input kernel size.
+        stride (int): Stride size.
+        pad_mode (str): pad mode in (pad, same, valid)
+        channel_multiplier (int): Output channel multiplier
+        has_bias (bool): has bias or not
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> DepthwiseConv(16, 3, 1, 'pad', 1, channel_multiplier=1)
+    """
+
+    def __init__(self, in_planes, kernel_size, stride, pad_mode, pad, channel_multiplier=1, has_bias=False):
+        super(DepthwiseConv, self).__init__()
+        self.has_bias = has_bias
+        self.in_channels = in_planes
+        self.channel_multiplier = channel_multiplier
+        self.out_channels = in_planes * channel_multiplier
+        self.kernel_size = (kernel_size, kernel_size)
+        self.depthwise_conv = P.DepthwiseConv2dNative(channel_multiplier=channel_multiplier,
+                                                      kernel_size=self.kernel_size,
+                                                      stride=stride, pad_mode=pad_mode, pad=pad)
+        self.bias_add = P.BiasAdd()
+        weight_shape = [channel_multiplier, in_planes, *self.kernel_size]
+        self.weight = Parameter(initializer('ones', weight_shape), name='weight')
+
+        if has_bias:
+            bias_shape = [channel_multiplier * in_planes]
+            self.bias = Parameter(initializer('zeros', bias_shape), name='bias')
+        else:
+            self.bias = None
+
+    def construct(self, x):
+        output = self.depthwise_conv(x, self.weight)
+        if self.has_bias:
+            output = self.bias_add(output, self.bias)
+        return output
+
+
+class ConvBNReLU(nn.Cell):
+    """
+    Convolution/Depthwise fused with Batchnorm and ReLU block definition.
+
+    Args:
+        in_planes (int): Input channel.
+        out_planes (int): Output channel.
+        kernel_size (int): Input kernel size.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        groups (int): channel group. Convolution is 1 while Depthiwse is input channel. Default: 1.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ConvBNReLU(16, 256, kernel_size=1, stride=1, groups=1)
+    """
+
+    def __init__(self, platform, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
+        super(ConvBNReLU, self).__init__()
+        padding = (kernel_size - 1) // 2
+        if groups == 1:
+            conv = nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding)
+        else:
+            if platform == "Ascend":
+                conv = DepthwiseConv(in_planes, kernel_size, stride, pad_mode='pad', pad=padding)
+            elif platform == "GPU":
+                conv = nn.Conv2d(in_planes, out_planes, kernel_size, stride,
+                                 group=in_planes, pad_mode='pad', padding=padding)
+
+        layers = [conv, nn.BatchNorm2d(out_planes), nn.ReLU6()]
+        self.features = nn.SequentialCell(layers)
+
+    def construct(self, x):
+        output = self.features(x)
+        return output
+
+
+class InvertedResidual(nn.Cell):
+    """
+    Mobilenetv2 residual block definition.
+
+    Args:
+        inp (int): Input channel.
+        oup (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        expand_ratio (int): expand ration of input channel
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlock(3, 256, 1, 1)
+    """
+
+    def __init__(self, platform, inp, oup, stride, expand_ratio):
+        super(InvertedResidual, self).__init__()
+        assert stride in [1, 2]
+
+        hidden_dim = int(round(inp * expand_ratio))
+        self.use_res_connect = stride == 1 and inp == oup
+
+        layers = []
+        if expand_ratio != 1:
+            layers.append(ConvBNReLU(platform, inp, hidden_dim, kernel_size=1))
+        layers.extend([
+            # dw
+            ConvBNReLU(platform, hidden_dim, hidden_dim,
+                       stride=stride, groups=hidden_dim),
+            # pw-linear
+            nn.Conv2d(hidden_dim, oup, kernel_size=1,
+                      stride=1, has_bias=False),
+            nn.BatchNorm2d(oup),
+        ])
+        self.conv = nn.SequentialCell(layers)
+        self.add = TensorAdd()
+        self.cast = P.Cast()
+
+    def construct(self, x):
+        identity = x
+        x = self.conv(x)
+        if self.use_res_connect:
+            return self.add(identity, x)
+        return x
+
+
+class MobileNetV2(nn.Cell):
+    """
+    MobileNetV2 architecture.
+
+    Args:
+        class_num (Cell): number of classes.
+        width_mult (int): Channels multiplier for round to 8/16 and others. Default is 1.
+        has_dropout (bool): Is dropout used. Default is false
+        inverted_residual_setting (list): Inverted residual settings. Default is None
+        round_nearest (list): Channel round to . Default is 8
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> MobileNetV2(num_classes=1000)
+    """
+
+    def __init__(self, platform, num_classes=1000, width_mult=1.,
+                 has_dropout=False, inverted_residual_setting=None, round_nearest=8):
+        super(MobileNetV2, self).__init__()
+        block = InvertedResidual
+        input_channel = 32
+        last_channel = 1280
+        # setting of inverted residual blocks
+        self.cfgs = inverted_residual_setting
+        if inverted_residual_setting is None:
+            self.cfgs = [
+                # t, c, n, s
+                [1, 16, 1, 1],
+                [6, 24, 2, 2],
+                [6, 32, 3, 2],
+                [6, 64, 4, 2],
+                [6, 96, 3, 1],
+                [6, 160, 3, 2],
+                [6, 320, 1, 1],
+            ]
+
+        # building first layer
+        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
+        self.out_channels = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
+        features = [ConvBNReLU(platform, 3, input_channel, stride=2)]
+        # building inverted residual blocks
+        for t, c, n, s in self.cfgs:
+            output_channel = _make_divisible(c * width_mult, round_nearest)
+            for i in range(n):
+                stride = s if i == 0 else 1
+                features.append(block(platform, input_channel, output_channel, stride, expand_ratio=t))
+                input_channel = output_channel
+        # building last several layers
+        features.append(ConvBNReLU(platform, input_channel, self.out_channels, kernel_size=1))
+        # make it nn.CellList
+        self.features = nn.SequentialCell(features)
+        # mobilenet head
+        head = ([GlobalAvgPooling(), nn.Dense(self.out_channels, num_classes, has_bias=True)] if not has_dropout else
+                [GlobalAvgPooling(), nn.Dropout(0.2), nn.Dense(self.out_channels, num_classes, has_bias=True)])
+        self.head = nn.SequentialCell(head)
+
+        self._initialize_weights()
+
+    def construct(self, x):
+        x = self.features(x)
+        x = self.head(x)
+        return x
+
+    def _initialize_weights(self):
+        """
+        Initialize weights.
+
+        Args:
+
+        Returns:
+            None.
+
+        Examples:
+            >>> _initialize_weights()
+        """
+        for _, m in self.cells_and_names():
+            if isinstance(m, (nn.Conv2d, DepthwiseConv)):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.set_parameter_data(Tensor(np.random.normal(0, np.sqrt(2. / n),
+                                                                    m.weight.data.shape()).astype("float32")))
+                if m.bias is not None:
+                    m.bias.set_parameter_data(
+                        Tensor(np.zeros(m.bias.data.shape(), dtype="float32")))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.gamma.set_parameter_data(
+                    Tensor(np.ones(m.gamma.data.shape(), dtype="float32")))
+                m.beta.set_parameter_data(
+                    Tensor(np.zeros(m.beta.data.shape(), dtype="float32")))
+            elif isinstance(m, nn.Dense):
+                m.weight.set_parameter_data(Tensor(np.random.normal(
+                    0, 0.01, m.weight.data.shape()).astype("float32")))
+                if m.bias is not None:
+                    m.bias.set_parameter_data(
+                        Tensor(np.zeros(m.bias.data.shape(), dtype="float32")))
+
+
+def mobilenet_v2(**kwargs):
+    """
+    Constructs a MobileNet V2 model
+    """
+    return MobileNetV2(**kwargs)
--- a/example/mobilenetv2_quant/src/mobilenetV2_quant.py
+++ b/example/mobilenetv2_quant/src/mobilenetV2_quant.py
@ -0,0 +1,296 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""MobileNetV2 Quant model define"""
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.ops.operations import TensorAdd
+from mindspore import Parameter, Tensor
+from mindspore.common.initializer import initializer
+
+__all__ = ['mobilenet_v2_quant']
+
+_ema_decay = 0.999
+_symmetric = False
+
+def _make_divisible(v, divisor, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+class GlobalAvgPooling(nn.Cell):
+    """
+    Global avg pooling definition.
+
+    Args:
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> GlobalAvgPooling()
+    """
+
+    def __init__(self):
+        super(GlobalAvgPooling, self).__init__()
+        self.mean = P.ReduceMean(keep_dims=False)
+
+    def construct(self, x):
+        x = self.mean(x, (2, 3))
+        return x
+
+
+class DepthwiseConv(nn.Cell):
+    """
+    Depthwise Convolution warpper definition.
+
+    Args:
+        in_planes (int): Input channel.
+        kernel_size (int): Input kernel size.
+        stride (int): Stride size.
+        pad_mode (str): pad mode in (pad, same, valid)
+        channel_multiplier (int): Output channel multiplier
+        has_bias (bool): has bias or not
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> DepthwiseConv(16, 3, 1, 'pad', 1, channel_multiplier=1)
+    """
+
+    def __init__(self, in_planes, kernel_size, stride, pad_mode, pad, channel_multiplier=1, has_bias=False):
+        super(DepthwiseConv, self).__init__()
+        self.has_bias = has_bias
+        self.in_channels = in_planes
+        self.channel_multiplier = channel_multiplier
+        self.out_channels = in_planes * channel_multiplier
+        self.kernel_size = (kernel_size, kernel_size)
+        self.depthwise_conv = P.DepthwiseConv2dNative(channel_multiplier=channel_multiplier,
+                                                      kernel_size=self.kernel_size,
+                                                      stride=stride, pad_mode=pad_mode, pad=pad)
+        self.bias_add = P.BiasAdd()
+        weight_shape = [channel_multiplier, in_planes, *self.kernel_size]
+        self.weight = Parameter(initializer('ones', weight_shape), name='weight')
+
+        if has_bias:
+            bias_shape = [channel_multiplier * in_planes]
+            self.bias = Parameter(initializer('zeros', bias_shape), name='bias')
+        else:
+            self.bias = None
+
+    def construct(self, x):
+        output = self.depthwise_conv(x, self.weight)
+        if self.has_bias:
+            output = self.bias_add(output, self.bias)
+        return output
+
+
+class ConvBNReLU(nn.Cell):
+    """
+    Convolution/Depthwise fused with Batchnorm and ReLU block definition.
+
+    Args:
+        in_planes (int): Input channel.
+        out_planes (int): Output channel.
+        kernel_size (int): Input kernel size.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        groups (int): channel group. Convolution is 1 while Depthiwse is input channel. Default: 1.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ConvBNReLU(16, 256, kernel_size=1, stride=1, groups=1)
+    """
+
+    def __init__(self, platform, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
+        super(ConvBNReLU, self).__init__()
+        padding = (kernel_size - 1) // 2
+        if groups == 1:
+            conv = nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding)
+        else:
+            if platform == "Ascend":
+                conv = DepthwiseConv(in_planes, kernel_size, stride, pad_mode='pad', pad=padding)
+            elif platform == "GPU":
+                conv = nn.Conv2d(in_planes, out_planes, kernel_size, stride,
+                                 group=in_planes, pad_mode='pad', padding=padding)
+
+        layers = [conv, nn.BatchNorm2d(out_planes), nn.ReLU6()]
+        self.features = nn.SequentialCell(layers)
+        self.fake = nn.FakeQuantWithMinMax(in_planes, ema=True, ema_decay=_ema_decay, symmetric=_symmetric)
+
+    def construct(self, x):
+        output = self.features(x)
+        output = self.fake(output)
+        return output
+
+
+class InvertedResidual(nn.Cell):
+    """
+    Mobilenetv2 residual block definition.
+
+    Args:
+        inp (int): Input channel.
+        oup (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        expand_ratio (int): expand ration of input channel
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlock(3, 256, 1, 1)
+    """
+
+    def __init__(self, platform, inp, oup, stride, expand_ratio):
+        super(InvertedResidual, self).__init__()
+        assert stride in [1, 2]
+
+        hidden_dim = int(round(inp * expand_ratio))
+        self.use_res_connect = stride == 1 and inp == oup
+
+        layers = []
+        if expand_ratio != 1:
+            layers.append(ConvBNReLU(platform, inp, hidden_dim, kernel_size=1))
+        layers.extend([
+            # dw
+            ConvBNReLU(platform, hidden_dim, hidden_dim,
+                       stride=stride, groups=hidden_dim),
+            # pw-linear
+            nn.Conv2dBatchNormQuant(hidden_dim, oup, kernel_size=1, stride=1, pad_mode='pad', padding=0, group=1),
+            nn.FakeQuantWithMinMax(oup, ema=True, ema_decay=_ema_decay, symmetric=_symmetric)
+        ])
+        self.conv = nn.SequentialCell(layers)
+        self.add = TensorAdd()
+        self.add_fake = nn.FakeQuantWithMinMax(oup, ema=True, ema_decay=_ema_decay, symmetric=_symmetric)
+        self.cast = P.Cast()
+
+    def construct(self, x):
+        identity = x
+        x = self.conv(x)
+        if self.use_res_connect:
+            x = self.add(identity, x)
+            x = self.add_fake(x)
+        return x
+
+
+class MobileNetV2Quant(nn.Cell):
+    """
+    MobileNetV2Quant architecture.
+
+    Args:
+        class_num (Cell): number of classes.
+        width_mult (int): Channels multiplier for round to 8/16 and others. Default is 1.
+        has_dropout (bool): Is dropout used. Default is false
+        inverted_residual_setting (list): Inverted residual settings. Default is None
+        round_nearest (list): Channel round to . Default is 8
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> MobileNetV2Quant(num_classes=1000)
+    """
+
+    def __init__(self, platform, num_classes=1000, width_mult=1.,
+                 has_dropout=False, inverted_residual_setting=None, round_nearest=8):
+        super(MobileNetV2Quant, self).__init__()
+        block = InvertedResidual
+        input_channel = 32
+        last_channel = 1280
+        # setting of inverted residual blocks
+        self.cfgs = inverted_residual_setting
+        if inverted_residual_setting is None:
+            self.cfgs = [
+                # t, c, n, s
+                [1, 16, 1, 1],
+                [6, 24, 2, 2],
+                [6, 32, 3, 2],
+                [6, 64, 4, 2],
+                [6, 96, 3, 1],
+                [6, 160, 3, 2],
+                [6, 320, 1, 1],
+            ]
+
+        # building first layer
+        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
+        self.out_channels = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
+        features = [ConvBNReLU(platform, 3, input_channel, stride=2)]
+        # building inverted residual blocks
+        for t, c, n, s in self.cfgs:
+            output_channel = _make_divisible(c * width_mult, round_nearest)
+            for i in range(n):
+                stride = s if i == 0 else 1
+                features.append(block(platform, input_channel, output_channel, stride, expand_ratio=t))
+                input_channel = output_channel
+        # building last several layers
+        features.append(ConvBNReLU(platform, input_channel, self.out_channels, kernel_size=1))
+        # make it nn.CellList
+        self.features = nn.SequentialCell(features)
+        # mobilenet head
+        head = ([GlobalAvgPooling(), nn.Dense(self.out_channels, num_classes, has_bias=True)] if not has_dropout else
+                [GlobalAvgPooling(), nn.Dropout(0.2), nn.Dense(self.out_channels, num_classes, has_bias=True)])
+        self.head = nn.SequentialCell(head)
+
+        self._initialize_weights()
+
+    def construct(self, x):
+        x = self.features(x)
+        x = self.head(x)
+        return x
+
+    def _initialize_weights(self):
+        """
+        Initialize weights.
+
+        Args:
+
+        Returns:
+            None.
+
+        Examples:
+            >>> _initialize_weights()
+        """
+        for _, m in self.cells_and_names():
+            if isinstance(m, (nn.Conv2d, DepthwiseConv)):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.set_parameter_data(Tensor(np.random.normal(0, np.sqrt(2. / n),
+                                                                    m.weight.data.shape()).astype("float32")))
+                if m.bias is not None:
+                    m.bias.set_parameter_data(
+                        Tensor(np.zeros(m.bias.data.shape(), dtype="float32")))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.gamma.set_parameter_data(
+                    Tensor(np.ones(m.gamma.data.shape(), dtype="float32")))
+                m.beta.set_parameter_data(
+                    Tensor(np.zeros(m.beta.data.shape(), dtype="float32")))
+            elif isinstance(m, nn.Dense):
+                m.weight.set_parameter_data(Tensor(np.random.normal(
+                    0, 0.01, m.weight.data.shape()).astype("float32")))
+                if m.bias is not None:
+                    m.bias.set_parameter_data(
+                        Tensor(np.zeros(m.bias.data.shape(), dtype="float32")))
+
+
+def mobilenet_v2_quant(**kwargs):
+    """
+    Constructs a MobileNet V2 model
+    """
+    return MobileNetV2Quant(**kwargs)
--- a/example/mobilenetv2_quant/train.py
+++ b/example/mobilenetv2_quant/train.py
@ -0,0 +1,274 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""train_imagenet."""
+import os
+import time
+import argparse
+import random
+import numpy as np
+from mindspore import context
+from mindspore import Tensor
+from mindspore import nn
+from mindspore.parallel._auto_parallel_context import auto_parallel_context
+from mindspore.nn.optim.momentum import Momentum
+from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
+from mindspore.nn.loss.loss import _Loss
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common import dtype as mstype
+from mindspore.train.model import Model, ParallelMode
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, Callback
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.communication.management import init, get_group_size
+import mindspore.dataset.engine as de
+from src.dataset import create_dataset
+from src.lr_generator import get_lr
+from src.config import config_gpu, config_ascend
+from src.mobilenetV2 import mobilenet_v2
+from src.mobilenetV2_quant import mobilenet_v2_quant
+
+random.seed(1)
+np.random.seed(1)
+de.config.set_seed(1)
+
+parser = argparse.ArgumentParser(description='Image classification')
+parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path')
+parser.add_argument('--pre_trained', type=str, default=None, help='Pretrained checkpoint path')
+parser.add_argument('--platform', type=str, default=None, help='run platform')
+args_opt = parser.parse_args()
+
+if args_opt.platform == "Ascend":
+    device_id = int(os.getenv('DEVICE_ID'))
+    rank_id = int(os.getenv('RANK_ID'))
+    rank_size = int(os.getenv('RANK_SIZE'))
+    run_distribute = rank_size > 1
+    device_id = int(os.getenv('DEVICE_ID'))
+    context.set_context(mode=context.GRAPH_MODE,
+                        device_target="Ascend",
+                        device_id=device_id, save_graphs=False)
+elif args_opt.platform == "GPU":
+    context.set_context(mode=context.GRAPH_MODE,
+                        device_target="GPU", save_graphs=False)
+else:
+    raise ValueError("Unsupport platform.")
+
+
+class CrossEntropyWithLabelSmooth(_Loss):
+    """
+    CrossEntropyWith LabelSmooth.
+
+    Args:
+        smooth_factor (float): smooth factor, default=0.
+        num_classes (int): num classes
+
+    Returns:
+        None.
+
+    Examples:
+        >>> CrossEntropyWithLabelSmooth(smooth_factor=0., num_classes=1000)
+    """
+
+    def __init__(self, smooth_factor=0., num_classes=1000):
+        super(CrossEntropyWithLabelSmooth, self).__init__()
+        self.onehot = P.OneHot()
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor /
+                                (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits()
+        self.mean = P.ReduceMean(False)
+        self.cast = P.Cast()
+
+    def construct(self, logit, label):
+        one_hot_label = self.onehot(self.cast(label, mstype.int32), F.shape(logit)[1],
+                                    self.on_value, self.off_value)
+        out_loss = self.ce(logit, one_hot_label)
+        out_loss = self.mean(out_loss, 0)
+        return out_loss
+
+
+class Monitor(Callback):
+    """
+    Monitor loss and time.
+
+    Args:
+        lr_init (numpy array): train lr
+
+    Returns:
+        None
+
+    Examples:
+        >>> Monitor(100,lr_init=Tensor([0.05]*100).asnumpy())
+    """
+
+    def __init__(self, lr_init=None):
+        super(Monitor, self).__init__()
+        self.lr_init = lr_init
+        self.lr_init_len = len(lr_init)
+
+    def epoch_begin(self, run_context):
+        self.losses = []
+        self.epoch_time = time.time()
+
+    def epoch_end(self, run_context):
+        cb_params = run_context.original_args()
+
+        epoch_mseconds = (time.time() - self.epoch_time) * 1000
+        per_step_mseconds = epoch_mseconds / cb_params.batch_num
+        print("epoch time: {:5.3f}, per step time: {:5.3f}, avg loss: {:5.3f}".format(epoch_mseconds,
+                                                                                      per_step_mseconds,
+                                                                                      np.mean(self.losses)))
+
+    def step_begin(self, run_context):
+        self.step_time = time.time()
+
+    def step_end(self, run_context):
+        cb_params = run_context.original_args()
+        step_mseconds = (time.time() - self.step_time) * 1000
+        step_loss = cb_params.net_outputs
+
+        if isinstance(step_loss, (tuple, list)) and isinstance(step_loss[0], Tensor):
+            step_loss = step_loss[0]
+        if isinstance(step_loss, Tensor):
+            step_loss = np.mean(step_loss.asnumpy())
+
+        self.losses.append(step_loss)
+        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num
+
+        print("epoch: [{:3d}/{:3d}], step:[{:5d}/{:5d}], loss:[{:5.3f}/{:5.3f}], time:[{:5.3f}], lr:[{:5.5f}]".format(
+            cb_params.cur_epoch_num -
+            1, cb_params.epoch_num, cur_step_in_epoch, cb_params.batch_num, step_loss,
+            np.mean(self.losses), step_mseconds, self.lr_init[cb_params.cur_step_num - 1]))
+
+
+if __name__ == '__main__':
+    if args_opt.platform == "GPU":
+        # train on gpu
+        print("train args: ", args_opt, "\ncfg: ", config_gpu)
+
+        init('nccl')
+        context.set_auto_parallel_context(parallel_mode="data_parallel",
+                                          mirror_mean=True,
+                                          device_num=get_group_size())
+
+        # define net
+        net = mobilenet_v2(num_classes=config_gpu.num_classes, platform="GPU")
+        # define loss
+        if config_gpu.label_smooth > 0:
+            loss = CrossEntropyWithLabelSmooth(
+                smooth_factor=config_gpu.label_smooth, num_classes=config_gpu.num_classes)
+        else:
+            loss = SoftmaxCrossEntropyWithLogits(
+                is_grad=False, sparse=True, reduction='mean')
+        # define dataset
+        epoch_size = config_gpu.epoch_size
+        dataset = create_dataset(dataset_path=args_opt.dataset_path,
+                                 do_train=True,
+                                 config=config_gpu,
+                                 platform=args_opt.platform,
+                                 repeat_num=epoch_size,
+                                 batch_size=config_gpu.batch_size)
+        step_size = dataset.get_dataset_size()
+        # resume
+        if args_opt.pre_trained:
+            param_dict = load_checkpoint(args_opt.pre_trained)
+            load_param_into_net(net, param_dict)
+        # define optimizer
+        loss_scale = FixedLossScaleManager(
+            config_gpu.loss_scale, drop_overflow_update=False)
+        lr = Tensor(get_lr(global_step=0,
+                           lr_init=0,
+                           lr_end=0,
+                           lr_max=config_gpu.lr,
+                           warmup_epochs=config_gpu.warmup_epochs,
+                           total_epochs=epoch_size,
+                           steps_per_epoch=step_size))
+        opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config_gpu.momentum,
+                       config_gpu.weight_decay, config_gpu.loss_scale)
+        # define model
+        model = Model(net, loss_fn=loss, optimizer=opt,
+                      loss_scale_manager=loss_scale)
+
+        cb = [Monitor(lr_init=lr.asnumpy())]
+        if config_gpu.save_checkpoint:
+            config_ck = CheckpointConfig(save_checkpoint_steps=config_gpu.save_checkpoint_epochs * step_size,
+                                         keep_checkpoint_max=config_gpu.keep_checkpoint_max)
+            ckpt_cb = ModelCheckpoint(
+                prefix="mobilenet", directory=config_gpu.save_checkpoint_path, config=config_ck)
+            cb += [ckpt_cb]
+        # begine train
+        model.train(epoch_size, dataset, callbacks=cb)
+    elif args_opt.platform == "Ascend":
+        # train on ascend
+        print("train args: ", args_opt, "\ncfg: ", config_ascend,
+              "\nparallel args: rank_id {}, device_id {}, rank_size {}".format(rank_id, device_id, rank_size))
+
+        if run_distribute:
+            context.set_auto_parallel_context(device_num=rank_size, parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              parameter_broadcast=True, mirror_mean=True)
+            auto_parallel_context().set_all_reduce_fusion_split_indices([140])
+            init()
+
+        epoch_size = config_ascend.epoch_size
+        net = mobilenet_v2(num_classes=config_ascend.num_classes, platform="Ascend")
+        net = mobilenet_v2_quant(num_classes=config_ascend.num_classes, platform="Ascend")
+        net.to_float(mstype.float16)
+        for _, cell in net.cells_and_names():
+            if isinstance(cell, nn.Dense):
+                cell.to_float(mstype.float32)
+        if config_ascend.label_smooth > 0:
+            loss = CrossEntropyWithLabelSmooth(
+                smooth_factor=config_ascend.label_smooth, num_classes=config_ascend.num_classes)
+        else:
+            loss = SoftmaxCrossEntropyWithLogits(
+                is_grad=False, sparse=True, reduction='mean')
+        dataset = create_dataset(dataset_path=args_opt.dataset_path,
+                                 do_train=True,
+                                 config=config_ascend,
+                                 platform=args_opt.platform,
+                                 repeat_num=epoch_size,
+                                 batch_size=config_ascend.batch_size)
+        step_size = dataset.get_dataset_size()
+        if args_opt.pre_trained:
+            param_dict = load_checkpoint(args_opt.pre_trained)
+            load_param_into_net(net, param_dict)
+
+        loss_scale = FixedLossScaleManager(
+            config_ascend.loss_scale, drop_overflow_update=False)
+        lr = Tensor(get_lr(global_step=0,
+                           lr_init=0,
+                           lr_end=0,
+                           lr_max=config_ascend.lr,
+                           warmup_epochs=config_ascend.warmup_epochs,
+                           total_epochs=epoch_size,
+                           steps_per_epoch=step_size))
+        opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), lr, config_ascend.momentum,
+                       config_ascend.weight_decay, config_ascend.loss_scale)
+
+        model = Model(net, loss_fn=loss, optimizer=opt,
+                      loss_scale_manager=loss_scale)
+
+        cb = None
+        if rank_id == 0:
+            cb = [Monitor(lr_init=lr.asnumpy())]
+            if config_ascend.save_checkpoint:
+                config_ck = CheckpointConfig(save_checkpoint_steps=config_ascend.save_checkpoint_epochs * step_size,
+                                             keep_checkpoint_max=config_ascend.keep_checkpoint_max)
+                ckpt_cb = ModelCheckpoint(
+                    prefix="mobilenet", directory=config_ascend.save_checkpoint_path, config=config_ck)
+                cb += [ckpt_cb]
+        model.train(epoch_size, dataset, callbacks=cb)
+    else:
+        raise ValueError("Unsupport platform.")