add deeplabv3plus gpu

modified: README_CN.md
2021-08-23 15:26:47 +08:00 · 2021-08-23 15:26:47 +08:00 · aba56674c6
parent f960f0671f
commit aba56674c6
22 changed files with 711 additions and 49 deletions
--- a/model_zoo/research/cv/deeplabv3plus/README_CN.md
+++ b/model_zoo/research/cv/deeplabv3plus/README_CN.md
@ -17,10 +17,12 @@
    - [训练过程](#训练过程)
        - [用法](#用法)
            - [Ascend处理器环境运行](#ascend处理器环境运行)
+            - [GPU处理器环境运行](#gpu处理器环境运行)
        - [结果](#结果)
    - [评估过程](#评估过程)
        - [用法](#用法-1)
            - [Ascend处理器环境运行](#ascend处理器环境运行-1)
+            - [GPU处理器环境运行](#gpu处理器环境运行-1)
        - [结果](#结果-1)
            - [训练准确率](#训练准确率)
 - [模型描述](#模型描述)
@ -173,6 +175,66 @@ run_eval_s8_multiscale.sh
 run_eval_s8_multiscale_flip.sh
 ```

+- GPU处理器环境运行
+
+按照以下训练步骤进行8卡训练：
+
+1.使用VOCaug数据集训练s16，微调ResNet-101预训练模型。脚本如下：
+
+```bash
+bash run_distribute_train_s16_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
+```
+
+2.使用VOCaug数据集训练s8，微调上一步的模型。脚本如下：
+
+```bash
+bash run_distribute_train_s8_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
+```
+
+3.使用VOCtrain数据集训练s8，微调上一步的模型。脚本如下：
+
+```bash
+bash run_distribute_train_s8_r2_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
+```
+
+评估步骤如下：
+
+1.使用voc val数据集评估s16。评估脚本如下：
+
+```bash
+bash run_eval_s16_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
+2.使用voc val数据集评估多尺度s16。评估脚本如下：
+
+```bash
+bash run_eval_s16_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
+3.使用voc val数据集评估多尺度和翻转s16。评估脚本如下：
+
+```bash
+bash run_eval_s16_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
+4.使用voc val数据集评估s8。评估脚本如下：
+
+```bash
+bash run_eval_s8_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
+5.使用voc val数据集评估多尺度s8。评估脚本如下：
+
+```bash
+bash run_eval_s8_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
+6.使用voc val数据集评估多尺度和翻转s8。评估脚本如下：
+
+```bash
+bash run_eval_s8_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
+```
+
 # 脚本说明

 ## 脚本及样例代码
@ -192,6 +254,15 @@ run_eval_s8_multiscale_flip.sh
    ├── run_eval_s8.sh                            # 使用s8结构启动Ascend评估
    ├── run_eval_s8_multiscale.sh                 # 使用多尺度s8结构启动Ascend评估
    ├── run_eval_s8_multiscale_filp.sh            # 使用多尺度和翻转s8结构启动Ascend评估
+    ├── run_distribute_train_s16_r1_gpu.sh            # 使用s16结构的VOCaug数据集启动GPU分布式训练（8卡）
+    ├── run_distribute_train_s8_r1_gpu.sh             # 使用s8结构的VOCaug数据集启动GPU分布式训练（8卡）
+    ├── run_distribute_train_s8_r2_gpu.sh             # 使用s8结构的VOCtrain数据集启动GPU分布式训练（8卡）
+    ├── run_eval_s16_gpu.sh                           # 使用s16结构启动GPU评估
+    ├── run_eval_s16_multiscale_gpu.sh                # 使用多尺度s16结构启动GPU评估
+    ├── run_eval_s16_multiscale_filp_gpu.sh           # 使用多尺度和翻转s16结构启动GPU评估
+    ├── run_eval_s8_gpu.sh                            # 使用s8结构启动GPU评估
+    ├── run_eval_s8_multiscale_gpu.sh                 # 使用多尺度s8结构启动GPU评估
+    ├── run_eval_s8_multiscale_filp_gpu.sh            # 使用多尺度和翻转s8结构启动GPU评估
  ├── src
    ├── tools
        ├── get_dataset_list.py               # 获取数据清单文件
@ -274,7 +345,7 @@ do
    echo 'start rank='$i', device id='$DEVICE_ID'...'
    mkdir ${train_path}/device$DEVICE_ID
    cd ${train_path}/device$DEVICE_ID
-    ython ${train_code_path}/train.py --train_dir=${train_path}/ckpt  \
+    python ${train_code_path}/train.py --train_dir=${train_path}/ckpt  \
                                               --data_file=/PATH/TO/MINDRECORD_NAME  \
                                               --train_epochs=300  \
                                               --batch_size=32  \
@ -374,6 +445,10 @@ python  train.py    --train_url=/PATH/TO/OUTPUT_DIR \
                    --save_steps=410  \
 ```

+#### GPU处理器环境运行
+
+具体参数配置可参照[快速入门](#快速入门)中8卡训练脚本。
+
 ### 结果

 #### Ascend处理器环境运行
@ -483,6 +558,10 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
 ```

+#### GPU处理器环境运行
+
+具体参数配置可参照[快速入门](#快速入门)中评估测试脚本。
+
 ### 结果

 运行适用的训练脚本获取结果。要获得相同的结果，请按照快速入门中的步骤操作。
@ -506,21 +585,21 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \

 ### 评估性能

-| 参数 | Ascend 910|
-| -------------------------- | -------------------------------------- |
-| 模型版本 | DeepLabV3+ |
-| 资源 | Ascend 910 |
-| 上传日期 | 2021-03-16 |
-| MindSpore版本 | 1.1.1 |
-| 数据集 | PASCAL VOC2012 + SBD |
-| 训练参数 | epoch = 300, batch_size = 32 (s16_r1)  epoch = 800, batch_size = 16 (s8_r1)  epoch = 300, batch_size = 16 (s8_r2) |
-| 优化器 | Momentum |
-| 损失函数 | Softmax交叉熵 |
-| 输出 | 概率 |
-| 损失 | 0.0041095633 |
-| 性能 | 187736.386 ms（单卡，s16）<br>  44474.187 ms（八卡，s16） |  
-| 微调检查点 | 453M （.ckpt文件） |
-| 脚本 | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |
+| 参数 | Ascend 910| GPU |
+| -------------------------- | -------------------------------------- | -------------------------------------- |
+| 模型版本 | DeepLabV3+ | DeepLabV3+ |
+| 资源 | Ascend 910 |NV SMX2 V100-32G|
+| 上传日期 | 2021-03-16 |2021-08-23|
+| MindSpore版本 | 1.1.1 |1.4.0|
+| 数据集 | PASCAL VOC2012 + SBD | PASCAL VOC2012 + SBD |
+| 训练参数 | epoch = 300, batch_size = 32 (s16_r1)  epoch = 800, batch_size = 16 (s8_r1)  epoch = 300, batch_size = 16 (s8_r2) |epoch = 300, batch_size = 16 (s16_r1)  epoch = 800, batch_size = 8 (s8_r1)  epoch = 300, batch_size = 8 (s8_r2) |
+| 优化器 | Momentum | Momentum |
+| 损失函数 | Softmax交叉熵 |Softmax交叉熵 |
+| 输出 | 概率 |概率 |
+| 损失 | 0.0041095633 |0.003395824|
+| 性能 | 187736.386 ms（单卡，s16）<br>  44474.187 ms（八卡，s16） |  1080 ms/step（单卡，s16）|  
+| 微调检查点 | 453M （.ckpt文件） | 454M （.ckpt文件）|
+| 脚本 | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |[链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |

 # 随机情况说明

--- a/model_zoo/research/cv/deeplabv3plus/eval.py
+++ b/model_zoo/research/cv/deeplabv3plus/eval.py
@ -24,9 +24,6 @@ from mindspore import context
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from src.deeplab_v3plus import DeepLabV3Plus

-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False,
-                    device_id=int(os.getenv('DEVICE_ID')))
-

 def parse_args():
    """parse_args"""
@ -44,6 +41,11 @@ def parse_args():
    parser.add_argument('--ignore_label', type=int, default=255, help='ignore label')
    parser.add_argument('--num_classes', type=int, default=21, help='number of classes')

+    # device info
+    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU', 'CPU'],
+                        help='device where the code will be implemented. (Default: Ascend)')
+    parser.add_argument('--device_id', type=int, default=0, help='device id')
+
    # model
    parser.add_argument('--model', type=str, default='', help='select model')
    parser.add_argument('--freeze_bn', action='store_true', default=False, help='freeze bn')
@ -154,7 +156,8 @@ def eval_batch_scales(args, eval_net, img_lst, scales,
 def net_eval():
    """net_eval"""
    args = parse_args()
-
+    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, save_graphs=False,
+                        device_id=args.device_id)
    # data list
    with open(args.data_lst) as f:
        img_lst = f.readlines()
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_alone_train.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_alone_train.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=5
+DEVICE_ID=5
 export SLOG_PRINT_TO_STDOUT=0
 train_path=/PATH/TO/EXPERIMENTS_DIR
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
@ -41,4 +41,5 @@ python ${train_code_path}/train.py --data_file=/PATH/TO/MINDRECORD_NAME  \
                    --model=DeepLabV3plus_s16  \
                    --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL  \
                    --save_steps=1500  \
-                    --keep_checkpoint_max=200 >log 2>&1 &
+                    --keep_checkpoint_max=200 \
+                    --device_id=$DEVICE_ID >log 2>&1 &
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s16_r1.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s16_r1.sh
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
 for((i=0;i<=$RANK_SIZE-1;i++));
 do
    export RANK_ID=${i}
-    export DEVICE_ID=$((i + RANK_START_ID))
+    DEVICE_ID=$((i + RANK_START_ID))
    echo 'start rank='${i}', device id='${DEVICE_ID}'...'
    mkdir ${train_path}/device${DEVICE_ID}
    cd ${train_path}/device${DEVICE_ID} || exit
@ -50,5 +50,6 @@ do
                                               --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL  \
                                               --is_distributed  \
                                               --save_steps=410  \
-                                               --keep_checkpoint_max=200 >log 2>&1 &
+                                               --keep_checkpoint_max=200 \
+                                               --device_id=$DEVICE_ID >log 2>&1 &
 done
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s16_r1_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s16_r1_gpu.sh
@ -0,0 +1,66 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 2 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_distribute_train_s16_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
+    echo "for example:"
+    echo "bash run_distribute_train_s16_r1_gpu.sh \
+      voc2012/mindrecord_train/vocaug_mindrecord0 resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+exit 1
+fi
+
+DATA_FILE=$1
+CKPT_PRE_TRAINED=$2
+
+ulimit -c unlimited
+export SLOG_PRINT_TO_STDOUT=0
+
+export RANK_SIZE=8
+export GLOG_v=2
+
+train_path=s16_train
+if [ -d ${train_path} ]; then
+  rm -rf ${train_path}
+fi
+mkdir -p ${train_path}
+mkdir ${train_path}/ckpt
+cp ../*.py  ${train_path}
+cp -r ../src  ${train_path}
+cd ${train_path} || exit
+
+mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
+python ./train.py --train_dir=${train_path}/ckpt  \
+                  --data_file=$DATA_FILE  \
+                  --train_epochs=300  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --base_lr=0.04  \
+                  --lr_type=cos  \
+                  --min_scale=0.5  \
+                  --max_scale=2.0  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s16  \
+                  --ckpt_pre_trained=$CKPT_PRE_TRAINED  \
+                  --is_distributed  \
+                  --save_steps=410  \
+                  --keep_checkpoint_max=200 \
+                  --device_target="GPU"  >log 2>&1 &
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r1.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r1.sh
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
 for((i=0;i<=$RANK_SIZE-1;i++));
 do
    export RANK_ID=${i}
-    export DEVICE_ID=$((i + RANK_START_ID))
+    DEVICE_ID=$((i + RANK_START_ID))
    echo 'start rank='${i}', device id='${DEVICE_ID}'...'
    mkdir ${train_path}/device${DEVICE_ID}
    cd ${train_path}/device${DEVICE_ID} || exit
@ -51,5 +51,6 @@ do
                                               --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL  \
                                               --is_distributed  \
                                               --save_steps=820  \
-                                               --keep_checkpoint_max=200 >log 2>&1 &
+                                               --keep_checkpoint_max=200 \
+                                               --device_id=$DEVICE_ID >log 2>&1 &
 done
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r1_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r1_gpu.sh
@ -0,0 +1,67 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 2 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_distribute_train_s8_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
+    echo "for example:"
+    echo "bash run_distribute_train_s8_r1_gpu.sh \
+      voc2012/mindrecord_train/vocaug_mindrecord0  ckpt/DeepLabV3plus_s16-300_82.ckpt"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_FILE=$1
+CKPT_PRE_TRAINED=$2
+
+ulimit -c unlimited
+export SLOG_PRINT_TO_STDOUT=0
+
+export RANK_SIZE=8
+export GLOG_v=2
+
+train_path=s8_r1_train
+if [ -d ${train_path} ]; then
+  rm -rf ${train_path}
+fi
+mkdir -p ${train_path}
+mkdir ${train_path}/ckpt
+cp ../*.py  ${train_path}
+cp -r ../src  ${train_path}
+cd ${train_path} || exit
+
+
+mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
+python ./train.py --train_dir=${train_path}/ckpt  \
+                   --data_file=$DATA_FILE  \
+                   --train_epochs=800  \
+                   --batch_size=8  \
+                   --crop_size=513  \
+                   --base_lr=0.01  \
+                   --lr_type=cos  \
+                   --min_scale=0.5  \
+                   --max_scale=2.0  \
+                   --ignore_label=255  \
+                   --num_classes=21  \
+                   --model=DeepLabV3plus_s8  \
+                   --loss_scale=2048  \
+                   --ckpt_pre_trained=$CKPT_PRE_TRAINED  \
+                   --is_distributed  \
+                   --save_steps=820  \
+                   --keep_checkpoint_max=200 \
+                   --device_target="GPU" >log 2>&1 &
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r2.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r2.sh
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
 for((i=0;i<=$RANK_SIZE-1;i++));
 do
    export RANK_ID=${i}
-    export DEVICE_ID=$((i + RANK_START_ID))
+    DEVICE_ID=$((i + RANK_START_ID))
    echo 'start rank='${i}', device id='${DEVICE_ID}'...'
    mkdir ${train_path}/device${DEVICE_ID}
    cd ${train_path}/device${DEVICE_ID} || exit
@ -51,5 +51,6 @@ do
                                               --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL  \
                                               --is_distributed  \
                                               --save_steps=110  \
-                                               --keep_checkpoint_max=200 >log 2>&1 &
+                                               --keep_checkpoint_max=200 \
+                                               --device_id=$DEVICE_ID >log 2>&1 &
 done
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r2_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_distribute_train_s8_r2_gpu.sh
@ -0,0 +1,70 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 2 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_distribute_train_s8_r2_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
+    echo "for example:"
+    echo "bash run_distribute_train_s8_r2_gpu.sh \
+      voc2012/mindrecord_voctrain/mindrecord_voctrain0  ckpt/DeepLabV3plus_s8-800_165.ckpt"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+
+DATA_FILE=$1
+CKPT_PRE_TRAINED=$2
+
+ulimit -c unlimited
+export SLOG_PRINT_TO_STDOUT=0
+
+export RANK_SIZE=8
+export GLOG_v=2
+
+train_path=s8_r2_train
+if [ -d ${train_path} ]; then
+  rm -rf ${train_path}
+fi
+mkdir -p ${train_path}
+mkdir ${train_path}/ckpt
+cp ../*.py  ${train_path}
+cp -r ../src  ${train_path}
+cd ${train_path} || exit
+
+
+mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
+python ./train.py --train_dir=${train_path}/ckpt  \
+                  --data_file=$DATA_FILE  \
+                  --train_epochs=300  \
+                  --batch_size=8  \
+                  --crop_size=513  \
+                  --base_lr=0.004  \
+                  --lr_type=cos  \
+                  --min_scale=0.5  \
+                  --max_scale=2.0  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s8  \
+                  --loss_scale=2048  \
+                  --ckpt_pre_trained=$CKPT_PRE_TRAINED  \
+                  --is_distributed  \
+                  --save_steps=110  \
+                  --keep_checkpoint_max=200 \
+                  --device_target="GPU" >log 2>&1 &
+
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -33,5 +33,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --model=DeepLabV3plus_s16  \
                    --scales=1.0  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &

--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_gpu.sh
@ -0,0 +1,56 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s16_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s16_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s16_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=32  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s16  \
+                  --scales=1.0  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
+
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -37,4 +37,5 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --scales=1.25  \
                    --scales=1.75  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_flip.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_flip.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -38,5 +38,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --scales=1.75  \
                    --flip  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &

--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_flip_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_flip_gpu.sh
@ -0,0 +1,63 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s16_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s16_multiscale_flip_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s16_multiscale_flip_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s16  \
+                  --scales=0.5  \
+                  --scales=0.75  \
+                  --scales=1.0  \
+                  --scales=1.25  \
+                  --scales=1.75  \
+                  --flip  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
+
+
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s16_multiscale_gpu.sh
@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s16_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s16_multiscale_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s16_multiscale_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s16  \
+                  --scales=0.5  \
+                  --scales=0.75  \
+                  --scales=1.0  \
+                  --scales=1.25  \
+                  --scales=1.75  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -33,5 +33,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --model=DeepLabV3plus_s8  \
                    --scales=1.0  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &

--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_gpu.sh
@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s8_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s8_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s8_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s8  \
+                  --scales=1.0  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
+
+
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -37,5 +37,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --scales=1.25  \
                    --scales=1.75  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &

--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_flip.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_flip.sh
@ -14,7 +14,7 @@
 # limitations under the License.
 # ============================================================================

-export DEVICE_ID=3
+DEVICE_ID=3
 export SLOG_PRINT_TO_STDOUT=0
 train_code_path=/PATH/TO/MODEL_ZOO_CODE
 eval_path=/PATH/TO/EVAL
@ -38,5 +38,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA  \
                    --scales=1.75  \
                    --flip  \
                    --freeze_bn  \
-                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
+                    --ckpt_path=/PATH/TO/PRETRAIN_MODEL \
+                    --device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &

--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_flip_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_flip_gpu.sh
@ -0,0 +1,63 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s8_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s8_multiscale_flip_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s8_multiscale_flip_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s8  \
+                  --scales=0.5  \
+                  --scales=0.75  \
+                  --scales=1.0  \
+                  --scales=1.25  \
+                  --scales=1.75  \
+                  --flip  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
+
+
--- a/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_gpu.sh
+++ b/model_zoo/research/cv/deeplabv3plus/scripts/run_eval_s8_multiscale_gpu.sh
@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ $# != 4 ]
+then
+    echo "=============================================================================================================="
+    echo "Please run the script as: "
+    echo "bash run_eval_s8_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
+    echo "for example:"
+    echo "bash run_eval_s8_multiscale_gpu.sh \
+      voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
+    echo "It is better to use absolute path."
+    echo "=============================================================================================================="
+    exit 1
+fi
+DATA_ROOT=$1
+DATA_LST=$2
+CKPT_PATH=$3
+export CUDA_VISIBLE_DEVICES=$4
+
+export SLOG_PRINT_TO_STDOUT=0
+eval_path=s8_multiscale_eval
+if [ -d ${eval_path} ]; then
+  rm -rf ${eval_path}
+fi
+mkdir -p ${eval_path}
+cp ../*.py  ${eval_path}
+cp -r ../src  ${eval_path}
+cd ${eval_path} || exit
+
+python ./eval.py  --data_root=$DATA_ROOT  \
+                  --data_lst=$DATA_LST  \
+                  --batch_size=16  \
+                  --crop_size=513  \
+                  --ignore_label=255  \
+                  --num_classes=21  \
+                  --model=DeepLabV3plus_s8  \
+                  --scales=0.5  \
+                  --scales=0.75  \
+                  --scales=1.0  \
+                  --scales=1.25  \
+                  --scales=1.75  \
+                  --freeze_bn  \
+                  --ckpt_path=$CKPT_PATH \
+                  --device_target="GPU" \
+                  --device_id=0 > eval_log 2>&1 &
+
--- a/model_zoo/research/cv/deeplabv3plus/train.py
+++ b/model_zoo/research/cv/deeplabv3plus/train.py
@ -76,8 +76,9 @@ def parse_args():
    parser.add_argument('--ckpt_pre_trained', type=str, default='', help='PreTrained model')

    # train
-    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'CPU'],
+    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU', 'CPU'],
                        help='device where the code will be implemented. (Default: Ascend)')
+    parser.add_argument('--device_id', type=int, default=0, help='device id')
    parser.add_argument('--is_distributed', action='store_true', help='distributed training')
    parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
@ -99,17 +100,16 @@ def parse_args():
 def train():
    """train"""
    args = parse_args()
-    if args.device_target == "CPU":
-        context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="CPU")
-    else:
-        context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True, save_graphs=False,
-                            device_target="Ascend", device_id=int(os.getenv('DEVICE_ID')))
+    context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target=args.device_target)
+    if args.device_target != "CPU":
+        context.set_context(enable_auto_mixed_precision=True, device_id=args.device_id)
+
    # init multicards training
    if args.modelArts_mode:
        import moxing as mox
        local_data_url = '/cache/data'
        local_train_url = '/cache/ckpt'
-        device_id = int(os.getenv('DEVICE_ID'))
+        device_id = args.device_id
        device_num = int(os.getenv('RANK_SIZE'))
        if device_num > 1:
            init()
@ -169,7 +169,15 @@ def train():
    # load pretrained model
    if args.ckpt_pre_trained or args.pretrainedmodel_filename:
        param_dict = load_checkpoint(ckpt_file)
-        load_param_into_net(train_net, param_dict)
+        if args.model == 'DeepLabV3plus_s16':
+            trans_param_dict = {}
+            for key, val in param_dict.items():
+                key = key.replace("down_sample_layer", "downsample")
+                trans_param_dict[f"network.resnet.{key}"] = val
+            load_param_into_net(train_net, trans_param_dict)
+        else:
+            load_param_into_net(train_net, param_dict)
+        print('load_model {} success'.format(args.ckpt_pre_trained))

    # optimizer
    iters_per_epoch = dataset.get_dataset_size()
@ -188,7 +196,7 @@ def train():

    # loss scale
    manager_loss_scale = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
-    amp_level = "O0" if args.device_target == "CPU" else "O3"
+    amp_level = "O0" if args.device_target != "Ascend" else "O3"
    model = Model(train_net, optimizer=opt, amp_level=amp_level, loss_scale_manager=manager_loss_scale)

    # callback for saving ckpts