forked from mindspore-Ecosystem/mindspore
parent
f960f0671f
commit
aba56674c6
|
@ -17,10 +17,12 @@
|
|||
- [训练过程](#训练过程)
|
||||
- [用法](#用法)
|
||||
- [Ascend处理器环境运行](#ascend处理器环境运行)
|
||||
- [GPU处理器环境运行](#gpu处理器环境运行)
|
||||
- [结果](#结果)
|
||||
- [评估过程](#评估过程)
|
||||
- [用法](#用法-1)
|
||||
- [Ascend处理器环境运行](#ascend处理器环境运行-1)
|
||||
- [GPU处理器环境运行](#gpu处理器环境运行-1)
|
||||
- [结果](#结果-1)
|
||||
- [训练准确率](#训练准确率)
|
||||
- [模型描述](#模型描述)
|
||||
|
@ -173,6 +175,66 @@ run_eval_s8_multiscale.sh
|
|||
run_eval_s8_multiscale_flip.sh
|
||||
```
|
||||
|
||||
- GPU处理器环境运行
|
||||
|
||||
按照以下训练步骤进行8卡训练:
|
||||
|
||||
1.使用VOCaug数据集训练s16,微调ResNet-101预训练模型。脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_distribute_train_s16_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
|
||||
```
|
||||
|
||||
2.使用VOCaug数据集训练s8,微调上一步的模型。脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_distribute_train_s8_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
|
||||
```
|
||||
|
||||
3.使用VOCtrain数据集训练s8,微调上一步的模型。脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_distribute_train_s8_r2_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
|
||||
```
|
||||
|
||||
评估步骤如下:
|
||||
|
||||
1.使用voc val数据集评估s16。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s16_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
2.使用voc val数据集评估多尺度s16。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s16_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
3.使用voc val数据集评估多尺度和翻转s16。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s16_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
4.使用voc val数据集评估s8。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s8_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
5.使用voc val数据集评估多尺度s8。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s8_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
6.使用voc val数据集评估多尺度和翻转s8。评估脚本如下:
|
||||
|
||||
```bash
|
||||
bash run_eval_s8_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID
|
||||
```
|
||||
|
||||
# 脚本说明
|
||||
|
||||
## 脚本及样例代码
|
||||
|
@ -192,6 +254,15 @@ run_eval_s8_multiscale_flip.sh
|
|||
├── run_eval_s8.sh # 使用s8结构启动Ascend评估
|
||||
├── run_eval_s8_multiscale.sh # 使用多尺度s8结构启动Ascend评估
|
||||
├── run_eval_s8_multiscale_filp.sh # 使用多尺度和翻转s8结构启动Ascend评估
|
||||
├── run_distribute_train_s16_r1_gpu.sh # 使用s16结构的VOCaug数据集启动GPU分布式训练(8卡)
|
||||
├── run_distribute_train_s8_r1_gpu.sh # 使用s8结构的VOCaug数据集启动GPU分布式训练(8卡)
|
||||
├── run_distribute_train_s8_r2_gpu.sh # 使用s8结构的VOCtrain数据集启动GPU分布式训练(8卡)
|
||||
├── run_eval_s16_gpu.sh # 使用s16结构启动GPU评估
|
||||
├── run_eval_s16_multiscale_gpu.sh # 使用多尺度s16结构启动GPU评估
|
||||
├── run_eval_s16_multiscale_filp_gpu.sh # 使用多尺度和翻转s16结构启动GPU评估
|
||||
├── run_eval_s8_gpu.sh # 使用s8结构启动GPU评估
|
||||
├── run_eval_s8_multiscale_gpu.sh # 使用多尺度s8结构启动GPU评估
|
||||
├── run_eval_s8_multiscale_filp_gpu.sh # 使用多尺度和翻转s8结构启动GPU评估
|
||||
├── src
|
||||
├── tools
|
||||
├── get_dataset_list.py # 获取数据清单文件
|
||||
|
@ -274,7 +345,7 @@ do
|
|||
echo 'start rank='$i', device id='$DEVICE_ID'...'
|
||||
mkdir ${train_path}/device$DEVICE_ID
|
||||
cd ${train_path}/device$DEVICE_ID
|
||||
ython ${train_code_path}/train.py --train_dir=${train_path}/ckpt \
|
||||
python ${train_code_path}/train.py --train_dir=${train_path}/ckpt \
|
||||
--data_file=/PATH/TO/MINDRECORD_NAME \
|
||||
--train_epochs=300 \
|
||||
--batch_size=32 \
|
||||
|
@ -374,6 +445,10 @@ python train.py --train_url=/PATH/TO/OUTPUT_DIR \
|
|||
--save_steps=410 \
|
||||
```
|
||||
|
||||
#### GPU处理器环境运行
|
||||
|
||||
具体参数配置可参照[快速入门](#快速入门)中8卡训练脚本。
|
||||
|
||||
### 结果
|
||||
|
||||
#### Ascend处理器环境运行
|
||||
|
@ -483,6 +558,10 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
```
|
||||
|
||||
#### GPU处理器环境运行
|
||||
|
||||
具体参数配置可参照[快速入门](#快速入门)中评估测试脚本。
|
||||
|
||||
### 结果
|
||||
|
||||
运行适用的训练脚本获取结果。要获得相同的结果,请按照快速入门中的步骤操作。
|
||||
|
@ -506,21 +585,21 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
|
||||
### 评估性能
|
||||
|
||||
| 参数 | Ascend 910|
|
||||
| -------------------------- | -------------------------------------- |
|
||||
| 模型版本 | DeepLabV3+ |
|
||||
| 资源 | Ascend 910 |
|
||||
| 上传日期 | 2021-03-16 |
|
||||
| MindSpore版本 | 1.1.1 |
|
||||
| 数据集 | PASCAL VOC2012 + SBD |
|
||||
| 训练参数 | epoch = 300, batch_size = 32 (s16_r1) epoch = 800, batch_size = 16 (s8_r1) epoch = 300, batch_size = 16 (s8_r2) |
|
||||
| 优化器 | Momentum |
|
||||
| 损失函数 | Softmax交叉熵 |
|
||||
| 输出 | 概率 |
|
||||
| 损失 | 0.0041095633 |
|
||||
| 性能 | 187736.386 ms(单卡,s16)<br> 44474.187 ms(八卡,s16) |
|
||||
| 微调检查点 | 453M (.ckpt文件) |
|
||||
| 脚本 | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |
|
||||
| 参数 | Ascend 910| GPU |
|
||||
| -------------------------- | -------------------------------------- | -------------------------------------- |
|
||||
| 模型版本 | DeepLabV3+ | DeepLabV3+ |
|
||||
| 资源 | Ascend 910 |NV SMX2 V100-32G|
|
||||
| 上传日期 | 2021-03-16 |2021-08-23|
|
||||
| MindSpore版本 | 1.1.1 |1.4.0|
|
||||
| 数据集 | PASCAL VOC2012 + SBD | PASCAL VOC2012 + SBD |
|
||||
| 训练参数 | epoch = 300, batch_size = 32 (s16_r1) epoch = 800, batch_size = 16 (s8_r1) epoch = 300, batch_size = 16 (s8_r2) |epoch = 300, batch_size = 16 (s16_r1) epoch = 800, batch_size = 8 (s8_r1) epoch = 300, batch_size = 8 (s8_r2) |
|
||||
| 优化器 | Momentum | Momentum |
|
||||
| 损失函数 | Softmax交叉熵 |Softmax交叉熵 |
|
||||
| 输出 | 概率 |概率 |
|
||||
| 损失 | 0.0041095633 |0.003395824|
|
||||
| 性能 | 187736.386 ms(单卡,s16)<br> 44474.187 ms(八卡,s16) | 1080 ms/step(单卡,s16)|
|
||||
| 微调检查点 | 453M (.ckpt文件) | 454M (.ckpt文件)|
|
||||
| 脚本 | [链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |[链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/deeplabv3plus) |
|
||||
|
||||
# 随机情况说明
|
||||
|
||||
|
|
|
@ -24,9 +24,6 @@ from mindspore import context
|
|||
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||
from src.deeplab_v3plus import DeepLabV3Plus
|
||||
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=False,
|
||||
device_id=int(os.getenv('DEVICE_ID')))
|
||||
|
||||
|
||||
def parse_args():
|
||||
"""parse_args"""
|
||||
|
@ -44,6 +41,11 @@ def parse_args():
|
|||
parser.add_argument('--ignore_label', type=int, default=255, help='ignore label')
|
||||
parser.add_argument('--num_classes', type=int, default=21, help='number of classes')
|
||||
|
||||
# device info
|
||||
parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU', 'CPU'],
|
||||
help='device where the code will be implemented. (Default: Ascend)')
|
||||
parser.add_argument('--device_id', type=int, default=0, help='device id')
|
||||
|
||||
# model
|
||||
parser.add_argument('--model', type=str, default='', help='select model')
|
||||
parser.add_argument('--freeze_bn', action='store_true', default=False, help='freeze bn')
|
||||
|
@ -154,7 +156,8 @@ def eval_batch_scales(args, eval_net, img_lst, scales,
|
|||
def net_eval():
|
||||
"""net_eval"""
|
||||
args = parse_args()
|
||||
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, save_graphs=False,
|
||||
device_id=args.device_id)
|
||||
# data list
|
||||
with open(args.data_lst) as f:
|
||||
img_lst = f.readlines()
|
||||
|
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=5
|
||||
DEVICE_ID=5
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_path=/PATH/TO/EXPERIMENTS_DIR
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
|
@ -41,4 +41,5 @@ python ${train_code_path}/train.py --data_file=/PATH/TO/MINDRECORD_NAME \
|
|||
--model=DeepLabV3plus_s16 \
|
||||
--ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
|
||||
--save_steps=1500 \
|
||||
--keep_checkpoint_max=200 >log 2>&1 &
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_id=$DEVICE_ID >log 2>&1 &
|
|
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
|
|||
for((i=0;i<=$RANK_SIZE-1;i++));
|
||||
do
|
||||
export RANK_ID=${i}
|
||||
export DEVICE_ID=$((i + RANK_START_ID))
|
||||
DEVICE_ID=$((i + RANK_START_ID))
|
||||
echo 'start rank='${i}', device id='${DEVICE_ID}'...'
|
||||
mkdir ${train_path}/device${DEVICE_ID}
|
||||
cd ${train_path}/device${DEVICE_ID} || exit
|
||||
|
@ -50,5 +50,6 @@ do
|
|||
--ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
|
||||
--is_distributed \
|
||||
--save_steps=410 \
|
||||
--keep_checkpoint_max=200 >log 2>&1 &
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_id=$DEVICE_ID >log 2>&1 &
|
||||
done
|
||||
|
|
|
@ -0,0 +1,66 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
if [ $# != 2 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_distribute_train_s16_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
|
||||
echo "for example:"
|
||||
echo "bash run_distribute_train_s16_r1_gpu.sh \
|
||||
voc2012/mindrecord_train/vocaug_mindrecord0 resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_FILE=$1
|
||||
CKPT_PRE_TRAINED=$2
|
||||
|
||||
ulimit -c unlimited
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
|
||||
export RANK_SIZE=8
|
||||
export GLOG_v=2
|
||||
|
||||
train_path=s16_train
|
||||
if [ -d ${train_path} ]; then
|
||||
rm -rf ${train_path}
|
||||
fi
|
||||
mkdir -p ${train_path}
|
||||
mkdir ${train_path}/ckpt
|
||||
cp ../*.py ${train_path}
|
||||
cp -r ../src ${train_path}
|
||||
cd ${train_path} || exit
|
||||
|
||||
mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
|
||||
python ./train.py --train_dir=${train_path}/ckpt \
|
||||
--data_file=$DATA_FILE \
|
||||
--train_epochs=300 \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--base_lr=0.04 \
|
||||
--lr_type=cos \
|
||||
--min_scale=0.5 \
|
||||
--max_scale=2.0 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s16 \
|
||||
--ckpt_pre_trained=$CKPT_PRE_TRAINED \
|
||||
--is_distributed \
|
||||
--save_steps=410 \
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_target="GPU" >log 2>&1 &
|
|
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
|
|||
for((i=0;i<=$RANK_SIZE-1;i++));
|
||||
do
|
||||
export RANK_ID=${i}
|
||||
export DEVICE_ID=$((i + RANK_START_ID))
|
||||
DEVICE_ID=$((i + RANK_START_ID))
|
||||
echo 'start rank='${i}', device id='${DEVICE_ID}'...'
|
||||
mkdir ${train_path}/device${DEVICE_ID}
|
||||
cd ${train_path}/device${DEVICE_ID} || exit
|
||||
|
@ -51,5 +51,6 @@ do
|
|||
--ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
|
||||
--is_distributed \
|
||||
--save_steps=820 \
|
||||
--keep_checkpoint_max=200 >log 2>&1 &
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_id=$DEVICE_ID >log 2>&1 &
|
||||
done
|
||||
|
|
|
@ -0,0 +1,67 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
if [ $# != 2 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_distribute_train_s8_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
|
||||
echo "for example:"
|
||||
echo "bash run_distribute_train_s8_r1_gpu.sh \
|
||||
voc2012/mindrecord_train/vocaug_mindrecord0 ckpt/DeepLabV3plus_s16-300_82.ckpt"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_FILE=$1
|
||||
CKPT_PRE_TRAINED=$2
|
||||
|
||||
ulimit -c unlimited
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
|
||||
export RANK_SIZE=8
|
||||
export GLOG_v=2
|
||||
|
||||
train_path=s8_r1_train
|
||||
if [ -d ${train_path} ]; then
|
||||
rm -rf ${train_path}
|
||||
fi
|
||||
mkdir -p ${train_path}
|
||||
mkdir ${train_path}/ckpt
|
||||
cp ../*.py ${train_path}
|
||||
cp -r ../src ${train_path}
|
||||
cd ${train_path} || exit
|
||||
|
||||
|
||||
mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
|
||||
python ./train.py --train_dir=${train_path}/ckpt \
|
||||
--data_file=$DATA_FILE \
|
||||
--train_epochs=800 \
|
||||
--batch_size=8 \
|
||||
--crop_size=513 \
|
||||
--base_lr=0.01 \
|
||||
--lr_type=cos \
|
||||
--min_scale=0.5 \
|
||||
--max_scale=2.0 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s8 \
|
||||
--loss_scale=2048 \
|
||||
--ckpt_pre_trained=$CKPT_PRE_TRAINED \
|
||||
--is_distributed \
|
||||
--save_steps=820 \
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_target="GPU" >log 2>&1 &
|
|
@ -31,7 +31,7 @@ mkdir ${train_path}/ckpt
|
|||
for((i=0;i<=$RANK_SIZE-1;i++));
|
||||
do
|
||||
export RANK_ID=${i}
|
||||
export DEVICE_ID=$((i + RANK_START_ID))
|
||||
DEVICE_ID=$((i + RANK_START_ID))
|
||||
echo 'start rank='${i}', device id='${DEVICE_ID}'...'
|
||||
mkdir ${train_path}/device${DEVICE_ID}
|
||||
cd ${train_path}/device${DEVICE_ID} || exit
|
||||
|
@ -51,5 +51,6 @@ do
|
|||
--ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
|
||||
--is_distributed \
|
||||
--save_steps=110 \
|
||||
--keep_checkpoint_max=200 >log 2>&1 &
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_id=$DEVICE_ID >log 2>&1 &
|
||||
done
|
||||
|
|
|
@ -0,0 +1,70 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
if [ $# != 2 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_distribute_train_s8_r2_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL"
|
||||
echo "for example:"
|
||||
echo "bash run_distribute_train_s8_r2_gpu.sh \
|
||||
voc2012/mindrecord_voctrain/mindrecord_voctrain0 ckpt/DeepLabV3plus_s8-800_165.ckpt"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
||||
DATA_FILE=$1
|
||||
CKPT_PRE_TRAINED=$2
|
||||
|
||||
ulimit -c unlimited
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
|
||||
export RANK_SIZE=8
|
||||
export GLOG_v=2
|
||||
|
||||
train_path=s8_r2_train
|
||||
if [ -d ${train_path} ]; then
|
||||
rm -rf ${train_path}
|
||||
fi
|
||||
mkdir -p ${train_path}
|
||||
mkdir ${train_path}/ckpt
|
||||
cp ../*.py ${train_path}
|
||||
cp -r ../src ${train_path}
|
||||
cd ${train_path} || exit
|
||||
|
||||
|
||||
mpirun -allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
|
||||
python ./train.py --train_dir=${train_path}/ckpt \
|
||||
--data_file=$DATA_FILE \
|
||||
--train_epochs=300 \
|
||||
--batch_size=8 \
|
||||
--crop_size=513 \
|
||||
--base_lr=0.004 \
|
||||
--lr_type=cos \
|
||||
--min_scale=0.5 \
|
||||
--max_scale=2.0 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s8 \
|
||||
--loss_scale=2048 \
|
||||
--ckpt_pre_trained=$CKPT_PRE_TRAINED \
|
||||
--is_distributed \
|
||||
--save_steps=110 \
|
||||
--keep_checkpoint_max=200 \
|
||||
--device_target="GPU" >log 2>&1 &
|
||||
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -33,5 +33,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--model=DeepLabV3plus_s16 \
|
||||
--scales=1.0 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
||||
|
||||
|
|
|
@ -0,0 +1,56 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s16_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s16_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s16_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=32 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s16 \
|
||||
--scales=1.0 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
||||
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -37,4 +37,5 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -38,5 +38,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--scales=1.75 \
|
||||
--flip \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
||||
|
||||
|
|
|
@ -0,0 +1,63 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s16_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s16_multiscale_flip_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s16_multiscale_flip_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s16 \
|
||||
--scales=0.5 \
|
||||
--scales=0.75 \
|
||||
--scales=1.0 \
|
||||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--flip \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
||||
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s16_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s16_multiscale_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s16_multiscale_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s16 \
|
||||
--scales=0.5 \
|
||||
--scales=0.75 \
|
||||
--scales=1.0 \
|
||||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -33,5 +33,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--model=DeepLabV3plus_s8 \
|
||||
--scales=1.0 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
||||
|
||||
|
|
|
@ -0,0 +1,59 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s8_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s8_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s8_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s8 \
|
||||
--scales=1.0 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
||||
|
||||
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -37,5 +37,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
||||
|
||||
|
|
|
@ -14,7 +14,7 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
export DEVICE_ID=3
|
||||
DEVICE_ID=3
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
train_code_path=/PATH/TO/MODEL_ZOO_CODE
|
||||
eval_path=/PATH/TO/EVAL
|
||||
|
@ -38,5 +38,6 @@ python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
|
|||
--scales=1.75 \
|
||||
--flip \
|
||||
--freeze_bn \
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
|
||||
--ckpt_path=/PATH/TO/PRETRAIN_MODEL \
|
||||
--device_id=$DEVICE_ID >${eval_path}/eval_log 2>&1 &
|
||||
|
||||
|
|
|
@ -0,0 +1,63 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s8_multiscale_flip_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s8_multiscale_flip_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s8_multiscale_flip_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s8 \
|
||||
--scales=0.5 \
|
||||
--scales=0.75 \
|
||||
--scales=1.0 \
|
||||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--flip \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
||||
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
if [ $# != 4 ]
|
||||
then
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash run_eval_s8_multiscale_gpu.sh /PATH/TO/DATA /PATH/TO/DATA_lst.txt /PATH/TO/PRETRAIN_MODEL DEVICE_ID"
|
||||
echo "for example:"
|
||||
echo "bash run_eval_s8_multiscale_gpu.sh \
|
||||
voc2012/VOCdevkit/VOC2012 voc2012/voc_val_lst.txt ckpt/DeepLabV3plus_s16-300_82.ckpt 0"
|
||||
echo "It is better to use absolute path."
|
||||
echo "=============================================================================================================="
|
||||
exit 1
|
||||
fi
|
||||
DATA_ROOT=$1
|
||||
DATA_LST=$2
|
||||
CKPT_PATH=$3
|
||||
export CUDA_VISIBLE_DEVICES=$4
|
||||
|
||||
export SLOG_PRINT_TO_STDOUT=0
|
||||
eval_path=s8_multiscale_eval
|
||||
if [ -d ${eval_path} ]; then
|
||||
rm -rf ${eval_path}
|
||||
fi
|
||||
mkdir -p ${eval_path}
|
||||
cp ../*.py ${eval_path}
|
||||
cp -r ../src ${eval_path}
|
||||
cd ${eval_path} || exit
|
||||
|
||||
python ./eval.py --data_root=$DATA_ROOT \
|
||||
--data_lst=$DATA_LST \
|
||||
--batch_size=16 \
|
||||
--crop_size=513 \
|
||||
--ignore_label=255 \
|
||||
--num_classes=21 \
|
||||
--model=DeepLabV3plus_s8 \
|
||||
--scales=0.5 \
|
||||
--scales=0.75 \
|
||||
--scales=1.0 \
|
||||
--scales=1.25 \
|
||||
--scales=1.75 \
|
||||
--freeze_bn \
|
||||
--ckpt_path=$CKPT_PATH \
|
||||
--device_target="GPU" \
|
||||
--device_id=0 > eval_log 2>&1 &
|
||||
|
|
@ -76,8 +76,9 @@ def parse_args():
|
|||
parser.add_argument('--ckpt_pre_trained', type=str, default='', help='PreTrained model')
|
||||
|
||||
# train
|
||||
parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'CPU'],
|
||||
parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU', 'CPU'],
|
||||
help='device where the code will be implemented. (Default: Ascend)')
|
||||
parser.add_argument('--device_id', type=int, default=0, help='device id')
|
||||
parser.add_argument('--is_distributed', action='store_true', help='distributed training')
|
||||
parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
|
||||
parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
|
||||
|
@ -99,17 +100,16 @@ def parse_args():
|
|||
def train():
|
||||
"""train"""
|
||||
args = parse_args()
|
||||
if args.device_target == "CPU":
|
||||
context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target="CPU")
|
||||
else:
|
||||
context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True, save_graphs=False,
|
||||
device_target="Ascend", device_id=int(os.getenv('DEVICE_ID')))
|
||||
context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target=args.device_target)
|
||||
if args.device_target != "CPU":
|
||||
context.set_context(enable_auto_mixed_precision=True, device_id=args.device_id)
|
||||
|
||||
# init multicards training
|
||||
if args.modelArts_mode:
|
||||
import moxing as mox
|
||||
local_data_url = '/cache/data'
|
||||
local_train_url = '/cache/ckpt'
|
||||
device_id = int(os.getenv('DEVICE_ID'))
|
||||
device_id = args.device_id
|
||||
device_num = int(os.getenv('RANK_SIZE'))
|
||||
if device_num > 1:
|
||||
init()
|
||||
|
@ -169,7 +169,15 @@ def train():
|
|||
# load pretrained model
|
||||
if args.ckpt_pre_trained or args.pretrainedmodel_filename:
|
||||
param_dict = load_checkpoint(ckpt_file)
|
||||
load_param_into_net(train_net, param_dict)
|
||||
if args.model == 'DeepLabV3plus_s16':
|
||||
trans_param_dict = {}
|
||||
for key, val in param_dict.items():
|
||||
key = key.replace("down_sample_layer", "downsample")
|
||||
trans_param_dict[f"network.resnet.{key}"] = val
|
||||
load_param_into_net(train_net, trans_param_dict)
|
||||
else:
|
||||
load_param_into_net(train_net, param_dict)
|
||||
print('load_model {} success'.format(args.ckpt_pre_trained))
|
||||
|
||||
# optimizer
|
||||
iters_per_epoch = dataset.get_dataset_size()
|
||||
|
@ -188,7 +196,7 @@ def train():
|
|||
|
||||
# loss scale
|
||||
manager_loss_scale = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
|
||||
amp_level = "O0" if args.device_target == "CPU" else "O3"
|
||||
amp_level = "O0" if args.device_target != "Ascend" else "O3"
|
||||
model = Model(train_net, optimizer=opt, amp_level=amp_level, loss_scale_manager=manager_loss_scale)
|
||||
|
||||
# callback for saving ckpts
|
||||
|
|
Loading…
Reference in New Issue