modify README by adding GPU usage
This commit is contained in:
parent
191b3f0c8c
commit
71a7102573
|
@ -47,7 +47,7 @@ Dataset used: [COCO2017](<https://cocodataset.org/>)
|
|||
|
||||
# Environment Requirements
|
||||
|
||||
- Hardware(Ascend)
|
||||
- Hardware(Ascend/GPU)
|
||||
- Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
|
||||
|
||||
- Docker base image
|
||||
|
@ -96,9 +96,13 @@ Dataset used: [COCO2017](<https://cocodataset.org/>)
|
|||
|
||||
After installing MindSpore via the official website, you can start training and evaluation as follows:
|
||||
|
||||
Note: 1.the first run will generate the mindeocrd file, which will take a long time.
|
||||
2.pretrained model is a resnet50 checkpoint that trained over ImageNet2012.you can train it with [resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet) scripts in modelzoo, and use src/convert_checkpoint.py to get the pretrain model.
|
||||
3.BACKBONE_MODEL is a checkpoint file trained with [resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet) scripts in modelzoo.PRETRAINED_MODEL is a checkpoint file after convert.VALIDATION_JSON_FILE is label file. CHECKPOINT_PATH is a checkpoint file after trained.
|
||||
Note:
|
||||
|
||||
1. the first run will generate the mindeocrd file, which will take a long time.
|
||||
2. pretrained model is a resnet50 checkpoint that trained over ImageNet2012.you can train it with [resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet) scripts in modelzoo, and use src/convert_checkpoint.py to get the pretrain model.
|
||||
3. BACKBONE_MODEL is a checkpoint file trained with [resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet) scripts in modelzoo.PRETRAINED_MODEL is a checkpoint file after convert.VALIDATION_JSON_FILE is label file. CHECKPOINT_PATH is a checkpoint file after trained.
|
||||
|
||||
## Run on Ascend
|
||||
|
||||
```shell
|
||||
|
||||
|
@ -118,7 +122,25 @@ sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
|||
sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH]
|
||||
```
|
||||
|
||||
# Run in docker
|
||||
## Run on GPU
|
||||
|
||||
```shell
|
||||
|
||||
# convert checkpoint
|
||||
python convert_checkpoint.py --ckpt_file=[BACKBONE_MODEL]
|
||||
|
||||
# standalone training
|
||||
sh run_standalone_train_gpu.sh [PRETRAINED_MODEL]
|
||||
|
||||
# distributed training
|
||||
sh run_distribute_train_gpu.sh [DEVICE_NUM] [PRETRAINED_MODEL]
|
||||
|
||||
# eval
|
||||
sh run_eval_gpu.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
|
||||
```
|
||||
|
||||
## Run in docker
|
||||
|
||||
1. Build docker images
|
||||
|
||||
|
@ -169,9 +191,12 @@ sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH]
|
|||
├─ascend310_infer //application for 310 inference
|
||||
├─scripts
|
||||
├─run_standalone_train_ascend.sh // shell script for standalone on ascend
|
||||
├─run_standalone_train_gpu.sh // shell script for standalone on GPU
|
||||
├─run_distribute_train_ascend.sh // shell script for distributed on ascend
|
||||
├─run_distribute_train_gpu.sh // shell script for distributed on GPU
|
||||
├─run_infer_310.sh // shell script for 310 inference
|
||||
└─run_eval_ascend.sh // shell script for eval on ascend
|
||||
└─run_eval_gpu.sh // shell script for eval on GPU
|
||||
├─src
|
||||
├─FasterRcnn
|
||||
├─__init__.py // init file
|
||||
|
@ -201,6 +226,8 @@ sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH]
|
|||
|
||||
### Usage
|
||||
|
||||
#### on Ascend
|
||||
|
||||
```shell
|
||||
# standalone training on ascend
|
||||
sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
|
||||
|
@ -209,6 +236,16 @@ sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
|
|||
sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
|
||||
```
|
||||
|
||||
#### on GPU
|
||||
|
||||
```shell
|
||||
# standalone training on gpu
|
||||
sh run_standalone_train_gpu.sh [PRETRAINED_MODEL]
|
||||
|
||||
# distributed training on gpu
|
||||
sh run_distribute_train_gpu.sh [DEVICE_NUM] [PRETRAINED_MODEL]
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
1. Rank_table.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
|
||||
|
@ -259,11 +296,20 @@ epoch: 12 step: 7393, rpn_loss: 0.00691, rcnn_loss: 0.10168, rpn_cls_loss: 0.005
|
|||
|
||||
### Usage
|
||||
|
||||
#### on Ascend
|
||||
|
||||
```shell
|
||||
# eval on ascend
|
||||
sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
```
|
||||
|
||||
#### on GPU
|
||||
|
||||
```shell
|
||||
# eval on GPU
|
||||
sh run_eval_gpu.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
```
|
||||
|
||||
> checkpoint can be produced in training process.
|
||||
>
|
||||
> Images size in dataset should be equal to the annotation size in VALIDATION_JSON_FILE, otherwise the evaluation result cannot be displayed properly.
|
||||
|
@ -331,34 +377,34 @@ Inference result is saved in current path, you can find result like this in acc.
|
|||
|
||||
### Evaluation Performance
|
||||
|
||||
| Parameters | Ascend |
|
||||
| -------------------------- | ----------------------------------------------------------- |
|
||||
| Model Version | V1 |
|
||||
| Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |
|
||||
| uploaded Date | 08/31/2020 (month/day/year) |
|
||||
| MindSpore Version | 1.0.0 |
|
||||
| Dataset | COCO2017 |
|
||||
| Training Parameters | epoch=12, batch_size=2 |
|
||||
| Optimizer | SGD |
|
||||
| Loss Function | Softmax Cross Entropy ,Sigmoid Cross Entropy,SmoothL1Loss |
|
||||
| Speed | 1pc: 190 ms/step; 8pcs: 200 ms/step |
|
||||
| Total time | 1pc: 37.17 hours; 8pcs: 4.89 hours |
|
||||
| Parameters (M) | 250 |
|
||||
| Scripts | [fasterrcnn script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ----------------------------------------------------------- |----------------------------------------------------------- |
|
||||
| Model Version | V1 | V1 |
|
||||
| Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |V100-PCIE 32G |
|
||||
| uploaded Date | 08/31/2020 (month/day/year) |02/10/2021 (month/day/year) |
|
||||
| MindSpore Version | 1.0.0 |1.2.0 |
|
||||
| Dataset | COCO2017 |COCO2017 |
|
||||
| Training Parameters | epoch=12, batch_size=2 |epoch=12, batch_size=2 |
|
||||
| Optimizer | SGD |SGD |
|
||||
| Loss Function | Softmax Cross Entropy ,Sigmoid Cross Entropy,SmoothL1Loss|Softmax Cross Entropy ,Sigmoid Cross Entropy,SmoothL1Loss|
|
||||
| Speed | 1pc: 190 ms/step; 8pcs: 200 ms/step | 1pc: 320 ms/step; 8pcs: 335 ms/step |
|
||||
| Total time | 1pc: 37.17 hours; 8pcs: 4.89 hours |1pc: 63.09 hours; 8pcs: 8.25 hours |
|
||||
| Parameters (M) | 250 |250 |
|
||||
| Scripts | [fasterrcnn script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) | [fasterrcnn script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) |
|
||||
|
||||
### Inference Performance
|
||||
|
||||
| Parameters | Ascend |
|
||||
| ------------------- | --------------------------- |
|
||||
| Model Version | V1 |
|
||||
| Resource | Ascend 910 |
|
||||
| Uploaded Date | 08/31/2020 (month/day/year) |
|
||||
| MindSpore Version | 1.0.0 |
|
||||
| Dataset | COCO2017 |
|
||||
| batch_size | 2 |
|
||||
| outputs | mAP |
|
||||
| Accuracy | IoU=0.50: 57.6% |
|
||||
| Model for inference | 250M (.ckpt file) |
|
||||
| Parameters | Ascend |GPU |
|
||||
| ------------------- | --------------------------- |--------------------------- |
|
||||
| Model Version | V1 | V1 |
|
||||
| Resource | Ascend 910 |GPU |
|
||||
| Uploaded Date | 08/31/2020 (month/day/year) |02/10/2021 (month/day/year) |
|
||||
| MindSpore Version | 1.0.0 | 1.2.0 |
|
||||
| Dataset | COCO2017 |COCO2017 |
|
||||
| batch_size | 2 |2 |
|
||||
| outputs | mAP |mAP |
|
||||
| Accuracy | IoU=0.50: 58.6% | IoU=0.50: 59.1% |
|
||||
| Model for inference | 250M (.ckpt file) |250M (.ckpt file) |
|
||||
|
||||
# [ModelZoo Homepage](#contents)
|
||||
|
||||
|
|
|
@ -48,7 +48,7 @@ Faster R-CNN是一个两阶段目标检测网络,该网络采用RPN,可以
|
|||
|
||||
# 环境要求
|
||||
|
||||
- 硬件(Ascend)
|
||||
- 硬件(Ascend/GPU)
|
||||
- 使用Ascend处理器来搭建硬件环境。如需试用Ascend处理器,请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com,审核通过即可获得资源。
|
||||
|
||||
- 获取基础镜像
|
||||
|
@ -103,6 +103,8 @@ Faster R-CNN是一个两阶段目标检测网络,该网络采用RPN,可以
|
|||
2. 预训练模型是在ImageNet2012上训练的ResNet-50检查点。你可以使用ModelZoo中 [resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet) 脚本来训练, 然后使用src/convert_checkpoint.py把训练好的resnet50的权重文件转换为可加载的权重文件。
|
||||
3. BACKBONE_MODEL是通过modelzoo中的[resnet50](https://gitee.com/qujianwei/mindspore/tree/master/model_zoo/official/cv/resnet)脚本训练的。PRETRAINED_MODEL是经过转换后的权重文件。VALIDATION_JSON_FILE为标签文件。CHECKPOINT_PATH是训练后的检查点文件。
|
||||
|
||||
## 在Ascend上运行
|
||||
|
||||
```shell
|
||||
|
||||
# 权重文件转换
|
||||
|
@ -121,7 +123,25 @@ sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
|||
sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DEVICE_ID]
|
||||
```
|
||||
|
||||
# 在docker上运行
|
||||
## 在GPU上运行
|
||||
|
||||
```shell
|
||||
|
||||
# 权重文件转换
|
||||
python convert_checkpoint.py --ckpt_file=[BACKBONE_MODEL]
|
||||
|
||||
# 单机训练
|
||||
sh run_standalone_train_gpu.sh [PRETRAINED_MODEL]
|
||||
|
||||
# 分布式训练
|
||||
sh run_distribute_train_gpu.sh [DEVICE_NUM] [PRETRAINED_MODEL]
|
||||
|
||||
# 评估
|
||||
sh run_eval_gpu.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
|
||||
```
|
||||
|
||||
## 在docker上运行
|
||||
|
||||
1. 编译镜像
|
||||
|
||||
|
@ -172,9 +192,12 @@ sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DEVICE_ID]
|
|||
├─ascend310_infer //实现310推理源代码
|
||||
├─scripts
|
||||
├─run_standalone_train_ascend.sh // Ascend单机shell脚本
|
||||
├─run_standalone_train_gpu.sh // GPU单机shell脚本
|
||||
├─run_distribute_train_ascend.sh // Ascend分布式shell脚本
|
||||
├─run_distribute_train_gpu.sh // GPU分布式shell脚本
|
||||
├─run_infer_310.sh // Ascend推理shell脚本
|
||||
└─run_eval_ascend.sh // Ascend评估shell脚本
|
||||
└─run_eval_gpu.sh // GPU评估shell脚本
|
||||
├─src
|
||||
├─FasterRcnn
|
||||
├─__init__.py // init文件
|
||||
|
@ -204,6 +227,8 @@ sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DEVICE_ID]
|
|||
|
||||
### 用法
|
||||
|
||||
#### 在Ascend上运行
|
||||
|
||||
```shell
|
||||
# Ascend单机训练
|
||||
sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
|
||||
|
@ -212,6 +237,16 @@ sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
|
|||
sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
|
||||
```
|
||||
|
||||
#### 在GPU上运行
|
||||
|
||||
```shell
|
||||
# GPU单机训练
|
||||
sh run_standalone_train_gpu.sh [PRETRAINED_MODEL]
|
||||
|
||||
# GPU分布式训练
|
||||
sh run_distribute_train_gpu.sh [DEVICE_NUM] [PRETRAINED_MODEL]
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
1. 运行分布式任务时需要用到RANK_TABLE_FILE指定的rank_table.json。您可以使用[hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)生成该文件。
|
||||
|
@ -262,11 +297,20 @@ epoch: 12 step: 7393, rpn_loss: 0.00691, rcnn_loss: 0.10168, rpn_cls_loss: 0.005
|
|||
|
||||
### 用法
|
||||
|
||||
#### 在Ascend上运行
|
||||
|
||||
```shell
|
||||
# Ascend评估
|
||||
sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
```
|
||||
|
||||
#### 在GPU上运行
|
||||
|
||||
```shell
|
||||
# GPU评估
|
||||
sh run_eval_gpu.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
```
|
||||
|
||||
> 在训练过程中生成检查点。
|
||||
>
|
||||
> 数据集中图片的数量要和VALIDATION_JSON_FILE文件中标记数量一致,否则精度结果展示格式可能出现异常。
|
||||
|
@ -334,34 +378,34 @@ sh run_infer_310.sh [AIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DEVICE_ID]
|
|||
|
||||
### 训练性能
|
||||
|
||||
| 参数 |Ascend |
|
||||
| -------------------------- | ----------------------------------------------------------- |
|
||||
| 模型版本 | V1 |
|
||||
| 资源 | Ascend 910;CPU 2.60GHz,192核;内存:755G |
|
||||
| 上传日期 | 2020/8/31 |
|
||||
| MindSpore版本 | 1.0.0 |
|
||||
| 数据集 | COCO 2017 |
|
||||
| 训练参数 | epoch=12, batch_size=2 |
|
||||
| 优化器 | SGD |
|
||||
| 损失函数 | Softmax交叉熵,Sigmoid交叉熵,SmoothL1Loss |
|
||||
| 速度 | 1卡:190毫秒/步;8卡:200毫秒/步 |
|
||||
| 总时间 | 1卡:37.17小时;8卡:4.89小时 |
|
||||
| 参数(M) | 250 |
|
||||
| 脚本 | [Faster R-CNN脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) |
|
||||
| 参数 |Ascend |GPU |
|
||||
| -------------------------- | ----------------------------------------------------------- |----------------------------------------------------------- |
|
||||
| 模型版本 | V1 |V1 |
|
||||
| 资源 | Ascend 910;CPU 2.60GHz,192核;内存:755G |V100-PCIE 32G |
|
||||
| 上传日期 | 2020/8/31 | 2021/2/10 |
|
||||
| MindSpore版本 | 1.0.0 |1.2.0 |
|
||||
| 数据集 | COCO 2017 |COCO 2017 |
|
||||
| 训练参数 | epoch=12, batch_size=2 |epoch=12, batch_size=2 |
|
||||
| 优化器 | SGD |SGD |
|
||||
| 损失函数 | Softmax交叉熵,Sigmoid交叉熵,SmoothL1Loss |Softmax交叉熵,Sigmoid交叉熵,SmoothL1Loss |
|
||||
| 速度 | 1卡:190毫秒/步;8卡:200毫秒/步 | 1卡:320毫秒/步;8卡:335毫秒/步 |
|
||||
| 总时间 | 1卡:37.17小时;8卡:4.89小时 |1卡:63.09小时;8卡:8.25小时 |
|
||||
| 参数(M) | 250 |250 |
|
||||
| 脚本 | [Faster R-CNN脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) | [Faster R-CNN脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn) |
|
||||
|
||||
### 评估性能
|
||||
|
||||
| 参数 | Ascend |
|
||||
| ------------------- | --------------------------- |
|
||||
| 模型版本 | V1 |
|
||||
| 资源 | Ascend 910 |
|
||||
| 上传日期 | 2020/8/31 |
|
||||
| MindSpore版本 | 1.0.0 |
|
||||
| 数据集 | COCO2017 |
|
||||
| batch_size | 2 |
|
||||
| 输出 | mAP |
|
||||
| 准确率 | IoU=0.50:57.6% |
|
||||
| 推理模型 | 250M(.ckpt文件) |
|
||||
| 参数 | Ascend |GPU |
|
||||
| ------------------- | --------------------------- | --------------------------- |
|
||||
| 模型版本 | V1 |V1 |
|
||||
| 资源 | Ascend 910 |V100-PCIE 32G |
|
||||
| 上传日期 | 2020/8/31 |2021/2/10 |
|
||||
| MindSpore版本 | 1.0.0 |1.2.0 |
|
||||
| 数据集 | COCO2017 |COCO2017 |
|
||||
| batch_size | 2 | 2 |
|
||||
| 输出 | mAP |mAP |
|
||||
| 准确率 | IoU=0.50:58.6% |IoU=0.50:59.1% |
|
||||
| 推理模型 | 250M(.ckpt文件) |250M(.ckpt文件) |
|
||||
|
||||
# ModelZoo主页
|
||||
|
||||
|
|
|
@ -0,0 +1,58 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
if [ $# -ne 1 ]
|
||||
then
|
||||
echo "Usage: sh run_standalone_train_gpu.sh [PRETRAINED_PATH]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
get_real_path(){
|
||||
if [ "${1:0:1}" == "/" ]; then
|
||||
echo "$1"
|
||||
else
|
||||
echo "$(realpath -m $PWD/$1)"
|
||||
fi
|
||||
}
|
||||
|
||||
PATH1=$(get_real_path $1)
|
||||
echo $PATH1
|
||||
|
||||
if [ ! -f $PATH1 ]
|
||||
then
|
||||
echo "error: PRETRAINED_PATH=$PATH1 is not a file"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ulimit -u unlimited
|
||||
export DEVICE_NUM=1
|
||||
export DEVICE_ID=0
|
||||
export RANK_ID=0
|
||||
export RANK_SIZE=1
|
||||
|
||||
if [ -d "train" ];
|
||||
then
|
||||
rm -rf ./train
|
||||
fi
|
||||
mkdir ./train
|
||||
cp ../*.py ./train
|
||||
cp *.sh ./train
|
||||
cp -r ../src ./train
|
||||
cd ./train || exit
|
||||
echo "start training for device $DEVICE_ID"
|
||||
env > env.log
|
||||
python train.py --device_id=$DEVICE_ID --pre_trained=$PATH1 --device_target="GPU" &> log &
|
||||
cd ..
|
Loading…
Reference in New Issue