!17593 update readme

From: @huchunmei
Reviewed-by: @wuxuejian,@c_34
Signed-off-by: @wuxuejian
This commit is contained in:
mindspore-ci-bot 2021-06-03 15:01:04 +08:00 committed by Gitee
commit 44dd7d994b
14 changed files with 895 additions and 93 deletions

View File

@ -114,10 +114,10 @@ Major parameters in train.py and config.py as follows:
#### Training
- running on Ascend
- Running on Ascend
```bash
python train.py --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
python train.py --config_path default_config.yaml --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
# or enter script dir, and run the script
sh run_standalone_train_ascend.sh cifar-10-batches-bin ckpt
```
@ -139,7 +139,7 @@ Major parameters in train.py and config.py as follows:
- running on GPU
```bash
python train.py --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
python train.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
# or enter script dir, and run the script
sh run_standalone_train_for_gpu.sh cifar-10-batches-bin ckpt
```
@ -164,7 +164,7 @@ Before running the command below, please check the checkpoint path used for eval
- running on Ascend
```bash
python eval.py --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
python eval.py --config_path default_config.yaml --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
# or enter script dir, and run the script
sh run_standalone_eval_ascend.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-1_1562.ckpt
```
@ -179,7 +179,7 @@ Before running the command below, please check the checkpoint path used for eval
- running on GPU
```bash
python eval.py --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
python eval.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
# or enter script dir, and run the script
sh run_standalone_eval_for_gpu.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-30_1562.ckpt
```
@ -196,7 +196,7 @@ Before running the command below, please check the checkpoint path used for eval
### [Export MindIR](#contents)
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
The ckpt_file parameter is required,
@ -225,6 +225,70 @@ Inference result is saved in current path, you can find result like this in acc.
'acc': 0.88772
```
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "distribute=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ckpt_path='/cache/train'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "distribute=True" on the website UI interface.
# Add "data_path=/cache/data" on the website UI interface.
# Add "ckpt_path=/cache/train" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original cifar10 dataset to S3 bucket.
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ckpt_path='/cache/train'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path=/cache/data" on the website UI interface.
# Add "ckpt_path=/cache/train" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original cifar10 dataset to S3 bucket.
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path=/cache/data" on the website UI interface.
# Add "ckpt_file=/cache/train/checkpoint_alexnet-30_1562.ckpt" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your trained model to S3 bucket.
# (4) Upload the original cifar10 dataset to S3 bucket.
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
```
## [Model Description](#contents)
### [Performance](#contents)

View File

@ -71,6 +71,70 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
```
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
```bash
# 在 ModelArts 上使用8卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "distribute=True"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "distribute=True"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ckpt_path='/cache/train'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
# (4) 上传原始 cifar10 数据集到 S3 桶上
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ckpt_path='/cache/train'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
# (4) 上传原始 cifar10 数据集到 S3 桶上
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡验证
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 上传你训练好的模型到 S3 桶上
# (4) 上传原始 cifar10 数据集到 S3 桶上
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
```
## 脚本说明
### 脚本及样例代码
@ -121,7 +185,7 @@ train.py和config.py中主要参数如下
- Ascend处理器环境运行
```bash
python train.py --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
python train.py --config_path default_config.yaml --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
# 或进入脚本目录,执行脚本
sh run_standalone_train_ascend.sh cifar-10-batches-bin ckpt
```
@ -143,7 +207,7 @@ train.py和config.py中主要参数如下
- GPU环境运行
```bash
python train.py --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
python train.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
# 或进入脚本目录,执行脚本
sh run_standalone_train_for_gpu.sh cifar-10-batches-bin ckpt
```
@ -168,7 +232,7 @@ train.py和config.py中主要参数如下
- Ascend处理器环境运行
```bash
python eval.py --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
python eval.py --config_path default_config.yaml --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
#或进入脚本目录,执行脚本
sh run_standalone_eval_ascend.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-1_1562.ckpt
```
@ -183,7 +247,7 @@ train.py和config.py中主要参数如下
- GPU环境运行
```bash
python eval.py --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
python eval.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
#或进入脚本目录,执行脚本
sh run_standalone_eval_for_gpu.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-30_1562.ckpt
```
@ -200,7 +264,7 @@ train.py和config.py中主要参数如下
### [导出MindIR](#contents)
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
参数ckpt_file为必填项

View File

@ -11,8 +11,8 @@ checkpoint_file: './checkpoint/checkpoint_alexnet-30_1562.ckpt'
device_target: Ascend
enable_profiling: False
ckpt_path: "/cache/data"
ckpt_file: "/cache/data/checkpoint_alexnet-30_1562.ckpt"
ckpt_path: "/cache/train"
ckpt_file: "/cache/train/checkpoint_alexnet-30_1562.ckpt"
# ==============================================================================
# Training options
epoch_size: 30

View File

@ -38,7 +38,6 @@ def modelarts_process():
@moxing_wrapper(pre_process=modelarts_process)
def eval_alexnet():
print("============== Starting Testing ==============")
device_num = get_device_num()
if device_num > 1:
# context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
@ -53,8 +52,7 @@ def eval_alexnet():
network = AlexNet(config.num_classes, phase='test')
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
opt = nn.Momentum(network.trainable_params(), config.learning_rate, config.momentum)
ds_eval = create_dataset_cifar10(config, config.data_path, config.batch_size, status="test", \
target=config.device_target)
ds_eval = create_dataset_cifar10(config, config.data_path, config.batch_size, target=config.device_target)
param_dict = load_checkpoint(config.ckpt_path)
print("load checkpoint from [{}].".format(config.ckpt_path))
load_param_into_net(network, param_dict)

View File

@ -66,6 +66,98 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "distribute=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "lr_init=0.00004" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set "epoch_size=250" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "distribute=True" on the website UI interface.
# Add "lr_init=0.00004" on the website UI interface.
# Add "dataset_path=/cache/data" on the website UI interface.
# Add "epoch_size=250" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. First, zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set "epoch_size=250" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "dataset_path='/cache/data'" on the website UI interface.
# Add "epoch_size=250" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
# Set "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
# Add "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'" on the website UI interface.
# Add "dataset_path='/cache/data'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your trained model to S3 bucket.
# (4) Perform a or b. (suggested option a)
# a. First, zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
```
# [Script description](#contents)
## [Script and sample code](#contents)
@ -167,8 +259,8 @@ sh scripts/run_standalone_train_cpu.sh DATA_PATH
```python
# training example
python:
Ascend: python train.py --dataset_path DATA_PATH --platform Ascend
CPU: python train.py --dataset_path DATA_PATH --platform CPU
Ascend: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform Ascend
CPU: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform CPU
shell:
Ascend:
@ -229,8 +321,8 @@ You can start training using python or shell scripts. The usage of shell scripts
```python
# eval example
python:
Ascend: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform Ascend
CPU: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
Ascend: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform Ascend
CPU: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
shell:
Ascend: sh scripts/run_eval.sh DEVICE_ID DATA_PATH PATH_CHECKPOINT
@ -250,7 +342,7 @@ metric: {'Loss': 1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942}
## Model Export
```shell
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
python export.py --config_path CONFIG_FILE --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
```
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
@ -290,7 +382,7 @@ accuracy:78.742
| MindSpore Version | 0.6.0-beta |
| Dataset | 1200k images |
| Batch_size | 128 |
| Training Parameters | src/config.py |
| Training Parameters | src/model_utils/default_config.yaml |
| Optimizer | RMSProp |
| Loss Function | SoftmaxCrossEntropy |
| Outputs | probability |

View File

@ -77,6 +77,98 @@ InceptionV3的总体网络架构如下
- [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html)
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
```bash
# 在 ModelArts 上使用8卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "distribute=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在 default_config.yaml 文件中设置 "lr_init=0.00004"
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "epoch_size=250"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在网页上设置 "distribute=True"
# 在网页上设置 "lr_init=0.00004"
# 在网页上设置 "dataset_path=/cache/data"
# 在网页上设置 "epoch_size=250"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
# (4) 执行a或者b (推荐选择 a)
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "epoch_size=250"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在网页上设置 "dataset_path='/cache/data'"
# 在网页上设置 "epoch_size=250"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
# (4) 执行a或者b (推荐选择 a)
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡验证
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在 default_config.yaml 文件中设置 "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'"
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在网页上设置 "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'"
# 在网页上设置 "dataset_path='/cache/data'"
# 在网页上设置 其他参数
# (2) 准备模型代码
# (3) 上传你训练好的模型到 S3 桶上
# (4) 执行a或者b (推荐选择 a)
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
```
# 脚本说明
## 脚本和样例代码
@ -172,8 +264,8 @@ train.py和config.py中主要参数如下
``` launch
# 训练示例
python:
Ascend: python train.py --dataset_path /dataset/train --platform Ascend
CPU: python train.py --dataset_path DATA_PATH --platform CPU
Ascend: python train.py --config_path default_config.yaml --dataset_path /dataset/train --platform Ascend
CPU: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform CPU
shell:
Ascend:
@ -234,8 +326,8 @@ epoch time: 6358482.104 ms, per step time: 16303.800 ms
``` launch
# 评估示例
python:
Ascend: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend
CPU: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
Ascend: python eval.py --config_path CONFIG_FILE --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend
CPU: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
shell:
Ascend: sh scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
@ -255,7 +347,7 @@ metric:{'Loss':1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942}
## 模型导出
```shell
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
```
`EXPORT_FORMAT` 可选 ["AIR", "MINDIR"]
@ -287,39 +379,39 @@ accuracy:78.742
### 训练性能
| 参数 | Ascend |
| -------------------------- | ---------------------------------------------- |
| 模型版本 | InceptionV3 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8 |
| 上传日期 | 2020-08-21 |
| MindSpore版本 | 0.6.0-beta |
| 数据集 | 120万张图像 |
| Batch_size | 128 |
| 训练参数 | src/config.py |
| 优化器 | RMSProp |
| 损失函数 | Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 1.98 |
| 总时长8卡 | 11小时 |
| 参数(M) | 103M |
| 微调检查点 | 313M |
| 训练速度 | 单卡1050img/s;8卡8000 img/s |
| 脚本 | [inceptionv3脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/inceptionv3) |
| 参数 | Ascend |
| -------------------------- | ------------------------------------------------------- |
| 模型版本 | InceptionV3 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8|
| 上传日期 | 2020-08-21 |
| MindSpore版本 | 0.6.0-beta |
| 数据集 | 120万张图像 |
| Batch_size | 128 |
| 训练参数 | src/model_utils/default_config.yaml |
| 优化器 | RMSProp |
| 损失函数 | Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 1.98 |
| 总时长8卡 | 11小时 |
| 参数(M) | 103M |
| 微调检查点 | 313M |
| 训练速度 | 单卡1050img/s;8卡8000 img/s |
| 脚本 | [inceptionv3脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/inceptionv3) |
#### 推理性能
| 参数 | Ascend |
| 参数 | Ascend |
| ------------------- | --------------------------- |
| 模型版本 | InceptionV3 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8 |
| 上传日期 | 2020-08-22 |
| 模型版本 | InceptionV3 |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8|
| 上传日期 | 2020-08-22 |
| MindSpore 版本 | 0.6.0-beta |
| 数据集 | 5万张图像 |
| Batch_size | 128 |
| 数据集 | 5万张图像 |
| Batch_size | 128 |
| 输出 | 概率 |
| 准确率 | ACC1[78.8%] ACC5[94.2%] |
| 总时长 | 2分钟 |
| 推理模型 | 92M (.onnx文件) |
| 准确率 | ACC1[78.8%] ACC5[94.2%] |
| 总时长 | 2分钟 |
| 推理模型 | 92M (.onnx文件) |
# 随机情况说明

View File

@ -190,6 +190,5 @@ def train_inceptionv3():
model.train(config.epoch_size, dataset, callbacks=callbacks, dataset_sink_mode=config.ds_sink_mode)
print("train success")
if __name__ == '__main__':
train_inceptionv3()

View File

@ -59,6 +59,98 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "distribute=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "lr_init=0.00004" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set "epoch_size=250" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "distribute=True" on the website UI interface.
# Add "lr_init=0.00004" on the website UI interface.
# Add "dataset_path=/cache/data" on the website UI interface.
# Add "epoch_size=250" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. First, zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set "epoch_size=250" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "dataset_path='/cache/data'" on the website UI interface.
# Add "epoch_size=250" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
# Set "checkpoint_path='./inceptionv4/inceptionv4-train-250_1251.ckpt'" on default_config.yaml file.
# Set "dataset_path='/cache/data'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
# Add "checkpoint_path='./inceptionv4/inceptionv4-train-250_1251.ckpt'" on the website UI interface.
# Add "dataset_path='/cache/data'" on the website UI interface.
# (2) Prepare model code
# Add other parameters on the website UI interface.
# (3) Upload or copy your trained model to S3 bucket.
# (4) Perform a or b. (suggested option a)
# a. First, zip MindRecord dataset to one zip file.
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
```
# [Script description](#contents)
## [Script and sample code](#contents)
@ -248,7 +340,7 @@ metric: {'Loss': 0.8144, 'Top1-Acc': 0.8009, 'Top5-Acc': 0.9457}
## Model Export
```shell
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
```
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
@ -288,7 +380,7 @@ accuracy:80.044
| MindSpore Version | 1.0.0 | 1.0.0 |
| Dataset | 1200k images | 1200K images |
| Batch_size | 128 | 128 |
| Training Parameters | src/config.py (Ascend) | src/config.py (GPU) |
| Training Parameters | src/model_utils/default_config.yaml (Ascend) | src/model_utils/default_config.yaml (GPU)|
| Optimizer | RMSProp | RMSProp |
| Loss Function | SoftmaxCrossEntropyWithLogits | SoftmaxCrossEntropyWithLogits |
| Outputs | probability | probability |

View File

@ -78,6 +78,72 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
```
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "distribute=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ckpt_path='/cache/data'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "ckpt_path='/cache/data'" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/lenet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ckpt_path='/cache/data'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "ckpt_path='/cache/data'" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/lenet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
# Set "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
# Add "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your trained model to S3 bucket.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/lenet" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
```
## [Script Description](#contents)
### [Script and Sample Code](#contents)
@ -115,7 +181,7 @@ sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
## [Script Parameters](#contents)
```python
Major parameters in train.py and config.py as follows:
Major parameters in train.py and default_config.yaml as follows:
--data_path: The absolute full path to the train and evaluation datasets.
--epoch_size: Total training epochs.
@ -134,7 +200,7 @@ Major parameters in train.py and config.py as follows:
### Training
```bash
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
python train.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
sh run_standalone_train_ascend.sh Data ckpt
```
@ -160,7 +226,7 @@ The model checkpoint will be saved in the current directory.
Before running the command below, please check the checkpoint path used for evaluation.
```bash
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
python eval.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
```
@ -177,7 +243,7 @@ You can view the results through the file "log.txt". The accuracy of the test da
### Export MindIR
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
The ckpt_file parameter is required,

View File

@ -82,6 +82,69 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
```
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
```bash
# 在 ModelArts 上使用8卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "distribute=True"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/data'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "distribute=True"
# 在网页上设置 "data_path=/cache/data"
# 在网页上设置 "ckpt_path=/cache/data"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
# (5) 在网页上设置你的代码路径为 "/path/lenet"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/data'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ckpt_path='/cache/data'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
# (5) 在网页上设置你的代码路径为 "/path/lenet"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
#
# 在 ModelArts 上使用单卡验证
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'"
# 在网页上设置 其他参数
# (3) 上传你训练好的模型到 S3 桶上
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
# (5) 在网页上设置你的代码路径为 "/path/lenet"
# (6) 在网页上设置启动文件为 "eval.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
```
## 脚本说明
### 脚本及样例代码
@ -119,7 +182,7 @@ sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
## 脚本参数
```python
train.py和config.py中主要参数如下
train.py和default_config.yaml中主要参数如下:
--data_path: 到训练和评估数据集的绝对全路径
--epoch_size: 训练轮次数
@ -136,7 +199,7 @@ train.py和config.py中主要参数如下
### 训练
```bash
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
python train.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
sh run_standalone_train_ascend.sh Data ckpt
```
@ -162,7 +225,7 @@ epoch:1 step:1538, loss is 1.0221305
在运行以下命令之前,请检查用于评估的检查点路径。
```bash
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
python eval.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
```
@ -179,7 +242,7 @@ sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
### 导出MindIR
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
参数ckpt_file为必填项
@ -213,33 +276,33 @@ bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DVPP] [DEVICE_ID]
### 评估性能
| 参数 | LeNet |
| -------------------------- | ----------------------------------------------------------- |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8 |
| 上传日期 | 2020-06-09 |
| MindSpore版本 | 0.5.0-beta |
| 数据集 | MNIST |
| 训练参数 | epoch=10, steps=1875, batch_size = 32, lr=0.01 |
| 优化器 | Momentum |
| 损失函数 | Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 0.002 |
| 速度 | 1.70毫秒/步 |
| 总时长 | 43.1秒 | |
| 微调检查点 | 482k (.ckpt文件) |
| 脚本 | [LeNet脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/lenet) |
| -------------------- | ------------------------------------------------------- |
| 资源 | Ascend 910CPU 2.60GHz192核内存 755G系统 Euler2.8|
| 上传日期 | 2020-06-09 |
| MindSpore版本 | 0.5.0-beta |
| 数据集 | MNIST |
| 训练参数 | epoch=10, steps=1875, batch_size = 32, lr=0.01 |
| 优化器 | Momentum |
| 损失函数 | Softmax交叉熵 |
| 输出 | 概率 |
| 损失 | 0.002 |
| 速度 | 1.70毫秒/步 |
| 总时长 | 43.1秒 |
| 微调检查点 | 482k (.ckpt文件) |
| 脚本 | [LeNet脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/lenet) |
### 推理性能
| 参数 | Ascend |
| ------------- | ----------------------------|
| 模型版本 | LeNet |
| 资源 | Ascend 310系统 CentOS 3.10 |
| 上传日期 | 2021-05-07 |
| MindSpore版本 | 1.2.0 |
| 数据集 | Mnist |
| batch_size | 1 |
| 输出 | Accuracy |
| 准确率 | Accuracy=0.9843 |
| 参数 | Ascend |
| --------------- | -----------------------------|
| 模型版本 | LeNet |
| 资源 | Ascend 310系统 CentOS 3.10 |
| 上传日期 | 2021-05-07 |
| MindSpore版本 | 1.2.0 |
| 数据集 | Mnist |
| batch_size | 1 |
| 输出 | Accuracy |
| 准确率 | Accuracy=0.9843 |
| 推理模型 | 482K.ckpt文件 |
## 随机情况说明

View File

@ -175,6 +175,121 @@ bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] [DATA_PATH]
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [ANN_FILE] [DEVICE_ID]
```
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "distribute=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
# Set "base_lr=0.02" on default_config.yaml file.
# Set "mindrecord_dir='./MindRecord_COCO'" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
# Set "epoch_size=12" on default_config.yaml file.
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
# Add "distribute=True" on the website UI interface.
# Add "base_lr=0.02" on the website UI interface.
# Add "mindrecord_dir='./MindRecord_COCO'" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "ann_file='./annotations/instances_val2017.json'" on the website UI interface.
# Add "epoch_size=12" on the website UI interface.
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. First, run "train.py" like the following to create MindRecord dataset locally from coco2017.
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
# Second, zip MindRecord dataset to one zip file.
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
# Set "mindrecord_dir='./MindRecord_COCO'" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
# Set "epoch_size=12" on default_config.yaml file.
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
# Add "mindrecord_dir='./MindRecord_COCO'" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "ann_file='./annotations/instances_val2017.json'" on the website UI interface.
# Add "epoch_size=12" on the website UI interface.
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Perform a or b. (suggested option a)
# a. First, run "train.py" like the following to create MindRecord dataset locally from coco2017.
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
# Second, zip MindRecord dataset to one zip file.
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
# Set "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
# Set "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'" on default_config.yaml file.
# Set "data_path='/cache/data'" on default_config.yaml file.
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
# Add "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on the website UI interface.
# Set "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'" on default_config.yaml file.
# Add "data_path='/cache/data'" on the website UI interface.
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
# Add other parameters on the website UI interface.
# (2) Prepare model code
# (3) Upload or copy your trained model to S3 bucket.
# (4) Perform a or b. (suggested option a)
# a. First, run "eval.py" like the following to create MindRecord dataset locally from coco2017.
# "python eval.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH \
# --checkpoint_path=$CHECKPOINT_PATH"
# Second, zip MindRecord dataset to one zip file.
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
# b. Upload the original coco dataset to S3 bucket.
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
```
# [Script Description](#contents)
## [Script and Sample Code](#contents)
@ -503,7 +618,7 @@ Accumulating evaluation results...
## Model Export
```shell
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
```
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]

View File

@ -169,6 +169,119 @@ bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT]
bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
```
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
```bash
# 在 ModelArts 上使用8卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "distribute=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在 default_config.yaml 文件中设置 "base_lr=0.02"
# 在 default_config.yaml 文件中设置 "mindrecord_dir='./MindRecord_COCO'"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
# 在 default_config.yaml 文件中设置 "epoch_size=12"
# 在 default_config.yaml 文件中设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在网页上设置 "distribute=True"
# 在网页上设置 "base_lr=0.02"
# 在网页上设置 "mindrecord_dir='./MindRecord_COCO'"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
# 在网页上设置 "epoch_size=12"
# 在网页上设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
# (3) 执行a或者b (推荐选择 a)
# a. 第一, 根据以下方式在本地运行 "train.py" 脚本来生成 MindRecord 格式的数据集。
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
# 第二, 将该数据集压缩为一个 ".zip" 文件。
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
# (5) 在网页上设置启动文件为 "train.py"
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (7) 创建训练作业
#
# 在 ModelArts 上使用单卡训练
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在 default_config.yaml 文件中设置 "mindrecord_dir='./MindRecord_COCO'"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
# 在 default_config.yaml 文件中设置 "epoch_size=12"
# 在 default_config.yaml 文件中设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在网页上设置 "mindrecord_dir='./MindRecord_COCO'"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
# 在网页上设置 "epoch_size=12"
# 在网页上设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (2) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
# (3) 执行a或者b (推荐选择 a)
# a. 第一, 根据以下方式在本地运行 "train.py" 脚本来生成 MindRecord 格式的数据集。
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
# 第二, 将该数据集压缩为一个 ".zip" 文件。
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
# (5) 在网页上设置启动文件为 "train.py"
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (7) 创建训练作业
#
# 在 ModelArts 上使用单卡验证
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在 default_config.yaml 文件中设置 "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# 在 default_config.yaml 文件中设置 "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'"
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "need_modelarts_dataset_unzip=True"
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
# 在网页上设置 "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
# 在网页上设置 "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'"
# 在网页上设置 "data_path='/cache/data'"
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
# 在网页上设置 其他参数
# (2) 上传你训练好的模型到 S3 桶上
# (3) 执行a或者b (推荐选择 a)
# a. 第一, 根据以下方式在本地运行 "eval.py" 脚本来生成 MindRecord 格式的数据集。
# "python eval.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH \
# --checkpoint_path=$CHECKPOINT_PATH"
# 第二, 将该数据集压缩为一个 ".zip" 文件。
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
# b. 上传原始 coco 数据集到 S3 桶上。
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
# (5) 在网页上设置启动文件为 "eval.py"
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (7) 创建训练作业
```
# 脚本说明
## 脚本和样例代码
@ -501,7 +614,7 @@ sh run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH]
## 模型导出
```shell
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
```
`EXPORT_FORMAT` 选项 ["AIR", "MINDIR"]

View File

@ -114,6 +114,29 @@ After installing MindSpore via the official website and Dataset is correctly gen
sh run_train_ascend.sh [DATASET_NAME]
```
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
# Train/eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
# Set "data_dir='/cache/data'" on default_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_dir='/cache/data'" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original Cora/Citeseer dataset to S3 bucket.
# (5) Set the code directory to "/path/gat" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
#
```
## [Script Description](#contents)
## [Script and Sample Code](#contents)
@ -142,7 +165,7 @@ After installing MindSpore via the official website and Dataset is correctly gen
## [Script Parameters](#contents)
Parameters for both training and evaluation can be set in config.py.
Parameters for both training and evaluation can be set in default_config.yaml.
- config for GAT, CORA dataset
@ -191,7 +214,7 @@ Parameters for both training and evaluation can be set in config.py.
### [Export MindIR](#contents)
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [CONFIG_PATH]--ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
The ckpt_file parameter is required,

View File

@ -112,6 +112,27 @@
sh run_train_ascend.sh [DATASET_NAME]
```
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
```bash
# 在 ModelArts 上使用单卡训练验证
# (1) 执行a或者b
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
# 在 default_config.yaml 文件中设置 "data_dir='/cache/data'"
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在 default_config.yaml 文件中设置 其他参数
# b. 在网页上设置 "enable_modelarts=True"
# 在网页上设置 "data_dir='/cache/data'"
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
# 在网页上设置 其他参数
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
# (4) 上传原始 Cora/Citeseer 数据集到 S3 桶上。
# (5) 在网页上设置你的代码路径为 "/path/gat"
# (6) 在网页上设置启动文件为 "train.py"
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
# (8) 创建训练作业
```
## 脚本说明
### 脚本及样例代码