forked from mindspore-Ecosystem/mindspore
!17593 update readme
From: @huchunmei Reviewed-by: @wuxuejian,@c_34 Signed-off-by: @wuxuejian
This commit is contained in:
commit
44dd7d994b
|
@ -114,10 +114,10 @@ Major parameters in train.py and config.py as follows:
|
|||
|
||||
#### Training
|
||||
|
||||
- running on Ascend
|
||||
- Running on Ascend
|
||||
|
||||
```bash
|
||||
python train.py --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
python train.py --config_path default_config.yaml --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_train_ascend.sh cifar-10-batches-bin ckpt
|
||||
```
|
||||
|
@ -139,7 +139,7 @@ Major parameters in train.py and config.py as follows:
|
|||
- running on GPU
|
||||
|
||||
```bash
|
||||
python train.py --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
python train.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_train_for_gpu.sh cifar-10-batches-bin ckpt
|
||||
```
|
||||
|
@ -164,7 +164,7 @@ Before running the command below, please check the checkpoint path used for eval
|
|||
- running on Ascend
|
||||
|
||||
```bash
|
||||
python eval.py --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
|
||||
python eval.py --config_path default_config.yaml --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_eval_ascend.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-1_1562.ckpt
|
||||
```
|
||||
|
@ -179,7 +179,7 @@ Before running the command below, please check the checkpoint path used for eval
|
|||
- running on GPU
|
||||
|
||||
```bash
|
||||
python eval.py --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
|
||||
python eval.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_eval_for_gpu.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-30_1562.ckpt
|
||||
```
|
||||
|
@ -196,7 +196,7 @@ Before running the command below, please check the checkpoint path used for eval
|
|||
### [Export MindIR](#contents)
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
The ckpt_file parameter is required,
|
||||
|
@ -225,6 +225,70 @@ Inference result is saved in current path, you can find result like this in acc.
|
|||
'acc': 0.88772
|
||||
```
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
# Train 8p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "distribute=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ckpt_path='/cache/train'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "distribute=True" on the website UI interface.
|
||||
# Add "data_path=/cache/data" on the website UI interface.
|
||||
# Add "ckpt_path=/cache/train" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Upload the original cifar10 dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Train 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ckpt_path='/cache/train'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "data_path=/cache/data" on the website UI interface.
|
||||
# Add "ckpt_path=/cache/train" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Upload the original cifar10 dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "data_path=/cache/data" on the website UI interface.
|
||||
# Add "ckpt_file=/cache/train/checkpoint_alexnet-30_1562.ckpt" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your trained model to S3 bucket.
|
||||
# (4) Upload the original cifar10 dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/alexnet" on the website UI interface.
|
||||
# (6) Set the startup file to "eval.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
```
|
||||
|
||||
## [Model Description](#contents)
|
||||
|
||||
### [Performance](#contents)
|
||||
|
|
|
@ -71,6 +71,70 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
|
|||
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
|
||||
```
|
||||
|
||||
- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
```bash
|
||||
# 在 ModelArts 上使用8卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "distribute=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "distribute=True"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ckpt_path='/cache/train'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
|
||||
# (4) 上传原始 cifar10 数据集到 S3 桶上
|
||||
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ckpt_path='/cache/train'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
|
||||
# (4) 上传原始 cifar10 数据集到 S3 桶上
|
||||
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡验证
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ckpt_file='/cache/train/checkpoint_alexnet-30_1562.ckpt'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 上传你训练好的模型到 S3 桶上
|
||||
# (4) 上传原始 cifar10 数据集到 S3 桶上
|
||||
# (5) 在网页上设置你的代码路径为 "/path/alexnet"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
```
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### 脚本及样例代码
|
||||
|
@ -121,7 +185,7 @@ train.py和config.py中主要参数如下:
|
|||
- Ascend处理器环境运行
|
||||
|
||||
```bash
|
||||
python train.py --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
python train.py --config_path default_config.yaml --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
# 或进入脚本目录,执行脚本
|
||||
sh run_standalone_train_ascend.sh cifar-10-batches-bin ckpt
|
||||
```
|
||||
|
@ -143,7 +207,7 @@ train.py和config.py中主要参数如下:
|
|||
- GPU环境运行
|
||||
|
||||
```bash
|
||||
python train.py --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
python train.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-batches-bin --ckpt_path ckpt > log 2>&1 &
|
||||
# 或进入脚本目录,执行脚本
|
||||
sh run_standalone_train_for_gpu.sh cifar-10-batches-bin ckpt
|
||||
```
|
||||
|
@ -168,7 +232,7 @@ train.py和config.py中主要参数如下:
|
|||
- Ascend处理器环境运行
|
||||
|
||||
```bash
|
||||
python eval.py --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
|
||||
python eval.py --config_path default_config.yaml --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-1_1562.ckpt > eval_log.txt 2>&1 &
|
||||
#或进入脚本目录,执行脚本
|
||||
sh run_standalone_eval_ascend.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-1_1562.ckpt
|
||||
```
|
||||
|
@ -183,7 +247,7 @@ train.py和config.py中主要参数如下:
|
|||
- GPU环境运行
|
||||
|
||||
```bash
|
||||
python eval.py --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
|
||||
python eval.py --config_path default_config.yaml --device_target "GPU" --data_path cifar-10-verify-bin --ckpt_path ckpt/checkpoint_alexnet-30_1562.ckpt > eval_log 2>&1 &
|
||||
#或进入脚本目录,执行脚本
|
||||
sh run_standalone_eval_for_gpu.sh cifar-10-verify-bin ckpt/checkpoint_alexnet-30_1562.ckpt
|
||||
```
|
||||
|
@ -200,7 +264,7 @@ train.py和config.py中主要参数如下:
|
|||
### [导出MindIR](#contents)
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
参数ckpt_file为必填项,
|
||||
|
|
|
@ -11,8 +11,8 @@ checkpoint_file: './checkpoint/checkpoint_alexnet-30_1562.ckpt'
|
|||
device_target: Ascend
|
||||
enable_profiling: False
|
||||
|
||||
ckpt_path: "/cache/data"
|
||||
ckpt_file: "/cache/data/checkpoint_alexnet-30_1562.ckpt"
|
||||
ckpt_path: "/cache/train"
|
||||
ckpt_file: "/cache/train/checkpoint_alexnet-30_1562.ckpt"
|
||||
# ==============================================================================
|
||||
# Training options
|
||||
epoch_size: 30
|
||||
|
|
|
@ -38,7 +38,6 @@ def modelarts_process():
|
|||
@moxing_wrapper(pre_process=modelarts_process)
|
||||
def eval_alexnet():
|
||||
print("============== Starting Testing ==============")
|
||||
|
||||
device_num = get_device_num()
|
||||
if device_num > 1:
|
||||
# context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
|
||||
|
@ -53,8 +52,7 @@ def eval_alexnet():
|
|||
network = AlexNet(config.num_classes, phase='test')
|
||||
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
|
||||
opt = nn.Momentum(network.trainable_params(), config.learning_rate, config.momentum)
|
||||
ds_eval = create_dataset_cifar10(config, config.data_path, config.batch_size, status="test", \
|
||||
target=config.device_target)
|
||||
ds_eval = create_dataset_cifar10(config, config.data_path, config.batch_size, target=config.device_target)
|
||||
param_dict = load_checkpoint(config.ckpt_path)
|
||||
print("load checkpoint from [{}].".format(config.ckpt_path))
|
||||
load_param_into_net(network, param_dict)
|
||||
|
|
|
@ -66,6 +66,98 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
|
|||
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
|
||||
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
# Train 8p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "distribute=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "lr_init=0.00004" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "epoch_size=250" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "distribute=True" on the website UI interface.
|
||||
# Add "lr_init=0.00004" on the website UI interface.
|
||||
# Add "dataset_path=/cache/data" on the website UI interface.
|
||||
# Add "epoch_size=250" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Train 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "epoch_size=250" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "dataset_path='/cache/data'" on the website UI interface.
|
||||
# Add "epoch_size=250" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
|
||||
# Set "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
|
||||
# Add "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'" on the website UI interface.
|
||||
# Add "dataset_path='/cache/data'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your trained model to S3 bucket.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv3" on the website UI interface.
|
||||
# (6) Set the startup file to "eval.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
```
|
||||
|
||||
# [Script description](#contents)
|
||||
|
||||
## [Script and sample code](#contents)
|
||||
|
@ -167,8 +259,8 @@ sh scripts/run_standalone_train_cpu.sh DATA_PATH
|
|||
```python
|
||||
# training example
|
||||
python:
|
||||
Ascend: python train.py --dataset_path DATA_PATH --platform Ascend
|
||||
CPU: python train.py --dataset_path DATA_PATH --platform CPU
|
||||
Ascend: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform Ascend
|
||||
CPU: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform CPU
|
||||
|
||||
shell:
|
||||
Ascend:
|
||||
|
@ -229,8 +321,8 @@ You can start training using python or shell scripts. The usage of shell scripts
|
|||
```python
|
||||
# eval example
|
||||
python:
|
||||
Ascend: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform Ascend
|
||||
CPU: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
|
||||
Ascend: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform Ascend
|
||||
CPU: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
|
||||
|
||||
shell:
|
||||
Ascend: sh scripts/run_eval.sh DEVICE_ID DATA_PATH PATH_CHECKPOINT
|
||||
|
@ -250,7 +342,7 @@ metric: {'Loss': 1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942}
|
|||
## Model Export
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
python export.py --config_path CONFIG_FILE --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
```
|
||||
|
||||
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
|
||||
|
@ -290,7 +382,7 @@ accuracy:78.742
|
|||
| MindSpore Version | 0.6.0-beta |
|
||||
| Dataset | 1200k images |
|
||||
| Batch_size | 128 |
|
||||
| Training Parameters | src/config.py |
|
||||
| Training Parameters | src/model_utils/default_config.yaml |
|
||||
| Optimizer | RMSProp |
|
||||
| Loss Function | SoftmaxCrossEntropy |
|
||||
| Outputs | probability |
|
||||
|
|
|
@ -77,6 +77,98 @@ InceptionV3的总体网络架构如下:
|
|||
- [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
|
||||
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html)
|
||||
|
||||
- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
```bash
|
||||
# 在 ModelArts 上使用8卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "distribute=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在 default_config.yaml 文件中设置 "lr_init=0.00004"
|
||||
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "epoch_size=250"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在网页上设置 "distribute=True"
|
||||
# 在网页上设置 "lr_init=0.00004"
|
||||
# 在网页上设置 "dataset_path=/cache/data"
|
||||
# 在网页上设置 "epoch_size=250"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
|
||||
# (4) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "epoch_size=250"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在网页上设置 "dataset_path='/cache/data'"
|
||||
# 在网页上设置 "epoch_size=250"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
|
||||
# (4) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡验证
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在 default_config.yaml 文件中设置 "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'"
|
||||
# 在 default_config.yaml 文件中设置 "dataset_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='imagenet_original'"
|
||||
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在网页上设置 "checkpoint='./inceptionv3/inceptionv3-rank3_1-247_1251.ckpt'"
|
||||
# 在网页上设置 "dataset_path='/cache/data'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 准备模型代码
|
||||
# (3) 上传你训练好的模型到 S3 桶上
|
||||
# (4) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 第二, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (5) 在网页上设置你的代码路径为 "/path/inceptionv3"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
```
|
||||
|
||||
# 脚本说明
|
||||
|
||||
## 脚本和样例代码
|
||||
|
@ -172,8 +264,8 @@ train.py和config.py中主要参数如下:
|
|||
``` launch
|
||||
# 训练示例
|
||||
python:
|
||||
Ascend: python train.py --dataset_path /dataset/train --platform Ascend
|
||||
CPU: python train.py --dataset_path DATA_PATH --platform CPU
|
||||
Ascend: python train.py --config_path default_config.yaml --dataset_path /dataset/train --platform Ascend
|
||||
CPU: python train.py --config_path CONFIG_FILE --dataset_path DATA_PATH --platform CPU
|
||||
|
||||
shell:
|
||||
Ascend:
|
||||
|
@ -234,8 +326,8 @@ epoch time: 6358482.104 ms, per step time: 16303.800 ms
|
|||
``` launch
|
||||
# 评估示例
|
||||
python:
|
||||
Ascend: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend
|
||||
CPU: python eval.py --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
|
||||
Ascend: python eval.py --config_path CONFIG_FILE --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend
|
||||
CPU: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
|
||||
|
||||
shell:
|
||||
Ascend: sh scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
|
||||
|
@ -255,7 +347,7 @@ metric:{'Loss':1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942}
|
|||
## 模型导出
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
```
|
||||
|
||||
`EXPORT_FORMAT` 可选 ["AIR", "MINDIR"]
|
||||
|
@ -288,14 +380,14 @@ accuracy:78.742
|
|||
### 训练性能
|
||||
|
||||
| 参数 | Ascend |
|
||||
| -------------------------- | ---------------------------------------------- |
|
||||
| -------------------------- | ------------------------------------------------------- |
|
||||
| 模型版本 | InceptionV3 |
|
||||
| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755G;系统 Euler2.8|
|
||||
| 上传日期 | 2020-08-21 |
|
||||
| MindSpore版本 | 0.6.0-beta |
|
||||
| 数据集 | 120万张图像 |
|
||||
| Batch_size | 128 |
|
||||
| 训练参数 | src/config.py |
|
||||
| 训练参数 | src/model_utils/default_config.yaml |
|
||||
| 优化器 | RMSProp |
|
||||
| 损失函数 | Softmax交叉熵 |
|
||||
| 输出 | 概率 |
|
||||
|
|
|
@ -190,6 +190,5 @@ def train_inceptionv3():
|
|||
model.train(config.epoch_size, dataset, callbacks=callbacks, dataset_sink_mode=config.ds_sink_mode)
|
||||
print("train success")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
train_inceptionv3()
|
||||
|
|
|
@ -59,6 +59,98 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
|
|||
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
|
||||
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
# Train 8p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "distribute=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "lr_init=0.00004" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "epoch_size=250" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "distribute=True" on the website UI interface.
|
||||
# Add "lr_init=0.00004" on the website UI interface.
|
||||
# Add "dataset_path=/cache/data" on the website UI interface.
|
||||
# Add "epoch_size=250" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Train 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "epoch_size=250" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "dataset_path='/cache/data'" on the website UI interface.
|
||||
# Add "epoch_size=250" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='ImageNet_Original'" on default_config.yaml file.
|
||||
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
|
||||
# Set "checkpoint_path='./inceptionv4/inceptionv4-train-250_1251.ckpt'" on default_config.yaml file.
|
||||
# Set "dataset_path='/cache/data'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='ImageNet_Original'" on the website UI interface.
|
||||
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
|
||||
# Add "checkpoint_path='./inceptionv4/inceptionv4-train-250_1251.ckpt'" on the website UI interface.
|
||||
# Add "dataset_path='/cache/data'" on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# Add other parameters on the website UI interface.
|
||||
# (3) Upload or copy your trained model to S3 bucket.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, zip MindRecord dataset to one zip file.
|
||||
# Second, upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/inceptionv4" on the website UI interface.
|
||||
# (6) Set the startup file to "eval.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
```
|
||||
|
||||
# [Script description](#contents)
|
||||
|
||||
## [Script and sample code](#contents)
|
||||
|
@ -248,7 +340,7 @@ metric: {'Loss': 0.8144, 'Top1-Acc': 0.8009, 'Top5-Acc': 0.9457}
|
|||
## Model Export
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
```
|
||||
|
||||
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
|
||||
|
@ -288,7 +380,7 @@ accuracy:80.044
|
|||
| MindSpore Version | 1.0.0 | 1.0.0 |
|
||||
| Dataset | 1200k images | 1200K images |
|
||||
| Batch_size | 128 | 128 |
|
||||
| Training Parameters | src/config.py (Ascend) | src/config.py (GPU) |
|
||||
| Training Parameters | src/model_utils/default_config.yaml (Ascend) | src/model_utils/default_config.yaml (GPU)|
|
||||
| Optimizer | RMSProp | RMSProp |
|
||||
| Loss Function | SoftmaxCrossEntropyWithLogits | SoftmaxCrossEntropyWithLogits |
|
||||
| Outputs | probability | probability |
|
||||
|
|
|
@ -78,6 +78,72 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
|
|||
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
|
||||
```
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
# Train 8p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "distribute=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ckpt_path='/cache/data'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Add "ckpt_path='/cache/data'" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code.
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Upload the original mnist_data dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/lenet" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Train 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ckpt_path='/cache/data'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Add "ckpt_path='/cache/data'" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code.
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Upload the original mnist_data dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/lenet" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
|
||||
# Set "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
|
||||
# Add "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code.
|
||||
# (3) Upload or copy your trained model to S3 bucket.
|
||||
# (4) Upload the original mnist_data dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/lenet" on the website UI interface.
|
||||
# (6) Set the startup file to "eval.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
```
|
||||
|
||||
## [Script Description](#contents)
|
||||
|
||||
### [Script and Sample Code](#contents)
|
||||
|
@ -115,7 +181,7 @@ sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
|
|||
## [Script Parameters](#contents)
|
||||
|
||||
```python
|
||||
Major parameters in train.py and config.py as follows:
|
||||
Major parameters in train.py and default_config.yaml as follows:
|
||||
|
||||
--data_path: The absolute full path to the train and evaluation datasets.
|
||||
--epoch_size: Total training epochs.
|
||||
|
@ -134,7 +200,7 @@ Major parameters in train.py and config.py as follows:
|
|||
### Training
|
||||
|
||||
```bash
|
||||
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
|
||||
python train.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_train_ascend.sh Data ckpt
|
||||
```
|
||||
|
@ -160,7 +226,7 @@ The model checkpoint will be saved in the current directory.
|
|||
Before running the command below, please check the checkpoint path used for evaluation.
|
||||
|
||||
```bash
|
||||
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
|
||||
python eval.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
|
||||
```
|
||||
|
@ -177,7 +243,7 @@ You can view the results through the file "log.txt". The accuracy of the test da
|
|||
### Export MindIR
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
The ckpt_file parameter is required,
|
||||
|
|
|
@ -82,6 +82,69 @@ sh run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
|
|||
sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
|
||||
```
|
||||
|
||||
- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
```bash
|
||||
# 在 ModelArts 上使用8卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "distribute=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/data'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "distribute=True"
|
||||
# 在网页上设置 "data_path=/cache/data"
|
||||
# 在网页上设置 "ckpt_path=/cache/data"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (3) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
|
||||
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
|
||||
# (5) 在网页上设置你的代码路径为 "/path/lenet"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='/cache/data'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ckpt_path='/cache/data'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
|
||||
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
|
||||
# (5) 在网页上设置你的代码路径为 "/path/lenet"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡验证
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ckpt_file='/cache/data/checkpoint_lenet-10_1875.ckpt'"
|
||||
# 在网页上设置 其他参数
|
||||
# (3) 上传你训练好的模型到 S3 桶上
|
||||
# (4) 上传原始 mnist_data 数据集到 S3 桶上。
|
||||
# (5) 在网页上设置你的代码路径为 "/path/lenet"
|
||||
# (6) 在网页上设置启动文件为 "eval.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
```
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### 脚本及样例代码
|
||||
|
@ -119,7 +182,7 @@ sh run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
|
|||
## 脚本参数
|
||||
|
||||
```python
|
||||
train.py和config.py中主要参数如下:
|
||||
train.py和default_config.yaml中主要参数如下:
|
||||
|
||||
--data_path: 到训练和评估数据集的绝对全路径
|
||||
--epoch_size: 训练轮次数
|
||||
|
@ -136,7 +199,7 @@ train.py和config.py中主要参数如下:
|
|||
### 训练
|
||||
|
||||
```bash
|
||||
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
|
||||
python train.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_train_ascend.sh Data ckpt
|
||||
```
|
||||
|
@ -162,7 +225,7 @@ epoch:1 step:1538, loss is 1.0221305
|
|||
在运行以下命令之前,请检查用于评估的检查点路径。
|
||||
|
||||
```bash
|
||||
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
|
||||
python eval.py --config_path CONFIG_PATH --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
|
||||
# or enter script dir, and run the script
|
||||
sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
|
||||
```
|
||||
|
@ -179,7 +242,7 @@ sh run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
|
|||
### 导出MindIR
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
参数ckpt_file为必填项,
|
||||
|
@ -213,7 +276,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DVPP] [DEVICE_ID]
|
|||
### 评估性能
|
||||
|
||||
| 参数 | LeNet |
|
||||
| -------------------------- | ----------------------------------------------------------- |
|
||||
| -------------------- | ------------------------------------------------------- |
|
||||
| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755G;系统 Euler2.8|
|
||||
| 上传日期 | 2020-06-09 |
|
||||
| MindSpore版本 | 0.5.0-beta |
|
||||
|
@ -224,14 +287,14 @@ bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DVPP] [DEVICE_ID]
|
|||
| 输出 | 概率 |
|
||||
| 损失 | 0.002 |
|
||||
| 速度 | 1.70毫秒/步 |
|
||||
| 总时长 | 43.1秒 | |
|
||||
| 总时长 | 43.1秒 |
|
||||
| 微调检查点 | 482k (.ckpt文件) |
|
||||
| 脚本 | [LeNet脚本](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/lenet) |
|
||||
|
||||
### 推理性能
|
||||
|
||||
| 参数 | Ascend |
|
||||
| ------------- | ----------------------------|
|
||||
| --------------- | -----------------------------|
|
||||
| 模型版本 | LeNet |
|
||||
| 资源 | Ascend 310;系统 CentOS 3.10 |
|
||||
| 上传日期 | 2021-05-07 |
|
||||
|
|
|
@ -175,6 +175,121 @@ bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] [DATA_PATH]
|
|||
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [ANN_FILE] [DEVICE_ID]
|
||||
```
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
# Train 8p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "distribute=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
|
||||
# Set "base_lr=0.02" on default_config.yaml file.
|
||||
# Set "mindrecord_dir='./MindRecord_COCO'" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
|
||||
# Set "epoch_size=12" on default_config.yaml file.
|
||||
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
|
||||
# Add "distribute=True" on the website UI interface.
|
||||
# Add "base_lr=0.02" on the website UI interface.
|
||||
# Add "mindrecord_dir='./MindRecord_COCO'" on the website UI interface.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Add "ann_file='./annotations/instances_val2017.json'" on the website UI interface.
|
||||
# Add "epoch_size=12" on the website UI interface.
|
||||
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, run "train.py" like the following to create MindRecord dataset locally from coco2017.
|
||||
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
|
||||
# Second, zip MindRecord dataset to one zip file.
|
||||
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Train 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
|
||||
# Set "mindrecord_dir='./MindRecord_COCO'" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
|
||||
# Set "epoch_size=12" on default_config.yaml file.
|
||||
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
|
||||
# Add "mindrecord_dir='./MindRecord_COCO'" on the website UI interface.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Add "ann_file='./annotations/instances_val2017.json'" on the website UI interface.
|
||||
# Add "epoch_size=12" on the website UI interface.
|
||||
# Set "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, run "train.py" like the following to create MindRecord dataset locally from coco2017.
|
||||
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
|
||||
# Second, zip MindRecord dataset to one zip file.
|
||||
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
# Eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "need_modelarts_dataset_unzip=True" on default_config.yaml file.
|
||||
# Set "modelarts_dataset_unzip_name='cocodataset'" on default_config.yaml file.
|
||||
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on base_config.yaml file.
|
||||
# Set "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on default_config.yaml file.
|
||||
# Set "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'" on default_config.yaml file.
|
||||
# Set "data_path='/cache/data'" on default_config.yaml file.
|
||||
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "need_modelarts_dataset_unzip=True" on the website UI interface.
|
||||
# Add "modelarts_dataset_unzip_name='cocodataset'" on the website UI interface.
|
||||
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
|
||||
# Add "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'" on the website UI interface.
|
||||
# Set "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'" on default_config.yaml file.
|
||||
# Add "data_path='/cache/data'" on the website UI interface.
|
||||
# Set "ann_file='./annotations/instances_val2017.json'" on default_config.yaml file.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (2) Prepare model code
|
||||
# (3) Upload or copy your trained model to S3 bucket.
|
||||
# (4) Perform a or b. (suggested option a)
|
||||
# a. First, run "eval.py" like the following to create MindRecord dataset locally from coco2017.
|
||||
# "python eval.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH \
|
||||
# --checkpoint_path=$CHECKPOINT_PATH"
|
||||
# Second, zip MindRecord dataset to one zip file.
|
||||
# Finally, Upload your zip dataset to S3 bucket.(you could also upload the origin mindrecord dataset, but it can be so slow.)
|
||||
# b. Upload the original coco dataset to S3 bucket.
|
||||
# (Data set conversion occurs during training process and costs a lot of time. it happens every time you train.)
|
||||
# (5) Set the code directory to "/path/maskrcnn" on the website UI interface.
|
||||
# (6) Set the startup file to "eval.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
```
|
||||
|
||||
# [Script Description](#contents)
|
||||
|
||||
## [Script and Sample Code](#contents)
|
||||
|
@ -503,7 +618,7 @@ Accumulating evaluation results...
|
|||
## Model Export
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
python export.py --config_path [CONFIG_FILE] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
```
|
||||
|
||||
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
|
||||
|
|
|
@ -169,6 +169,119 @@ bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT]
|
|||
bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
|
||||
```
|
||||
|
||||
- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
```bash
|
||||
|
||||
# 在 ModelArts 上使用8卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "distribute=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在 default_config.yaml 文件中设置 "base_lr=0.02"
|
||||
# 在 default_config.yaml 文件中设置 "mindrecord_dir='./MindRecord_COCO'"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在 default_config.yaml 文件中设置 "epoch_size=12"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在网页上设置 "distribute=True"
|
||||
# 在网页上设置 "base_lr=0.02"
|
||||
# 在网页上设置 "mindrecord_dir='./MindRecord_COCO'"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在网页上设置 "epoch_size=12"
|
||||
# 在网页上设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 如果选择微调您的模型,请上传你的预训练模型到 S3 桶上
|
||||
# (3) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 根据以下方式在本地运行 "train.py" 脚本来生成 MindRecord 格式的数据集。
|
||||
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
|
||||
# 第二, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
|
||||
# (5) 在网页上设置启动文件为 "train.py"
|
||||
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (7) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡训练
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在 default_config.yaml 文件中设置 "mindrecord_dir='./MindRecord_COCO'"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在 default_config.yaml 文件中设置 "epoch_size=12"
|
||||
# 在 default_config.yaml 文件中设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在网页上设置 "mindrecord_dir='./MindRecord_COCO'"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在网页上设置 "epoch_size=12"
|
||||
# 在网页上设置 "ckpt_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
|
||||
# (3) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 根据以下方式在本地运行 "train.py" 脚本来生成 MindRecord 格式的数据集。
|
||||
# "python train.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH"
|
||||
# 第二, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
|
||||
# (5) 在网页上设置启动文件为 "train.py"
|
||||
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (7) 创建训练作业
|
||||
#
|
||||
# 在 ModelArts 上使用单卡验证
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在 default_config.yaml 文件中设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在 default_config.yaml 文件中设置 "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# 在 default_config.yaml 文件中设置 "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'"
|
||||
# 在 default_config.yaml 文件中设置 "data_path='/cache/data'"
|
||||
# 在 default_config.yaml 文件中设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "need_modelarts_dataset_unzip=True"
|
||||
# 在网页上设置 "modelarts_dataset_unzip_name='cocodataset'"
|
||||
# 在网页上设置 "checkpoint_url='s3://dir_to_your_trained_model/'"
|
||||
# 在网页上设置 "checkpoint_path='./ckpt_maskrcnn/mask_rcnn-12_7393.ckpt'"
|
||||
# 在网页上设置 "mindrecord_file='/cache/data/cocodataset/MindRecord_COCO'"
|
||||
# 在网页上设置 "data_path='/cache/data'"
|
||||
# 在网页上设置 "ann_file='./annotations/instances_val2017.json'"
|
||||
# 在网页上设置 其他参数
|
||||
# (2) 上传你训练好的模型到 S3 桶上
|
||||
# (3) 执行a或者b (推荐选择 a)
|
||||
# a. 第一, 根据以下方式在本地运行 "eval.py" 脚本来生成 MindRecord 格式的数据集。
|
||||
# "python eval.py --only_create_dataset=True --mindrecord_dir=$MINDRECORD_DIR --data_path=$DATA_PATH --ann_file=$ANNO_PATH \
|
||||
# --checkpoint_path=$CHECKPOINT_PATH"
|
||||
# 第二, 将该数据集压缩为一个 ".zip" 文件。
|
||||
# 最后, 上传你的压缩数据集到 S3 桶上 (你也可以上传未压缩的数据集,但那可能会很慢。)
|
||||
# b. 上传原始 coco 数据集到 S3 桶上。
|
||||
# (数据集转换发生在训练过程中,需要花费较多的时间。每次训练的时候都会重新进行转换。)
|
||||
# (4) 在网页上设置你的代码路径为 "/path/maskrcnn"
|
||||
# (5) 在网页上设置启动文件为 "eval.py"
|
||||
# (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (7) 创建训练作业
|
||||
```
|
||||
|
||||
# 脚本说明
|
||||
|
||||
## 脚本和样例代码
|
||||
|
@ -501,7 +614,7 @@ sh run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH]
|
|||
## 模型导出
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
|
||||
```
|
||||
|
||||
`EXPORT_FORMAT` 选项 ["AIR", "MINDIR"]
|
||||
|
|
|
@ -114,6 +114,29 @@ After installing MindSpore via the official website and Dataset is correctly gen
|
|||
sh run_train_ascend.sh [DATASET_NAME]
|
||||
```
|
||||
|
||||
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
|
||||
|
||||
```bash
|
||||
|
||||
# Train/eval 1p with Ascend
|
||||
# (1) Perform a or b.
|
||||
# a. Set "enable_modelarts=True" on default_config.yaml file.
|
||||
# Set "data_dir='/cache/data'" on default_config.yaml file.
|
||||
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on default_config.yaml file.
|
||||
# Set other parameters on default_config.yaml file you need.
|
||||
# b. Add "enable_modelarts=True" on the website UI interface.
|
||||
# Add "data_dir='/cache/data'" on the website UI interface.
|
||||
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
|
||||
# Add other parameters on the website UI interface.
|
||||
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
|
||||
# (4) Upload the original Cora/Citeseer dataset to S3 bucket.
|
||||
# (5) Set the code directory to "/path/gat" on the website UI interface.
|
||||
# (6) Set the startup file to "train.py" on the website UI interface.
|
||||
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (8) Create your job.
|
||||
#
|
||||
```
|
||||
|
||||
## [Script Description](#contents)
|
||||
|
||||
## [Script and Sample Code](#contents)
|
||||
|
@ -142,7 +165,7 @@ After installing MindSpore via the official website and Dataset is correctly gen
|
|||
|
||||
## [Script Parameters](#contents)
|
||||
|
||||
Parameters for both training and evaluation can be set in config.py.
|
||||
Parameters for both training and evaluation can be set in default_config.yaml.
|
||||
|
||||
- config for GAT, CORA dataset
|
||||
|
||||
|
@ -191,7 +214,7 @@ Parameters for both training and evaluation can be set in config.py.
|
|||
### [Export MindIR](#contents)
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [CONFIG_PATH]--ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
The ckpt_file parameter is required,
|
||||
|
|
|
@ -112,6 +112,27 @@
|
|||
sh run_train_ascend.sh [DATASET_NAME]
|
||||
```
|
||||
|
||||
- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
```bash
|
||||
# 在 ModelArts 上使用单卡训练验证
|
||||
# (1) 执行a或者b
|
||||
# a. 在 default_config.yaml 文件中设置 "enable_modelarts=True"
|
||||
# 在 default_config.yaml 文件中设置 "data_dir='/cache/data'"
|
||||
# (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在 default_config.yaml 文件中设置 其他参数
|
||||
# b. 在网页上设置 "enable_modelarts=True"
|
||||
# 在网页上设置 "data_dir='/cache/data'"
|
||||
# (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'"
|
||||
# 在网页上设置 其他参数
|
||||
# (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上
|
||||
# (4) 上传原始 Cora/Citeseer 数据集到 S3 桶上。
|
||||
# (5) 在网页上设置你的代码路径为 "/path/gat"
|
||||
# (6) 在网页上设置启动文件为 "train.py"
|
||||
# (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
|
||||
# (8) 创建训练作业
|
||||
```
|
||||
|
||||
## 脚本说明
|
||||
|
||||
### 脚本及样例代码
|
||||
|
|
Loading…
Reference in New Issue