add newwork run demo

This commit is contained in:
maijianqiang 2021-09-06 09:22:33 +08:00
parent ff81c3ce13
commit 962c7a33a1
18 changed files with 357 additions and 174 deletions

View File

@ -97,15 +97,18 @@ After installing MindSpore via the official website, you can start training and
```python
# run training example default train densenet121 if you want to train densenet100 modify _config_path in /src/model_utils/config.py
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --is_distributed 0 > train.log 2>&1 &
# example: python train.py --net densenet121 --dataset imagenet --train_data_dir /home/DataSet/ImageNet_Original/train/
# run distributed training example
bash scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
# run evaluation example
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
bash scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
@ -118,6 +121,7 @@ After installing MindSpore via the official website, you can start training and
- If you want to train the model on modelarts, you can refer to the [official guidance document] of modelarts (https://support.huaweicloud.com/modelarts/)
```python
# Example of using distributed training densenet121 on modelarts :
# Data set storage method
@ -164,6 +168,7 @@ After installing MindSpore via the official website, you can start training and
# (6) Set the data path of the model on the modelarts interface ".../ImageNet_Original"(choices ImageNet_Original Folder path) ,
# The output path of the model "Output file path" and the log path of the model "Job log path" 。
# (7) Start model inference。
```
- running on GPU
@ -171,6 +176,7 @@ After installing MindSpore via the official website, you can start training and
- For running on GPU, please change `platform` from `Ascend` to `GPU`
```python
# run training example
export CUDA_VISIBLE_DEVICES=0
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &
@ -182,13 +188,15 @@ After installing MindSpore via the official website, you can start training and
python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --eval_data_dir=[DATASET_PATH] --device_target='GPU' --ckpt_files=[CHECKPOINT_PATH] > eval.log 2>&1 &
OR
bash run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]
```
# [Script Description](#contents)
## [Script and Sample Code](#contents)
```text
```densenet
├── model_zoo
├── README.md // descriptions about all the models
├── densenet
@ -227,6 +235,7 @@ After installing MindSpore via the official website, you can start training and
├── export.py // Export script
├── densenet100_config.yaml // config file
├── densenet100_config.yaml // config file
```
## [Script Parameters](#contents)
@ -234,6 +243,7 @@ After installing MindSpore via the official website, you can start training and
You can modify the training behaviour through the various flags in the `densenet100.yaml/densenet121.yaml` script. Flags in the `densenet100.yaml/densenet121.yaml` script are as follows:
```densenet100.yaml/densenet121.yaml
--train_data_dir train data dir
--num_classes num of classes in dataset(default:1000 for densenet121; 10 for densenet100)
--image_size image size of the dataset
@ -258,6 +268,7 @@ You can modify the training behaviour through the various flags in the `densenet
--is_distributed if multi device(default: 1)
--rank local rank of distributed(default: 0)
--group_size world size of distributed(default: 1)
```
## [Training Process](#contents)
@ -267,26 +278,31 @@ You can modify the training behaviour through the various flags in the `densenet
- running on Ascend
```python
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
```
The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows:
```shell
```log
2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
2020-08-22 16:58:56,619:INFO:local passed
2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
2020-08-22 17:02:19,921:INFO:local passed
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
2020-08-22 17:05:43,113:INFO:local passed
...
```
- running on GPU
```python
export CUDA_VISIBLE_DEVICES=0
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &
```
The python command above will run in the background, you can view the results through the file `train.log`.
@ -296,7 +312,9 @@ You can modify the training behaviour through the various flags in the `densenet
- running on CPU
```python
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 &
```
The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`.
@ -306,27 +324,32 @@ You can modify the training behaviour through the various flags in the `densenet
- running on Ascend
```bash
bash scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
```
The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows:
```log
2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
...
...
```
- running on GPU
```bash
cd scripts
bash run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH]
```
The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`.
@ -340,16 +363,21 @@ You can modify the training behaviour through the various flags in the `densenet
running the command below for evaluation.
```python
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
bash scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows:
```log
2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
```
- evaluation on GPU
@ -357,22 +385,28 @@ You can modify the training behaviour through the various flags in the `densenet
running the command below for evaluation.
```python
python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --eval_data_dir=[DATASET_PATH] --device_target='GPU' --ckpt_files=[CHECKPOINT_PATH] > eval.log 2>&1 &
OR
bash run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]
```
The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows:
```log
2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30%
2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77%
```
The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows:
```log
2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51%
```
- evaluation on CPU
@ -380,13 +414,17 @@ You can modify the training behaviour through the various flags in the `densenet
running the command below for evaluation.
```python
python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --eval_data_dir=[DATASET_PATH] --device_target='CPU' --ckpt_files=[CHECKPOINT_PATH] > eval.log 2>&1 &
```
The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows:
```log
2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07%
```
## [Export Process](#contents)
@ -394,7 +432,9 @@ You can modify the training behaviour through the various flags in the `densenet
### export
```shell
python export.py --net [NET_NAME] --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format [EXPORT_FORMAT] --batch_size [BATCH_SIZE]
```
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
@ -402,6 +442,7 @@ python export.py --net [NET_NAME] --ckpt_file [CKPT_PATH] --device_target [DEVIC
- Export MindIR on Modelarts
```Modelarts
Export MindIR example on ModelArts
Data storage method is the same as training
# (1) Choose either a (modify yaml file parameters) or b (modelArts create training job to modify parameters)。
@ -418,6 +459,7 @@ Data storage method is the same as training
# (4) Set the model's startup file on the modelarts interface "export.py" 。
# (5) Set the data path of the model on the modelarts interface ".../ImageNet_Original/checkpoint"(choices ImageNet_Original/checkpoint Folder path) ,
# The output path of the model "Output file path" and the log path of the model "Job log path" 。
```
## [Inference Process](#contents)
@ -427,8 +469,10 @@ Data storage method is the same as training
Before performing inference, we need to export the model first. Air model can only be exported in Ascend 910 environment, mindir can be exported in any environment.
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [DATA_PATH] [LABEL_FILE] [DEVICE_ID]
```
-NOTE:Ascend310 inference use Imagenet dataset . The label of the image is the number of folder which is started from 0 after sorting.
@ -437,8 +481,10 @@ Inference result is saved in current path, you can find result like this in acc.
The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows:
```log
2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.56%
2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.74%
```
# [Model Description](#contents)

View File

@ -99,18 +99,22 @@ DenseNet-100使用的数据集 Cifar-10
- Ascend处理器环境运行
```默认训练densenet121训练densenet100需要修改 _config_path值参数路径: src/model_utils/config.py
# 单卡训练示例
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
```默认训练densenet121训练densenet100需要修改 _config_path值参数路径: src/model_utils/config.py
# 分布式训练示例
bash scripts/run_distribute_train.sh 8 /PATH/TO/RANK_TABLE.JSON [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
# 单卡训练示例
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
# example: python train.py --net densenet121 --dataset imagenet --train_data_dir /home/DataSet/ImageNet_Original/train/
# 单卡评估示例
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
# 分布式训练示例
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
bash scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
```
# 单卡评估示例
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
分布式训练需要提前创建JSON格式的HCCL配置文件。
@ -267,66 +271,67 @@ DenseNet-100使用的数据集 Cifar-10
- Ascend处理器环境运行
```python
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
```
```python
python train.py --net [NET_NAME] --dataset [DATASET_NAME] --train_data_dir /PATH/TO/DATASET --train_pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
```
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下
```log
2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
2020-08-22 16:58:56,619:INFO:local passed
2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
2020-08-22 17:02:19,921:INFO:local passed
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
2020-08-22 17:05:43,113:INFO:local passed
...
```
```log
2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
2020-08-22 16:58:56,619:INFO:local passed
2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
2020-08-22 17:02:19,921:INFO:local passed
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
2020-08-22 17:05:43,113:INFO:local passed
...
```
- GPU处理器环境运行
```python
export CUDA_VISIBLE_DEVICES=0
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &
```
```python
export CUDA_VISIBLE_DEVICES=0
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &
```
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。
- CPU处理器环境运行
```python
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 &
```
```python
python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --train_data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 &
```
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。
以上python命令在后台运行在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。
### 分布式训练
- Ascend处理器环境运行
```shell
bash scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
```
```shell
bash scripts/run_distribute_train.sh [DEVICE_NUM] [RANK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [TRAIN_DATA_DIR]
# example bash scripts/run_distribute_train.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/
```
上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下
```log
2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
...
...
```
```log
2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
...
...
```
- GPU处理器环境运行
```bash
cd scripts
bash run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH]
```
```bash
cd scripts
bash run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH]
```
上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。
@ -341,7 +346,8 @@ DenseNet-100使用的数据集 Cifar-10
```eval
python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --eval_data_dir /PATH/TO/DATASET --ckpt_files /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
bash scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
bash scripts/run_distribute_eval.sh [DEVICE_NUM] [RANDK_TABLE_FILE] [NET_NAME] [DATASET_NAME] [EVAL_DATA_DIR][CKPT_PATH]
# example: bash script/run_distribute_eval.sh 8 /root/hccl_8p_01234567_10.155.170.71.json densenet121 imagenet /home/DataSet/ImageNet_Original/train/validation_preprocess/ /home/model/densenet/ckpt/0-120_500.ckpt
```
上述python命令在后台运行。可以通过“output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log”文件查看结果。DenseNet-121在ImageNet的测试数据集的准确率如下

View File

@ -94,12 +94,14 @@ To train the DPNs, run the shell script `scripts/train_standalone.sh` with the f
```shell
bash scripts/train_standalone.sh [device_id] [train_data_dir] [ckpt_path_to_save] [eval_each_epoch] [pretrained_ckpt(optional)]
# example: bash scripts/train_standalone.sh 0 /home/DataSet/ImageNet_Original/train/ ./ckpt 0
```
To validate the DPNs, run the shell script `scripts/eval.sh` with the format below:
```shell
bash scripts/eval.sh [device_id] [eval_data_dir] [checkpoint_path]
# example bash scripts/eval.sh 0 /home/DataSet/ImageNet_Original/validation_preprocess/ /home/model/dpn/ckpt/dpn-100_40036.ckpt
```
# [Script Description](#contents)
@ -184,12 +186,7 @@ Run `scripts/train_standalone.sh` to train the model standalone. The usage of th
```shell
bash scripts/train_standalone.sh [device_id] [train_data_dir] [ckpt_path_to_save] [eval_each_epoch] [pretrained_ckpt(optional)]
```
For example, you can run the shell command below to launch the training procedure.
```shell
bash scripts/train_standalone.sh 0 /data/dataset/imagenet/ scripts/pretrian/ 0
# example: bash scripts/train_standalone.sh 0 /home/DataSet/ImageNet_Original/train/ ./ckpt 0
```
If eval_each_epoch is 1, it will evaluate after each epoch and save the parameters with the max accuracy. But in this case, the time of one epoch will be longer.
@ -231,12 +228,7 @@ Run `scripts/train_distributed.sh` to train the model distributed. The usage of
```text
bash scripts/train_distributed.sh [rank_table] [train_data_dir] [ckpt_path_to_save] [rank_size] [eval_each_epoch] [pretrained_ckpt(optional)]
```
For example, you can run the shell command below to launch the training procedure.
```shell
bash scripts/train_distributed.sh /home/rank_table.json /data/dataset/imagenet/ ../scripts 8 0 ../pretrain/dpn92.ckpt
# example: bash scripts/train_distributed.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/train/ ./ckpt/ 8 0
```
The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log.txt` as follows:
@ -259,12 +251,7 @@ Run `scripts/eval.sh` to evaluate the model with one Ascend processor. The usage
```text
bash scripts/eval.sh [device_id] [eval_data_dir] [checkpoint_path]
```
For example, you can run the shell command below to launch the validation procedure.
```text
bash scripts/eval.sh 0 /data/dataset/imagenet/ pretrain/dpn-180_5004.ckpt
# example bash scripts/eval.sh 0 /home/DataSet/ImageNet_Original/validation_preprocess/ /home/model/dpn/ckpt/dpn-100_40036.ckpt
```
The above shell script will run evaluation in the background. You can view the results through the file `eval_log.txt`. The result will be achieved as follows:

View File

@ -80,11 +80,16 @@ Dataset used [ICDAR 2015](https://rrc.cvc.uab.es/?ch=4&com=downloads)
```bash
# distribute training example(8p)
sh run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
bash run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt /root/hccl_8p_01234567_10.155.170.71.json
# standalone training
sh run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
bash run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
# example: bash run_standalone_train_ascend.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt 0
# evaluation:
sh run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
bash run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh /home/DataSet/ICDAR2015/ch4_test_images/ home/model/east/ckpt/checkpoint_east-600_15.ckpt
```
> Notes:
@ -100,9 +105,12 @@ sh run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
shell:
Ascend:
# distribute training example(8p)
sh run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
bash run_distribute_train.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [RANK_TABLE_FILE]
# example: bash run_distribute_train.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt /root/hccl_8p_01234567_10.155.170.71.json
# standalone training
sh run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
bash run_standalone_train_ascend.sh [DATASET_PATH] [PRETRAINED_BACKBONE] [DEVICE_ID]
# example: bash run_standalone_train_ascend.sh /home/DataSet/ICDAR2015/ic15/ home/model/east/pretrained/0-150_5004.ckpt 0
```
### Result
@ -199,7 +207,8 @@ You can start training using python or shell scripts. The usage of shell scripts
- Ascend:
```bash
sh run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
bash run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh /home/DataSet/ICDAR2015/ch4_test_images/ home/model/east/ckpt/checkpoint_east-600_15.ckpt
```
### Launch
@ -224,7 +233,8 @@ You can start training using python or shell scripts. The usage of shell scripts
# eval example
shell:
Ascend:
sh run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
bash run_eval_ascend.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh /home/DataSet/ICDAR2015/ch4_test_images/ home/model/east/ckpt/checkpoint_east-600_15.ckpt
```
> checkpoint can be produced in training process.

View File

@ -90,17 +90,28 @@ After installing MindSpore via the official website, you can start training and
- running on Ascend
```yaml
# Add data set path, take training cifar10 as an example
train_data_path:/home/DataSet/cifar10/
val_data_path:/home/DataSet/cifar10/
# Add checkpoint path parameters before inference
chcekpoint_path:/home/model/googlenet/ckpt/train_googlenet_cifar10-125_390.ckpt
```
```python
# run training example
python train.py > train.log 2>&1 &
# run distributed training example
bash scripts/run_train.sh rank_table.json
bash scripts/run_train.sh [RANK_TABLE_FILE] [DATASET_NAME]
# example: bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
# run evaluation example
python eval.py > eval.log 2>&1 &
OR
bash run_eval.sh
bash run_eval.sh [DATASET_NAME]
# example: bash run_eval.sh cifar10
# run inferenct example
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [DATA_PATH] [LABEL_FILE] [DEVICE_ID]
@ -390,7 +401,7 @@ For more configuration details, please refer the script `config.py`.
- running on Ascend
```bash
bash scripts/run_train.sh rank_table.json
bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
```
The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log`. The loss value will be achieved as follows:
@ -425,7 +436,7 @@ For more configuration details, please refer the script `config.py`.
```python
python eval.py > eval.log 2>&1 &
OR
bash scripts/run_eval.sh
bash run_eval.sh cifar10
```
The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
@ -460,7 +471,7 @@ For more configuration details, please refer the script `config.py`.
OR,
```bash
bash scripts/run_eval_gpu.sh [CHECKPOINT_PATH]
bash run_eval_gpu.sh [CHECKPOINT_PATH]
```
The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of the test dataset will be as follows:

View File

@ -92,17 +92,28 @@ GoogleNet由多个inception模块串联起来可以更加深入。 降维的
- Ascend处理器环境运行
```yaml
# 添加数据集路径,以训练cifar10为例
train_data_path:/home/DataSet/cifar10/
val_data_path:/home/DataSet/cifar10/
# 推理前添加checkpoint路径参数
chcekpoint_path:/home/model/googlenet/ckpt/train_googlenet_cifar10-125_390.ckpt
```
```python
# 运行训练示例
python train.py > train.log 2>&1 &
# 运行分布式训练示例
bash scripts/run_train.sh rank_table.json
bash scripts/run_train.sh [RANK_TABLE_FILE] [DATASET_NAME]
# example: bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
# 运行评估示例
python eval.py > eval.log 2>&1 &
bash run_eval.sh
bash run_eval.sh [DATASET_NAME]
# example: bash run_eval.sh cifar10
# 运行推理示例
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [DATA_PATH] [LABEL_FILE] [DEVICE_ID]
@ -360,7 +371,7 @@ GoogleNet由多个inception模块串联起来可以更加深入。 降维的
- Ascend处理器环境运行
```bash
bash scripts/run_train.sh rank_table.json
bash scripts/run_train.sh /root/hccl_8p_01234567_10.155.170.71.json cifar10
```
上述shell脚本将在后台运行分布训练。您可以通过train_parallel[X]/log文件查看结果。采用以下方式达到损失值
@ -395,7 +406,7 @@ GoogleNet由多个inception模块串联起来可以更加深入。 降维的
```bash
python eval.py > eval.log 2>&1 &
OR
sh scripts/run_eval.sh
bash run_eval.sh cifar10
```
上述python命令将在后台运行您可以通过eval.log文件查看结果。测试数据集的准确性如下
@ -430,7 +441,7 @@ GoogleNet由多个inception模块串联起来可以更加深入。 降维的
或者,
```bash
bash scripts/run_eval_gpu.sh [CHECKPOINT_PATH]
bash run_eval_gpu.sh [CHECKPOINT_PATH]
```
上述python命令将在后台运行您可以通过eval/eval.log文件查看结果。测试数据集的准确性如下

View File

@ -81,7 +81,7 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
```bash
```inceptionv3
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on default_config.yaml file.
@ -274,11 +274,21 @@ You can start training using python or shell scripts. The usage of shell scripts
- Ascend:
```yaml
ds_type:imagenet
or
ds_type:cifar10
Take training cifar10 as an example, the ds_type parameter is set to cifar10
```
```shell
# distribute training(8p)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [CKPT_PATH]
# example: bash run_distribute_train.sh /root/hccl_8p_012345467_10.155.170.71.json /home/DataSet/cifar10/ ./ckpt/
# standalone training
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
bash scripts/run_standalone_train.sh [DEVICE_ID] [DATA_PATH] [CKPT_PATH]
# example: bash scripts/run_standalone_train.sh 0 /home/DataSet/cifar10/ ./ckpt/
```
- CPU:
@ -302,10 +312,12 @@ bash scripts/run_standalone_train_cpu.sh DATA_PATH
shell:
Ascend:
# distribute training example(8p)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [CKPT_PATH]
# example: bash run_distribute_train.sh /root/hccl_8p_012345467_10.155.170.71.json /home/DataSet/cifar10/ ./ckpt/
# standalone training example
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
bash scripts/run_standalone_train.sh [DEVICE_ID] [DATA_PATH] [CKPT_PATH]
# example: bash scripts/run_standalone_train.sh 0 /home/DataSet/cifar10/ ./ckpt/
CPU:
bash script/run_standalone_train_cpu.sh DATA_PATH
@ -344,13 +356,14 @@ You can start training using python or shell scripts. The usage of shell scripts
- Ascend:
```python
bash scripts/run_eval.sh DEVICE_ID DATA_PATH PATH_CHECKPOINT
```shell
bash run_eval.sh [DEVICE_ID] [DATA_DIR] [PATH_CHECKPOINT]
# example: bash run_eval.sh 0 /home/DataSet/cifar10/ /home/model/inceptionv3/ckpt/inception_v3-rank0-2_1251.ckpt
```
- CPU:
```python
```shell
bash scripts/run_eval_cpu.sh DATA_PATH PATH_CHECKPOINT
```

View File

@ -280,12 +280,22 @@ train.py和config.py中主要参数如下
- Ascend
```shell
# 分布式训练示例(8卡)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# 单机训练
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
```
```yaml
ds_type:imagenet
or
ds_type:cifar10
以训练cifar10为例,ds_type参数设置为cifar10
````
```shell
# 分布式训练示例(8卡)
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [CKPT_PATH]
# example: bash run_distribute_train.sh /root/hccl_8p_012345467_10.155.170.71.json /home/DataSet/cifar10/ ./ckpt/
# 单机训练
bash scripts/run_standalone_train.sh [DEVICE_ID] [DATA_PATH] [CKPT_PATH]
# example: bash scripts/run_standalone_train.sh 0 /home/DataSet/cifar10/ ./ckpt/
```
> 注RANK_TABLE_FILE可参考[链接](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/distributed_training_ascend.html)。device_ip可以通过[链接](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)获取
> 这是关于device_num和处理器总数的处理器核绑定操作。如不需要请删除scripts/run_distribute_train.sh中的taskset操作。
@ -301,9 +311,12 @@ train.py和config.py中主要参数如下
shell:
Ascend:
# 分布式训练示例(8卡)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [CKPT_PATH]
# example: bash run_distribute_train.sh /root/hccl_8p_012345467_10.155.170.71.json /home/DataSet/cifar10/ ./ckpt/
# 单机训练
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
bash scripts/run_standalone_train.sh [DEVICE_ID] [DATA_PATH] [CKPT_PATH]
# example: bash scripts/run_standalone_train.sh 0 /home/DataSet/cifar10/ ./ckpt/
CPU:
bash script/run_standalone_train_cpu.sh DATA_PATH
@ -343,7 +356,8 @@ epoch time: 6358482.104 ms, per step time: 16303.800 ms
- Ascend
```shell
bash scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
bash run_eval.sh [DEVICE_ID] [DATA_DIR] [PATH_CHECKPOINT]
# example: bash run_eval.sh 0 /home/DataSet/cifar10/ /home/model/inceptionv3/ckpt/inception_v3-rank0-2_1251.ckpt
```
- CPU:
@ -361,7 +375,7 @@ epoch time: 6358482.104 ms, per step time: 16303.800 ms
CPU: python eval.py --config_path CONFIG_FILE --dataset_path DATA_PATH --checkpoint PATH_CHECKPOINT --platform CPU
shell:
Ascend: bash scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
Ascend: bash run_eval.sh [DEVICE_ID] [DATA_DIR] [PATH_CHECKPOINT]
CPU: bash scripts/run_eval_cpu.sh DATA_PATH PATH_CHECKPOINT
```

View File

@ -245,11 +245,21 @@ You can start training using python or shell scripts. The usage of shell scripts
- Ascend:
```yaml
ds_type:imagenet
or
ds_type:cifar10
Take training cifar10 as an example, the ds_type parameter is set to cifar10
````
```bash
# distribute training example(8p)
bash scripts/run_distribute_train_ascend.sh RANK_TABLE_FILE DATA_PATH DATA_DIR
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [DATA_DIR]
# example: bash scripts/run_distribute_train_ascend.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cifar10/
# standalone training
bash scripts/run_standalone_train_ascend.sh DEVICE_ID DATA_DIR
bash scripts/run_standalone_train_ascend.sh [DEVICE_ID] [DATA_DIR]
# example: bash scripts/run_standalone_train_ascend.sh 0 /home/DataSet/cifar10/
```
> Notes:
@ -278,9 +288,13 @@ bash scripts/run_standalone_train_cpu.sh DATA_PATH
shell:
Ascend:
# distribute training example(8p)
bash scripts/run_distribute_train_ascend.sh RANK_TABLE_FILE DATA_PATH DATA_DIR
bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [DATA_DIR]
# example: bash scripts/run_distribute_train_ascend.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cifar10/
# standalone training
bash scripts/run_standalone_train_ascend.sh DEVICE_ID DATA_DIR
bash scripts/run_standalone_train_ascend.sh [DEVICE_ID] [DATA_DIR]
# example: bash scripts/run_standalone_train_ascend.sh 0 /home/DataSet/cifar10/
GPU:
# distribute training example(8p)
bash scripts/run_distribute_train_gpu.sh DATA_PATH
@ -324,7 +338,8 @@ You can start training using python or shell scripts. The usage of shell scripts
- Ascend:
```bash
bash scripts/run_eval_ascend.sh DEVICE_ID DATA_DIR CHECKPOINT_PATH
bash scripts/run_eval_ascend.sh [DEVICE_ID] [DATA_DIR] [CHECKPOINT_PATH]
# example: bash scripts/run_eval_ascend.sh 0 /home/DataSet/cifar10/ /home/model/inceptionv4/ckpt/inceptionv4-train-250_1251
```
- GPU
@ -339,7 +354,7 @@ You can start training using python or shell scripts. The usage of shell scripts
# eval example
shell:
Ascend:
bash scripts/run_eval_ascend.sh DEVICE_ID DATA_DIR CHECKPOINT_PATH
bash scripts/run_eval_ascend.sh [DEVICE_ID] [DATA_DIR] [CHECKPOINT_PATH]
GPU:
bash scripts/run_eval_gpu.sh DATA_DIR CHECKPOINT_PATH
```

View File

@ -73,11 +73,14 @@ Dataset used: [MNIST](<http://yann.lecun.com/exdb/mnist/>)
After installing MindSpore via the official website, you can start training and evaluation as follows:
```python
```bash
# enter script dir, train LeNet
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
# enter script dir, evaluate LeNet
bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
# example: bash run_standalone_eval_ascend.sh /home/DataSet/MNIST/ /home/model/lenet/ckpt/checkpoint_lenet-1_1875.ckpt
```
- Running on [ModelArts](https://support.huaweicloud.com/modelarts/)
@ -147,7 +150,7 @@ bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start evaluating as follows)
1. Export s8 multiscale and flip with voc val dataset on modelarts, evaluating steps are as follows:
1. The evaluation steps using ModelArts are as follows:
```python
# (1) Perform a or b.
@ -206,8 +209,8 @@ bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
## [Script Parameters](#contents)
```python
Major parameters in train.py and default_config.yaml as follows:
```default_config.yaml
default_config.yaml as follows:
--data_path: The absolute full path to the train and evaluation datasets.
--epoch_size: Total training epochs.
@ -228,7 +231,8 @@ Major parameters in train.py and default_config.yaml as follows:
```bash
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
bash run_standalone_train_ascend.sh Data ckpt
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
```
After training, the loss value will be achieved as follows:
@ -254,7 +258,8 @@ Before running the command below, please check the checkpoint path used for eval
```bash
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
bash run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
# example: bash run_standalone_eval_ascend.sh /home/DataSet/MNIST/ /home/model/lenet/ckpt/checkpoint_lenet-1_1875.ckpt
```
You can view the results through the file "log.txt". The accuracy of the test dataset will be as follows:

View File

@ -75,11 +75,14 @@ LeNet非常简单包含5层由2个卷积层和3个全连接层组成。
通过官方网站安装MindSpore后您可以按照如下步骤进行训练和评估
```python
```bash
# 进入脚本目录训练LeNet
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
# 进入脚本目录评估LeNet
bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
# example: bash run_standalone_eval_ascend.sh /home/DataSet/MNIST/ /home/model/lenet/ckpt/checkpoint_lenet-1_1875.ckpt
```
- 在 ModelArts 进行训练 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
@ -147,7 +150,7 @@ bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
- 在 ModelArts 进行导出 (如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
1. 使用voc val数据集评估多尺度和翻转s8。评估步骤如下:
1. 使用ModelArts评估步骤如下:
```python
# (1) 执行 a 或者 b.
@ -206,8 +209,8 @@ bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
## 脚本参数
```python
train.py和default_config.yaml中主要参数如下
```default_config.yaml
default_config.yaml中主要参数如下
--data_path: 到训练和评估数据集的绝对全路径
--epoch_size: 训练轮次数
@ -226,7 +229,8 @@ train.py和default_config.yaml中主要参数如下
```bash
python train.py --data_path Data --ckpt_path ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
bash run_standalone_train_ascend.sh Data ckpt
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_SAVE_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
```
训练结束,损失值如下:
@ -252,7 +256,8 @@ epoch:1 step:1538, loss is 1.0221305
```bash
python eval.py --data_path Data --ckpt_path ckpt/checkpoint_lenet-1_1875.ckpt > log.txt 2>&1 &
# or enter script dir, and run the script
bash run_standalone_eval_ascend.sh Data ckpt/checkpoint_lenet-1_1875.ckpt
bash run_standalone_eval_ascend.sh [DATA_PATH] [CKPT_NAME]
# example: bash run_standalone_eval_ascend.sh /home/DataSet/MNIST/ /home/model/lenet/ckpt/checkpoint_lenet-1_1875.ckpt
```
您可以通过log.txt文件查看结果。测试数据集的准确性如下

View File

@ -72,18 +72,23 @@ After installing MindSpore via the official website, you can start training and
```python
# enter ../lenet directory and train lenet network,then a '.ckpt' file will be generated.
bash run_standalone_train_ascend.sh [DATA_PATH]
# enter lenet dir, train LeNet-Quant
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
# enter lenet_quant dir, train lenet_quant
python train.py --device_target=Ascend --data_path=[DATA_PATH] --ckpt_path=[CKPT_PATH] --dataset_sink_mode=True
# example: python train.py --device_target=Ascend --data_path=/home/DataSet/MNIST/ --ckpt_path=/home/model/lenet/checkpoint_lenet-10_1875.ckpt --dataset_sink_mode=True
#evaluate LeNet-Quant
python eval.py --device_target=Ascend --data_path=[DATA_PATH] --ckpt_path=[CKPT_PATH] --dataset_sink_mode=True
# example: python eval.py --device_target=Ascend --data_path=/home/DataSet/MNIST/ --ckpt_path=/home/model/lenet_quant/checkpoint_lenet-10_937.ckpt --dataset_sink_mode=True
```
## [Script Description](#contents)
## [Script and Sample Code](#contents)
```bash
```lenet_quant
├── model_zoo
├── README.md // descriptions about all the models
├── lenet_quant

View File

@ -75,19 +75,24 @@ LeNet非常简单包含5层由2个卷积层和3个全连接层组成。
通过官方网站安装MindSpore后您可以按照如下步骤进行训练和评估
```python
# 进入../lenet目录训练lenet网络生成'.ckpt'文件。
bash run_standalone_train_ascend.sh [DATA_PATH]
# 进入lenet目录训练LeNet-Quant
# 进入../lenet目录训练lenet网络生成'.ckpt'文件作为lenet-quant预训练文件
bash run_standalone_train_ascend.sh [DATA_PATH] [CKPT_PATH]
# example: bash run_standalone_train_ascend.sh /home/DataSet/MNIST/ ./ckpt/
# 进入lenet-quant目录训练lenet-quant
python train.py --device_target=Ascend --data_path=[DATA_PATH] --ckpt_path=[CKPT_PATH] --dataset_sink_mode=True
# 评估LeNet-Quant
# example: python train.py --device_target=Ascend --data_path=/home/DataSet/MNIST/ --ckpt_path=/home/model/lenet/checkpoint_lenet-10_1875.ckpt --dataset_sink_mode=True
# 评估lenet-quant
python eval.py --device_target=Ascend --data_path=[DATA_PATH] --ckpt_path=[CKPT_PATH] --dataset_sink_mode=True
# example: python eval.py --device_target=Ascend --data_path=/home/DataSet/MNIST/ --ckpt_path=/home/model/lenet_quant/checkpoint_lenet-10_937.ckpt --dataset_sink_mode=True
```
## 脚本说明
### 脚本及样例代码
```bash
```lenet_quant
├── model_zoo
├── README.md // 所有型号的描述
├── lenet_quant

View File

@ -106,10 +106,12 @@ pip install mmcv=0.2.14
On Ascend:
# distributed training
bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT]
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [PRETRAINED_CKPT(optional)]
# example: bash run_distribute_train.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cocodataset/
# standalone training
bash run_standalone_train.sh [PRETRAINED_CKPT]
bash run_standalone_train.sh [DATA_PATH] [PRETRAINED_CKPT(optional)]
# example: bash run_standalone_train.sh /home/DataSet/cocodataset/
On CPU:
@ -128,7 +130,8 @@ pip install mmcv=0.2.14
```bash
# Evaluation on Ascend
bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
bash run_eval.sh [ANN_FILE] [CHECKPOINT_PATH] [DATA_PATH]
# example: bash run_eval.sh /home/DataSet/cocodataset/annotations/instances_val2017.json /home/model/maskrcnn_mobilenetv1/ckpt/mask_rcnn-5_7393.ckpt /home/DataSet/cocodataset/
# Evaluation on CPU
bash run_eval_cpu.sh [ANN_FILE] [CHECKPOINT_PATH]
@ -347,10 +350,12 @@ pip install mmcv=0.2.14
On Ascend:
# distributed training
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [PRETRAINED_CKPT(optional)]
# example: bash run_distribute_train.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cocodataset/
# standalone training
Usage: bash run_standalone_train.sh [PRETRAINED_MODEL]
Usage: bash run_standalone_train.sh [DATA_PATH] [PRETRAINED_CKPT(optional)]
# example: bash run_standalone_train.sh /home/DataSet/cocodataset/
On CPU:
@ -360,7 +365,7 @@ Usage: bash run_standalone_train_cpu.sh [PRETRAINED_MODEL](optional)
### [Parameters Configuration](#contents)
```bash
```default_config.yaml
"img_width": 1280, # width of the input images
"img_height": 768, # height of the input images
@ -510,7 +515,8 @@ Usage: bash run_standalone_train_cpu.sh [PRETRAINED_MODEL](optional)
```bash
# standalone training
bash run_standalone_train.sh [PRETRAINED_MODEL]
bash run_standalone_train.sh [DATA_PATH] [PRETRAINED_CKPT(optional)]
# example: bash run_standalone_train.sh /home/DataSet/cocodataset/
```
- Run `run_standalone_train_cpu.sh` for non-distributed training of maskrcnn_mobilenetv1 model on CPU.
@ -525,7 +531,8 @@ bash run_standalone_train_cpu.sh [PRETRAINED_MODEL](optional)
- Run `run_distribute_train.sh` for distributed training of Mask model on Ascend.
```bash
bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [PRETRAINED_MODEL(optional)]
# example: bash run_distribute_train.sh /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cocodataset/
```
> hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
@ -536,7 +543,7 @@ bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the following in loss_rankid.log.
```bash
```log
# distribute training result(8p)
2123 epoch: 1 step: 7393 ,rpn_loss: 0.24854, rcnn_loss: 1.04492, rpn_cls_loss: 0.19238, rpn_reg_loss: 0.05603, rcnn_cls_loss: 0.47510, rcnn_reg_loss: 0.16919, rcnn_mask_loss: 0.39990, total_loss: 1.29346
3973 epoch: 2 step: 7393 ,rpn_loss: 0.02769, rcnn_loss: 0.51367, rpn_cls_loss: 0.01746, rpn_reg_loss: 0.01023, rcnn_cls_loss: 0.24255, rcnn_reg_loss: 0.05630, rcnn_mask_loss: 0.21484, total_loss: 0.54137
@ -557,7 +564,8 @@ Training result will be stored in the example path, whose folder name begins wit
```bash
# infer
bash run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH]
bash run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH] [DATA_PATH]
# example: bash run_eval.sh /home/DataSet/cocodataset/annotations/instances_val2017.json /home/model/maskrcnn_mobilenetv1/ckpt/mask_rcnn-5_7393.ckpt /home/DataSet/cocodataset/
```
> As for the COCO2017 dataset, VALIDATION_ANN_FILE_JSON is refer to the annotations/instances_val2017.json in the dataset directory.
@ -567,7 +575,7 @@ bash run_eval.sh [VALIDATION_ANN_FILE_JSON] [CHECKPOINT_PATH]
Inference result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the following in log.
```bash
```log
Evaluate annotation type *bbox*
Accumulating evaluation results...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.227
@ -625,7 +633,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [ANN_FILE] [DEVICE_ID]
Inference result is saved in current path, you can find result like this in acc.log file.
```bash
```log
Evaluate annotation type *bbox*
Accumulating evaluation results...
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.227

View File

@ -46,14 +46,29 @@ Dataset used: [ImageNet2012](http://www.image-net.org/)
- Test 50,000 images
- Data formatjpeg
- NoteData will be processed in dataset.py
Dataset used: [CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html)
- Dataset size175M60,000 32*32 colorful images in 10 classes
- Train146M50,000 images
- Test29M10,000 images
- Data formatbinary files
- NoteData will be processed in dataset.py
- Download the dataset, the directory structure is as follows:
```bash
└─dataset
├─ilsvrc # train dataset
```ImageNet2012
└─ImageNet_Original
├─train # train dataset
└─validation_preprocess # evaluate dataset
```
```cifar10
└─cifar10
├─cifar-10-batches-bin # train dataset
└─cifar-10-verify-bin # evaluate dataset
```
## Features
### Mixed Precision(Ascend)
@ -223,6 +238,9 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
You can start training using python or shell scripts. The usage of shell scripts as follows:
- Ascend: bash run_distribute_train.sh [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH] (optional)
# example: bash run_distribute_train.sh cifar10 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cifar10/cifar-10-batches-bin/
# example: bash run_distribute_train.sh imagenet2012 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/
- CPU: bash run_train_CPU.sh [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH] (optional)
- GPU(single device)bash run_standalone_train_gpu.sh [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
- GPU(distribute training): bash run_distribute_train_gpu.sh [cifar10|imagenet2012] [CONFIG_PATH] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
@ -246,6 +264,9 @@ Please follow the instructions in the link [hccn_tools](https://gitee.com/mindsp
shell:
Ascend: bash run_distribute_train.sh [cifar10|imagenet2012] [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_distribute_train.sh cifar10 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/cifar10/cifar-10-batches-bin/
# example: bash run_distribute_train.sh imagenet2012 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/
CPU: bash run_train_CPU.sh [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
GPU(single device): bash run_standalone_train_gpu.sh [cifar10|imagenet2012] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
GPU(distribute training): bash run_distribute_train_gpu.sh [cifar10|imagenet2012] [CONFIG_PATH] [DATASET_PATH] [PRETRAINED_CKPT_PATH](optional)
@ -278,6 +299,9 @@ Epoch time: 320744.265, per step time: 256.390
You can start training using python or shell scripts.If the train method is train or fine tune, should not input the `[CHECKPOINT_PATH]` The usage of shell scripts as follows:
- Ascend: bash run_eval.sh [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
# example: bash run_eval.sh cifar10 /home/DataSet/cifar10/cifar-10-verify-bin/ /home/model/mobilenetv1/ckpt/cifar10/mobilenetv1-90_1562.ckpt
# example: bash run_eval.sh imagenet2012 /home/DataSet/ImageNet_Original/ /home/model/mobilenetv1/ckpt/imagenet2012/mobilenetv1-90_625.ckpt
- CPU: bash run_eval_CPU.sh [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
### Launch
@ -291,6 +315,9 @@ You can start training using python or shell scripts.If the train method is trai
shell:
Ascend: bash run_eval.sh [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
# example: bash run_eval.sh cifar10 /home/DataSet/cifar10/cifar-10-verify-bin/ /home/model/mobilenetv1/ckpt/cifar10/mobilenetv1-90_1562.ckpt
# example: bash run_eval.sh imagenet2012 /home/DataSet/ImageNet_Original/ /home/model/mobilenetv1/ckpt/imagenet2012/mobilenetv1-90_625.ckpt
CPU: bash run_eval_CPU.sh [cifar10|imagenet2012] [DATASET_PATH] [CHECKPOINT_PATH]
```

View File

@ -22,7 +22,7 @@ pretrain_epoch_size: 0
save_checkpoint: True
save_checkpoint_epochs: 5
keep_checkpoint_max: 10
save_checkpoint_path: "/cache/train"
save_checkpoint_path: "./"
warmup_epochs: 5
lr_decay_mode: "poly"
lr_init: 0.01

View File

@ -225,7 +225,7 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
You can start training using python or shell scripts. The usage of shell scripts as follows:
- Ascend: bash run_train.sh Ascend [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
- Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH(optional)] [FREEZE_LAYER(optional)] [FILTER_HEAD(optional)]
- GPU: bash run_trian.sh GPU [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
- CPU: bash run_trian.sh CPU [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
@ -273,29 +273,35 @@ You can start training using python or shell scripts. The usage of shell scripts
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH]
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH]
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH]
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH]
# fine tune whole network example
# finetune whole network example
python:
Ascend: python train.py --platform Ascend --config_path [CONFIG_PATH] --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer none --filter_head True
GPU: python train.py --platform GPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer none --filter_head True
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer none --filter_head True
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH] [CKPT_PATH] none True
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/ /home/model/mobilenetv2/predtrain/mobilenet-200_625.ckpt none True
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH] [CKPT_PATH] none True
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH] [CKPT_PATH] none True
# fine tune full connected layers example
# finetune full connected layers example
python:
Ascend: python train.py --platform Ascend --config_path default_config.yaml --dataset_path [TRAIN_DATASET_PATH]--pretrain_ckpt [CKPT_PATH] --freeze_layer backbone
GPU: python train.py --platform GPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer backbone
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer backbone
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/ /home/model/mobilenetv2/backbone/mobilenet-200_625.ckpt backbone
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
```
@ -304,7 +310,7 @@ You can start training using python or shell scripts. The usage of shell scripts
Training result will be stored in the example path. Checkpoints will be stored at `. /checkpoint` by default, and training log will be redirected to `./train.log` like followings with the platform CPU and GPU, will be wrote to `./train/rank*/log*.log` with the platform Ascend .
```shell
```log
epoch: [ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100]
epoch time: 140522.500, per step time: 224.836, avg loss: 5.258
epoch: [ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200]
@ -331,7 +337,9 @@ You can start training using python or shell scripts.If the train method is trai
CPU: python eval.py --platform CPU --dataset_path [VAL_DATASET_PATH] --pretrain_ckpt ./ckpt_0/mobilenetv2_15.ckpt
shell:
Ascend: bash run_eval.sh Ascend [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
Ascend: bash run_eval.sh Ascend [DATASET_PATH] [CHECKPOINT_PATH]
# example: bash run_eval.sh Ascend /home/DataSet/ImageNet_Original/ /home/model/mobilenetV2/ckpt/mobilenet-200_625.ckpt
GPU: bash run_eval.sh GPU [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
CPU: bash run_eval.sh CPU [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
```
@ -342,7 +350,7 @@ You can start training using python or shell scripts.If the train method is trai
Inference result will be stored in the example path, you can find result like the followings in `eval.log`.
```shell
```log
result: {'acc': 0.71976314102564111} ckpt=./ckpt_0/mobilenet-200_625.ckpt
```

View File

@ -227,7 +227,7 @@ MobileNetV2总体网络架构如下
使用python或shell脚本开始训练。shell脚本的使用方法如下
- Ascend: sh run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
- Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH(optional)] [FREEZE_LAYER(optional)] [FILTER_HEAD(optional)]
- GPU: bash run_trian.sh GPU [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
- CPU: bash run_trian.sh CPU [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
@ -275,7 +275,9 @@ MobileNetV2总体网络架构如下
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH]
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH]
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH]
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH]
@ -286,7 +288,9 @@ MobileNetV2总体网络架构如下
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer none --filter_head True
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH] [CKPT_PATH] none True
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER] [FILTER_HEAD]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/ /home/model/mobilenetv2/predtrain/mobilenet-200_625.ckpt none True
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH] [CKPT_PATH] none True
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH] [CKPT_PATH] none True
@ -297,7 +301,8 @@ MobileNetV2总体网络架构如下
CPU: python train.py --platform CPU --dataset_path [TRAIN_DATASET_PATH] --pretrain_ckpt [CKPT_PATH] --freeze_layer backbone
shell:
Ascend: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 hccl_config.json [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
Ascend: bash run_train.sh Ascend [CONFIG_PATH] [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [RANK_TABLE_FILE] [DATASET_PATH] [CKPT_PATH] [FREEZE_LAYER]
# example: bash run_train.sh Ascend default_config.yaml 8 0,1,2,3,4,5,6,7 /root/hccl_8p_01234567_10.155.170.71.json /home/DataSet/ImageNet_Original/ /home/model/mobilenetv2/backbone/mobilenet-200_625.ckpt backbone
GPU: bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
CPU: bash run_train.sh CPU [TRAIN_DATASET_PATH] [CKPT_PATH] backbone
```
@ -306,7 +311,7 @@ MobileNetV2总体网络架构如下
训练结果保存在示例路径。检查点默认保存在 `./checkpoint`训练日志会重定向到的CPU和GPU的`./train.log`写入到Ascend的`./train/rank*/log*.log`。
```shell
```log
epoch:[ 0/200], step:[ 624/ 625], loss:[5.258/5.258], time:[140412.236], lr:[0.100]
epoch time:140522.500, per step time:224.836, avg loss:5.258
epoch:[ 1/200], step:[ 624/ 625], loss:[3.917/3.917], time:[138221.250], lr:[0.200]
@ -317,7 +322,7 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917
### 用法
使用python或shell脚本开始训练。采用train或fine tune训练方法时不建议输入`[CHECKPOINT_PATH]`。shell脚本的用法如下
使用python或shell脚本开始训练。采用train或finetune训练方法时不建议输入`[CHECKPOINT_PATH]`。shell脚本的用法如下
- Ascend: bash run_eval.sh Ascend [DATASET_PATH] [CHECKPOINT_PATH]
- GPU: bash run_eval.sh GPU [DATASET_PATH] [CHECKPOINT_PATH]
@ -333,7 +338,9 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917
CPU: python eval.py --platform CPU --dataset_path [VAL_DATASET_PATH] --pretrain_ckpt ./ckpt_0/mobilenetv2_15.ckpt
shell:
Ascend: bash run_eval.sh Ascend [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
Ascend: bash run_eval.sh Ascend [DATASET_PATH] [CHECKPOINT_PATH]
# example: bash run_eval.sh Ascend /home/DataSet/ImageNet_Original/ /home/model/mobilenetV2/ckpt/mobilenet-200_625.ckpt
GPU: bash run_eval.sh GPU [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
CPU: bash run_eval.sh CPU [VAL_DATASET_PATH] ./checkpoint/mobilenetv2_head_15.ckpt
```
@ -344,7 +351,7 @@ epoch time:138331.250, per step time:221.330, avg loss:3.917
推理结果保存在示例路径,可以在`eval.log`中找到如下结果。
```shell
```log
result:{'acc':0.71976314102564111} ckpt=./ckpt_0/mobilenet-200_625.ckpt
```