History

i-robot 2bd161e2b1 !18456 Add TinyDarkNet CPU scripts and cifar-10 Dataset support Merge pull request !18456 from Shawny/tdn_cpu		2021-06-21 20:52:09 +08:00
..
ascend310_infer	tinydarknet pass parameter modification	2021-06-18 15:40:55 +08:00
scripts	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
src	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
README.md	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
README_CN.md	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
cifar10_config.yaml	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
eval.py	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00
export.py	tinydarknet pass parameter modification	2021-06-18 15:40:55 +08:00
imagenet_config.yaml	tinydarknet pass parameter modification	2021-06-18 15:40:55 +08:00
mindspore_hub_conf.py	upload	2020-12-23 14:44:37 +08:00
postprocess.py	tinydarknet pass parameter modification	2021-06-18 15:40:55 +08:00
train.py	Add TinyDarkNet CPU scripts and cifar-10 Dataset support	2021-06-17 16:05:35 +08:00

Tiny-DarkNet Description

Tiny-DarkNet is a 16-layer image classification network model for the classic image classification data set ImageNet proposed by Joseph Chet Redmon and others. Tiny-DarkNet, as a simplified version of Darknet designed by the author to minimize the size of the model to meet the needs of users for smaller model sizes, has better image classification capabilities than AlexNet and SqueezeNet, and at the same time it uses only fewer model parameters than them. In order to reduce the scale of the model, the Tiny-DarkNet network does not use a fully connected layer, but only consists of a convolutional layer, a maximum pooling layer, and an average pooling layer.

For more detailed information on Tiny-DarkNet, please refer to the official introduction.

Model Architecture

Specifically, the Tiny-DarkNet network consists of 1×1 conv , 3×3 conv , 2×2 max and a global average pooling layer. These modules form each other to convert the input picture into a 1×1000 vector.

Dataset

In the following sections, we will introduce how to run the scripts using the related dataset below.：

Dataset used can refer to paper

Dataset size：125G，1250k colorful images in 1000 classes
- Train: 120G,1200k images
- Test: 5G, 50k images
Data format: RGB images
- Note: Data will be processed in src/dataset.py

Environment Requirements

Hardware（Ascend/CPU）
- Prepare hardware environment with Ascend/CPU processor.
Framework
- MindSpore
For more information,please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:

running on Ascend：
```
# run training example
bash ./scripts/run_standalone_train.sh 0

# run distributed training example
bash ./scripts/run_distribute_train.sh /{path}/*.json

# run evaluation example
python eval.py > eval.log 2>&1 &
OR
bash ./script/run_eval.sh
```
For distributed training, a hccl configuration file [RANK_TABLE_FILE] with JSON format needs to be created in advance.

Please follow the instructions in the link below:

https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.

Running on ModelArts

If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows.

Training with 8 cards on ModelArts

# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/tinydarknet" on the website UI interface.
# (4) Set the startup file to /{path}/tinydarknet/train.py" on the website UI interface.
# (5) Perform a or b.
#     a. setting parameters in /{path}/tinydarknet/imagenet_config.yaml.
#         1. Set ”batch_size: 64“ (not necessary)
#         2. Set ”enable_modelarts: True“
#         3. Set ”modelarts_dataset_unzip_name: {filenmae}",if the data is uploaded in the form of zip package.
#     b. adding on the website UI interface.
#         1. Add ”batch_size=64“ (not necessary)
#         2. Add ”enable_modelarts=True“
#         3. Add ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
# (6) Upload the dataset or the zip package of dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of 8 cards.
# (10) Create your job.

evaluating with single card on ModelArts

# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/not necessary" on the website UI interface.
# (4) Set the startup file to /{path}/not necessary/eval.py" on the website UI interface.
# (5) Perform a or b.
#     a. setting parameters in /{path}/not necessary/imagenet_config.yaml.
#         1. Set ”enable_modelarts: True“
#         2. Set “checkpoint_path: {checkpoint_path}”({checkpoint_path} Indicates the path of the weight file to be evaluated relative to the file 'eval.py', and the weight file must be included in the code directory.)
#         3. Add ”modelarts_dataset_unzip_name: {filenmae}",if the data is uploaded in the form of zip package.
#     b. adding on the website UI interface.
#         1. Set ”enable_modelarts=True“
#         2. Set “checkpoint_path={checkpoint_path}”({checkpoint_path} Indicates the path of the weight file to be evaluated relative to the file 'eval.py', and the weight file must be included in the code directory.)
#         3. Add ”modelarts_dataset_unzip_name={filenmae}",if the data is uploaded in the form of zip package.
# (6)  Upload the dataset or the zip package of dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of a single card.
# (10) Create your job.

For more details, please refer the specify script.

Script Description

Script and Sample Code


├── tinydarknet
    ├── README.md                       // descriptions about Tiny-Darknet in English
    ├── README_CN.md                    // descriptions about Tiny-Darknet in Chinese
    ├── ascend310_infer                 // application for 310 inference
    ├── scripts
        ├── run_standalone_train.sh     // shell script for single on Ascend
        ├── run_distribute_train.sh     // shell script for distributed on Ascend
        ├── run_train_cpu.sh            // shell script for distributed on CPU
        ├── run_eval.sh                 // shell script for evaluation on Ascend
        ├── run_eval_cpu.sh             // shell script for evaluation on CPU
        ├── run_infer_310.sh            // shell script for inference on Ascend310
    ├── src
        ├── lr_scheduler                //learning rate scheduler
            ├── __init__.py             // init
            ├── linear_warmup.py        // linear_warmup
            ├── warmup_cosine_annealing_lr.py    // warmup_cosine_annealing_lr
            ├── warmup_step_lr.py       // warmup_step_lr
        ├── model_utils
            ├── config.py               // parsing parameter configuration file of "*.yaml"
            ├── device_adapter.py       // local or ModelArts training
            ├── local_adapter.py        // get related environment variables in local training
            └── moxing_adapter.py       // get related environment variables in ModelArts training
        ├── dataset.py                  // creating dataset
        ├── CrossEntropySmooth.py       // loss function
        ├── tinydarknet.py              // Tiny-Darknet architecture
    ├── train.py                        // training script
    ├── eval.py                         //  evaluation script
    ├── export.py                       // export checkpoint file into air/onnx
    ├── imagenet_config.yaml            // imagenet parameter configuration
    ├── cifar10_config.yaml             // cifar10 parameter configuration
    ├── mindspore_hub_conf.py           // hub config
    ├── postprocess.py                  // postprocess script

Script Parameters

Parameters for both training and evaluation can be set in imagenet_config.yaml

config for Tiny-Darknet(only some parameters are listed)

pre_trained: False      # whether training based on the pre-trained model
num_classes: 1000       # the number of classes in the dataset
lr_init: 0.1            # initial learning rate
batch_size: 128         # training batch_size
epoch_size: 500         # total training epoch
momentum: 0.9           # momentum
weight_decay: 1e-4      # weight decay value
image_height: 224       # image height used as input to the model
image_width: 224        # image width used as input to the model
train_data_dir: './ImageNet_Original/train/'  # absolute full path to the train datasets
val_data_dir: './ImageNet_Original/val/'  # absolute full path to the evaluation datasets
device_target: 'Ascend' # device running the program
keep_checkpoint_max: 10 # only keep the last keep_checkpoint_max checkpoint
checkpoint_path: '/train_tinydarknet.ckpt'  # the absolute full path to save the checkpoint file
onnx_filename: 'tinydarknet.onnx' # file name of the onnx model used in export.py
air_filename: 'tinydarknet.air'   # file name of the air model used in export.py
lr_scheduler: 'exponential'     # learning rate scheduler
lr_epochs: [70, 140, 210, 280]  # epoch of lr changing
lr_gamma: 0.3            # decrease lr by a factor of exponential lr_scheduler
eta_min: 0.0             # eta_min in cosine_annealing scheduler
T_max: 150               # T-max in cosine_annealing scheduler
warmup_epochs: 0         # warmup epoch
is_dynamic_loss_scale: 0 # dynamic loss scale
loss_scale: 1024         # loss scale
label_smooth_factor: 0.1 # label_smooth_factor
use_label_smooth: True   # label smooth

For more configuration details, please refer the script imagenet_config.yaml.

Training Process

Training

running on Ascend：

bash ./scripts/run_standalone_train.sh [DEVICE_ID]

The command above will run in the background, you can view the results through the file train.log.

After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:

# grep "loss is " train.log
epoch: 498 step: 1251, loss is 2.7798953
Epoch time: 130690.544, per step time: 104.469
epoch: 499 step: 1251, loss is 2.9261637
Epoch time: 130511.081, per step time: 104.325
epoch: 500 step: 1251, loss is 2.69412
Epoch time: 127067.548, per step time: 101.573
...

The model checkpoint file will be saved in the current folder.

running on CPU

bash scripts/run_train_cpu.sh [TRAIN_DATA_DIR] [cifar10|imagenet]

Distributed Training

running on Ascend：

bash ./scripts/run_distribute_train.sh [RANK_TABLE_FILE]

The above shell script will run distribute training in the background. You can view the results through the file train_parallel[X]/log. The loss value will be achieved as follows:

# grep "result: " train_parallel*/log
epoch: 498 step: 1251, loss is 2.7798953
Epoch time: 130690.544, per step time: 104.469
epoch: 499 step: 1251, loss is 2.9261637
Epoch time: 130511.081, per step time: 104.325
epoch: 500 step: 1251, loss is 2.69412
Epoch time: 127067.548, per step time: 101.573
...

Evaluation Process

Evaluation

evaluation on Imagenet dataset when running on Ascend:

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "/username/tinydaeknet/train_tinydarknet.ckpt".
```
python eval.py > eval.log 2>&1 &  
OR
bash scripts/run_eval.sh
```
The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
```
# grep "accuracy: " eval.log
accuracy:  {'top_1_accuracy': 0.5871979166666667, 'top_5_accuracy': 0.8175280448717949}
```
Note that for evaluation after distributed training, please set the checkpoint_path to be the last saved checkpoint file. The accuracy of the test dataset will be as follows:
```
# grep "accuracy: " eval.log
accuracy:  {'top_1_accuracy': 0.5871979166666667, 'top_5_accuracy': 0.8175280448717949}
```
evaluation on cifar-10 dataset when running on CPU:

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "/username/tinydaeknet/train_tinydarknet.ckpt".
```
bash scripts/run_eval.sh [VAL_DATA_DIR] [imagenet|cifar10] [CHECKPOINT_PATH]
```
You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
```
# grep "accuracy: " eval.log
accuracy:  {'top_5_accuracy': 1.0, 'top_1_accuracy': 0.9829727564102564}
```

Inference process

Export MindIR

# Ascend310 inference
python export.py --dataset [DATASET] --file_name [FILE_NAME] --file_format [EXPORT_FORMAT]

The parameter does not have the ckpt_file option. Please store the ckpt file according to the path of the parameter checkpoint_path in imagenet_config.yaml. EXPORT_FORMAT should be in ["AIR", "MINDIR"]

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model. Current batch_size can only be set to 1.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [LABEL_PATH] [DVPP] [DEVICE_ID]

LABEL_PATH label.txt path. Write a py script to sort the category under the dataset, map the file names under the categories and category sort values,Such as[file name : sort value], and write the mapping results to the labe.txt file.
DVPP is mandatory, and must choose from ["DVPP", "CPU"], it's case-insensitive.The size of the picture that MobilenetV2 performs inference is [224, 224], the DVPP hardware limits the width of divisible by 16, and the height is divisible by 2. The network conforms to the standard, and the network can pre-process the image through DVPP.
DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

'top_1_accuracy': 59.07%, 'top_5_accuracy': 81.73%

Model Description

Performance

Training Performance

Parameters	Ascend
Model Version	V1
Resource	Ascend 910; CPU 2.60GHz, 56cores; Memory 314G; OS Euler2.8
Uploaded Date	2020/12/22
MindSpore Version	1.1.0
Dataset	1200k images
Training Parameters	epoch=500, steps=1251, batch_size=128, lr=0.1
Optimizer	Momentum
Loss Function	Softmax Cross Entropy
Speed	8 pc: 104 ms/step
Total Time	8 pc: 17.8 hours
Parameters(M)	4.0M
Scripts	Tiny-Darknet Scripts

Evaluation Performance

Parameters	Ascend
Model Version	V1
Resource	Ascend 910; OS Euler2.8
Uploaded Date	2020/12/22
MindSpore Version	1.1.0
Dataset	200k images
batch_size	128
Outputs	probability
Accuracy	8 pc Top-1: 58.7%; Top-5: 81.7%
Model for inference	11.6M (.ckpt file)

Inference Performance

Parameters	Ascend
Model Version	TinyDarknet
Resource	Ascend 310; Euler2.8
Uploaded Date	29/05/2021 (month/day/year)
MindSpore Version	1.2.0
Dataset	ImageNet
batch_size	1
outputs	Accuracy
Accuracy	Top-1: 59.07%; Top-5: 81.73%
Model for inference	10.3M(.ckpt file)

ModelZoo Homepage

Please check the officialhomepage.

README.md Unescape Escape

Contents

Export MindIR

Infer on Ascend310

result

README.md