!4917 Modify readme for inceptionv3

Merge pull request !4917 from zhouyaqiang0/master
2020-08-21 17:41:28 +08:00 · 2020-08-21 17:41:28 +08:00 · 72efb63fd8
parent 95f89e710e aba52a459e
commit 72efb63fd8
2 changed files with 154 additions and 42 deletions
--- a/model_zoo/official/cv/inceptionv3/README.md
+++ b/model_zoo/official/cv/inceptionv3/README.md
@ -1,23 +1,77 @@
-# Inception-v3 Example
+# Contents

-## Description
+- [InceptionV3 Description](#InceptionV3-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Training Process](#training-process)
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+- [Model Description](#model-description)
+    - [Performance](#performance)  
+        - [Training Performance](#evaluation-performance)
+        - [Inference Performance](#evaluation-performance)
+- [Description of Random Situation](#description-of-random-situation)
+- [ModelZoo Homepage](#modelzoo-homepage)

-This is an example of training Inception-v3 in MindSpore.
+# [InceptionV3 Description](#contents)

-## Requirements
+InceptionV3 by Google is the 3rd version in a series of Deep Learning Convolutional Architectures.

- Install [Mindspore](http://www.mindspore.cn/install/en).
- Downlaod the dataset.
+[Paper](https://arxiv.org/pdf/1512.00567.pdf) Min Sun, Ali Farhadi, Steve Seitz. Ranking Domain-Specific Highlights by Analyzing Edited Videos[J]. 2014.

-## Structure
+# [Model architecture](#contents)
+
+The overall network architecture of InceptionV3 is show below:
+
+[Link](https://arxiv.org/pdf/1905.02244)
+
+
+# [Dataset](#contents)
+
+Dataset used can refer to paper.
+
+- Dataset size: ~125G, 1.2W colorful images in 1000 classes
+	- Train: 120G, 1.2W images
+	- Test: 5G, 50000 images
+- Data format: RGB images.
+	- Note: Data will be processed in src/dataset.py 
+
+# [Features](#contents)
+
+## [Mixed Precision(Ascend)](#contents)
+
+The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. 
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend/GPU）
+  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
+- Framework
+  - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/)
+- For more information, please check the resources below：
+  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
+  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
+
+# [Script description](#contents)
+
+## [Script and sample code](#contents)

 ```shell
 .
 └─Inception-v3      
  ├─README.md
  ├─scripts      
+	├─run_standalone_train.sh         		  # launch standalone training with ascend platform(1p)
    ├─run_standalone_train_for_gpu.sh         # launch standalone training with gpu platform(1p)
+	├─run_distribute_train.sh         		  # launch distributed training with ascend platform(8p)
    ├─run_distribute_train_for_gpu.sh         # launch distributed training with gpu platform(8p)
+	├─run_eval.sh                     		  # launch evaluating with ascend platform
    └─run_eval_for_gpu.sh                     # launch evaluating with gpu platform
  ├─src
    ├─config.py                       # parameter configuration
@ -30,12 +84,10 @@ This is an example of training Inception-v3 in MindSpore.
  └─train.py                          # train net
  
 ```
+## [Script Parameters](#contents)

-## Parameter Configuration
-
-Parameters for both training and evaluating can be set in config.py
-
-```       
+```python
+Major parameters in train.py and config.py are:   
 'random_seed': 1,                # fix random seed
 'rank': 0,                       # local rank of distributed
 'group_size': 1,                 # world size of distributed
@ -59,14 +111,22 @@ Parameters for both training and evaluating can be set in config.py
 'is_save_on_master': 1           # save checkpoint on rank0, distributed parameters
 ```

+## [Training process](#contents)
+
+### Usage


-## Running the example
+You can start training using python or shell scripts. The usage of shell scripts as follows:

-### Train
-
-#### Usage
+- Ascend: 
+```
+# distribute training example(8p)
+sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
+# standalone training
+sh run_standalone_train.sh DEVICE_ID DATA_PATH
+```

+- GPU:
 ```
 # distribute training example(8p)
 sh run_distribute_train_for_gpu.sh DATA_DIR 
@ -74,42 +134,94 @@ sh run_distribute_train_for_gpu.sh DATA_DIR
 sh run_standalone_train_for_gpu.sh DEVICE_ID DATA_DIR
 ```

-#### Launch
+### Launch

-```bash
-# distributed training example(8p) for GPU
-sh scripts/run_distribute_train_for_gpu.sh /dataset/train
-# standalone training example for GPU
-sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train
+``` 
+# training example
+  python:
+      Ascend: python train.py --dataset_path /dataset/train --platform Ascend
+      GPU: python train.py --dataset_path /dataset/train --platform GPU
+
+  shell:
+      # distributed training example(8p) for GPU
+	  sh scripts/run_distribute_train_for_gpu.sh /dataset/train
+	  # standalone training example for GPU
+      sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train
 ```

-#### Result
+### Result

-You can find checkpoint file together with result in log.
-
-### Evaluation
-
-#### Usage
+Training result will be stored in the example path. Checkpoints will be stored at `. /checkpoint` by default, and training log  will be redirected to `./log.txt` like followings. 

+``` 
+epoch: 0 step: 1251, loss is 5.7787247
+Epoch time: 360760.985, per step time: 288.378
+epoch: 1 step: 1251, loss is 4.392868
+Epoch time: 160917.911, per step time: 128.631
 ```
-# Evaluation
-sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
+## [Eval process](#contents)
+
+### Usage
+
+You can start training using python or shell scripts. The usage of shell scripts as follows:
+
+- Ascend: sh run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
+- GPU: sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
+
+### Launch
+
+``` 
+# eval example
+  python:
+      Ascend: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend
+      GPU: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform GPU
+
+  shell:
+      Ascend: sh run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
+      GPU: sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
 ```

-#### Launch
+> checkpoint can be produced in training process. 

-```bash
-# Evaluation with checkpoint
-sh scripts/run_eval_for_gpu.sh 0 /dataset/val ./checkpoint/inceptionv3-rank3-247_1251.ckpt
+### Result
+
+Evaluation result will be stored in the example path, you can find result like the followings in `log.txt`. 
+
+``` 
+metric: {'Loss': 1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942}
 ```

-> checkpoint can be produced in training process.
+# [Model description](#contents)

-#### Result
+## [Performance](#contents)

-Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log.
+### Training Performance
+
+| Parameters                 | InceptionV3                                                |                           |
+| -------------------------- | ---------------------------------------------------------- | ------------------------- |
+| Model Version              |                                                            |                           |
+| Resource                   | Ascend 910, cpu:2.60GHz 56cores, memory:314G               | NV SMX2 V100-32G          |
+| uploaded Date              | 08/21/2020                                                 | 08/21/2020                |
+| MindSpore Version          | 0.6.0-beta                                                 | 0.6.0-beta                     |
+| Training Parameters        | src/config.py                                              | src/config.py             |
+| Optimizer                  | RMSProp                                                    | RMSProp                   |
+| Loss Function              | SoftmaxCrossEntropy                                        | SoftmaxCrossEntropy       |
+| outputs                    | probability                                                | probability               |
+| Loss                       | 1.98                                                       | 1.98                      |
+| Accuracy                   | ACC1[78.8%] ACC5[94.2%]                                    | ACC1[78.7%] ACC5[94.1%]   |
+| Total time                 | 11h                                                        | 72h                       |
+| Params (M)                 | 103M                                                       | 103M                      |
+| Checkpoint for Fine tuning | 313M                                                       | 312.41                    |
+| Model for inference        |                                                            |                           |
+
+#### Inference Performance
+
+To be added.
+
+# [Description of Random Situation](#contents)
+
+In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
+
+# [ModelZoo Homepage](#contents)
 
-```
-acc=78.75%(TOP1)
-acc=94.07%(TOP5)
-```
+Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). 
--- a/model_zoo/official/cv/inceptionv3/src/config.py
+++ b/model_zoo/official/cv/inceptionv3/src/config.py
@ -64,7 +64,7 @@ config_ascend = edict({
    'weight_decay': 0.00004,
    'momentum': 0.9,
    'opt_eps': 1.0,
-    'keep_checkpoint_max': 100,
+    'keep_checkpoint_max': 10,
    'ckpt_path': './checkpoint/',
    'is_save_on_master': 0,
    'dropout_keep_prob': 0.8,