model_zoo README.md format change for googlenet

2020-08-24 16:19:16 +08:00 · 2020-08-24 16:19:16 +08:00 · 983d6f16a7
parent 3162b12552
commit 983d6f16a7
1 changed files with 308 additions and 153 deletions
--- a/model_zoo/official/cv/googlenet/README.md
+++ b/model_zoo/official/cv/googlenet/README.md
@ -48,8 +48,7 @@ Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>)
  - Train：146M，50,000 images  
  - Test：29.3M，10,000 images 
 - Data format：binary files
-  - Note：Data will be processed in dataset.py
-
+  - Note：Data will be processed in src/dataset.py


 # [Features](#contents)
@ -66,7 +65,7 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil
 - Hardware（Ascend/GPU）
  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
 - Framework
-  - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/)
+  - [MindSpore](https://www.mindspore.cn/install/en)
 - For more information, please check the resources below：
  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
@ -77,16 +76,45 @@ For FP16 operators, if the input data type is FP32, the backend of MindSpore wil

 After installing MindSpore via the official website, you can start training and evaluation as follows: 

-```python
-# run training example
-python train.py > train.log 2>&1 & 
+- runing on Ascend

-# run distributed training example
-Ascend: sh scripts/run_train.sh rank_table.json OR GPU: sh scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
+  ```python
+  # run training example
+  python train.py > train.log 2>&1 & 
+  
+  # run distributed training example
+  sh scripts/run_train.sh rank_table.json
+  
+  # run evaluation example
+  python eval.py > eval.log 2>&1 & 
+  OR
+  sh run_eval.sh
+  ```
+
+  For distributed training, a hccl configuration file with JSON format needs to be created in advance.
+
+  Please follow the instructions in the link below:
+
+  https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
+
+- running on GPU
+
+  For running on GPU, please change `device_target` from `Ascend` to `GPU` in configuration file src/config.py
+
+  ```python
+  # run training example
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py > train.log 2>&1 & 
+  
+  # run distributed training example
+  sh scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
+  
+  # run evaluation example
+  python eval.py --checkpoint_path=[CHECKPOINT_PATH] > eval.log 2>&1 &  
+  OR
+  sh run_eval_gpu.sh [CHECKPOINT_PATH]
+  ```

-# run evaluation example
-python eval.py > eval.log 2>&1 &  OR  Ascend: sh run_eval.sh OR GPU: sh run_eval_gpu.sh
-```



@ -100,8 +128,10 @@ python eval.py > eval.log 2>&1 &  OR  Ascend: sh run_eval.sh OR GPU: sh run_eval
    ├── googlenet        
        ├── README.md                    // descriptions about googlenet
        ├── scripts 
-        │   ├──run_train.sh             // shell script for distributed 
-        │   ├──run_eval.sh             // shell script for evaluation 
+        │   ├──run_train.sh             // shell script for distributed on Ascend
+        │   ├──run_train_gpu.sh         // shell script for distributed on GPU
+        │   ├──run_eval.sh              // shell script for evaluation on Ascend
+        │   ├──run_eval_gpu.sh          // shell script for evaluation on GPU
        ├── src 
        │   ├──dataset.py             // creating dataset
        │   ├──googlenet.py          // googlenet architecture
@ -113,98 +143,153 @@ python eval.py > eval.log 2>&1 &  OR  Ascend: sh run_eval.sh OR GPU: sh run_eval

 ## [Script Parameters](#contents)

-```python
-Major parameters in train.py and config.py are:
+Parameters for both training and evaluation can be set in config.py

--data_path: The absolute full path to the train and evaluation datasets. 
--epoch_size: Total training epochs. 
--batch_size: Training batch size. 
--lr_init: Initial learning rate. 
--num_classes: The number of classes in the training set.
--weight_decay: Weight decay value. 
--image_height: Image height used as input to the model.
--image_width: Image width used as input the model.
--pre_trained: Whether training from scratch or training based on the
-               pre-trained model.Optional values are True, False. 
--device_target: Device where the code will be implemented. Optional values
-                 are "Ascend", "GPU". 
--device_id: Device ID used to train or evaluate the dataset. Ignore it
-             when you use run_train.sh for distributed training.
--checkpoint_path: The absolute full path to the checkpoint file saved
-                   after training.
--onnx_filename: File name of the onnx model used in export.py. 
--air_filename: File name of the air model used in export.py.    
-```
+- config for GoogleNet, CIFAR-10 dataset
+
+  ```python
+  'pre_trained': 'False'    # whether training based on the pre-trained model
+  'nump_classes': 10        # the number of classes in the dataset
+  'lr_init': 0.1            # initial learning rate
+  'batch_size': 128         # training batch size
+  'epoch_size': 125         # total training epochs
+  'momentum': 0.9           # momentum
+  'weight_decay': 5e-4      # weight decay value
+  'buffer_size': 10         # buffer size
+  'image_height': 224       # image height used as input to the model
+  'image_width': 224        # image width used as input to the model
+  'data_path': './cifar10'  # absolute full path to the train and evaluation datasets
+  'device_target': 'Ascend' # device running the program
+  'device_id': 4            # device ID used to train or evaluate the dataset. Ignore it when you use run_train.sh for distributed training
+  'keep_checkpoint_max': 10 # only keep the last keep_checkpoint_max checkpoint
+  'checkpoint_path': './train_googlenet_cifar10-125_390.ckpt'  # the absolute full path to save the checkpoint file
+  'onnx_filename': 'googlenet.onnx' # file name of the onnx model used in export.py
+  'geir_filename': 'googlenet.geir' # file name of the geir model used in export.py
+  ```


 ## [Training Process](#contents)

 ### Training 

-```
-python train.py > train.log 2>&1 & 
-```
+- running on Ascend

-The python command above will run in the background, you can view the results through the file `train.log`.
+  ```
+  python train.py > train.log 2>&1 & 
+  ```
+  
+  The python command above will run in the background, you can view the results through the file `train.log`.
+  
+  After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
+  
+  ```
+  # grep "loss is " train.log
+  epoch: 1 step: 390, loss is 1.4842823
+  epcoh: 2 step: 390, loss is 1.0897788
+  ...
+  ```
+  
+  The model checkpoint will be saved in the current directory. 

-After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
+- running on GPU

-```
-# grep "loss is " train.log
-epoch: 1 step: 390, loss is 1.4842823
-epcoh: 2 step: 390, loss is 1.0897788
-...
-```
+  ```
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py > train.log 2>&1 & 
+  ```
+
+  The python command above will run in the background, you can view the results through the file `train.log`.
+  
+  After training, you'll get some checkpoint files under the folder `./ckpt_0/` by default.

-The model checkpoint will be saved in the current directory. 

 ### Distributed Training

-```
-Ascend: sh scripts/run_train.sh rank_table.json OR GPU: sh scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
-```
+- running on Ascend

-The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log`. The loss value will be achieved as follows:
+  ```
+  sh scripts/run_train.sh rank_table.json
+  ```
+  
+  The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log`. The loss value will be achieved as follows:
+  
+  ```
+  # grep "result: " train_parallel*/log
+  train_parallel0/log:epoch: 1 step: 48, loss is 1.4302931
+  train_parallel0/log:epcoh: 2 step: 48, loss is 1.4023874
+  ...
+  train_parallel1/log:epoch: 1 step: 48, loss is 1.3458025
+  train_parallel1/log:epcoh: 2 step: 48, loss is 1.3729336
+  ...
+  ...
+  ```

-```
-# grep "result: " train_parallel*/log
-train_parallel0/log:epoch: 1 step: 48, loss is 1.4302931
-train_parallel0/log:epcoh: 2 step: 48, loss is 1.4023874
-...
-train_parallel1/log:epoch: 1 step: 48, loss is 1.3458025
-train_parallel1/log:epcoh: 2 step: 48, loss is 1.3729336
-...
-...
-```
+- running on GPU
+
+  ```
+  sh scripts/run_train_gpu.sh 8 0,1,2,3,4,5,6,7
+  ```
+  
+  The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`.


 ## [Evaluation Process](#contents)

 ### Evaluation

-Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/googlenet/train_googlenet_cifar10-125_390.ckpt".
+- evaluation on CIFAR-10 dataset when running on Ascend

-```
-python eval.py > eval.log 2>&1 &  
-OR
-Ascned: sh scripts/run_eval.sh
-OR
-GPU: sh scripts/run_eval_gpu.sh
-```
+  Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/googlenet/train_googlenet_cifar10-125_390.ckpt".
+  
+  ```
+  python eval.py > eval.log 2>&1 &  
+  OR
+  sh scripts/run_eval.sh
+  ```
+  
+  The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+  
+  ```
+  # grep "accuracy: " eval.log
+  accuracy: {'acc': 0.934}
+  ```
+  
+  Note that for evaluation after distributed training, please set the checkpoint_path to be the last saved checkpoint file such as "username/googlenet/train_parallel0/train_googlenet_cifar10-125_48.ckpt". The accuracy of the test dataset will be as follows:
+  
+  ```
+  # grep "accuracy: " dist.eval.log
+  accuracy: {'acc': 0.9217}
+  ```

-The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+- evaluation on CIFAR-10 dataset when running on GPU

-```
-# grep "accuracy: " eval.log
-accuracy: {'acc': 0.934}
-```
+  Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/googlenet/train/ckpt_0/train_googlenet_cifar10-125_390.ckpt".
+  
+  ```
+  python eval.py --checkpoint_path=[CHECKPOINT_PATH] > eval.log 2>&1 &  
+  ```
+  
+  The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+  
+  ```
+  # grep "accuracy: " eval.log
+  accuracy: {'acc': 0.930}
+  ```

-Note that for evaluation after distributed training, please set the checkpoint_path to be the last saved checkpoint file such as "username/googlenet/train_parallel0/train_googlenet_cifar10-125_48.ckpt". The accuracy of the test dataset will be as follows:
+  OR,

-```
-# grep "accuracy: " dist.eval.log
-accuracy: {'acc': 0.9217}
-```
+  ```
+  sh scripts/run_eval_gpu.sh [CHECKPOINT_PATH]
+  ```
+  
+  The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of the test dataset will be as follows:
+  
+  ```
+  # grep "accuracy: " eval/eval.log
+  accuracy: {'acc': 0.930}
+  ```
+
+ 


 # [Model Description](#contents)
@ -212,100 +297,170 @@ accuracy: {'acc': 0.9217}

 ### Evaluation Performance 

-| Parameters                 | GoogleNet                                                   |
-| -------------------------- | ----------------------------------------------------------- |
-| Model Version              | Inception V1                                                |
-| Resource                   | Ascend 910 ；CPU 2.60GHz，56cores；Memory，314G             |
-| uploaded Date              | 06/09/2020 (month/day/year)                                 |
-| MindSpore Version          | 0.3.0-alpha                                                       |
-| Dataset                    | CIFAR-10                                                    |
-| Training Parameters        | epoch=125, steps=390, batch_size = 128, lr=0.1              |
-| Optimizer                  | SGD                                                         |
-| Loss Function              | Softmax Cross Entropy                                       |
-| outputs                    | probability                                                 |
-| Loss                       | 0.0016                                                      |
-| Speed                      | 1pc: 79 ms/step;  8pcs: 82 ms/step                          |
-| Total time                 | 1pc: 63.85 mins;  8pcs: 11.28 mins                          |
-| Parameters (M)             | 13.0                                                         |
-| Checkpoint for Fine tuning | 43.07M (.ckpt file)                                         |
-| Model for inference        | 21.50M (.onnx file),  21.60M(.air file)                    |
-| Scripts                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet |
+| Parameters                 | Ascend                                                      | GPU                    |
+| -------------------------- | ----------------------------------------------------------- | ---------------------- |
+| Model Version              | Inception V1                                                | Inception V1           |
+| Resource                   | Ascend 910 ；CPU 2.60GHz，56cores；Memory，314G             | NV SMX2 V100-32G       |
+| uploaded Date              | 06/09/2020 (month/day/year)                                 | 08/20/2020             |
+| MindSpore Version          | 0.2.0-alpha                                                 | 0.6.0-alpha            |
+| Dataset                    | CIFAR-10                                                    | CIFAR-10               |
+| Training Parameters        | epoch=125, steps=390, batch_size = 128, lr=0.1              | epoch=125, steps=390, batch_size=128, lr=0.1    |
+| Optimizer                  | SGD                                                         | SGD                    |
+| Loss Function              | Softmax Cross Entropy                                       | Softmax Cross Entropy  |
+| outputs                    | probability                                                 | probobility            |
+| Loss                       | 0.0016                                                      | 0.0016                 |
+| Speed                      | 1pc: 79 ms/step;  8pcs: 82 ms/step                          | 1pc: 150 ms/step;  8pcs: 164 ms/step      |
+| Total time                 | 1pc: 63.85 mins;  8pcs: 11.28 mins                          | 1pc: 126.87 mins;  8pcs: 21.65 mins      | 
+| Parameters (M)             | 13.0                                                        | 13.0                   |
+| Checkpoint for Fine tuning | 43.07M (.ckpt file)                                         | 43.07M (.ckpt file)    |
+| Model for inference        | 21.50M (.onnx file),  21.60M(.air file)                     |      | 
+| Scripts                    | [googlenet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet) | [googlenet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet) |


 ### Inference Performance

-| Parameters          | GoogleNet                   |
-| ------------------- | --------------------------- |
-| Model Version       | Inception V1                |
-| Resource            | Ascend 910                  |
-| Uploaded Date       | 06/09/2020 (month/day/year) |
-| MindSpore Version   | 0.2.0-alpha                       |
-| Dataset             | CIFAR-10, 10,000 images     |
-| batch_size          | 128                         |
-| outputs             | probability                 |
-| Accuracy            | 1pc: 93.4%;  8pcs: 92.17%   |
-| Model for inference | 21.50M (.onnx file)         |
+| Parameters          | Ascend                      | GPU                         |
+| ------------------- | --------------------------- | --------------------------- |
+| Model Version       | Inception V1                | Inception V1                |
+| Resource            | Ascend 910                  | GPU                         |
+| Uploaded Date       | 06/09/2020 (month/day/year) | 08/20/2020 (month/day/year) |
+| MindSpore Version   | 0.2.0-alpha                 | 0.6.0-alpha                 |
+| Dataset             | CIFAR-10, 10,000 images     | CIFAR-10, 10,000 images     |
+| batch_size          | 128                         | 128                         |
+| outputs             | probability                 | probability                 |
+| Accuracy            | 1pc: 93.4%;  8pcs: 92.17%   | 1pc: 93%, 8pcs: 92.89%      |
+| Model for inference | 21.50M (.onnx file)         |  |

 ## [How to use](#contents)
 ### Inference

 If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/network_migration.html). Following the steps below, this is a simple example:

-```
-# Load unseen dataset for inference
-dataset = dataset.create_dataset(cfg.data_path, 1, False)
+- Running on Ascend

-# Define model 
-net = GoogleNet(num_classes=cfg.num_classes)
-opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01,
-               cfg.momentum, weight_decay=cfg.weight_decay)
-loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', 
-                                        is_grad=False)
-model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+  ```
+  # Set context
+  context.set_context(mode=context.GRAPH_HOME, device_target=cfg.device_target)
+  context.set_context(device_id=cfg.device_id)
+  
+  # Load unseen dataset for inference
+  dataset = dataset.create_dataset(cfg.data_path, 1, False)
+  
+  # Define model 
+  net = GoogleNet(num_classes=cfg.num_classes)
+  opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01,
+                 cfg.momentum, weight_decay=cfg.weight_decay)
+  loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', 
+                                          is_grad=False)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+  
+  # Load pre-trained model
+  param_dict = load_checkpoint(cfg.checkpoint_path)
+  load_param_into_net(net, param_dict)
+  net.set_train(False)
+  
+  # Make predictions on the unseen dataset
+  acc = model.eval(dataset)
+  print("accuracy: ", acc)
+  ```

-# Load pre-trained model
-param_dict = load_checkpoint(cfg.checkpoint_path)
-load_param_into_net(net, param_dict)
-net.set_train(False)
+- Running on GPU:

-# Make predictions on the unseen dataset
-acc = model.eval(dataset)
-print("accuracy: ", acc)
-```
+  ```
+  # Set context
+  context.set_context(mode=context.GRAPH_HOME, device_target="GPU")
+  
+  # Load unseen dataset for inference
+  dataset = dataset.create_dataset(cfg.data_path, 1, False)
+  
+  # Define model 
+  net = GoogleNet(num_classes=cfg.num_classes)
+  opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01,
+                 cfg.momentum, weight_decay=cfg.weight_decay)
+  loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', 
+                                          is_grad=False)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+  
+  # Load pre-trained model
+  param_dict = load_checkpoint(args_opt.checkpoint_path)
+  load_param_into_net(net, param_dict)
+  net.set_train(False)
+  
+  # Make predictions on the unseen dataset
+  acc = model.eval(dataset)
+  print("accuracy: ", acc)
+
+  ```

 ### Continue Training on the Pretrained Model 

-```
-# Load dataset
-dataset = create_dataset(cfg.data_path, cfg.epoch_size)
-batch_num = dataset.get_dataset_size()
+- running on Ascend

-# Define model
-net = GoogleNet(num_classes=cfg.num_classes)
-# Continue training if set pre_trained to be True
-if cfg.pre_trained:
-    param_dict = load_checkpoint(cfg.checkpoint_path)
-    load_param_into_net(net, param_dict)
-lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size,    
-              steps_per_epoch=batch_num)
-opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 
-               Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay)
-loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
-model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
-              amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None)
+  ```
+  # Load dataset
+  dataset = create_dataset(cfg.data_path, 1)
+  batch_num = dataset.get_dataset_size()
+  
+  # Define model
+  net = GoogleNet(num_classes=cfg.num_classes)
+  # Continue training if set pre_trained to be True
+  if cfg.pre_trained:
+      param_dict = load_checkpoint(cfg.checkpoint_path)
+      load_param_into_net(net, param_dict)
+  lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size,    
+                steps_per_epoch=batch_num)
+  opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 
+                 Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay)
+  loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
+                amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None)
+  
+  # Set callbacks 
+  config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, 
+                               keep_checkpoint_max=cfg.keep_checkpoint_max)
+  time_cb = TimeMonitor(data_size=batch_num)
+  ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./", 
+                               config=config_ck)
+  loss_cb = LossMonitor()
+  
+  # Start training
+  model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
+  print("train success")
+  ```

-# Set callbacks 
-config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, 
-                             keep_checkpoint_max=cfg.keep_checkpoint_max)
-time_cb = TimeMonitor(data_size=batch_num)
-ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./", 
-                             config=config_ck)
-loss_cb = LossMonitor()
+- running on GPU

-# Start training
-model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
-print("train success")
-```
+  ```
+  # Load dataset
+  dataset = create_dataset(cfg.data_path, 1)
+  batch_num = dataset.get_dataset_size()
+  
+  # Define model
+  net = GoogleNet(num_classes=cfg.num_classes)
+  # Continue training if set pre_trained to be True
+  if cfg.pre_trained:
+      param_dict = load_checkpoint(cfg.checkpoint_path)
+      load_param_into_net(net, param_dict)
+  lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size,    
+                steps_per_epoch=batch_num)
+  opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 
+                 Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay)
+  loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
+                amp_level="O2", keep_batchnorm_fp32=True, loss_scale_manager=None)
+  
+  # Set callbacks 
+  config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, 
+                               keep_checkpoint_max=cfg.keep_checkpoint_max)
+  time_cb = TimeMonitor(data_size=batch_num)
+  ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./ckpt_" + str(get_rank()) + "/", 
+                               config=config_ck)
+  loss_cb = LossMonitor()
+  
+  # Start training
+  model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
+  print("train success")
+  ```

 ### Transfer Learning
 To be added.