update vit

2021-08-26 19:43:31 +08:00 · 2021-08-26 19:43:31 +08:00 · 33e6608ba7
parent 8c6d4a05fc
commit 33e6608ba7
40 changed files with 4662 additions and 0 deletions
--- a/model_zoo/official/cv/vit/README.md
+++ b/model_zoo/official/cv/vit/README.md
@ -0,0 +1,526 @@
+# Contents
+
+[查看中文](./README_CN.md)
+
+- [Vit Description](#vit-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Training Process](#training-process)
+        - [Training](#training)
+        - [Distributed Training](#distributed-training)
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+    - [Export Process](#Export-process)
+        - [Export](#Export)
+    - [Inference Process](#Inference-process)
+        - [Inference](#Inference)
+- [Model Description](#model-description)
+    - [Performance](#performance)
+        - [Evaluation Performance](#evaluation-performance)
+        - [Inference Performance](#evaluation-performance)
+    - [How to use](#how-to-use)
+        - [Inference](#inference)
+        - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
+- [Description of Random Situation](#description-of-random-situation)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+# [Vit Description](#contents)
+
+While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
+
+[Paper](https://arxiv.org/abs/2010.11929):  Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 2021.
+
+# [Model Architecture](#contents)
+
+Specifically, the vit contains transformer encoder. The structure is patch_embeding + n transformer layer + head(FC for classification).
+
+# [Dataset](#contents)
+
+Dataset used: [ImageNet2012](http://www.image-net.org/)
+
+- Dataset size 224*224 colorful images in 1000 classes
+    - Train：1,281,167 images
+    - Test： 50,000 images
+- Data format：jpeg
+    - Note：Data will be processed in dataset.py
+- Download the dataset, the directory structure is as follows:
+
+ ```bash
+└─dataset
+    ├─train                # train dataset, should be .tar file when running on clould
+    └─val                  # evaluate dataset
+```
+
+- Data format: RGB images.
+    - Note: Data will be processed in src/dataset.py
+
+# [Features](#contents)
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend/GPU/CPU）
+    - Prepare hardware environment with Ascend/GPU/CPU processor.
+- Framework
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
+- running on Ascend
+
+  ```python
+  # run training example CONFIG_PATH in ./config/*.yml or *.ymal
+  python train.py --config_path=[CONFIG_PATH] > train.log 2>&1 &
+
+  # run distributed training example
+  cd scripts;
+  bash run_train_distribute.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+
+  # run evaluation example
+  cd scripts;
+  bash run_eval.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+
+  # run inferenct example
+  cd scripts;
+  bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
+  ```
+
+  For distributed training, a hccl configuration file(RANK_TABLE_FILE) with JSON format needs to be created in advance.
+
+  Please follow the instructions in the link below:
+
+  <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools>.
+
+- ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows)
+
+    - Train imagenet 8p on ModelArts
+
+      ```python
+      # (1) Add "config_path='/path_to_code/config/vit_patch32_imagenet2012_config_cloud.yml'" on the website UI interface.
+      # (2) Perform a or b.
+      #       a. Set "enable_modelarts=1" on yml file.
+      #          Set "output_path" on yml file.
+      #          Set "data_path='/cache/data/ImageNet/'" on yml file.
+      #          Set other parameters on yml file you need.
+      #       b. Add "enable_modelarts=1" on the website UI interface.
+      #          Set "output_path" on yml file.
+      #          Set "data_path='/cache/data/ImageNet/'" on yml file.
+      #          Add other parameters on the website UI interface.
+      # (3) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
+      # (4) Set the code directory to "/path/vit" on the website UI interface.
+      # (5) Set the startup file to "train.py" on the website UI interface.
+      # (6) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
+      # (7) Create your job.
+      ```
+
+    - Eval imagenet on ModelArts
+
+      ```python
+      # (1) Add "config_path='/path_to_code/config/vit_eval.yml'" on the website UI interface.
+      # (2) Perform a or b.
+      #       a. Set "enable_modelarts=1" on yml file.
+      #          Set "output_path" on yml file.
+      #          Set "data_path='/cache/data/ImageNet/'" on yml file.
+      #          Set "checkpoint_url='s3://dir_to_trained_ckpt/'" on yml file.
+      #          Set "load_path='/cache/checkpoint_path/model.ckpt'" on yml file.
+      #          Set other parameters on yml file you need.
+      #       b. Add "enable_modelarts=1" on the website UI interface.
+      #          Add "dataset_name=imagenet" on the website UI interface.
+      #          Add "val_data_path=/cache/data/ImageNet/val/" on the website UI interface.
+      #          Add "checkpoint_url='s3://dir_to_trained_ckpt/'" on the website UI interface.
+      #          Add "load_path='/cache/checkpoint_path/model.ckpt'" on the website UI interface.
+      #          Add other parameters on the website UI interface.
+      # (3) Upload or copy your pretrained model to S3 bucket.
+      # (4) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
+      # (5) Set the code directory to "/path/vit" on the website UI interface.
+      # (6) Set the startup file to "eval.py" on the website UI interface.
+      # (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
+      # (8) Create your job.
+      ```
+
+    - Export on ModelArts
+
+      ```python
+      # (1) Add "config_path='/path_to_code/config/vit_export.yml'" on the website UI interface.
+      # (2) Perform a or b.
+      #       a. Set "enable_modelarts=1" on yml file.
+      #          Set "checkpoint_url='s3://dir_to_trained_ckpt/'" on yml file.
+      #          Set "load_path='/cache/checkpoint_path/model.ckpt'" on yml file.
+      #          Set other parameters on yml file you need.
+      #       b. Add "enable_modelarts=1" on the website UI interface.
+      #          Add "checkpoint_url=s3://dir_to_trained_ckpt/" on the website UI interface.
+      #          Add "load_path=/cache/checkpoint_path/model.ckpt" on the website UI interface.
+      #          Add other parameters on the website UI interface.
+      # (3) Upload or copy your trained model to S3 bucket.
+      # (4) Set the code directory to "/path/vit" on the website UI interface.
+      # (5) Set the startup file to "export.py" on the website UI interface.
+      # (6) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
+      # (7) Create your job.
+      ```
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+```text
+├── model_zoo
+    ├── README.md                            // descriptions about all the models
+    ├── vit
+        ├── README.md                        // descriptions about vit
+        ├── ascend310_infer                  // application for 310 inference
+        ├── scripts
+        │   ├──run_train_distribute.sh       // shell script for distributed on Ascend
+        │   ├──run_train_standalone.sh       // shell script for single node on Ascend
+        │   ├──run_eval.sh                   // shell script for evaluation on Ascend
+        │   ├──run_infer_310.sh              // shell script for 310 inference
+        ├── src
+        │   ├──autoaugment.py                // autoaugment for data processing
+        │   ├──callback.py                   // logging callback
+        │   ├──cross_entropy.py              // ce loss
+        │   ├──dataset.py                    // creating dataset
+        │   ├──eval_engine.py                // eval code
+        │   ├──logging.py                    // logging engine
+        │   ├──lr_generator.py               // lr schedule
+        │   ├──metric.py                     // metric for eval
+        │   ├──optimizer.py                  // user defined optimizer
+        │   ├──vit.py                        // model architecture
+        │   ├──model_utils                   // cloud depending files, all model zoo shares the same files, not recommend user changing
+        ├── config
+        │   ├──vit_eval.yml                                    // parameter configuration for eval
+        │   ├──vit_export.yml                                  // parameter configuration for export
+        │   ├──vit_patch32_imagenet2012_config.yml             // parameter configuration for 8P training
+        │   ├──vit_patch32_imagenet2012_config_cloud.yml       // parameter configuration for 8P training on cloud
+        │   ├──vit_patch32_imagenet2012_config_standalone.yml  // parameter configuration for 1P training
+        ├── train.py                         // training script
+        ├── eval.py                          //  evaluation script
+        ├── postprogress.py                  // post process for 310 inference
+        ├── export.py                        // export checkpoint files into air/mindir
+        ├── create_imagenet2012_label.py     // create label for 310 inference
+        ├── requirements.txt                 // requirements pip list
+        ├── mindspore_hub_conf.py            // The mindspore_hub_conf file required for the operation of the hub warehouse
+```
+
+## [Script Parameters](#contents)
+
+Parameters for both training and evaluation can be set in config.py
+
+- config for vit, ImageNet dataset
+
+  ```python
+  enable_modelarts: 1               # train on cloud or not
+
+  # Url for modelarts
+  data_url: ""                      # S3 dataset path
+  train_url: ""                     # S3 output path
+  checkpoint_url: ""                # S3 pretrain model path
+  output_path: "/cache/train"       # output cache, copy to train_url
+  data_path: "/cache/datasets/imagenet" # dataset cache(real path on cloud), copy from data_url
+  load_path: "/cache/model/vit_base_patch32.ckpt" # model cache, copy from checkpoint_url
+
+  # train datasets
+  dataset_path: '/cache/datasets/imagenet/train' # training dataset
+  train_image_size: 224             # image height and weight used as input to the model
+  interpolation: 'BILINEAR'         # dataset interpolation
+  crop_min: 0.05                    # random crop min value
+  batch_size: 256                   # batch size for train
+  train_num_workers: 14             # parallel work number
+
+  # eval datasets
+  eval_path: '/cache/datasets/imagenet/val' # eval dataset
+  eval_image_size: 224              # image height and weight used as input to the model
+  eval_batch_size: 256              # batch size for eval
+  eval_interval: 1                  # eval interval
+  eval_offset: -1                   # eval offset
+  eval_num_workers: 12              # parallel work number
+
+  # network
+  backbone: 'vit_base_patch32'      # backbone type
+  class_num: 1001                   # class number, imagenet is 1000+1
+  vit_config_path: 'src.vit.VitConfig' #vit config path, for advanced user to design transformer based new architecture
+  pretrained: ''                    # pre-trained model path, '' means not use pre-trained model
+
+  # lr
+  lr_decay_mode: 'cosine'           # lr decay type, support cosine, exp... detail see lr_generator.py
+  lr_init: 0.0                      # start lr(epoch 0)
+  lr_max: 0.00355                   # max lr
+  lr_min: 0.0                       # min lr (max epoch)
+  max_epoch: 300                    # max epoch
+  warmup_epochs: 40                 # warmup epoch
+
+  # optimizer
+  opt: 'adamw'                      # optimizer type
+  beta1: 0.9                        # adam beta
+  beta2: 0.999                      # adam beta
+  weight_decay: 0.05                # weight decay
+  no_weight_decay_filter: "beta,bias" # which type of weight not use weight decay
+  gc_flag: 0                        # use gc or not, not support for user defined opt, support for system defined opt
+
+  # loss, some parameter also used in datasets
+  loss_scale: 1024                  # amp loss scale
+  use_label_smooth: 1               # use label smooth or not
+  label_smooth_factor: 0.1          #label smooth factor
+  mixup: 0.2                        # use mixup or not
+  autoaugment: 1                    # use autoaugment or not
+  loss_name: "ce_smooth_mixup"      #loss type, detail see cross_entropy.py
+
+  # ckpt
+  save_checkpoint: 1                # save .ckpt(training result) or not
+  save_checkpoint_epochs: 8         # when to save .ckpt
+  keep_checkpoint_max: 3            # max keep ckpt
+  save_checkpoint_path: './outputs' # save path
+
+  # profiler
+  open_profiler: 0 # do profiling or not. if use profile, you'd better set a small dataset as training dataset and set max_epoch=1
+  ```
+
+For more configuration details, please refer the script `train.py`, `eval.py`, `export.py` and `config/*.yml`.
+
+## [Training Process](#contents)
+
+### Training
+
+- running on Ascend
+
+  ```bash
+  python train.py --config_path=[CONFIG_PATH] > train.log 2>&1 &
+  ```
+
+  The python command above will run in the background, you can view the results through the file `train.log`.
+
+  After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
+
+  ```bash
+  # vim log
+  2021-08-05 15:17:12:INFO:compile time used=143.16s
+  2021-08-05 15:34:41:INFO:epoch[0], epoch time: 1048.72s, per step time: 0.2096s, loss=6.738676, lr=0.000011, fps=1221.51
+  2021-08-05 15:52:03:INFO:epoch[1], epoch time: 1041.90s, per step time: 0.2082s, loss=6.381927, lr=0.000022, fps=1229.51
+  ...
+  ```
+
+  The model checkpoint will be saved in the train directory.
+
+### Distributed Training
+
+- running on Ascend
+
+  ```bash
+  cd scripts
+  bash run_train_distribute.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+  ```
+
+  The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log`. The loss value will be achieved as follows:
+
+  ```bash
+  # vim train_parallel0/log
+  # fps depend on cpu processing ability, data processing take times
+  2021-08-05 20:15:16:INFO:compile time used=191.77s
+  2021-08-05 20:17:46:INFO:epoch[0], epoch time: 149.10s, per step time: 0.2386s, loss=6.729037, lr=0.000089, fps=8584.97, accuracy=0.014940, eval_cost=1.58
+  2021-08-05 20:20:11:INFO:epoch[1], epoch time: 143.44s, per step time: 0.2295s, loss=6.786729, lr=0.000177, fps=8923.72, accuracy=0.047000, eval_cost=1.27
+
+  ...
+  2021-08-06 08:18:18:INFO:epoch[299], epoch time: 143.19s, per step time: 0.2291s, loss=2.718115, lr=0.000000, fps=8939.29, accuracy=0.741800, eval_cost=1.28
+  2021-08-06 08:18:20:INFO:training time used=43384.70s
+  2021-08-06 08:18:20:INFO:last_metric[0.74206]
+  2021-08-06 08:18:20:INFO:ip[*.*.*.*], mean_fps[8930.40]
+
+  ```
+
+## [Evaluation Process](#contents)
+
+### Evaluation
+
+- evaluation on imagenet dataset when running on Ascend
+
+  Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/vit/vit_base_patch32.ckpt".
+
+  ```bash
+  cd scripts
+  bash run_eval.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+  ```
+
+  The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
+
+  ```bash
+  # grep "accuracy=" eval0/log
+  accuracy=0.741260
+  ```
+
+  Note that for evaluation after distributed training, please choose the checkpoint_path to be the saved checkpoint file such as "username/vit/train_parallel0/outputs/vit_base_patch32-288_625.ckpt". The accuracy of the test dataset will be as follows:
+
+  ```bash
+  # grep "accuracy=" eval0/log
+  accuracy=0.741260
+  ```
+
+## [Export Process](#contents)
+
+### [Export](#content)
+
+Before export model, you must modify the config file, config/export.yml.
+The config items you should modify are batch_size and ckpt_file/pretrained.
+
+```bash
+python export.py --config_path=[CONFIG_PATH]
+```
+
+## [Inference Process](#contents)
+
+### [Inference](#content)
+
+Before performing inference, we need to export model first. Air model can only be exported in Ascend 910 environment, mindir model can be exported in any environment.
+Current batch_ Size can only be set to 1.
+
+- inference on imagenet dataset when running on Ascend
+
+  Before running the command below, you should modify the config file. The items you should modify are batch_size and val_data_path.
+
+  Inference result will be stored in the example path, you can find result like the followings in acc.log.
+
+  ```shell
+  # Ascend310 inference
+  cd scripts
+  bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
+  Total data: 50000, top1 accuracy: 0.74084, top5 accuracy: 0.91026
+  ```
+
+# [Model Description](#contents)
+
+## [Performance](#contents)
+
+### Evaluation Performance
+
+#### Vit on imagenet 1200k images
+
+| Parameters                 | Ascend                                                      |
+| -------------------------- | ----------------------------------------------------------- |
+| Model Version              | Vit                                                         |
+| Resource                   | Ascend 910; CPU 2.60GHz, 56cores; Memory 314G; OS Euler2.8  |
+| uploaded Date              | 08/30/2021 (month/day/year)                                 |
+| MindSpore Version          | 1.3.0                                                       |
+| Dataset                    | 1200k images                                                |
+| Training Parameters        | epoch=300, steps=625*300, batch_size=256, lr=0.00355        |
+| Optimizer                  | Adamw                                                       |
+| Loss Function              | Softmax Cross Entropy                                       |
+| outputs                    | probability                                                 |
+| Loss                       | 1.0                                                         |
+| Speed                      | 1pc: 180 ms/step;  8pcs: 185 ms/step                        |
+| Total time                 | 8pcs: 11 hours                                              |
+| Parameters (M)             | 86.0                                                        |
+| Checkpoint for Fine tuning | 1000M (.ckpt file)                                          |
+| Scripts                    | [vit script](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/vit) |
+
+### Inference Performance
+
+#### Vit on 1200k images
+
+| Parameters          | Ascend                      |
+| ------------------- | --------------------------- |
+| Model Version       | Vit                         |
+| Resource            | Ascend 910; OS Euler2.8     |
+| Uploaded Date       | 08/30/2021 (month/day/year) |
+| MindSpore Version   | 1.3.0                       |
+| Dataset             | 1200k images                |
+| batch_size          | 256                         |
+| outputs             | probability                 |
+| Accuracy            | 8pcs: 73.5%-74.6%           |
+
+## [How to use](#contents)
+
+### Inference
+
+If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example:
+
+- Running on Ascend
+
+  ```python
+  # get args from cfg and get parameter by args
+  args.loss_scale = ...
+  lrs = ...
+  ...
+  # Set context
+  context.set_context(mode=context.GRAPH_HOME, device_target=args.device_target)
+  context.set_context(device_id=args.device_id)
+
+  # Load unseen dataset for inference
+  dataset = dataset.create_dataset(args.data_path, 1, False)
+
+  # Define model
+  net = ViT(args.vit_config)
+  opt = AdamW(filter(lambda x: x.requires_grad, net.get_parameters()), lrs, args.beta1, args.beta2, loss_scale=args.loss_scale, weight_decay=cfg.weight_decay)
+  loss = CrossEntropySmoothMixup(smooth_factor=args.label_smooth_factor, num_classes=args.class_num)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+  # Load pre-trained model
+  param_dict = load_checkpoint(args.pretrained)
+  load_param_into_net(net, param_dict)
+  net.set_train(False)
+
+  # Make predictions on the unseen dataset
+  acc = model.eval(dataset)
+  print("accuracy: ", acc)
+  ```
+
+### Continue Training on the Pretrained Model
+
+- running on Ascend
+
+  ```python
+  # get args from cfg and get parameter by args
+  args.loss_scale = ...
+  lrs = ...
+  ...
+
+  # Load dataset
+  dataset = create_dataset(cfg.data_path, 1)
+  batch_num = dataset.get_dataset_size()
+
+  # Define model
+  net = ViT(args.vit_config)
+  # Continue training if set pre_trained to be True
+  if cfg.pretrained != '':
+      param_dict = load_checkpoint(cfg.pretrained)
+      load_param_into_net(net, param_dict)
+  # Define model
+  opt = AdamW(filter(lambda x: x.requires_grad, net.get_parameters()), lrs, args.beta1, args.beta2, loss_scale=args.loss_scale, weight_decay=cfg.weight_decay)
+  loss = CrossEntropySmoothMixup(smooth_factor=args.label_smooth_factor, num_classes=args.class_num)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+  # Start training
+  epoch_size = args.max_epoch
+  step_size = dataset.get_dataset_size()
+  # Set callbacks
+  state_cb = StateMonitor(data_size=step_size,
+                          tot_batch_size=args.batch_size * device_num,
+                          lrs=lrs,
+                          eval_interval=args.eval_interval,
+                          eval_offset=args.eval_offset,
+                          eval_engine=eval_engine,
+                          logger=args.logger.info)
+  cb = [state_cb, ]
+  model.train(epoch_size, dataset, callbacks=cb, sink_size=step_size)
+  print("train success")
+  ```
+
+# [Description of Random Situation](#contents)
+
+In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
+
+# [ModelZoo Homepage](#contents)
+
+ Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
--- a/model_zoo/official/cv/vit/README_CN.md
+++ b/model_zoo/official/cv/vit/README_CN.md
@ -0,0 +1,532 @@
+# 目录
+
+[View English](./README.md)
+
+<!-- TOC -->
+
+- [目录](#目录)
+- [Vit描述](#vit描述)
+- [模型架构](#模型架构)
+- [数据集](#数据集)
+- [特性](#特性)
+    - [混合精度](#混合精度)
+- [环境要求](#环境要求)
+- [快速入门](#快速入门)
+- [脚本说明](#脚本说明)
+    - [脚本及样例代码](#脚本及样例代码)
+    - [脚本参数](#脚本参数)
+    - [训练过程](#训练过程)
+        - [训练](#训练)
+        - [分布式训练](#分布式训练)
+    - [评估过程](#评估过程)
+        - [评估](#评估)
+     - [导出过程](#导出过程)
+        - [导出](#导出)
+    - [推理过程](#推理过程)
+        - [推理](#推理)
+- [模型描述](#模型描述)
+    - [性能](#性能)
+        - [评估性能](#评估性能)
+            - [120万张图像上的GoogleNet](#120万张图像上的vit)
+        - [推理性能](#推理性能)
+            - [120万张图像上的GoogleNet](#120万张图像上的vit)
+    - [使用流程](#使用流程)
+        - [推理](#推理)
+        - [继续训练预训练模型](#继续训练预训练模型)
+- [随机情况说明](#随机情况说明)
+- [ModelZoo主页](#modelzoo主页)
+
+<!-- /TOC -->
+
+# Vit描述
+
+vit：全名vision transformer，不同于传统的基于CNN的网络结果，是基于transformer结构的cv网络，2021年谷歌研究发表网络，在大数据集上表现了非常强的泛化能力。大数据任务（如clip）基于该结构能有良好的效果。
+
+[论文](https://arxiv.org/abs/2010.11929):  Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 2021.
+
+# 模型架构
+
+Vit是基于多个transformer encoder模块串联起来，由多个inception模块串联起来，基本结构由patch_embeding + n transformer layer + head(分类网络中就是FC)构成。
+
+# 数据集
+
+使用的数据集：[ImageNet2012](http://www.image-net.org/)
+
+- 数据集大小：125G，共1000个类、125万张彩色图像
+    - 训练集：120G，共120万张图像
+    - 测试集：5G，共5万张图像
+- 数据格式：RGB
+    - 注：数据将在src/dataset.py中处理。
+
+ ```bash
+└─dataset
+    ├─train                # 训练集, 云上训练得是 .tar压缩文件格式
+    └─val                  # 评估数据集
+```
+
+# 特性
+
+## 混合精度
+
+采用[混合精度](https://www.mindspore.cn/tutorial/training/zh-CN/master/advanced_use/enable_mixed_precision.html)的训练方法使用支持单精度和半精度数据来提高深度学习神经网络的训练速度，同时保持单精度训练所能达到的网络精度。混合精度训练提高计算速度、减少内存使用的同时，支持在特定硬件上训练更大的模型或实现更大批次的训练。
+以FP16算子为例，如果输入数据类型为FP32，MindSpore后台会自动降低精度来处理数据。用户可打开INFO日志，搜索“reduce precision”查看精度降低的算子。
+
+# 环境要求
+
+- 硬件（Ascend/GPU/CPU）
+    - 使用Ascend/GPU/CPU处理器来搭建硬件环境。
+- 框架
+    - [MindSpore](https://www.mindspore.cn/install/en)
+- 如需查看详情，请参见如下资源：
+    - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
+
+# 快速入门
+
+通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：
+
+- Ascend处理器环境运行
+
+  ```python
+  # 运行训练示例 CONFIG_PATH配置文件请参考'./config'路径下相关文件
+  python train.py --config_path=[CONFIG_PATH] > train.log 2>&1 &
+
+  # 运行分布式训练示例
+  cd scripts;
+  sh run_train_distribute.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+
+  # 运行评估示例
+  cd scripts;
+  bash run_eval.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+
+  # 运行推理示例
+  cd scripts;
+  bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
+  ```
+
+  对于分布式训练，需要提前创建JSON格式的hccl配置文件。
+
+  请遵循以下链接中的说明：
+
+ <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.>
+
+- 在 ModelArts 进行训练 (如果你想在modelarts上运行，可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
+
+    - 在 ModelArts 上使用8卡训练 ImageNet 数据集
+
+      ```python
+      # (1) 在网页上设置 "config_path='/path_to_code/config/vit_patch32_imagenet2012_config_cloud.yml'"
+      # (2) 执行a或者b
+      #       a. 在 .yml 文件中设置 "enable_modelarts=True"
+      #          在 .yml 文件中设置 "output_path"
+      #          在 .yml 文件中设置 "data_path='/cache/data/ImageNet/'"
+      #          在 .yml 文件中设置 其他参数
+      #       b. 在网页上设置 "enable_modelarts=True"
+      #          在网页上设置 "output_path"
+      #          在网页上设置 "data_path='/cache/data/ImageNet/'"
+      #          在网页上设置 其他参数
+      # (3) 上传你的压缩数据集到 S3 桶上 (你也可以上传原始的数据集，但那可能会很慢。)
+      # (4) 在网页上设置你的代码路径为 "/path/vit"
+      # (5) 在网页上设置启动文件为 "train.py"
+      # (6) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
+      # (7) 创建训练作业
+      ```
+
+    - 在 ModelArts 上使用单卡验证 ImageNet 数据集
+
+      ```python
+      # (1) 在网页上设置 "config_path='/path_to_code/config/vit_eval.yml'"
+      # (2) 执行a或者b
+      #       a. 在 .yml 文件中设置 "enable_modelarts=True"
+      #          在 .yml 文件中设置 "dataset_name='imagenet'"
+      #          在 .yml 文件中设置 "val_data_path='/cache/data/ImageNet/val/'"
+      #          在 .yml 文件中设置 "checkpoint_url='s3://dir_to_trained_ckpt/'"
+      #          在 .yml 文件中设置 "checkpoint_path='/cache/checkpoint_path/model.ckpt'"
+      #          在 .yml 文件中设置 其他参数
+      #       b. 在网页上设置 "enable_modelarts=True"
+      #          在网页上设置 "dataset_name=imagenet"
+      #          在网页上设置 "val_data_path=/cache/data/ImageNet/val/"
+      #          在网页上设置 "checkpoint_url='s3://dir_to_trained_ckpt/'"
+      #          在网页上设置 "checkpoint_path='/cache/checkpoint_path/model.ckpt'"
+      #          在网页上设置 其他参数
+      # (3) 上传你的预训练模型到 S3 桶上
+      # (4) 上传你的压缩数据集到 S3 桶上 (你也可以上传原始的数据集，但那可能会很慢。)
+      # (5) 在网页上设置你的代码路径为 "/path/vit"
+      # (6) 在网页上设置启动文件为 "eval.py"
+      # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
+      # (8) 创建训练作业
+      ```
+
+    - 在 ModelArts 上转模型
+
+      ```python
+      # (1) 在网页上设置 "config_path='/path_to_code/config/vit_export.yml'"
+      # (2) 执行a或者b
+      #       a. 在 .yml 文件中设置 "enable_modelarts=True"
+      #          在 .yml 文件中设置 "checkpoint_url='s3://dir_to_trained_ckpt/'"
+      #          在 .yml 文件中设置 "load_path='/cache/checkpoint_path/model.ckpt'"
+      #          在 .yml 文件中设置 其他参数
+      #       b. 在网页上设置 "enable_modelarts=True"
+      #          在网页上设置 "checkpoint_url=s3://dir_to_trained_ckpt/"
+      #          在网页上设置 "load_path=/cache/checkpoint_path/model.ckpt"
+      #          在网页上设置 其他参数
+      # (3) 上传你的预训练模型到 S3 桶上
+      # (4) 上传你的压缩数据集到 S3 桶上 (你也可以上传原始的数据集，但那可能会很慢。)
+      # (5) 在网页上设置你的代码路径为 "/path/vit"
+      # (6) 在网页上设置启动文件为 "export.py"
+      # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等
+      # (8) 创建训练作业
+      ```
+
+# 脚本说明
+
+## 脚本及样例代码
+
+```text
+├── model_zoo
+    ├── README.md                            // 所有模型相关说明
+    ├── vit
+        ├── README.md                        // vit模型相关说明
+        ├── ascend310_infer                  // 实现310推理源代码
+        ├── scripts
+        │   ├──run_train_distribute.sh       // 分布式到Ascend的shell脚本
+        │   ├──run_train_standalone.sh       // 单卡到Ascend的shell脚本
+        │   ├──run_eval.sh                   // Ascend评估的shell脚本
+        │   ├──run_infer_310.sh              // Ascend推理shell脚本
+        ├── src
+        │   ├──autoaugment.py                // 数据自动增强策略
+        │   ├──callback.py                   // 打印结果的回调函数
+        │   ├──cross_entropy.py              // ce loss函数
+        │   ├──dataset.py                    // 创建数据集
+        │   ├──eval_engine.py                // 评估策略
+        │   ├──logging.py                    // 自定义日志打印策略
+        │   ├──lr_generator.py               // lr的策略
+        │   ├──metric.py                     // 评估结果计算方式
+        │   ├──optimizer.py                  // 优化器
+        │   ├──vit.py                        // 模型结构
+        │   ├──model_utils                   // 云上训练依赖
+        ├── config
+        │   ├──vit_eval.yml                                    // 评估配置
+        │   ├──vit_export.yml                                  // 转模型配置
+        │   ├──vit_patch32_imagenet2012_config.yml             // 8p训练参数配置
+        │   ├──vit_patch32_imagenet2012_config_cloud.yml       // 8p云上训练参数配置
+        │   ├──vit_patch32_imagenet2012_config_standalone.yml  // 单p训练参数配置
+        ├── train.py                         // 训练脚本
+        ├── eval.py                          // 评估脚本
+        ├── postprogress.py                  // 310推理的后处理
+        ├── export.py                        // 模型转 air/mindir类型
+        ├── create_imagenet2012_label.py     // 310推理ImageNet转label格式
+        ├── requirements.txt                 // 依赖python包
+        ├── mindspore_hub_conf.py            // mindspore_hub_conf文件，为hub warehouse准备
+```
+
+## 脚本参数
+
+在./config/.yml中可以同时配置训练参数和评估参数。
+
+- vit和ImageNet数据集配置。
+
+  ```python
+  enable_modelarts: 1               # 是否云上训练
+
+  # modelarts云上参数
+  data_url: ""                      # S3 数据集路径
+  train_url: ""                     # S3 输出路径
+  checkpoint_url: ""                # S3 预训练模型路径
+  output_path: "/cache/train"       # 真实的云上机器路径，从train_url拷贝
+  data_path: "/cache/datasets/imagenet" # 真实的云上机器路径，从data_url拷贝
+  load_path: "/cache/model/vit_base_patch32.ckpt" #真实的云上机器路径，从checkpoint_url拷贝
+
+  # 训练数据集
+  dataset_path: '/cache/datasets/imagenet/train' # 训练数据集路径
+  train_image_size: 224             # 输入图片的宽高
+  interpolation: 'BILINEAR'         # 图片预处理的插值算法
+  crop_min: 0.05                    # random crop 最小参数
+  batch_size: 256                   # 训练batch size
+  train_num_workers: 14             # 并行work数量
+
+  # 评估数据集
+  eval_path: '/cache/datasets/imagenet/val' # eval dataset
+  eval_image_size: 224              # 输入图片的宽高
+  eval_batch_size: 256              # 评估batch size
+  eval_interval: 1                  # 评估 interval
+  eval_offset: -1                   # 评估 offset
+  eval_num_workers: 12              # 并行work数量
+
+  # 网络
+  backbone: 'vit_base_patch32'      # 网络backbone选择，目前支持vit_base_patch32和vit_base_patch16，更多的用户去vit.py下自定义添加即可
+  class_num: 1001                   # 训练数据集类别数
+  vit_config_path: 'src.vit.VitConfig' #vit网络相关配置路径, 高阶的用户可仿照该类自定义基于transformer的cv网络
+  pretrained: ''                    # 预训练模型路径, '' 指重头开始训练
+
+  # lr
+  lr_decay_mode: 'cosine'           # lr下降类型选择，支持cos、exp等，具体见lr_generator.py
+  lr_init: 0.0                      # 初始的lr(epoch 0)
+  lr_max: 0.00355                   # 最大的lr
+  lr_min: 0.0                       # 最后一个step的lr值
+  max_epoch: 300                    # 总的epoch
+  warmup_epochs: 40                 # warmup epoch值
+
+  # 优化器
+  opt: 'adamw'                      # 优化器类型
+  beta1: 0.9                        # adam beta参数
+  beta2: 0.999                      # adam beta参数
+  weight_decay: 0.05                # weight decay知
+  no_weight_decay_filter: "beta,bias" # 哪些权重不用weight decay
+  gc_flag: 0                        # 是否使用gc
+
+  # loss, 有些参数也用于dataset预处理
+  loss_scale: 1024                  # amp 静态loss scale值
+  use_label_smooth: 1               # 是否使用 label smooth
+  label_smooth_factor: 0.1          # label smooth因子的值
+  mixup: 0.2                        # 是否使用mixup
+  autoaugment: 1                    # 是否使用autoaugment
+  loss_name: "ce_smooth_mixup"      # loss类别选择, 详情看cross_entropy.py
+
+  # ckpt
+  save_checkpoint: 1                # 是否保存训练结果
+  save_checkpoint_epochs: 8         # 每隔多少个epoch存储一次
+  keep_checkpoint_max: 3            # 最多保留的结果数
+  save_checkpoint_path: './outputs' # 训练结果存储目录
+
+  # profiler
+  open_profiler: 0 # 是否开启性能评估，使用时最好用个小数据集+max_epoch设为1.
+  ```
+
+更多配置细节请参考脚本`train.py`, `eval.py`, `export.py` 和 `config/*.yml`。
+
+## 训练过程
+
+### 训练
+
+- Ascend处理器环境运行
+
+  ```bash
+  python train.py --config_path=[CONFIG_PATH] > train.log 2>&1 &
+  ```
+
+  上述python命令将在后台运行，您可以通过train.log文件查看结果。
+
+  训练结束后，您可在默认脚本文件夹下找到检查点文件。采用以下方式达到损失值：
+
+  ```bash
+  # vim log
+  2021-08-05 15:17:12:INFO:compile time used=143.16s
+  2021-08-05 15:34:41:INFO:epoch[0], epoch time: 1048.72s, per step time: 0.2096s, loss=6.738676, lr=0.000011, fps=1221.51
+  2021-08-05 15:52:03:INFO:epoch[1], epoch time: 1041.90s, per step time: 0.2082s, loss=6.381927, lr=0.000022, fps=1229.51
+  ...
+  ```
+
+  模型检查点保存在当前目录下。
+
+### 分布式训练
+
+- Ascend处理器环境运行
+
+  ```bash
+  cd scripts;
+  bash run_train_distribute.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+  ```
+
+  上述shell脚本将在后台运行分布训练。您可以通过train_parallel[X]/log文件查看结果。采用以下方式达到损失值：
+
+  ```bash
+  # vim train_parallel0/log
+  # fps跟cpu能力相关，由于用到了autoaugementation，patch32的vit运行速度是数据瓶颈
+  2021-08-05 20:15:16:INFO:compile time used=191.77s
+  2021-08-05 20:17:46:INFO:epoch[0], epoch time: 149.10s, per step time: 0.2386s, loss=6.729037, lr=0.000089, fps=8584.97, accuracy=0.014940, eval_cost=1.58
+  2021-08-05 20:20:11:INFO:epoch[1], epoch time: 143.44s, per step time: 0.2295s, loss=6.786729, lr=0.000177, fps=8923.72, accuracy=0.047000, eval_cost=1.27
+
+  ...
+  2021-08-06 08:18:18:INFO:epoch[299], epoch time: 143.19s, per step time: 0.2291s, loss=2.718115, lr=0.000000, fps=8939.29, accuracy=0.741800, eval_cost=1.28
+  2021-08-06 08:18:20:INFO:training time used=43384.70s
+  2021-08-06 08:18:20:INFO:last_metric[0.74206]
+  2021-08-06 08:18:20:INFO:ip[*.*.*.*], mean_fps[8930.40]
+
+  ```
+
+## 评估过程
+
+### 评估
+
+- 在Ascend环境运行时评估ImageNet数据集
+
+  在运行以下命令之前，请检查用于评估的检查点路径。请将检查点路径设置为绝对全路径，例如“username/vit/vit_base_patch32.ckpt”。
+
+  ```bash
+  cd scripts;
+  bash run_eval.sh [RANK_TABLE_FILE] [CONFIG_PATH]
+  ```
+
+  上述python命令将在后台运行，您可以通过eval.log文件查看结果。测试数据集的准确性如下：
+
+  ```bash
+  # grep "accuracy=" eval0/log
+  accuracy=0.741260
+  ```
+
+  注：对于分布式训练后评估，请将checkpoint_path设置为用户保存的检查点文件，如“username/vit/train_parallel0/outputs/vit_base_patch32-288_625.ckpt”。测试数据集的准确性如下：
+
+  ```bash
+  # grep "accuracy=" eval0/log
+  accuracy=0.741260
+  ```
+
+## 导出过程
+
+### 导出
+
+在导出之前需要修改数据集对应的配置文件，config/export.yml. 需要修改的配置项为 batch_size 和 ckpt_file.
+
+```shell
+python export.py --config_path=[CONFIG_PATH]
+```
+
+## 推理过程
+
+### 推理
+
+在还行推理之前我们需要先导出模型。Air模型只能在昇腾910环境上导出，mindir可以在任意环境上导出。batch_size只支持1。
+
+- 在昇腾310上使用ImageNet数据集进行推理
+
+  在执行下面的命令之前，我们需要先修改配置文件。修改的项包括batch_size和val_data_path。
+
+  推理的结果保存在当前目录下，在acc.log日志文件中可以找到类似以下的结果。
+
+  ```bash
+  # Ascend310 inference
+  cd scripts;
+  bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
+  Total data: 50000, top1 accuracy: 0.74084, top5 accuracy: 0.91026
+  ```
+
+- `NET_TYPE` 选择范围：[vit]。
+- `DATASET` 选择范围：[imagenet]。
+- `DEVICE_ID` 可选，默认值为0。
+
+# 模型描述
+
+## 性能
+
+### 评估性能
+
+#### imagenet 120万张图像上的Vit
+
+| 参数                       | Ascend                                                      |
+| -------------------------- | -----------------------------------------------------------|
+| 模型版本                   | Vit                                                         |
+| 资源                       | Ascend 910；CPU 2.60GHz，56核；内存 314G；系统 Euler2.8      |
+| 上传日期                   | 08/30/2021                                                  |
+| MindSpore版本              | 1.3.0                                                       |
+| 数据集                     | 120万张图像                                                  |
+| 训练参数                   | epoch=300, steps=625*300, batch_size=256, lr=0.00355        |
+| 优化器                     | Adamw                                                       |
+| 损失函数                   | Softmax交叉熵                                                |
+| 输出                       | 概率                                                        |
+| 损失                       | 1.0                                                         |
+| 速度                       | 单卡：180毫秒/步;  8卡：185毫秒/步                            |
+| 总时长                     | 8卡：11小时                                                  |
+| 参数(M)                    | 86.0                                                        |
+| 微调检查点                 | 1000M (.ckpt文件)                                            |
+| 脚本                    | [vit脚本](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/vit) |
+
+### 推理性能
+
+#### 120万张图像上的Vit
+
+| 参数                | Ascend                      |
+| ------------------- | --------------------------- |
+| 模型版本            | Vit                          |
+| 资源                |  Ascend 910；系统 Euler2.8   |
+| 上传日期            | 08/30/2021                   |
+| MindSpore版本       | 1.3.0                        |
+| 数据集              | 120万张图像                   |
+| batch_size          | 256                          |
+| 输出                | 概率                         |
+| 准确性              | 8卡: 73.5%-74.6%             |
+
+## 使用流程
+
+### 推理
+
+如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理，可参考此[链接](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html)。下面是操作步骤示例：
+
+- Ascend处理器环境运行
+
+  ```python
+  # 配置文件读取+通过配置文件生成模型训练需要的参数
+  args.loss_scale = ...
+  lrs = ...
+  ...
+  # 设置上下文
+  context.set_context(mode=context.GRAPH_HOME, device_target=args.device_target)
+  context.set_context(device_id=args.device_id)
+
+  # 加载未知数据集进行推理
+  dataset = dataset.create_dataset(args.data_path, 1, False)
+
+  # 定义模型
+  net = ViT(args.vit_config)
+  opt = AdamW(filter(lambda x: x.requires_grad, net.get_parameters()), lrs, args.beta1, args.beta2, loss_scale=args.loss_scale, weight_decay=cfg.weight_decay)
+  loss = CrossEntropySmoothMixup(smooth_factor=args.label_smooth_factor, num_classes=args.class_num)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+  # 加载预训练模型
+  param_dict = load_checkpoint(args.pretrained)
+  load_param_into_net(net, param_dict)
+  net.set_train(False)
+
+  # 执行评估
+  acc = model.eval(dataset)
+  print("accuracy: ", acc)
+  ```
+
+### 继续训练预训练模型
+
+- Ascend处理器环境运行
+
+  ```python
+  # 配置文件读取+通过配置文件生成模型训练需要的参数
+  args.loss_scale = ...
+  lrs = ...
+  ...
+
+  # 加载数据集
+  dataset = create_dataset(cfg.data_path, 1)
+  batch_num = dataset.get_dataset_size()
+
+  # 定义模型
+  net = ViT(args.vit_config)
+  # 若pre_trained为True，继续训练
+  if cfg.pretrained != '':
+      param_dict = load_checkpoint(cfg.pretrained)
+      load_param_into_net(net, param_dict)
+  # 定义训练模型
+  opt = AdamW(filter(lambda x: x.requires_grad, net.get_parameters()), lrs, args.beta1, args.beta2, loss_scale=args.loss_scale, weight_decay=cfg.weight_decay)
+  loss = CrossEntropySmoothMixup(smooth_factor=args.label_smooth_factor, num_classes=args.class_num)
+  model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+  # 开始训练
+  epoch_size = args.max_epoch
+  step_size = dataset.get_dataset_size()
+  # 设置回调
+  state_cb = StateMonitor(data_size=step_size,
+                          tot_batch_size=args.batch_size * device_num,
+                          lrs=lrs,
+                          eval_interval=args.eval_interval,
+                          eval_offset=args.eval_offset,
+                          eval_engine=eval_engine,
+                          logger=args.logger.info)
+  cb = [state_cb, ]
+  model.train(epoch_size, dataset, callbacks=cb, sink_size=step_size)
+  print("train success")
+  ```
+
+# 随机情况说明
+
+在dataset.py中，我们设置了“create_dataset”函数内的种子，同时还使用了train.py中的随机种子。
+
+# ModelZoo主页
+
+ 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。
--- a/model_zoo/official/cv/vit/ascend310_infer/inc/utils.h
+++ b/model_zoo/official/cv/vit/ascend310_infer/inc/utils.h
@ -0,0 +1,33 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef MINDSPORE_INFERENCE_UTILS_H_
+#define MINDSPORE_INFERENCE_UTILS_H_
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <vector>
+#include <string>
+#include <memory>
+#include "include/api/types.h"
+
+DIR *OpenDir(std::string_view dirName);
+std::string RealPath(std::string_view path);
+mindspore::MSTensor ReadFileToTensor(const std::string &file);
+int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs);
+std::vector<std::string> GetAllFiles(std::string dir_name);
+
+#endif
--- a/model_zoo/official/cv/vit/ascend310_infer/src/CMakeLists.txt
+++ b/model_zoo/official/cv/vit/ascend310_infer/src/CMakeLists.txt
@ -0,0 +1,14 @@
+cmake_minimum_required(VERSION 3.14.1)
+project(MindSporeCxxTestcase[CXX])
+add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined")
+set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/)
+option(MINDSPORE_PATH "mindspore install path" "")
+include_directories(${MINDSPORE_PATH})
+include_directories(${MINDSPORE_PATH}/include)
+include_directories(${PROJECT_SRC_ROOT}/../)
+find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib)
+file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*)
+
+add_executable(main main.cc utils.cc)
+target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags)
--- a/model_zoo/official/cv/vit/ascend310_infer/src/build.sh
+++ b/model_zoo/official/cv/vit/ascend310_infer/src/build.sh
@ -0,0 +1,18 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+cmake . -DMINDSPORE_PATH="`pip3.7 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`"
+make
--- a/model_zoo/official/cv/vit/ascend310_infer/src/main.cc
+++ b/model_zoo/official/cv/vit/ascend310_infer/src/main.cc
@ -0,0 +1,161 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <sys/time.h>
+#include <gflags/gflags.h>
+#include <dirent.h>
+#include <iostream>
+#include <string>
+#include <algorithm>
+#include <iosfwd>
+#include <vector>
+#include <fstream>
+#include <sstream>
+
+#include "include/api/model.h"
+#include "include/api/context.h"
+#include "include/api/types.h"
+#include "include/api/serialization.h"
+#include "include/dataset/vision_ascend.h"
+#include "include/dataset/execute.h"
+#include "include/dataset/transforms.h"
+#include "include/dataset/vision.h"
+#include "inc/utils.h"
+
+using mindspore::dataset::vision::Decode;
+using mindspore::dataset::vision::Resize;
+using mindspore::dataset::vision::CenterCrop;
+using mindspore::dataset::vision::Normalize;
+using mindspore::dataset::vision::HWC2CHW;
+using mindspore::dataset::TensorTransform;
+using mindspore::Context;
+using mindspore::Serialization;
+using mindspore::Model;
+using mindspore::Status;
+using mindspore::ModelType;
+using mindspore::GraphCell;
+using mindspore::kSuccess;
+using mindspore::MSTensor;
+using mindspore::dataset::Execute;
+
+DEFINE_string(mindir_path, "", "mindir path");
+DEFINE_string(dataset_path, ".", "dataset path");
+DEFINE_string(network, "vit", "networktype");
+DEFINE_string(dataset, "imagenet", "dataset");
+DEFINE_int32(device_id, 0, "device id");
+
+int main(int argc, char **argv) {
+    gflags::ParseCommandLineFlags(&argc, &argv, true);
+    if (RealPath(FLAGS_mindir_path).empty()) {
+        std::cout << "Invalid mindir" << std::endl;
+        return 1;
+    }
+
+    auto context = std::make_shared<Context>();
+    auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
+    ascend310->SetDeviceID(FLAGS_device_id);
+    context->MutableDeviceInfo().push_back(ascend310);
+    mindspore::Graph graph;
+    Serialization::Load(FLAGS_mindir_path, ModelType::kMindIR, &graph);
+    Model model;
+    Status ret = model.Build(GraphCell(graph), context);
+    if (ret != kSuccess) {
+        std::cout << "ERROR: Build failed." << std::endl;
+        return 1;
+    }
+
+    auto all_files = GetAllFiles(FLAGS_dataset_path);
+    if (all_files.empty()) {
+        std::cout << "ERROR: no input data." << std::endl;
+        return 1;
+    }
+
+    std::vector<MSTensor> modelInputs = model.GetInputs();
+    std::map<double, double> costTime_map;
+    size_t size = all_files.size();
+
+    std::shared_ptr<TensorTransform> decode = std::make_shared<Decode>();
+    std::shared_ptr<TensorTransform> hwc2chw = std::make_shared<HWC2CHW>();
+
+    std::shared_ptr<TensorTransform> resize = std::make_shared<Resize>(std::vector<int>{256});
+    std::shared_ptr<TensorTransform> centercrop = std::make_shared<CenterCrop>(std::vector<int>{224});
+    std::shared_ptr<TensorTransform> normalize = std::make_shared<Normalize>(
+        std::vector<float>{123.675, 116.28, 103.53}, std::vector<float>{58.395, 57.12, 57.375});
+
+    std::shared_ptr<TensorTransform> normalizeResnet101 = std::make_shared<Normalize>(
+        std::vector<float>{121.125, 115.005, 99.96}, std::vector<float>{70.125, 68.085, 70.89});
+
+    std::shared_ptr<TensorTransform> sr_resize = std::make_shared<Resize>(std::vector<int>{292});
+    std::shared_ptr<TensorTransform> sr_centercrop = std::make_shared<CenterCrop>(std::vector<int>{256});
+    std::shared_ptr<TensorTransform> sr_normalize = std::make_shared<Normalize>(
+        std::vector<float>{123.68, 116.78, 103.94}, std::vector<float>{1.0, 1.0, 1.0});
+
+    std::vector<std::shared_ptr<TensorTransform>> trans_list;
+
+    if (FLAGS_network == "se-resnet50") {
+        trans_list = {decode, sr_resize, sr_centercrop, sr_normalize, hwc2chw};
+    } else if (FLAGS_network == "resnet101") {
+        trans_list = {decode, resize, centercrop, normalizeResnet101, hwc2chw};
+    } else {
+        trans_list = {decode, resize, centercrop, normalize, hwc2chw};
+    }
+    mindspore::dataset::Execute SingleOp(trans_list);
+
+    for (size_t i = 0; i < size; ++i) {
+        struct timeval start = {0};
+        struct timeval end = {0};
+        double startTimeMs;
+        double endTimeMs;
+        std::vector<MSTensor> inputs;
+        std::vector<MSTensor> outputs;
+        std::cout << "Start predict input files:" << all_files[i] <<std::endl;
+
+        MSTensor image = ReadFileToTensor(all_files[i]);
+        if (FLAGS_dataset == "imagenet") {
+            SingleOp(image, &image);
+        }
+
+        inputs.emplace_back(modelInputs[0].Name(), modelInputs[0].DataType(), modelInputs[0].Shape(),
+                            image.Data().get(), image.DataSize());
+        gettimeofday(&start, nullptr);
+        ret = model.Predict(inputs, &outputs);
+        gettimeofday(&end, nullptr);
+        if (ret != kSuccess) {
+            std::cout << "Predict " << all_files[i] << " failed." << std::endl;
+            return 1;
+        }
+        startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
+        endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
+        costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
+        WriteResult(all_files[i], outputs);
+    }
+    double average = 0.0;
+    int inferCount = 0;
+
+    for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) {
+        average += iter->second - iter->first;
+        inferCount++;
+    }
+    average = average / inferCount;
+    std::stringstream timeCost;
+    timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl;
+    std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl;
+    std::string fileName = "./time_Result" + std::string("/test_perform_static.txt");
+    std::ofstream fileStream(fileName.c_str(), std::ios::trunc);
+    fileStream << timeCost.str();
+    fileStream.close();
+    costTime_map.clear();
+    return 0;
+}
--- a/model_zoo/official/cv/vit/ascend310_infer/src/utils.cc
+++ b/model_zoo/official/cv/vit/ascend310_infer/src/utils.cc
@ -0,0 +1,145 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <fstream>
+#include <algorithm>
+#include <iostream>
+#include "inc/utils.h"
+
+using mindspore::MSTensor;
+using mindspore::DataType;
+
+std::vector<std::string> GetAllFiles(std::string dirName) {
+    struct dirent *filename;
+    DIR *dir = OpenDir(dirName);
+    if (dir == nullptr) {
+        return {};
+    }
+    std::vector<std::string> dirs;
+    std::vector<std::string> files;
+    while ((filename = readdir(dir)) != nullptr) {
+        std::string dName = std::string(filename->d_name);
+        if (dName == "." || dName == "..") {
+            continue;
+        } else if (filename->d_type == DT_DIR) {
+            dirs.emplace_back(std::string(dirName) + "/" + filename->d_name);
+        } else if (filename->d_type == DT_REG) {
+            files.emplace_back(std::string(dirName) + "/" + filename->d_name);
+        } else {
+            continue;
+        }
+    }
+
+    for (auto d : dirs) {
+        dir = OpenDir(d);
+        while ((filename = readdir(dir)) != nullptr) {
+            std::string dName = std::string(filename->d_name);
+            if (dName == "." || dName == ".." || filename->d_type != DT_REG) {
+                continue;
+            }
+            files.emplace_back(std::string(d) + "/" + filename->d_name);
+        }
+    }
+    std::sort(files.begin(), files.end());
+    for (auto &f : files) {
+        std::cout << "image file: " << f << std::endl;
+    }
+    return files;
+}
+
+int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) {
+  std::string homePath = "./result_Files";
+  for (size_t i = 0; i < outputs.size(); ++i) {
+    size_t outputSize;
+    std::shared_ptr<const void> netOutput;
+    netOutput = outputs[i].Data();
+    outputSize = outputs[i].DataSize();
+    int pos = imageFile.rfind('/');
+    std::string fileName(imageFile, pos + 1);
+    fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin");
+    std::string outFileName = homePath + "/" + fileName;
+    FILE *outputFile = fopen(outFileName.c_str(), "wb");
+    fwrite(netOutput.get(), outputSize, sizeof(char), outputFile);
+    fclose(outputFile);
+    outputFile = nullptr;
+  }
+  return 0;
+}
+
+mindspore::MSTensor ReadFileToTensor(const std::string &file) {
+  if (file.empty()) {
+    std::cout << "Pointer file is nullptr" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  std::ifstream ifs(file);
+  if (!ifs.good()) {
+    std::cout << "File: " << file << " is not exist" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  if (!ifs.is_open()) {
+    std::cout << "File: " << file << "open failed" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  ifs.seekg(0, std::ios::end);
+  size_t size = ifs.tellg();
+  mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size);
+
+  ifs.seekg(0, std::ios::beg);
+  ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size);
+  ifs.close();
+
+  return buffer;
+}
+
+
+DIR *OpenDir(std::string_view dirName) {
+  if (dirName.empty()) {
+    std::cout << " dirName is null ! " << std::endl;
+    return nullptr;
+  }
+  std::string realPath = RealPath(dirName);
+  struct stat s;
+  lstat(realPath.c_str(), &s);
+  if (!S_ISDIR(s.st_mode)) {
+    std::cout << "dirName is not a valid directory !" << std::endl;
+    return nullptr;
+  }
+  DIR *dir;
+  dir = opendir(realPath.c_str());
+  if (dir == nullptr) {
+    std::cout << "Can not open dir " << dirName << std::endl;
+    return nullptr;
+  }
+  std::cout << "Successfully opened the dir " << dirName << std::endl;
+  return dir;
+}
+
+std::string RealPath(std::string_view path) {
+  char realPathMem[PATH_MAX] = {0};
+  char *realPathRet = nullptr;
+  realPathRet = realpath(path.data(), realPathMem);
+  if (realPathRet == nullptr) {
+    std::cout << "File: " << path << " is not exist.";
+    return "";
+  }
+
+  std::string realPath(realPathMem);
+  std::cout << path << " realpath is: " << realPath << std::endl;
+  return realPath;
+}
--- a/model_zoo/official/cv/vit/config/vit_eval.yml
+++ b/model_zoo/official/cv/vit/config/vit_eval.yml
@ -0,0 +1,20 @@
+enable_modelarts: 0
+
+# eval datasets
+interpolation: 'BILINEAR'
+eval_path: '/opt/npu/datasets/imagenet/val'
+eval_image_size: 224
+eval_batch_size: 256
+eval_interval: 1
+eval_offset: -1
+eval_num_workers: 12
+
+# load model
+pretrained: '../vit_base_patch32.ckpt'
+
+# network
+backbone: 'vit_base_patch32'
+class_num: 1001
+vit_config_path: 'src.vit.VitConfig'
+
+open_profiler: 0
--- a/model_zoo/official/cv/vit/config/vit_export.yml
+++ b/model_zoo/official/cv/vit/config/vit_export.yml
@ -0,0 +1,22 @@
+enable_modelarts: 0
+
+device_target: 'Ascend'
+device_id: 0
+
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+output_path: "/cache/train"
+
+file_name: 'vit_base_patch32.mindir'
+file_format: 'MINDIR'
+
+backbone: 'vit_base_patch32'
+train_image_size: 224
+class_num: 1001
+batch_size: 1
+vit_config_path: 'src.vit.VitConfig'
+
+# load model
+pretrained: './vit_base_patch32.ckpt'
--- a/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config.yml
+++ b/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config.yml
@ -0,0 +1,62 @@
+enable_modelarts: 0
+
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+output_path: "/cache/train"
+
+# train datasets
+dataset_path: '/opt/npu/datasets/imagenet/train'
+train_image_size: 224
+interpolation: 'BILINEAR'
+crop_min: 0.05
+batch_size: 256
+train_num_workers: 14
+
+# eval datasets
+eval_path: '/opt/npu/datasets/imagenet/val'
+eval_image_size: 224
+eval_batch_size: 256
+eval_interval: 1
+eval_offset: -1
+eval_num_workers: 12
+
+# network
+backbone: 'vit_base_patch32'
+class_num: 1001
+vit_config_path: 'src.vit.VitConfig'
+pretrained: ''
+
+# lr
+lr_decay_mode: 'cosine'
+lr_init: 0.0
+lr_max: 0.00355
+lr_min: 0.0
+max_epoch: 300
+warmup_epochs: 40
+
+# optimizer
+opt: 'adamw'
+beta1: 0.9
+beta2: 0.999
+weight_decay: 0.05
+no_weight_decay_filter: "beta,bias"
+gc_flag: 0
+
+# loss
+loss_scale: 1024
+use_label_smooth: 1
+label_smooth_factor: 0.1
+mixup: 0.2
+autoaugment: 1
+loss_name: "ce_smooth_mixup"
+
+# ckpt
+save_checkpoint: 1
+save_checkpoint_epochs: 8
+keep_checkpoint_max: 3
+save_checkpoint_path: './outputs'
+
+# profiler
+open_profiler: 0
--- a/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config_cloud.yml
+++ b/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config_cloud.yml
@ -0,0 +1,64 @@
+enable_modelarts: 1
+
+# Url for modelarts
+data_url: "s3://bucket-d/datasets/imagenet"
+train_url: "s3://bucket-d/train"
+checkpoint_url: "s3://bucket-d/model/vit_base_patch32.ckpt"
+output_path: "/cache/train"
+data_path: "/cache/datasets/imagenet"
+load_path: "/cache/model/vit_base_patch32.ckpt"
+
+# train datasets
+dataset_path: '/cache/datasets/imagenet/train'
+train_image_size: 224
+interpolation: 'BILINEAR'
+crop_min: 0.05
+batch_size: 256
+train_num_workers: 14
+
+# eval datasets
+eval_path: '/cache/datasets/imagenet/val'
+eval_image_size: 224
+eval_batch_size: 256
+eval_interval: 1
+eval_offset: -1
+eval_num_workers: 12
+
+# network
+backbone: 'vit_base_patch32'
+class_num: 1001
+vit_config_path: 'src.vit.VitConfig'
+pretrained: ''
+
+# lr
+lr_decay_mode: 'cosine'
+lr_init: 0.0
+lr_max: 0.00355
+lr_min: 0.0
+max_epoch: 300
+warmup_epochs: 40
+
+# optimizer
+opt: 'adamw'
+beta1: 0.9
+beta2: 0.999
+weight_decay: 0.05
+no_weight_decay_filter: "beta,bias"
+gc_flag: 0
+
+# loss
+loss_scale: 1024
+use_label_smooth: 1
+label_smooth_factor: 0.1
+mixup: 0.2
+autoaugment: 1
+loss_name: "ce_smooth_mixup"
+
+# ckpt
+save_checkpoint: 1
+save_checkpoint_epochs: 8
+keep_checkpoint_max: 3
+save_checkpoint_path: './outputs'
+
+# profiler
+open_profiler: 0
--- a/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config_standalone.yml
+++ b/model_zoo/official/cv/vit/config/vit_patch32_imagenet2012_config_standalone.yml
@ -0,0 +1,56 @@
+enable_modelarts: 0
+
+# train datasets
+dataset_path: '/opt/npu/datasets/imagenet/train'
+train_image_size: 224
+interpolation: 'BILINEAR'
+crop_min: 0.05
+batch_size: 256
+train_num_workers: 14
+
+# eval datasets
+eval_path: '/opt/npu/datasets/imagenet/val'
+eval_image_size: 224
+eval_batch_size: 256
+eval_interval: 1
+eval_offset: -1
+eval_num_workers: 12
+
+# network
+backbone: 'vit_base_patch32'
+class_num: 1001
+vit_config_path: 'src.vit.VitConfig'
+pretrained: ''
+
+# lr
+lr_decay_mode: 'cosine'
+lr_init: 0.0
+lr_max: 0.00044375
+lr_min: 0.0
+max_epoch: 300
+warmup_epochs: 40
+
+# optimizer
+opt: 'adamw'
+beta1: 0.9
+beta2: 0.999
+weight_decay: 0.05
+no_weight_decay_filter: "beta,bias"
+gc_flag: 0
+
+# loss
+loss_scale: 1024
+use_label_smooth: 1
+label_smooth_factor: 0.1
+mixup: 0.2
+autoaugment: 1
+loss_name: "ce_smooth_mixup"
+
+# ckpt
+save_checkpoint: 1
+save_checkpoint_epochs: 8
+keep_checkpoint_max: 3
+save_checkpoint_path: './outputs'
+
+# profiler
+open_profiler: 0
--- a/model_zoo/official/cv/vit/create_imagenet2012_label.py
+++ b/model_zoo/official/cv/vit/create_imagenet2012_label.py
@ -0,0 +1,49 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""create_imagenet2012_label"""
+import os
+import json
+import argparse
+
+parser = argparse.ArgumentParser(description="resnet imagenet2012 label")
+parser.add_argument("--img_path", type=str, required=True, help="imagenet2012 file path.")
+args = parser.parse_args()
+
+
+def create_label(file_path):
+    """create_imagenet2012_label"""
+    print("[WARNING] Create imagenet label. Currently only use for Imagenet2012!")
+    dirs = os.listdir(file_path)
+    file_list = []
+    for file in dirs:
+        file_list.append(file)
+    file_list = sorted(file_list)
+
+    total = 0
+    img_label = {}
+    for i, file_dir in enumerate(file_list):
+        files = os.listdir(os.path.join(file_path, file_dir))
+        for f in files:
+            img_label[f] = i
+        total += len(files)
+
+    with open("imagenet_label.json", "w+") as label:
+        json.dump(img_label, label)
+
+    print("[INFO] Completed! Total {} data.".format(total))
+
+
+if __name__ == '__main__':
+    create_label(args.img_path)
--- a/model_zoo/official/cv/vit/eval.py
+++ b/model_zoo/official/cv/vit/eval.py
@ -0,0 +1,137 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""eval script"""
+
+import os
+import numpy as np
+
+from mindspore import context
+from mindspore.train.model import Model, ParallelMode
+from mindspore.communication.management import init
+from mindspore.profiler.profiling import Profiler
+from mindspore.train.serialization import load_checkpoint
+
+from src.vit import get_network
+from src.dataset import get_dataset
+from src.optimizer import get_optimizer
+from src.eval_engine import get_eval_engine
+from src.logging import get_logger
+from src.model_utils.config import config
+from src.model_utils.moxing_adapter import moxing_wrapper
+
+try:
+    os.environ['MINDSPORE_HCCL_CONFIG_PATH'] = os.getenv('RANK_TABLE_FILE')
+
+    device_id = int(os.getenv('DEVICE_ID'))   # 0 ~ 7
+    local_rank = int(os.getenv('RANK_ID'))    # local_rank
+    device_num = int(os.getenv('RANK_SIZE'))  # world_size
+    print("distribute")
+except TypeError:
+    device_id = 0   # 0 ~ 7
+    local_rank = 0    # local_rank
+    device_num = 1  # world_size
+    print("standalone")
+
+def add_static_args(args):
+    """add_static_args"""
+    args.train_image_size = args.eval_image_size
+    args.weight_decay = 0.05
+    args.no_weight_decay_filter = ""
+    args.gc_flag = 0
+    args.beta1 = 0.9
+    args.beta2 = 0.999
+    args.loss_scale = 1024
+
+    args.dataset_name = 'imagenet'
+    args.save_checkpoint_path = './outputs'
+    args.eval_engine = 'imagenet'
+    args.auto_tune = 0
+    args.seed = 1
+
+    args.device_id = device_id
+    args.local_rank = local_rank
+    args.device_num = device_num
+
+    return args
+
+@moxing_wrapper()
+def eval_net():
+    """eval_net"""
+    args = add_static_args(config)
+    np.random.seed(args.seed)
+    args.logger = get_logger(args.save_checkpoint_path, rank=local_rank)
+
+    context.set_context(device_id=device_id,
+                        mode=context.GRAPH_MODE,
+                        device_target="Ascend",
+                        save_graphs=False)
+
+    if args.auto_tune:
+        context.set_context(auto_tune_mode='GA')
+    elif args.device_num == 1:
+        pass
+    else:
+        context.set_auto_parallel_context(device_num=device_num,
+                                          parallel_mode=ParallelMode.DATA_PARALLEL,
+                                          gradients_mean=True)
+
+    if args.open_profiler:
+        profiler = Profiler(output_path="data_{}".format(local_rank))
+
+    # init the distribute env
+    if not args.auto_tune and args.device_num > 1:
+        init()
+
+    # network
+    net = get_network(backbone_name=args.backbone, args=args)
+
+    if os.path.isfile(args.pretrained):
+        load_checkpoint(args.pretrained, net, strict_load=False)
+
+    # evaluation dataset
+    eval_dataset = get_dataset(dataset_name=args.dataset_name,
+                               do_train=False,
+                               dataset_path=args.eval_path,
+                               args=args)
+
+    opt, _ = get_optimizer(optimizer_name='adamw',
+                           network=net,
+                           lrs=1.0,
+                           args=args)
+
+    # evaluation engine
+    if args.auto_tune or args.open_profiler or eval_dataset is None:
+        args.eval_engine = ''
+    eval_engine = get_eval_engine(args.eval_engine, net, eval_dataset, args)
+
+    # model
+    model = Model(net, loss_fn=None, optimizer=opt,
+                  metrics=eval_engine.metric, eval_network=eval_engine.eval_network,
+                  loss_scale_manager=None, amp_level="O3")
+    eval_engine.set_model(model)
+    args.logger.save_args(args)
+    eval_engine.compile(sink_size=625) #step_size
+
+    eval_engine.eval()
+    output = eval_engine.get_result()
+
+    print_str = 'accuracy={:.6f}'.format(float(output))
+    print(print_str)
+
+    if args.open_profiler:
+        profiler.analyse()
+
+if __name__ == '__main__':
+    eval_net()
--- a/model_zoo/official/cv/vit/export.py
+++ b/model_zoo/official/cv/vit/export.py
@ -0,0 +1,52 @@
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+##############export checkpoint file into air and onnx models#################
+python export.py
+"""
+import os
+import numpy as np
+
+from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
+from src.model_utils.config import config
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.vit import get_network
+
+context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)
+if config.device_target == "Ascend":
+    context.set_context(device_id=config.device_id)
+
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    config.file_name = os.path.join(config.output_path, config.file_name)
+
+@moxing_wrapper(pre_process=modelarts_pre_process)
+def run_export():
+    """run export."""
+    net = get_network(backbone_name=config.backbone, args=config)
+
+    assert config.pretrained is not None, "checkpoint_path is None."
+
+    param_dict = load_checkpoint(config.pretrained)
+    load_param_into_net(net, param_dict)
+
+    config.height = config.train_image_size
+    config.width = config.train_image_size
+
+    input_arr = Tensor(np.zeros([config.batch_size, 3, config.height, config.width], np.float32))
+    export(net, input_arr, file_name=config.file_name, file_format=config.file_format)
+
+if __name__ == '__main__':
+    run_export()
--- a/model_zoo/official/cv/vit/mindspore_hub_conf.py
+++ b/model_zoo/official/cv/vit/mindspore_hub_conf.py
@ -0,0 +1,24 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""hub config."""
+from src.vit import vit_base_patch16, vit_base_patch32
+
+def create_network(name, *args, **kwargs):
+    """create_network about resnet"""
+    if name == 'vit_base_patch16':
+        return vit_base_patch16(*args, **kwargs)
+    if name == 'vit_base_patch32':
+        return vit_base_patch32(*args, **kwargs)
+    raise NotImplementedError(f"{name} is not implemented in the repo")
--- a/model_zoo/official/cv/vit/postprocess.py
+++ b/model_zoo/official/cv/vit/postprocess.py
@ -0,0 +1,52 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""post process for 310 inference"""
+
+import os
+import json
+import argparse
+import numpy as np
+
+batch_size = 1
+parser = argparse.ArgumentParser(description="resnet inference")
+parser.add_argument("--dataset", type=str, required=True, help="dataset type.")
+parser.add_argument("--result_path", type=str, required=True, help="result files path.")
+parser.add_argument("--label_path", type=str, required=True, help="image file path.")
+args = parser.parse_args()
+
+def cal_acc_imagenet(result_path, label_path):
+    """cal_acc_imagenet"""
+    files = os.listdir(result_path)
+    with open(label_path, "r") as label:
+        labels = json.load(label)
+    result_shape = (1, 1001)
+    top1 = 0
+    top5 = 0
+    total_data = len(files)
+    for file in files:
+        img_ids_name = file.split('_0.')[0]
+        data_path = os.path.join(result_path, img_ids_name + "_0.bin")
+        result = np.fromfile(data_path, dtype=np.float32).reshape(result_shape)
+        for batch in range(batch_size):
+            predict = np.argsort(-result[batch], axis=-1)
+            if labels[img_ids_name+".JPEG"] == predict[0]:
+                top1 += 1
+            if labels[img_ids_name+".JPEG"] in predict[:5]:
+                top5 += 1
+    print(f"Total data: {total_data}, top1 accuracy: {top1/total_data}, top5 accuracy: {top5/total_data}.")
+
+
+if __name__ == '__main__':
+    cal_acc_imagenet(args.result_path, args.label_path)
--- a/model_zoo/official/cv/vit/requirements.txt
+++ b/model_zoo/official/cv/vit/requirements.txt
@ -0,0 +1 @@
+easydict
--- a/model_zoo/official/cv/vit/scripts/run_eval.sh
+++ b/model_zoo/official/cv/vit/scripts/run_eval.sh
@ -0,0 +1,81 @@
+#!/bin/bash
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 2 ]
+then
+    echo "Usage: bash run_eval.sh [RANK_TABLE_FILE] [CONFIG_PATH]"
+exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+PATH1=$(get_real_path $1)
+CONFIG_FILE=$(get_real_path $2)
+
+if [ ! -f $PATH1 ]
+then
+    echo "error: RANK_TABLE_FILE=$PATH1 is not a directory"
+exit 1
+fi
+
+if [ ! -f $CONFIG_FILE ]
+then
+    echo "error: config_path=$CONFIG_PATH is not a file"
+exit 1
+fi
+
+ulimit -u unlimited
+export DEVICE_NUM=8
+export RANK_SIZE=8
+export RANK_TABLE_FILE=$PATH1
+
+export SERVER_ID=0
+rank_start=$((DEVICE_NUM * SERVER_ID))
+
+cpus=`cat /proc/cpuinfo| grep "processor"| wc -l`
+avg=`expr $cpus \/ $DEVICE_NUM`
+gap=`expr $avg \- 1`
+
+for((i=0; i<${DEVICE_NUM}; i++))
+do
+    start=`expr $i \* $avg`
+    end=`expr $start \+ $gap`
+    cmdopt=$start"-"$end
+    export DEVICE_ID=${i}
+    export RANK_ID=$((rank_start + i))
+    rm -rf ./eval$i
+    mkdir ./eval$i
+    cp ../*.py ./eval$i
+    cp *.sh ./eval$i
+    cp -r ../config/*.yml ./eval$i
+    cp -r ../src ./eval$i
+    cd ./eval$i || exit
+    echo "start training for rank $RANK_ID, device $DEVICE_ID"
+    env > env.log
+
+    if [ $# == 2 ]
+    then
+        taskset -c $cmdopt python eval.py --config_path=$CONFIG_FILE &> log &
+    fi
+
+    cd ..
+done
--- a/model_zoo/official/cv/vit/scripts/run_infer_310.sh
+++ b/model_zoo/official/cv/vit/scripts/run_infer_310.sh
@ -0,0 +1,115 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [[ $# -lt 4 || $# -gt 5 ]]; then
+    echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [NET_TYPE] [DATASET] [DATA_PATH] [DEVICE_ID]
+    NET_TYPE can choose from [vit]
+    DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
+exit 1
+fi
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        echo "$(realpath -m $PWD/$1)"
+    fi
+}
+model=$(get_real_path $1)
+if [ $2 == 'vit' ]; then
+  network=$2
+else
+  echo "NET_TYPE can choose from [vit]"
+  exit 1
+fi
+
+dataset=$3
+data_path=$(get_real_path $4)
+
+device_id=0
+if [ $# == 5 ]; then
+    device_id=$5
+fi
+
+echo "mindir name: "$model
+echo "dataset path: "$data_path
+echo "network: "$network
+echo "dataset: "$dataset
+echo "device id: "$device_id
+
+export ASCEND_HOME=/usr/local/Ascend/
+if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then
+    export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH
+    export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
+    export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe
+    export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH
+    export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp
+else
+    export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH
+    export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
+    export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH
+    export ASCEND_OPP_PATH=$ASCEND_HOME/opp
+fi
+
+function compile_app()
+{
+    cd ../ascend310_infer/src/ || exit
+    if [ -f "Makefile" ]; then
+        make clean
+    fi
+    bash build.sh &> build.log
+}
+
+function infer()
+{
+    cd - || exit
+    if [ -d result_Files ]; then
+        rm -rf ./result_Files
+    fi
+    if [ -d time_Result ]; then
+        rm -rf ./time_Result
+    fi
+    mkdir result_Files
+    mkdir time_Result
+    ../ascend310_infer/src/main --mindir_path=$model --dataset_path=$data_path --network=$network --dataset=$dataset --device_id=$device_id  &> infer.log
+}
+
+function cal_acc()
+{
+    python3.7 ../create_imagenet2012_label.py  --img_path=$data_path
+    python3.7 ../postprocess.py --dataset=$dataset --result_path=./result_Files --label_path=./imagenet_label.json  &> acc.log
+
+    if [ $? -ne 0 ]; then
+        echo "calculate accuracy failed"
+        exit 1
+    fi
+}
+
+compile_app
+if [ $? -ne 0 ]; then
+    echo "compile app code failed"
+    exit 1
+fi
+infer
+if [ $? -ne 0 ]; then
+    echo " execute inference failed"
+    exit 1
+fi
+cal_acc
+if [ $? -ne 0 ]; then
+    echo "calculate accuracy failed"
+    exit 1
+fi
--- a/model_zoo/official/cv/vit/scripts/run_train_distribute.sh
+++ b/model_zoo/official/cv/vit/scripts/run_train_distribute.sh
@ -0,0 +1,81 @@
+#!/bin/bash
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 2 ]
+then
+  echo "Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [CONFIG_PATH]"
+  exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+PATH1=$(get_real_path $1)
+CONFIG_FILE=$(get_real_path $2)
+
+if [ ! -f $PATH1 ]
+then
+    echo "error: RANK_TABLE_FILE=$PATH1 is not a file"
+exit 1
+fi
+
+if [ ! -f $CONFIG_FILE ]
+then
+    echo "error: config_path=$CONFIG_FILE is not a file"
+exit 1
+fi
+
+ulimit -u unlimited
+export DEVICE_NUM=8
+export RANK_SIZE=8
+export RANK_TABLE_FILE=$PATH1
+
+export SERVER_ID=0
+rank_start=$((DEVICE_NUM * SERVER_ID))
+
+cpus=`cat /proc/cpuinfo| grep "processor"| wc -l`
+avg=`expr $cpus \/ $DEVICE_NUM`
+gap=`expr $avg \- 1`
+
+for((i=0; i<${DEVICE_NUM}; i++))
+do
+    start=`expr $i \* $avg`
+    end=`expr $start \+ $gap`
+    cmdopt=$start"-"$end
+    export DEVICE_ID=${i}
+    export RANK_ID=$((rank_start + i))
+    rm -rf ./train_parallel$i
+    mkdir ./train_parallel$i
+    cp ../*.py ./train_parallel$i
+    cp *.sh ./train_parallel$i
+    cp -r ../config/*.yml ./train_parallel$i
+    cp -r ../src ./train_parallel$i
+    cd ./train_parallel$i || exit
+    echo "start training for rank $RANK_ID, device $DEVICE_ID"
+    env > env.log
+
+    if [ $# == 2 ]
+    then
+        taskset -c $cmdopt python train.py --config_path=$CONFIG_FILE &> log &
+    fi
+
+    cd ..
+done
--- a/model_zoo/official/cv/vit/scripts/run_train_standalone.sh
+++ b/model_zoo/official/cv/vit/scripts/run_train_standalone.sh
@ -0,0 +1,61 @@
+#!/bin/bash
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# != 1 ]
+then
+    echo "Usage: bash run_standalone_train.sh [CONFIG_PATH] "
+exit 1
+fi
+
+get_real_path(){
+  if [ "${1:0:1}" == "/" ]; then
+    echo "$1"
+  else
+    echo "$(realpath -m $PWD/$1)"
+  fi
+}
+
+CONFIG_FILE=$(get_real_path $1)
+
+if [ ! -f $CONFIG_FILE ]
+then
+    echo "error: config_path=$CONFIG_FILE is not a file"
+exit 1
+fi
+
+ulimit -u unlimited
+export DEVICE_NUM=1
+export RANK_ID=0
+export RANK_SIZE=1
+
+if [ -d "train" ];
+then
+    rm -rf ./train
+fi
+mkdir ./train
+cp ../config/*.yml ./train
+cp ../*.py ./train
+cp *.sh ./train
+cp -r ../src ./train
+cd ./train || exit
+echo "start training for device $DEVICE_ID"
+env > env.log
+if [ $# == 1 ]
+then
+    python train.py --config_path=$CONFIG_FILE &> log &
+fi
+
+cd ..
--- a/model_zoo/official/cv/vit/src/init.py
+++ b/model_zoo/official/cv/vit/src/init.py
--- a/model_zoo/official/cv/vit/src/autoaugment.py
+++ b/model_zoo/official/cv/vit/src/autoaugment.py
@ -0,0 +1,264 @@
+# MIT License
+
+# Copyright (c) 2018 Philip Popien
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# ============================================================================
+
+"""
+This code is based on https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py
+"""
+
+import random
+from PIL import Image, ImageEnhance, ImageOps
+import numpy as np
+
+
+class ImageNetPolicy():
+    """ Randomly choose one of the best 24 Sub-policies on ImageNet.
+        Example:
+        >>> policy = ImageNetPolicy()
+        >>> transformed = policy(image)
+        >>> transform=transforms.Compose([
+        >>>     transforms.Resize(256),
+        >>>     ImageNetPolicy(),
+        >>>     transforms.ToTensor()])
+    """
+    def __init__(self, fillcolor=(128, 128, 128)):
+        self.policies = [
+            SubPolicy(0.4, "posterize", 8, 0.6, "rotate", 9, fillcolor),
+            SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor),
+            SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor),
+            SubPolicy(0.6, "posterize", 7, 0.6, "posterize", 6, fillcolor),
+            SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor),
+
+            SubPolicy(0.4, "equalize", 4, 0.8, "rotate", 8, fillcolor),
+            SubPolicy(0.6, "solarize", 3, 0.6, "equalize", 7, fillcolor),
+            SubPolicy(0.8, "posterize", 5, 1.0, "equalize", 2, fillcolor),
+            SubPolicy(0.2, "rotate", 3, 0.6, "solarize", 8, fillcolor),
+            SubPolicy(0.6, "equalize", 8, 0.4, "posterize", 6, fillcolor),
+
+            SubPolicy(0.8, "rotate", 8, 0.4, "color", 0, fillcolor),
+            SubPolicy(0.4, "rotate", 9, 0.6, "equalize", 2, fillcolor),
+            SubPolicy(0.0, "equalize", 7, 0.8, "equalize", 8, fillcolor),
+            SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor),
+            SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor),
+
+            SubPolicy(0.8, "rotate", 8, 1.0, "color", 2, fillcolor),
+            SubPolicy(0.8, "color", 8, 0.8, "solarize", 7, fillcolor),
+            SubPolicy(0.4, "sharpness", 7, 0.6, "invert", 8, fillcolor),
+            SubPolicy(0.6, "shearX", 5, 1.0, "equalize", 9, fillcolor),
+            SubPolicy(0.4, "color", 0, 0.6, "equalize", 3, fillcolor),
+
+            SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor),
+            SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor),
+            SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor),
+            SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor),
+            SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor)
+        ]
+
+    def __call__(self, img, policy_idx=None):
+        if policy_idx is None or not isinstance(policy_idx, int):
+            policy_idx = random.randint(0, len(self.policies) - 1)
+        else:
+            policy_idx = policy_idx % len(self.policies)
+        return self.policies[policy_idx](img)
+
+    def __repr__(self):
+        return "AutoAugment ImageNet Policy"
+
+
+class CIFAR10Policy():
+    """ Randomly choose one of the best 25 Sub-policies on CIFAR10.
+        Example:
+        >>> policy = CIFAR10Policy()
+        >>> transformed = policy(image)
+        Example as a PyTorch Transform:
+        >>> transform=transforms.Compose([
+        >>>     transforms.Resize(256),
+        >>>     CIFAR10Policy(),
+        >>>     transforms.ToTensor()])
+    """
+    def __init__(self, fillcolor=(128, 128, 128)):
+        self.policies = [
+            SubPolicy(0.1, "invert", 7, 0.2, "contrast", 6, fillcolor),
+            SubPolicy(0.7, "rotate", 2, 0.3, "translateX", 9, fillcolor),
+            SubPolicy(0.8, "sharpness", 1, 0.9, "sharpness", 3, fillcolor),
+            SubPolicy(0.5, "shearY", 8, 0.7, "translateY", 9, fillcolor),
+            SubPolicy(0.5, "autocontrast", 8, 0.9, "equalize", 2, fillcolor),
+
+            SubPolicy(0.2, "shearY", 7, 0.3, "posterize", 7, fillcolor),
+            SubPolicy(0.4, "color", 3, 0.6, "brightness", 7, fillcolor),
+            SubPolicy(0.3, "sharpness", 9, 0.7, "brightness", 9, fillcolor),
+            SubPolicy(0.6, "equalize", 5, 0.5, "equalize", 1, fillcolor),
+            SubPolicy(0.6, "contrast", 7, 0.6, "sharpness", 5, fillcolor),
+
+            SubPolicy(0.7, "color", 7, 0.5, "translateX", 8, fillcolor),
+            SubPolicy(0.3, "equalize", 7, 0.4, "autocontrast", 8, fillcolor),
+            SubPolicy(0.4, "translateY", 3, 0.2, "sharpness", 6, fillcolor),
+            SubPolicy(0.9, "brightness", 6, 0.2, "color", 8, fillcolor),
+            SubPolicy(0.5, "solarize", 2, 0.0, "invert", 3, fillcolor),
+
+            SubPolicy(0.2, "equalize", 0, 0.6, "autocontrast", 0, fillcolor),
+            SubPolicy(0.2, "equalize", 8, 0.8, "equalize", 4, fillcolor),
+            SubPolicy(0.9, "color", 9, 0.6, "equalize", 6, fillcolor),
+            SubPolicy(0.8, "autocontrast", 4, 0.2, "solarize", 8, fillcolor),
+            SubPolicy(0.1, "brightness", 3, 0.7, "color", 0, fillcolor),
+
+            SubPolicy(0.4, "solarize", 5, 0.9, "autocontrast", 3, fillcolor),
+            SubPolicy(0.9, "translateY", 9, 0.7, "translateY", 9, fillcolor),
+            SubPolicy(0.9, "autocontrast", 2, 0.8, "solarize", 3, fillcolor),
+            SubPolicy(0.8, "equalize", 8, 0.1, "invert", 3, fillcolor),
+            SubPolicy(0.7, "translateY", 9, 0.9, "autocontrast", 1, fillcolor)
+        ]
+
+    def __call__(self, img, policy_idx=None):
+        if policy_idx is None or not isinstance(policy_idx, int):
+            policy_idx = random.randint(0, len(self.policies) - 1)
+        else:
+            policy_idx = policy_idx % len(self.policies)
+        return self.policies[policy_idx](img)
+
+    def __repr__(self):
+        return "AutoAugment CIFAR10 Policy"
+
+
+class SVHNPolicy():
+    """ Randomly choose one of the best 25 Sub-policies on SVHN.
+        Example:
+        >>> policy = SVHNPolicy()
+        >>> transformed = policy(image)
+        Example as a PyTorch Transform:
+        >>> transform=transforms.Compose([
+        >>>     transforms.Resize(256),
+        >>>     SVHNPolicy(),
+        >>>     transforms.ToTensor()])
+    """
+    def __init__(self, fillcolor=(128, 128, 128)):
+        self.policies = [
+            SubPolicy(0.9, "shearX", 4, 0.2, "invert", 3, fillcolor),
+            SubPolicy(0.9, "shearY", 8, 0.7, "invert", 5, fillcolor),
+            SubPolicy(0.6, "equalize", 5, 0.6, "solarize", 6, fillcolor),
+            SubPolicy(0.9, "invert", 3, 0.6, "equalize", 3, fillcolor),
+            SubPolicy(0.6, "equalize", 1, 0.9, "rotate", 3, fillcolor),
+
+            SubPolicy(0.9, "shearX", 4, 0.8, "autocontrast", 3, fillcolor),
+            SubPolicy(0.9, "shearY", 8, 0.4, "invert", 5, fillcolor),
+            SubPolicy(0.9, "shearY", 5, 0.2, "solarize", 6, fillcolor),
+            SubPolicy(0.9, "invert", 6, 0.8, "autocontrast", 1, fillcolor),
+            SubPolicy(0.6, "equalize", 3, 0.9, "rotate", 3, fillcolor),
+
+            SubPolicy(0.9, "shearX", 4, 0.3, "solarize", 3, fillcolor),
+            SubPolicy(0.8, "shearY", 8, 0.7, "invert", 4, fillcolor),
+            SubPolicy(0.9, "equalize", 5, 0.6, "translateY", 6, fillcolor),
+            SubPolicy(0.9, "invert", 4, 0.6, "equalize", 7, fillcolor),
+            SubPolicy(0.3, "contrast", 3, 0.8, "rotate", 4, fillcolor),
+
+            SubPolicy(0.8, "invert", 5, 0.0, "translateY", 2, fillcolor),
+            SubPolicy(0.7, "shearY", 6, 0.4, "solarize", 8, fillcolor),
+            SubPolicy(0.6, "invert", 4, 0.8, "rotate", 4, fillcolor),
+            SubPolicy(0.3, "shearY", 7, 0.9, "translateX", 3, fillcolor),
+            SubPolicy(0.1, "shearX", 6, 0.6, "invert", 5, fillcolor),
+
+            SubPolicy(0.7, "solarize", 2, 0.6, "translateY", 7, fillcolor),
+            SubPolicy(0.8, "shearY", 4, 0.8, "invert", 8, fillcolor),
+            SubPolicy(0.7, "shearX", 9, 0.8, "translateY", 3, fillcolor),
+            SubPolicy(0.8, "shearY", 5, 0.7, "autocontrast", 3, fillcolor),
+            SubPolicy(0.7, "shearX", 2, 0.1, "invert", 5, fillcolor)
+        ]
+
+    def __call__(self, img, policy_idx=None):
+        if policy_idx is None or not isinstance(policy_idx, int):
+            policy_idx = random.randint(0, len(self.policies) - 1)
+        else:
+            policy_idx = policy_idx % len(self.policies)
+        return self.policies[policy_idx](img)
+
+    def __repr__(self):
+        return "AutoAugment SVHN Policy"
+
+
+class SubPolicy():
+    """
+        Randomly choose one of the best 14 Sub-policies on SubPolicy.
+    """
+    def __init__(self, p1, operation1, magnitude_idx1, p2, operation2, magnitude_idx2, fillcolor=(128, 128, 128)):
+        ranges = {
+            "shearX": np.linspace(0, 0.3, 10),
+            "shearY": np.linspace(0, 0.3, 10),
+            "translateX": np.linspace(0, 150 / 331, 10),
+            "translateY": np.linspace(0, 150 / 331, 10),
+            "rotate": np.linspace(0, 30, 10),
+            "color": np.linspace(0.0, 0.9, 10),
+            "posterize": np.round(np.linspace(8, 4, 10), 0).astype(np.int),
+            "solarize": np.linspace(256, 0, 10),
+            "contrast": np.linspace(0.0, 0.9, 10),
+            "sharpness": np.linspace(0.0, 0.9, 10),
+            "brightness": np.linspace(0.0, 0.9, 10),
+            "autocontrast": [0] * 10,
+            "equalize": [0] * 10,
+            "invert": [0] * 10
+        }
+
+        # from https://stackoverflow.com/questions/5252170/specify-image-filling-color-when-rotating-in-python-with-pil-and-setting-expand
+        def rotate_with_fill(img, magnitude):
+            rot = img.convert("RGBA").rotate(magnitude)
+            return Image.composite(rot, Image.new("RGBA", rot.size, (128,) * 4), rot).convert(img.mode)
+
+        # pylint: disable = unnecessary-lambda
+        func = {
+            "shearX": lambda img, magnitude: img.transform(
+                img.size, Image.AFFINE, (1, magnitude * random.choice([-1, 1]), 0, 0, 1, 0),
+                Image.BICUBIC, fillcolor=fillcolor),
+            "shearY": lambda img, magnitude: img.transform(
+                img.size, Image.AFFINE, (1, 0, 0, magnitude * random.choice([-1, 1]), 1, 0),
+                Image.BICUBIC, fillcolor=fillcolor),
+            "translateX": lambda img, magnitude: img.transform(
+                img.size, Image.AFFINE, (1, 0, magnitude * img.size[0] * random.choice([-1, 1]), 0, 1, 0),
+                fillcolor=fillcolor),
+            "translateY": lambda img, magnitude: img.transform(
+                img.size, Image.AFFINE, (1, 0, 0, 0, 1, magnitude * img.size[1] * random.choice([-1, 1])),
+                fillcolor=fillcolor),
+            "rotate": lambda img, magnitude: rotate_with_fill(img, magnitude),
+            "color": lambda img, magnitude: ImageEnhance.Color(img).enhance(1 + magnitude * random.choice([-1, 1])),
+            "posterize": lambda img, magnitude: ImageOps.posterize(img, magnitude),
+            "solarize": lambda img, magnitude: ImageOps.solarize(img, magnitude),
+            "contrast": lambda img, magnitude: ImageEnhance.Contrast(img).enhance(
+                1 + magnitude * random.choice([-1, 1])),
+            "sharpness": lambda img, magnitude: ImageEnhance.Sharpness(img).enhance(
+                1 + magnitude * random.choice([-1, 1])),
+            "brightness": lambda img, magnitude: ImageEnhance.Brightness(img).enhance(
+                1 + magnitude * random.choice([-1, 1])),
+            "autocontrast": lambda img, magnitude: ImageOps.autocontrast(img),
+            "equalize": lambda img, magnitude: ImageOps.equalize(img),
+            "invert": lambda img, magnitude: ImageOps.invert(img)
+        }
+
+        self.p1 = p1
+        self.operation1 = func[operation1]
+        self.magnitude1 = ranges[operation1][magnitude_idx1]
+        self.p2 = p2
+        self.operation2 = func[operation2]
+        self.magnitude2 = ranges[operation2][magnitude_idx2]
+
+    def __call__(self, img):
+        if random.random() < self.p1: img = self.operation1(img, self.magnitude1)
+        if random.random() < self.p2: img = self.operation2(img, self.magnitude2)
+        return img
--- a/model_zoo/official/cv/vit/src/callback.py
+++ b/model_zoo/official/cv/vit/src/callback.py
@ -0,0 +1,108 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""callbacks"""
+
+import time
+import numpy as np
+from mindspore.train.callback import Callback
+from mindspore.common.tensor import Tensor
+
+class StateMonitor(Callback):
+    """StateMonitor"""
+    def __init__(self, data_size, tot_batch_size=None, lrs=None,
+                 eval_interval=None, eval_offset=None, eval_engine=None, logger=None):
+        super(StateMonitor, self).__init__()
+        self.data_size = data_size
+        self.tot_batch_size = tot_batch_size
+        self.lrs = lrs
+        self.epoch_num = 0
+        self.loss = 0
+        self.eval_interval = eval_interval
+        self.eval_offset = eval_offset
+        self.eval_engine = eval_engine
+        self.best_acc = -1
+        self.best_acc_top5 = -1
+        self.best_i2t_recall = -1
+        self.best_t2i_recall = -1
+        self.mean_fps = 0.0
+        self.print = print
+        if logger is not None:
+            self.print = logger
+
+
+    def step_end(self, run_context):
+        cb_params = run_context.original_args()
+        loss = cb_params.net_outputs
+
+        if isinstance(loss, (tuple, list)):
+            if isinstance(loss[0], Tensor) and isinstance(loss[0].asnumpy(), np.ndarray):
+                loss = loss[0]
+
+        if isinstance(loss, Tensor) and isinstance(loss.asnumpy(), np.ndarray):
+            loss = np.mean(loss.asnumpy())
+
+        self.loss = loss
+
+    def epoch_begin(self, run_context):
+        self.epoch_time = time.time()
+
+    def epoch_end(self, run_context):
+        epoch_seconds = (time.time() - self.epoch_time)
+        per_step_seconds = epoch_seconds / self.data_size
+
+        print_str = "epoch[{}]".format(self.epoch_num)
+        print_str += ', epoch time: {:.2f}s'.format(epoch_seconds)
+        print_str += ', per step time: {:.4f}s'.format(per_step_seconds)
+        print_str += ', loss={:.6f}'.format(self.loss)
+
+        if self.lrs is not None:
+            lr = self.lrs[(self.epoch_num + 1) * self.data_size - 1]
+            print_str += ', lr={:.6f}'.format(lr)
+
+        if self.tot_batch_size is not None:
+            fps = self.tot_batch_size * self.data_size / epoch_seconds
+            self.mean_fps = (self.mean_fps * self.epoch_num + fps) / (self.epoch_num + 1)
+            print_str += ', fps={:.2f}'.format(fps)
+
+        if (self.epoch_num + 1) % self.eval_interval == self.eval_offset:
+            eval_start = time.time()
+            self.eval_engine.eval()
+            output = self.eval_engine.get_result()
+            eval_seconds = time.time() - eval_start
+            if output is not None:
+                if isinstance(output, list):
+                    print_str += ', top1 accuracy={:.6f}'.format(float(output[0]))
+                    print_str += ', top5 accuracy={:.6f}'.format(float(output[1]))
+                    print_str += ', i2t_recall={:.6f}'.format(float(output[2]))
+                    print_str += ', t2i_recall={:.6f}'.format(float(output[3]))
+                    print_str += ', eval_cost={:.2f}'.format(eval_seconds)
+
+                    if float(output[0]) > self.best_acc:
+                        self.best_acc = float(output[0])
+                    if float(output[1]) > self.best_acc_top5:
+                        self.best_acc_top5 = float(output[1])
+                    if float(output[2]) > self.best_i2t_recall:
+                        self.best_i2t_recall = float(output[2])
+                    if float(output[3]) > self.best_t2i_recall:
+                        self.best_t2i_recall = float(output[3])
+                else:
+                    print_str += ', accuracy={:.6f}'.format(float(output))
+                    print_str += ', eval_cost={:.2f}'.format(eval_seconds)
+
+                    if float(output) > self.best_acc:
+                        self.best_acc = float(output)
+
+        self.print(print_str)
+        self.epoch_num += 1
--- a/model_zoo/official/cv/vit/src/cross_entropy.py
+++ b/model_zoo/official/cv/vit/src/cross_entropy.py
@ -0,0 +1,124 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""loss functions"""
+
+from mindspore import nn
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+try:
+    from mindspore.nn.loss.loss import Loss
+except ImportError:
+    try:
+        from mindspore.nn.loss.loss import LossBase as Loss
+    except ImportError:
+        from mindspore.nn.loss.loss import _Loss as Loss
+
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+
+
+class CrossEntropySmooth(Loss):
+    """CrossEntropy"""
+    def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000, aux_factor=0.4):
+        super().__init__()
+        self.aux_factor = aux_factor
+        self.onehot = P.OneHot()
+        self.sparse = sparse
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
+
+    def construct(self, logits, label):
+        if isinstance(logits, tuple):
+            logit, aux_logit = logits
+        else:
+            logit, aux_logit = logits, None
+
+        if self.sparse:
+            label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
+
+        loss = self.ce(logit, label)
+        if aux_logit is not None:
+            loss = loss + self.aux_factor * self.ce(aux_logit, label)
+        return loss
+
+
+class CrossEntropySmoothMixup(Loss):
+    """CrossEntropy"""
+    def __init__(self, reduction='mean', smooth_factor=0., num_classes=1000):
+        super().__init__()
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = 1.0 * smooth_factor / (num_classes - 2)
+        self.cross_entropy = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
+
+    def construct(self, logit, label):
+        off_label = P.Select()(P.Equal()(label, 0.0), \
+           P.Fill()(mstype.float32, P.Shape()(label), self.off_value), \
+           P.Fill()(mstype.float32, P.Shape()(label), 0.0))
+
+        label = self.on_value * label + off_label
+        loss = self.cross_entropy(logit, label)
+        return loss
+
+
+class CrossEntropyIgnore(Loss):
+    """CrossEntropyIgnore"""
+    def __init__(self, num_classes=21, ignore_label=255):
+        super().__init__()
+        self.one_hot = P.OneHot(axis=-1)
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.cast = P.Cast()
+        self.ce = nn.SoftmaxCrossEntropyWithLogits()
+        self.not_equal = P.NotEqual()
+        self.num_cls = num_classes
+        self.ignore_label = ignore_label
+        self.mul = P.Mul()
+        self.sum = P.ReduceSum(False)
+        self.div = P.RealDiv()
+        self.transpose = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, logits, labels):
+        labels_int = self.cast(labels, mstype.int32)
+        labels_int = self.reshape(labels_int, (-1,))
+        logits_ = self.transpose(logits, (0, 2, 3, 1))
+        logits_ = self.reshape(logits_, (-1, self.num_cls))
+        weights = self.not_equal(labels_int, self.ignore_label)
+        weights = self.cast(weights, mstype.float32)
+        one_hot_labels = self.one_hot(labels_int, self.num_cls, self.on_value, self.off_value)
+        loss = self.ce(logits_, one_hot_labels)
+        loss = self.mul(weights, loss)
+        loss = self.div(self.sum(loss), self.sum(weights))
+        return loss
+
+
+def get_loss(loss_name, args):
+    """get_loss"""
+    loss = None
+    if loss_name == 'ce_smooth':
+        loss = CrossEntropySmooth(smooth_factor=args.label_smooth_factor,
+                                  num_classes=args.class_num,
+                                  aux_factor=args.aux_factor)
+    elif loss_name == 'ce_smooth_mixup':
+        loss = CrossEntropySmoothMixup(smooth_factor=args.label_smooth_factor,
+                                       num_classes=args.class_num)
+    elif loss_name == 'ce_ignore':
+        loss = CrossEntropyIgnore(num_classes=args.class_num,
+                                  ignore_label=args.ignore_label)
+    else:
+        raise NotImplementedError
+
+    return loss
--- a/model_zoo/official/cv/vit/src/dataset.py
+++ b/model_zoo/official/cv/vit/src/dataset.py
@ -0,0 +1,171 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""create train or eval dataset."""
+
+import os
+import warnings
+from io import BytesIO
+from PIL import Image
+import numpy as np
+
+import mindspore.common.dtype as mstype
+import mindspore.dataset.engine as de
+import mindspore.dataset.vision.c_transforms as C
+import mindspore.dataset.transforms.c_transforms as C2
+import mindspore.dataset.vision.py_transforms as P
+from mindspore.dataset.vision.utils import Inter
+
+from .autoaugment import ImageNetPolicy
+
+warnings.filterwarnings("ignore", "(Possibly )?corrupt EXIF data", UserWarning)
+
+class ToNumpy:
+    def __init__(self):
+        pass
+
+    def __call__(self, img):
+        return np.asarray(img)
+
+def create_dataset(dataset_path,
+                   do_train,
+                   image_size=224,
+                   interpolation='BILINEAR',
+                   crop_min=0.05,
+                   repeat_num=1,
+                   batch_size=32,
+                   num_workers=12,
+                   autoaugment=False,
+                   mixup=0.0,
+                   num_classes=1001):
+    """create_dataset"""
+
+    if hasattr(Inter, interpolation):
+        interpolation = getattr(Inter, interpolation)
+    else:
+        interpolation = Inter.BILINEAR
+        print('cannot find interpolation_type: {}, use {} instead'.format(interpolation, 'BILINEAR'))
+
+    device_num = int(os.getenv("RANK_SIZE", '1'))
+    rank_id = int(os.getenv('RANK_ID', '0'))
+
+    if do_train:
+        ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=num_workers, shuffle=True,
+                                   num_shards=device_num, shard_id=rank_id)
+    else:
+        batch_per_step = batch_size * device_num
+        print("eval batch per step: {}".format(batch_per_step))
+        if batch_per_step < 50000:
+            if 50000 % batch_per_step == 0:
+                num_padded = 0
+            else:
+                num_padded = batch_per_step - (50000 % batch_per_step)
+        else:
+            num_padded = batch_per_step - 50000
+        print("eval dataset num_padded: {}".format(num_padded))
+
+        if num_padded != 0:
+            # padded_with_decode
+            white_io = BytesIO()
+            Image.new('RGB', (image_size, image_size), (255, 255, 255)).save(white_io, 'JPEG')
+            padded_sample = {
+                'image': np.array(bytearray(white_io.getvalue()), dtype='uint8'),
+                'label': np.array(-1, np.int32)
+            }
+            sample = [padded_sample for x in range(num_padded)]
+            ds_pad = de.PaddedDataset(sample)
+            ds_imagefolder = de.ImageFolderDataset(dataset_path, num_parallel_workers=num_workers)
+            ds = ds_pad + ds_imagefolder
+            distribute_sampler = de.DistributedSampler(num_shards=device_num, shard_id=rank_id, \
+              shuffle=False, num_samples=None)
+            ds.use_sampler(distribute_sampler)
+        else:
+            ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=num_workers, \
+              shuffle=False, num_shards=device_num, shard_id=rank_id)
+            print("eval dataset size: {}".format(ds.get_dataset_size()))
+
+    mean = [0.485*255, 0.456*255, 0.406*255]
+    std = [0.229*255, 0.224*255, 0.225*255]
+
+    # define map operations
+    if do_train:
+        trans = [
+            C.RandomCropDecodeResize(image_size, scale=(crop_min, 1.0), \
+              ratio=(0.75, 1.333), interpolation=interpolation),
+            C.RandomHorizontalFlip(prob=0.5),
+        ]
+        if autoaugment:
+            trans += [
+                P.ToPIL(),
+                ImageNetPolicy(),
+                ToNumpy(),
+            ]
+        trans += [
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW(),
+        ]
+    else:
+        resize = int(int(image_size / 0.875 / 16 + 0.5) * 16)
+        print('eval, resize:{}'.format(resize))
+        trans = [
+            C.Decode(),
+            C.Resize(resize, interpolation=interpolation),
+            C.CenterCrop(image_size),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    ds = ds.repeat(repeat_num)
+    ds = ds.map(input_columns="image", num_parallel_workers=num_workers, operations=trans, python_multiprocessing=True)
+    ds = ds.map(input_columns="label", num_parallel_workers=num_workers, operations=type_cast_op)
+
+    if do_train and mixup > 0:
+        one_hot_encode = C2.OneHot(num_classes)
+        ds = ds.map(operations=one_hot_encode, input_columns=["label"])
+
+    ds = ds.batch(batch_size, drop_remainder=True)
+
+    if do_train and mixup > 0:
+        trans_mixup = C.MixUpBatch(alpha=mixup)
+        ds = ds.map(input_columns=["image", "label"], num_parallel_workers=num_workers, operations=trans_mixup)
+
+    return ds
+
+
+def get_dataset(dataset_name, do_train, dataset_path, args):
+    """get_dataset"""
+    if dataset_name == "imagenet":
+        if do_train:
+            data = create_dataset(dataset_path=dataset_path,
+                                  do_train=True,
+                                  image_size=args.train_image_size,
+                                  interpolation=args.interpolation,
+                                  autoaugment=args.autoaugment,
+                                  mixup=args.mixup,
+                                  crop_min=args.crop_min,
+                                  batch_size=args.batch_size,
+                                  num_workers=args.train_num_workers)
+        else:
+            data = create_dataset(dataset_path=dataset_path,
+                                  do_train=False,
+                                  image_size=args.eval_image_size,
+                                  interpolation=args.interpolation,
+                                  batch_size=args.eval_batch_size,
+                                  num_workers=args.eval_num_workers)
+    else:
+        raise NotImplementedError
+
+    return data
--- a/model_zoo/official/cv/vit/src/eval_engine.py
+++ b/model_zoo/official/cv/vit/src/eval_engine.py
@ -0,0 +1,105 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""eval engine"""
+
+from mindspore import Tensor
+import mindspore.common.dtype as mstype
+
+from src.metric import ClassifyCorrectWithCache, ClassifyCorrectCell, DistAccuracy
+
+class BasicEvalEngine():
+    """BasicEvalEngine"""
+    def __init__(self):
+        pass
+
+    @property
+    def metric(self):
+        return None
+
+    @property
+    def eval_network(self):
+        return None
+
+    def compile(self, sink_size=-1):
+        pass
+
+    def eval(self):
+        pass
+
+    def set_model(self, model):
+        self.model = model
+
+    def get_result(self):
+        return None
+
+class ImageNetCacheEvelEngine(BasicEvalEngine):
+    """ImageNetCacheEvelEngine"""
+    def __init__(self, net, eval_dataset, args):
+        super().__init__()
+        self.dist_eval_network = ClassifyCorrectWithCache(net, eval_dataset)
+        self.outputs = None
+        self.args = args
+
+    def compile(self, sink_size=-1):
+        index = Tensor(0, mstype.int32)
+        self.dist_eval_network.set_train(False)
+        self.dist_eval_network.compile(index)
+
+    def eval(self):
+        index = Tensor(0, mstype.int32)
+        output = self.dist_eval_network(index)
+        output = output.asnumpy() / 50000
+        self.outputs = {"acc": output}
+
+    def get_result(self):
+        return self.outputs["acc"]
+
+
+class ImageNetEvelEngine(BasicEvalEngine):
+    """ImageNetEvelEngine"""
+    def __init__(self, net, eval_dataset, args):
+        super().__init__()
+        self.eval_dataset = eval_dataset
+        self.dist_eval_network = ClassifyCorrectCell(net)
+        self.args = args
+        self.outputs = None
+        self.model = None
+
+    @property
+    def metric(self):
+        return {'acc': DistAccuracy(batch_size=self.args.eval_batch_size, device_num=self.args.device_num)}
+
+    @property
+    def eval_network(self):
+        return self.dist_eval_network
+
+    def eval(self):
+        self.outputs = self.model.eval(self.eval_dataset)
+
+    def get_result(self):
+        return self.outputs["acc"]
+
+def get_eval_engine(engine_name, net, eval_dataset, args):
+    """get_eval_engine"""
+    if engine_name == '':
+        eval_engine = BasicEvalEngine()
+    elif engine_name == "imagenet":
+        eval_engine = ImageNetEvelEngine(net, eval_dataset, args)
+    elif engine_name == "imagenet_cache":
+        eval_engine = ImageNetCacheEvelEngine(net, eval_dataset, args)
+    else:
+        raise NotImplementedError
+
+    return eval_engine
--- a/model_zoo/official/cv/vit/src/logging.py
+++ b/model_zoo/official/cv/vit/src/logging.py
@ -0,0 +1,80 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""logging"""
+
+import logging
+import os
+import sys
+from datetime import datetime
+
+logger_name = 'mindspore-benchmark'
+
+
+class LOGGER(logging.Logger):
+    """
+    LOGGER
+    """
+    def __init__(self, logger_name_local, rank=0):
+        super().__init__(logger_name_local)
+        self.log_fn = None
+        if rank % 8 == 0:
+            console = logging.StreamHandler(sys.stdout)
+            console.setLevel(logging.INFO)
+            formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s', "%Y-%m-%d %H:%M:%S")
+            console.setFormatter(formatter)
+            self.addHandler(console)
+
+    def setup_logging_file(self, log_dir, rank=0):
+        """setup_logging_file"""
+        self.rank = rank
+        if not os.path.exists(log_dir):
+            os.makedirs(log_dir, exist_ok=True)
+        log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank)
+        log_fn = os.path.join(log_dir, log_name)
+        fh = logging.FileHandler(log_fn)
+        fh.setLevel(logging.INFO)
+        formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
+        fh.setFormatter(formatter)
+        self.addHandler(fh)
+        self.log_fn = log_fn
+
+    def info(self, msg, *args, **kwargs):
+        """info"""
+        if self.isEnabledFor(logging.INFO):
+            self._log(logging.INFO, msg, args, **kwargs)
+
+    def save_args(self, args):
+        """save_args"""
+        self.info('Args:')
+        if isinstance(args, (list, tuple)):
+            for value in args:
+                message = '--> {}'.format(value)
+                self.info(message)
+        else:
+            if isinstance(args, dict):
+                args_dict = args
+            else:
+                args_dict = vars(args)
+            for key in args_dict.keys():
+                message = '--> {}: {}'.format(key, args_dict[key])
+                self.info(message)
+        self.info('')
+
+
+def get_logger(path, rank=0):
+    """get_logger"""
+    logger = LOGGER(logger_name, rank)
+    logger.setup_logging_file(path, rank)
+    return logger
--- a/model_zoo/official/cv/vit/src/lr_generator.py
+++ b/model_zoo/official/cv/vit/src/lr_generator.py
@ -0,0 +1,93 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""learning rate generator"""
+
+import math
+import numpy as np
+
+def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    lr = float(init_lr) + lr_inc * current_step
+    return lr
+
+def get_lr(global_step, lr_init, lr_end, lr_max, warmup_epochs, \
+           total_epochs, steps_per_epoch, lr_decay_mode, poly_power=2.0):
+    """
+    generate learning rate array
+
+    Args:
+       global_step(int): total steps of the training
+       lr_init(float): init learning rate
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate
+       warmup_epochs(int): number of warmup epochs
+       total_epochs(int): total epoch of training
+       steps_per_epoch(int): steps of one epoch
+       lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or default
+
+    Returns:
+       np.array, learning rate array
+    """
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    warmup_steps = int(steps_per_epoch * warmup_epochs)
+    if lr_decay_mode == 'steps':
+        decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps]
+        for i in range(total_steps):
+            if i < decay_epoch_index[0]:
+                lr = lr_max
+            elif i < decay_epoch_index[1]:
+                lr = lr_max * 0.1
+            elif i < decay_epoch_index[2]:
+                lr = lr_max * 0.01
+            else:
+                lr = lr_max * 0.001
+            lr_each_step.append(lr)
+    elif lr_decay_mode == 'poly':
+        if warmup_steps != 0:
+            inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+        else:
+            inc_each_step = 0
+        for i in range(total_steps):
+            if i < warmup_steps:
+                lr = float(lr_init) + inc_each_step * float(i)
+            else:
+                base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
+                lr = float(lr_max - lr_end) * base ** poly_power + lr_end
+                lr = max(lr, 0.0)
+            lr_each_step.append(lr)
+    elif lr_decay_mode == 'cosine':
+        decay_steps = total_steps - warmup_steps
+        for i in range(total_steps):
+            if i < warmup_steps:
+                lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+                lr = float(lr_init) + lr_inc * (i + 1)
+            else:
+                cur_step = i + 1 - warmup_steps
+                lr = lr_max * (1 + math.cos(math.pi * cur_step / decay_steps)) / 2
+            lr_each_step.append(lr)
+    else:
+        for i in range(total_steps):
+            if i < warmup_steps:
+                lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+            else:
+                lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps)
+            lr_each_step.append(lr)
+
+    current_step = global_step
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[current_step:]
+
+    return learning_rate
--- a/model_zoo/official/cv/vit/src/metric.py
+++ b/model_zoo/official/cv/vit/src/metric.py
@ -0,0 +1,115 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""metric"""
+
+import numpy as np
+
+from mindspore.communication.management import GlobalComm
+from mindspore.ops import operations as P
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.common.tensor import Tensor
+from mindspore.common.parameter import Parameter
+
+
+class ClassifyCorrectWithCache(nn.Cell):
+    """ClassifyCorrectWithCache"""
+    def __init__(self, network, eval_dataset):
+        super(ClassifyCorrectWithCache, self).__init__(auto_prefix=False)
+        self._network = network
+        self.argmax = P.Argmax()
+        self.equal = P.Equal()
+        self.cast = P.Cast()
+        self.reduce_sum = P.ReduceSum()
+        self.allreduce = P.AllReduce(P.ReduceOp.SUM, GlobalComm.WORLD_COMM_GROUP)
+        self.assign_add = P.AssignAdd()
+        self.assign = P.Assign()
+        self._correct_num = Parameter(Tensor(0.0, mstype.float32), name="correct_num", requires_grad=False)
+        # save data to parameter
+        pdata = []
+        plabel = []
+        step_num = 0
+        for batch in eval_dataset.create_dict_iterator(output_numpy=True, num_epochs=1):
+            pdata.append(batch["image"])
+            plabel.append(batch["label"])
+            step_num = step_num + 1
+        pdata = Tensor(np.array(pdata), mstype.float32)
+        plabel = Tensor(np.array(plabel), mstype.int32)
+        self._data = Parameter(pdata, name="pdata", requires_grad=False)
+        self._label = Parameter(plabel, name="plabel", requires_grad=False)
+        self._step_num = Tensor(step_num, mstype.int32)
+
+    def construct(self, index):
+        self._correct_num = 0
+        while index < self._step_num:
+            data = self._data[index]
+            label = self._label[index]
+            outputs = self._network(data)
+            y_pred = self.argmax(outputs)
+            y_pred = self.cast(y_pred, mstype.int32)
+            y_correct = self.equal(y_pred, label)
+            y_correct = self.cast(y_correct, mstype.float32)
+            y_correct_sum = self.reduce_sum(y_correct)
+            self._correct_num += y_correct_sum #self.assign(self._correct_num, y_correct_sum)
+            index = index + 1
+        total_correct = self.allreduce(self._correct_num)
+        return total_correct
+
+
+class ClassifyCorrectCell(nn.Cell):
+    """ClassifyCorrectCell"""
+    def __init__(self, network):
+        super(ClassifyCorrectCell, self).__init__(auto_prefix=False)
+        self._network = network
+        self.argmax = P.Argmax()
+        self.equal = P.Equal()
+        self.cast = P.Cast()
+        self.reduce_sum = P.ReduceSum()
+        self.allreduce = P.AllReduce(P.ReduceOp.SUM, GlobalComm.WORLD_COMM_GROUP)
+
+    def construct(self, data, label):
+        outputs = self._network(data)
+        y_pred = self.argmax(outputs)
+        y_pred = self.cast(y_pred, mstype.int32)
+        y_correct = self.equal(y_pred, label)
+        y_correct = self.cast(y_correct, mstype.float32)
+        y_correct = self.reduce_sum(y_correct)
+        total_correct = self.allreduce(y_correct)
+        return (total_correct,)
+
+
+class DistAccuracy(nn.Metric):
+    """DistAccuracy"""
+    def __init__(self, batch_size, device_num):
+        super(DistAccuracy, self).__init__()
+        self.clear()
+        self.batch_size = batch_size
+        self.device_num = device_num
+
+    def clear(self):
+        self._correct_num = 0
+        self._total_num = 0
+
+    def update(self, *inputs):
+        if len(inputs) != 1:
+            raise ValueError('Distribute accuracy needs 1 input (y_correct), but got {}'.format(len(inputs)))
+        y_correct = self._convert_data(inputs[0])
+        self._correct_num += y_correct
+        self._total_num += self.batch_size * self.device_num
+
+    def eval(self):
+        if self._total_num == 0:
+            raise RuntimeError('Accuracy can not be calculated, because the number of samples is 0.')
+        return self._correct_num / 50000
--- a/model_zoo/official/cv/vit/src/model_utils/init.py
+++ b/model_zoo/official/cv/vit/src/model_utils/init.py
--- a/model_zoo/official/cv/vit/src/model_utils/config.py
+++ b/model_zoo/official/cv/vit/src/model_utils/config.py
@ -0,0 +1,129 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Parse arguments"""
+
+import os
+import ast
+import argparse
+from pprint import pprint, pformat
+import yaml
+
+class Config:
+    """
+    Configuration namespace. Convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+            print(cfg_helper)
+        except:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    current_dir = os.path.dirname(os.path.abspath(__file__))
+    parser.add_argument("--config_path", type=str,
+                        default=os.path.join(current_dir, "../../vit_patch32_imagenet2012_config.yml"),
+                        help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    pprint(default)
+    config_path = path_args.config_path
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=config_path)
+    final_config = merge(args, default)
+    return Config(final_config)
+
+config = get_config()
--- a/model_zoo/official/cv/vit/src/model_utils/device_adapter.py
+++ b/model_zoo/official/cv/vit/src/model_utils/device_adapter.py
@ -0,0 +1,27 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Device adapter for ModelArts"""
+
+from .config import config
+
+if config.enable_modelarts:
+    from .moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
+else:
+    from .local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
+
+__all__ = [
+    "get_device_id", "get_device_num", "get_rank_id", "get_job_id"
+]
--- a/model_zoo/official/cv/vit/src/model_utils/local_adapter.py
+++ b/model_zoo/official/cv/vit/src/model_utils/local_adapter.py
@ -0,0 +1,36 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Local adapter"""
+
+import os
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    return "Local Job"
--- a/model_zoo/official/cv/vit/src/model_utils/moxing_adapter.py
+++ b/model_zoo/official/cv/vit/src/model_utils/moxing_adapter.py
@ -0,0 +1,115 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Moxing adapter for ModelArts"""
+
+import os
+import functools
+from mindspore import context
+from .config import config
+
+_global_sync_count = 0
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    job_id = os.getenv('JOB_ID')
+    job_id = job_id if job_id != "" else "default"
+    return job_id
+
+def sync_data(from_path, to_path):
+    """
+    Download data from remote obs to local directory if the first url is remote url and the second one is local path
+    Upload data from local directory to remote obs in contrast.
+    """
+    import moxing as mox
+    import time
+    global _global_sync_count
+    sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
+    _global_sync_count += 1
+
+    # Each server contains 8 devices as most.
+    if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
+        print("from path: ", from_path)
+        print("to path: ", to_path)
+        mox.file.copy_parallel(from_path, to_path)
+        print("===finish data synchronization===")
+        try:
+            os.mknod(sync_lock)
+        except IOError:
+            pass
+        print("===save flag===")
+
+    while True:
+        if os.path.exists(sync_lock):
+            break
+        time.sleep(1)
+
+    print("Finish sync data from {} to {}.".format(from_path, to_path))
+
+
+def moxing_wrapper(pre_process=None, post_process=None):
+    """
+    Moxing wrapper to download dataset and upload outputs.
+    """
+    def wrapper(run_func):
+        @functools.wraps(run_func)
+        def wrapped_func(*args, **kwargs):
+            # Download data from data_url
+            if config.enable_modelarts:
+                if config.data_url:
+                    sync_data(config.data_url, config.data_path)
+                    print("Dataset downloaded: ", os.listdir(config.data_path))
+                if config.checkpoint_url:
+                    sync_data(config.checkpoint_url, config.load_path)
+                    print("Preload downloaded: ", os.listdir(config.load_path))
+                if config.train_url:
+                    sync_data(config.train_url, config.output_path)
+                    print("Workspace downloaded: ", os.listdir(config.output_path))
+
+                context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
+                config.device_num = get_device_num()
+                config.device_id = get_device_id()
+                if not os.path.exists(config.output_path):
+                    os.makedirs(config.output_path)
+
+                if pre_process:
+                    pre_process()
+
+            run_func(*args, **kwargs)
+
+            # Upload data to train_url
+            if config.enable_modelarts:
+                if post_process:
+                    post_process()
+
+                if config.train_url:
+                    print("Start to copy output directory")
+                    sync_data(config.output_path, config.train_url)
+        return wrapped_func
+    return wrapper
--- a/model_zoo/official/cv/vit/src/optimizer.py
+++ b/model_zoo/official/cv/vit/src/optimizer.py
@ -0,0 +1,214 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Gradient clipping wrapper for optimizers."""
+
+import numpy as np
+
+from mindspore._checkparam import Validator as validator
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+
+from mindspore.common import dtype as mstype
+from mindspore.common.initializer import initializer
+from mindspore.common.parameter import Parameter
+from mindspore.common.tensor import Tensor
+from mindspore._checkparam import Rel
+from mindspore.nn.optim import Optimizer
+from mindspore.nn.optim.optimizer import opt_init_args_register
+
+
+def _check_param_value(beta1, beta2, eps, prim_name):
+    """Check the type of inputs."""
+    validator.check_value_type("beta1", beta1, [float], prim_name)
+    validator.check_value_type("beta2", beta2, [float], prim_name)
+    validator.check_value_type("eps", eps, [float], prim_name)
+    validator.check_float_range(beta1, 0.0, 1.0, Rel.INC_NEITHER, "beta1", prim_name)
+    validator.check_float_range(beta2, 0.0, 1.0, Rel.INC_NEITHER, "beta2", prim_name)
+    validator.check_positive_float(eps, "eps", prim_name)
+
+
+_grad_scale = C.MultitypeFuncGraph("grad_scale")
+op_mul = P.Mul()
+map_ = C.Map()
+
+
+@_grad_scale.register("Number", "Tensor")
+def tensor_grad_scale(scale, grad):
+    """Get grad with scale."""
+    if scale == 1.0:
+        return grad
+    return op_mul(grad, F.cast(scale, F.dtype(grad)))
+
+
+@_grad_scale.register("Tensor", "Tensor")
+def tensor_grad_scale_with_tensor(scale, grad):
+    """Get grad with scale."""
+    return op_mul(grad, F.cast(scale, F.dtype(grad)))
+
+
+def scale_grad(gradients, reciprocal_scale):
+    gradients = map_(F.partial(_grad_scale, reciprocal_scale), gradients)
+    return gradients
+
+
+_adam_opt = C.MultitypeFuncGraph("adam_opt")
+_scaler_one = Tensor(1, mstype.int32)
+
+
+@_adam_opt.register("Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Number", "Tensor", "Tensor", "Tensor",
+                    "Tensor", "Bool", "Bool")
+def _update_run_op(beta1_power, beta2_power, beta1, beta2, eps, lr, weight_decay, param, \
+                   m, v, gradient, decay_flag, optim_filter):
+    """
+    Update parameters.
+
+    Args:
+        beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+        beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
+        eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
+        lr (Tensor): Learning rate.
+        weight_decay (Number): Weight decay. Should be equal to or greater than 0.
+        param (Tensor): Parameters.
+        m (Tensor): m value of parameters.
+        v (Tensor): v value of parameters.
+        gradient (Tensor): Gradient of parameters.
+        decay_flag (bool): Applies weight decay or not.
+        optim_filter (bool): Applies parameter update or not.
+
+    Returns:
+        Tensor, the new value of v after updating.
+    """
+    if optim_filter:
+        # op_mul = P.Mul(), defined output
+        op_square = P.Square()
+        op_sqrt = P.Sqrt()
+        op_cast = P.Cast()
+        op_reshape = P.Reshape()
+        op_shape = P.Shape()
+
+        param_fp32 = op_cast(param, mstype.float32)
+        m_fp32 = op_cast(m, mstype.float32)
+        v_fp32 = op_cast(v, mstype.float32)
+        gradient_fp32 = op_cast(gradient, mstype.float32)
+
+        next_m = op_mul(beta1, m_fp32) + op_mul(op_cast(F.tuple_to_array((1.0,)), mstype.float32)
+                                                - beta1, gradient_fp32)
+
+        next_v = op_mul(beta2, v_fp32) + op_mul(op_cast(F.tuple_to_array((1.0,)), mstype.float32)
+                                                - beta2, op_square(gradient_fp32))
+
+        regulate_m = next_m / (_scaler_one - beta1_power)
+        regulate_v = next_v / (_scaler_one - beta2_power)
+
+        update = regulate_m / (eps + op_sqrt(regulate_v))
+        if decay_flag:
+            update = op_mul(weight_decay, param_fp32) + update
+
+        update_with_lr = op_mul(lr, update)
+        next_param = param_fp32 - op_reshape(update_with_lr, op_shape(param_fp32))
+
+        next_param = F.depend(next_param, F.assign(param, op_cast(next_param, F.dtype(param))))
+        next_param = F.depend(next_param, F.assign(m, op_cast(next_m, F.dtype(m))))
+        next_param = F.depend(next_param, F.assign(v, op_cast(next_v, F.dtype(v))))
+
+        return op_cast(next_param, F.dtype(param))
+    return gradient
+
+
+class AdamW(Optimizer):
+    """
+    Implements the gradient clipping by norm for a AdamWeightDecay optimizer.
+    """
+    @opt_init_args_register
+    def __init__(self, params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, \
+                 weight_decay=0.0, loss_scale=1.0, clip=False):
+        super(AdamW, self).__init__(learning_rate, params, weight_decay)
+        _check_param_value(beta1, beta2, eps, self.cls_name)
+        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
+        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
+        self.eps = Tensor(np.array([eps]).astype(np.float32))
+        self.moments1 = self.parameters.clone(prefix="adam_m", init='zeros')
+        self.moments2 = self.parameters.clone(prefix="adam_v", init='zeros')
+        self.hyper_map = C.HyperMap()
+        self.beta1_power = Parameter(initializer(1, [1], mstype.float32), name="beta1_power")
+        self.beta2_power = Parameter(initializer(1, [1], mstype.float32), name="beta2_power")
+
+        self.reciprocal_scale = Tensor(1.0 / loss_scale, mstype.float32)
+        self.clip = clip
+
+    def construct(self, gradients):
+        lr = self.get_lr()
+        gradients = scale_grad(gradients, self.reciprocal_scale)
+        if self.clip:
+            gradients = C.clip_by_global_norm(gradients, 5.0, None)
+
+        beta1_power = self.beta1_power * self.beta1
+        self.beta1_power = beta1_power
+        beta2_power = self.beta2_power * self.beta2
+        self.beta2_power = beta2_power
+
+        if self.is_group:
+            if self.is_group_lr:
+                optim_result = self.hyper_map(F.partial(_adam_opt, beta1_power, beta2_power, \
+                                              self.beta1, self.beta2, self.eps),
+                                              lr, self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+            else:
+                optim_result = self.hyper_map(F.partial(_adam_opt, beta1_power, beta2_power, \
+                                              self.beta1, self.beta2, self.eps, lr),
+                                              self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+        else:
+            optim_result = self.hyper_map(F.partial(_adam_opt, beta1_power, beta2_power, self.beta1, self.beta2, \
+                                          self.eps, lr, self.weight_decay),
+                                          self.parameters, self.moments1, self.moments2,
+                                          gradients, self.decay_flags, self.optim_filter)
+        if self.use_parallel:
+            self.broadcast_params(optim_result)
+        return optim_result
+
+def paramter_group(network, weight_decay, no_weight_decay_filter, gc_flag):
+    """paramter_group"""
+    filter_len = len(no_weight_decay_filter)
+    if filter_len > 0:
+        decayed_params = []
+        no_decayed_params = []
+        for param in network.trainable_params():
+            if all([key not in param.name for key in no_weight_decay_filter]):
+                decayed_params.append(param)
+            else:
+                no_decayed_params.append(param)
+
+        group_params = [{'params': decayed_params, 'weight_decay': weight_decay, 'grad_centralization': gc_flag},
+                        {'params': no_decayed_params},
+                        {'order_params': network.trainable_params()}]
+    else:
+        group_params = [{'params': network.trainable_params(), \
+                        'weight_decay': weight_decay, 'grad_centralization': gc_flag},
+                        {'order_params': network.trainable_params()}]
+
+    return group_params
+
+def get_optimizer(optimizer_name, network, lrs, args):
+    no_weight_decay_filter = [x for x in args.no_weight_decay_filter.split(",") if len(x) > 0]
+    group_params = paramter_group(network, args.weight_decay, no_weight_decay_filter, bool(args.gc_flag))
+
+    if optimizer_name == 'adamw':
+        opt = AdamW(group_params, lrs, args.beta1, args.beta2, loss_scale=args.loss_scale)
+    else:
+        raise NotImplementedError
+
+    return opt, group_params
--- a/model_zoo/official/cv/vit/src/set_loglevel.py
+++ b/model_zoo/official/cv/vit/src/set_loglevel.py
@ -0,0 +1,26 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""set_loglevel"""
+
+import os
+
+def set_loglevel(level='info'):
+    print('set device global log level to {}'.format(level))
+    os.system('/usr/local/Ascend/driver/tools/msnpureport -g {}'.format(level))
+    os.system('/usr/local/Ascend/driver/tools/msnpureport -g {} -d 4'.format(level))
+    event_log_level = 'enable' if level in ['info', 'debug'] else 'disable'
+    print('set device event log level to {}'.format(event_log_level))
+    os.system('/usr/local/Ascend/driver/tools/msnpureport -e {}'.format(event_log_level))
+    os.system('/usr/local/Ascend/driver/tools/msnpureport -e {} -d 4'.format(event_log_level))
--- a/model_zoo/official/cv/vit/src/vit.py
+++ b/model_zoo/official/cv/vit/src/vit.py
@ -0,0 +1,507 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Vision Transformer implementation."""
+
+from importlib import import_module
+from easydict import EasyDict as edict
+import numpy as np
+
+import mindspore
+from mindspore.common.initializer import initializer
+from mindspore.common.parameter import Parameter
+from mindspore.nn import Cell, Dense, Dropout, SequentialCell
+from mindspore.ops import operations as P
+import mindspore.common.dtype as mstype
+from mindspore import Tensor
+
+MIN_NUM_PATCHES = 4
+
+class VitConfig:
+    """
+    VitConfig
+    """
+    def __init__(self, configs):
+        self.configs = configs
+
+        # network init
+        self.network_norm = mindspore.nn.LayerNorm((configs.normalized_shape,))
+        self.network_init = mindspore.common.initializer.Normal(sigma=1.0)
+        self.network_dropout_rate = 0.1
+        self.network_pool = 'cls'
+        self.network = ViT
+
+        # stem
+        self.stem_init = mindspore.common.initializer.XavierUniform()
+        self.stem = VitStem
+
+        # body
+        self.body_norm = mindspore.nn.LayerNorm
+        self.body_drop_path_rate = 0.1
+        self.body = Transformer
+
+        # body attention
+        self.attention_init = mindspore.common.initializer.XavierUniform()
+        self.attention_activation = mindspore.nn.Softmax()
+        self.attention_dropout_rate = 0.1
+        self.attention = Attention
+
+        # body feedforward
+        self.feedforward_init = mindspore.common.initializer.XavierUniform()
+        self.feedforward_activation = mindspore.nn.GELU()
+        self.feedforward_dropout_rate = 0.1
+        self.feedforward = FeedForward
+
+        # head
+        self.head = origin_head
+        self.head_init = mindspore.common.initializer.XavierUniform()
+        self.head_dropout_rate = 0.1
+        self.head_norm = mindspore.nn.LayerNorm((configs.normalized_shape,))
+        self.head_activation = mindspore.nn.GELU()
+
+
+class DropPath(Cell):
+    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
+    """
+
+    def __init__(self, drop_prob=None, seed=0):
+        super(DropPath, self).__init__()
+        self.keep_prob = 1 - drop_prob
+        seed = min(seed, 0) # always be 0
+        self.rand = P.UniformReal(seed=seed) # seed must be 0, if set to other value, it's not rand for multiple call
+        self.shape = P.Shape()
+        self.floor = P.Floor()
+        self.print = P.Print()
+
+    def construct(self, x):
+        if self.training:
+            x_shape = self.shape(x) # B N C
+            random_tensor = self.rand((x_shape[0], 1, 1))
+            random_tensor = random_tensor + self.keep_prob
+            random_tensor = self.floor(random_tensor)
+            x = x / self.keep_prob
+            x = x * random_tensor
+        return x
+
+
+class BatchDense(Cell):
+    """BatchDense module."""
+
+    def __init__(self, in_features, out_features, initialization, has_bias=True):
+        super().__init__()
+        self.out_features = out_features
+        self.dense = Dense(in_features, out_features, has_bias=has_bias)
+        self.dense.weight.set_data(initializer(initialization, [out_features, in_features]))
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        bs, seq_len, d_model = x.shape
+        out = self.reshape(x, (bs * seq_len, d_model))
+        out = self.dense(out)
+        out = self.reshape(out, (bs, seq_len, self.out_features))
+        return out
+
+
+class ResidualCell(Cell):
+    """Cell which implements x + f(x) function."""
+    def __init__(self, cell):
+        super().__init__()
+        self.cell = cell
+
+    def construct(self, x, **kwargs):
+        return self.cell(x, **kwargs) + x
+
+
+def pretrain_head(vit_config):
+    """Head for ViT pretraining."""
+    d_model = vit_config.configs.d_model
+    mlp_dim = vit_config.configs.mlp_dim
+    num_classes = vit_config.configs.num_classes
+
+    dropout_rate = vit_config.head_dropout_rate
+    initialization = vit_config.head_init
+    normalization = vit_config.head_norm
+    activation = vit_config.head_activation
+
+    dense1 = Dense(d_model, mlp_dim)
+    dense1.weight.set_data(initializer(initialization, [mlp_dim, d_model]))
+    dense2 = Dense(mlp_dim, num_classes)
+    dense2.weight.set_data(initializer(initialization, [num_classes, mlp_dim]))
+
+    return SequentialCell([
+        normalization,
+        dense1,
+        activation,
+        Dropout(keep_prob=(1. - dropout_rate)),
+        dense2])
+
+
+def origin_head(vit_config):
+    """Head for ViT pretraining."""
+    d_model = vit_config.configs.d_model
+    num_classes = vit_config.configs.num_classes
+    initialization = vit_config.head_init
+    dense = Dense(d_model, num_classes)
+    dense.weight.set_data(initializer(initialization, [num_classes, d_model]))
+    return SequentialCell([dense])
+
+
+class VitStem(Cell):
+    """Stem layer for ViT."""
+
+    def __init__(self, vit_config):
+        super().__init__()
+        d_model = vit_config.configs.d_model
+        patch_size = vit_config.configs.patch_size
+        image_size = vit_config.configs.image_size
+        initialization = vit_config.stem_init
+        channels = 3
+
+        assert image_size % patch_size == 0, 'Image dimensions must be divisible by the patch size.'
+        num_patches = (image_size // patch_size) ** 2
+        assert num_patches > MIN_NUM_PATCHES, f'your number of patches {num_patches} is too small'
+        patch_dim = channels * patch_size ** 2
+
+        self.patch_size = patch_size
+        self.reshape = P.Reshape()
+        self.transpose = P.Transpose()
+        self.patch_to_embedding = BatchDense(patch_dim, d_model, initialization, has_bias=True)
+
+    def construct(self, img):
+        p = self.patch_size
+        bs, channels, h, w = img.shape
+        x = self.reshape(img, (bs, channels, h // p, p, w // p, p))
+        x = self.transpose(x, (0, 2, 4, 1, 3, 5))
+        x = self.reshape(x, (bs, (h//p)*(w//p), channels*p*p))
+        x = self.patch_to_embedding(x)
+        return x
+
+
+class ViT(Cell):
+    """Vision Transformer implementation."""
+
+    def __init__(self, vit_config):
+        super().__init__()
+
+        d_model = vit_config.configs.d_model
+        patch_size = vit_config.configs.patch_size
+        image_size = vit_config.configs.image_size
+
+        initialization = vit_config.network_init
+        pool = vit_config.network_pool
+        dropout_rate = vit_config.network_dropout_rate
+        norm = vit_config.network_norm
+
+        stem = vit_config.stem(vit_config)
+        body = vit_config.body(vit_config)
+        head = vit_config.head(vit_config)
+
+        assert pool in {'cls', 'mean'}, 'pool type must be either cls or mean'
+        num_patches = (image_size // patch_size) ** 2
+
+        if pool == "cls":
+            self.cls_token = Parameter(initializer(initialization, (1, 1, d_model)),
+                                       name='cls', requires_grad=True)
+            self.pos_embedding = Parameter(initializer(initialization, (1, num_patches + 1, d_model)),
+                                           name='pos_embedding', requires_grad=True)
+            self.tile = P.Tile()
+            self.cat_1 = P.Concat(axis=1)
+        else:
+            self.pos_embedding = Parameter(initializer(initialization, (1, num_patches, d_model)),
+                                           name='pos_embedding', requires_grad=True)
+            self.mean = P.ReduceMean(keep_dims=False)
+        self.pool = pool
+
+        self.cast = P.Cast()
+        self.dropout = Dropout(keep_prob=(1. - dropout_rate))
+        self.stem = stem
+        self.body = body
+        self.head = head
+        self.norm = norm
+
+    def construct(self, img):
+        x = self.stem(img)
+        bs, seq_len, _ = x.shape
+
+        if self.pool == "cls":
+            cls_tokens = self.tile(self.cls_token, (bs, 1, 1))
+            x = self.cat_1((cls_tokens, x)) # now x has shape = (bs, seq_len+1, d)
+            x += self.pos_embedding[:, :(seq_len + 1)]
+        else:
+            x += self.pos_embedding[:, :seq_len]
+
+        y = self.cast(x, mstype.float32)
+        y = self.dropout(y)
+        x = self.cast(y, x.dtype)
+
+        x = self.body(x)
+
+        if self.norm is not None:
+            x = self.norm(x)
+
+        if self.pool == "cls":
+            x = x[:, 0]
+        else:
+            x = self.mean(x, (-2,))
+
+        return self.head(x)
+
+
+class Attention(Cell):
+    """Attention layer implementation."""
+
+    def __init__(self, vit_config):
+        super().__init__()
+        d_model = vit_config.configs.d_model
+        dim_head = vit_config.configs.dim_head
+        heads = vit_config.configs.heads
+
+        initialization = vit_config.attention_init
+        activation = vit_config.attention_activation
+        dropout_rate = vit_config.attention_dropout_rate
+
+        inner_dim = heads * dim_head
+        self.dim_head = dim_head
+        self.heads = heads
+        self.scale = Tensor([dim_head ** -0.5])
+
+        self.to_q = Dense(d_model, inner_dim, has_bias=True)
+        self.to_q.weight.set_data(initializer(initialization, [inner_dim, d_model]))
+        self.to_k = Dense(d_model, inner_dim, has_bias=True)
+        self.to_k.weight.set_data(initializer(initialization, [inner_dim, d_model]))
+        self.to_v = Dense(d_model, inner_dim, has_bias=True)
+        self.to_v.weight.set_data(initializer(initialization, [inner_dim, d_model]))
+
+        self.to_out = Dense(inner_dim, d_model, has_bias=True)
+        self.to_out.weight.set_data(initializer(initialization, [inner_dim, d_model]))
+        self.dropout = Dropout(1 - dropout_rate)
+
+        self.activation = activation
+
+        #auxiliary functions
+        self.reshape = P.Reshape()
+        self.transpose = P.Transpose()
+        self.cast = P.Cast()
+        self.mul = P.Mul()
+        self.q_matmul_k = P.BatchMatMul(transpose_b=True)
+        self.attn_matmul_v = P.BatchMatMul()
+        self.softmax_nz = True
+
+    def construct(self, x):
+        '''x size - BxNxd_model'''
+        bs, seq_len, d_model, h, d = x.shape[0], x.shape[1], x.shape[2], self.heads, self.dim_head
+
+        x_2d = self.reshape(x, (-1, d_model))
+        q, k, v = self.to_q(x_2d), self.to_k(x_2d), self.to_v(x_2d)
+
+        if self.softmax_nz:
+            q = self.reshape(q, (bs, seq_len, h, d))
+            q = self.transpose(q, (0, 2, 1, 3))
+            q = self.cast(q, mstype.float32)
+            q = self.mul(q, self.scale)
+
+            k = self.reshape(k, (bs, seq_len, h, d))
+            k = self.transpose(k, (0, 2, 1, 3))
+            v = self.reshape(v, (bs, seq_len, h, d))
+            v = self.transpose(v, (0, 2, 1, 3))
+
+            q = self.cast(q, k.dtype)
+            attn_scores = self.q_matmul_k(q, k) #bs x h x seq_len x seq_len
+            attn_scores = self.cast(attn_scores, x.dtype)
+            attn_scores = self.activation(attn_scores)
+        else:
+            q = self.reshape(q, (bs, seq_len, h, d))
+            q = self.transpose(q, (0, 2, 1, 3))
+            k = self.reshape(k, (bs, seq_len, h, d))
+            k = self.transpose(k, (0, 2, 1, 3))
+            v = self.reshape(v, (bs, seq_len, h, d))
+            v = self.transpose(v, (0, 2, 1, 3))
+
+            attn_scores = self.q_matmul_k(q, k) #bs x h x seq_len x seq_len
+            attn_scores = self.cast(attn_scores, mstype.float32)
+            attn_scores = self.mul(attn_scores, self.scale)
+            attn_scores = self.cast(attn_scores, x.dtype)
+            attn_scores = self.activation(attn_scores)
+
+        out = self.attn_matmul_v(attn_scores, v) #bs x h x seq_len x dim_head
+        out = self.transpose(out, (0, 2, 1, 3))
+        out = self.reshape(out, (bs*seq_len, h*d))
+        out = self.to_out(out)
+        out = self.reshape(out, (bs, seq_len, d_model))
+        #out = self.dropout(out)
+        y = self.cast(out, mstype.float32)
+        y = self.dropout(y)
+        out = self.cast(y, out.dtype)
+        #out = self.reshape(out, (bs, seq_len, d_model))
+        return out
+
+
+class FeedForward(Cell):
+    """FeedForward layer implementation."""
+
+    def __init__(self, vit_config):
+        super().__init__()
+
+        d_model = vit_config.configs.d_model
+        hidden_dim = vit_config.configs.mlp_dim
+
+        initialization = vit_config.feedforward_init
+        activation = vit_config.feedforward_activation
+        dropout_rate = vit_config.feedforward_dropout_rate
+
+        self.ff1 = BatchDense(d_model, hidden_dim, initialization)
+        self.activation = activation
+        self.dropout = Dropout(keep_prob=1.-dropout_rate)
+        self.ff2 = BatchDense(hidden_dim, d_model, initialization)
+        self.cast = P.Cast()
+
+    def construct(self, x):
+        y = self.ff1(x)
+        y = self.cast(y, mstype.float32)
+        y = self.activation(y)
+        y = self.dropout(y)
+        y = self.cast(y, x.dtype)
+        y = self.ff2(y)
+        y = self.cast(y, mstype.float32)
+        y = self.dropout(y)
+        y = self.cast(y, x.dtype)
+        return y
+
+
+class Transformer(Cell):
+    """Transformer implementation."""
+
+    def __init__(self, vit_config):
+        super().__init__()
+
+        depth = vit_config.configs.depth
+        drop_path_rate = vit_config.body_drop_path_rate
+
+        dpr = [x.item() for x in np.linspace(0, drop_path_rate, depth)]
+        att_seeds = [np.random.randint(1024) for _ in range(depth)]
+        mlp_seeds = [np.random.randint(1024) for _ in range(depth)]
+
+        layers = []
+        for i in range(depth):
+            normalization = vit_config.body_norm((vit_config.configs.normalized_shape,))
+            normalization2 = vit_config.body_norm((vit_config.configs.normalized_shape,))
+            attention = vit_config.attention(vit_config)
+            feedforward = vit_config.feedforward(vit_config)
+
+            if drop_path_rate > 0:
+                layers.append(
+                    SequentialCell([
+                        ResidualCell(SequentialCell([normalization,
+                                                     attention,
+                                                     DropPath(dpr[i], att_seeds[i])])),
+                        ResidualCell(SequentialCell([normalization2,
+                                                     feedforward,
+                                                     DropPath(dpr[i], mlp_seeds[i])]))
+                    ])
+                )
+            else:
+                layers.append(
+                    SequentialCell([
+                        ResidualCell(SequentialCell([normalization,
+                                                     attention])),
+                        ResidualCell(SequentialCell([normalization2,
+                                                     feedforward]))
+                    ])
+                )
+
+        self.layers = SequentialCell(layers)
+
+    def construct(self, x):
+        return self.layers(x)
+
+
+def load_function(func_name):
+    """Load function using its name."""
+    modules = func_name.split(".")
+    if len(modules) > 1:
+        module_path = ".".join(modules[:-1])
+        name = modules[-1]
+        module = import_module(module_path)
+        return getattr(module, name)
+    return func_name
+
+
+vit_cfg = edict({
+    'd_model': 768,
+    'depth': 12,
+    'heads': 12,
+    'mlp_dim': 3072,
+    'dim_head': 64,
+    'patch_size': 32,
+    'normalized_shape': 768,
+    'image_size': 224,
+    'num_classes': 1001,
+})
+
+
+def vit_base_patch16(args):
+    """vit_base_patch16"""
+    vit_cfg.d_model = 768
+    vit_cfg.depth = 12
+    vit_cfg.heads = 12
+    vit_cfg.mlp_dim = 3072
+    vit_cfg.dim_head = vit_cfg.d_model // vit_cfg.heads
+    vit_cfg.patch_size = 16
+    vit_cfg.normalized_shape = vit_cfg.d_model
+    vit_cfg.image_size = args.train_image_size
+    vit_cfg.num_classes = args.class_num
+
+    if args.vit_config_path != '':
+        print("get vit_config_path")
+        vit_config = load_function(args.vit_config_path)(vit_cfg)
+    else:
+        print("get default_vit_cfg")
+        vit_config = VitConfig(vit_cfg)
+
+    model = vit_config.network(vit_config)
+    return model
+
+
+def vit_base_patch32(args):
+    """vit_base_patch32"""
+    vit_cfg.d_model = 768
+    vit_cfg.depth = 12
+    vit_cfg.heads = 12
+    vit_cfg.mlp_dim = 3072
+    vit_cfg.dim_head = vit_cfg.d_model // vit_cfg.heads
+    vit_cfg.patch_size = 32
+    vit_cfg.normalized_shape = vit_cfg.d_model
+    vit_cfg.image_size = args.train_image_size
+    vit_cfg.num_classes = args.class_num
+
+    if args.vit_config_path != '':
+        print("get vit_config_path")
+        vit_config = load_function(args.vit_config_path)(vit_cfg)
+    else:
+        print("get default_vit_cfg")
+        vit_config = VitConfig(vit_cfg)
+
+    model = vit_config.network(vit_config)
+
+    return model
+
+def get_network(backbone_name, args):
+    """get_network"""
+    if backbone_name == 'vit_base_patch32':
+        backbone = vit_base_patch32(args=args)
+    elif backbone_name == 'vit_base_patch16':
+        backbone = vit_base_patch16(args=args)
+    else:
+        raise NotImplementedError
+    return backbone
--- a/model_zoo/official/cv/vit/train.py
+++ b/model_zoo/official/cv/vit/train.py
@ -0,0 +1,242 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""training script"""
+
+import os
+import time
+import socket
+import numpy as np
+
+from mindspore import context
+from mindspore import Tensor
+from mindspore.train.model import Model, ParallelMode
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
+from mindspore.communication.management import init
+from mindspore.profiler.profiling import Profiler
+from mindspore.train.serialization import load_checkpoint
+import mindspore.dataset as ds
+
+from src.vit import get_network
+from src.dataset import get_dataset
+from src.cross_entropy import get_loss
+from src.optimizer import get_optimizer
+from src.lr_generator import get_lr
+from src.eval_engine import get_eval_engine
+from src.callback import StateMonitor
+from src.logging import get_logger
+
+from src.model_utils.config import config
+from src.model_utils.moxing_adapter import moxing_wrapper
+
+
+try:
+    os.environ['MINDSPORE_HCCL_CONFIG_PATH'] = os.getenv('RANK_TABLE_FILE')
+
+    device_id = int(os.getenv('DEVICE_ID'))   # 0 ~ 7
+    local_rank = int(os.getenv('RANK_ID'))    # local_rank
+    device_num = int(os.getenv('RANK_SIZE'))  # world_size
+    print("distribute training")
+except TypeError:
+    device_id = 0   # 0 ~ 7
+    local_rank = 0    # local_rank
+    device_num = 1  # world_size
+    print("standalone training")
+
+def add_static_args(args):
+    """add_static_args"""
+    args.weight_decay = float(args.weight_decay)
+
+    args.eval_engine = 'imagenet'
+    args.split_point = 0.4
+    args.poly_power = 2
+    args.aux_factor = 0.4
+    args.seed = 1
+    args.auto_tune = 0
+
+    if args.eval_offset < 0:
+        args.eval_offset = args.max_epoch % args.eval_interval
+
+    args.device_id = device_id
+    args.local_rank = local_rank
+    args.device_num = device_num
+    args.dataset_name = 'imagenet'
+
+    return args
+
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    start_t = time.time()
+
+    val_file = os.path.join(config.data_path, 'val/imagenet_val.tar')
+    train_file = os.path.join(config.data_path, 'train/imagenet_train.tar')
+    tar_files = [val_file, train_file]
+
+    print('tar_files:{}'.format(tar_files))
+    for tar_file in tar_files:
+        if os.path.exists(tar_file):
+            t1 = time.time()
+            tar_dir = os.path.dirname(tar_file)
+            print('cd {}; tar -xvf {} > /dev/null 2>&1'.format(tar_dir, tar_file))
+            os.system('cd {}; tar -xvf {} > /dev/null 2>&1'.format(tar_dir, tar_file))
+            t2 = time.time()
+            print('uncompress, time used={:.2f}s'.format(t2 - t1))
+            os.system('cd {}; rm -rf {}'.format(tar_dir, tar_file))
+        else:
+            print('file no exists:', tar_file)
+
+    end_t = time.time()
+    print('tar cost time {:.2f} sec'.format(end_t-start_t))
+
+
+@moxing_wrapper(pre_process=modelarts_pre_process)
+def train_net():
+    """train_net"""
+    args = add_static_args(config)
+    np.random.seed(args.seed)
+    args.logger = get_logger(args.save_checkpoint_path, rank=local_rank)
+
+    context.set_context(device_id=device_id,
+                        mode=context.GRAPH_MODE,
+                        device_target="Ascend",
+                        save_graphs=False)
+
+    if args.auto_tune:
+        context.set_context(auto_tune_mode='GA')
+    elif args.device_num == 1:
+        pass
+    else:
+        context.set_auto_parallel_context(device_num=device_num,
+                                          parallel_mode=ParallelMode.DATA_PARALLEL,
+                                          gradients_mean=True)
+
+    if args.open_profiler:
+        profiler = Profiler(output_path="data_{}".format(local_rank))
+
+    # init the distribute env
+    if not args.auto_tune and args.device_num > 1:
+        init()
+
+    # network
+    net = get_network(backbone_name=args.backbone, args=args)
+
+    # set grad allreduce split point
+    parameters = [param for param in net.trainable_params()]
+    parameter_len = len(parameters)
+    if args.split_point > 0:
+        print("split_point={}".format(args.split_point))
+        split_parameter_index = [int(args.split_point*parameter_len),]
+        parameter_indices = 1
+        for i in range(parameter_len):
+            if i in split_parameter_index:
+                parameter_indices += 1
+            parameters[i].comm_fusion = parameter_indices
+    else:
+        print("warning!!!, no split point")
+
+    if os.path.isfile(args.pretrained):
+        load_checkpoint(args.pretrained, net, strict_load=False)
+
+    # loss
+    if not args.use_label_smooth:
+        args.label_smooth_factor = 0.0
+    loss = get_loss(loss_name=args.loss_name, args=args)
+
+    # train dataset
+    epoch_size = args.max_epoch
+    dataset = get_dataset(dataset_name=args.dataset_name,
+                          do_train=True,
+                          dataset_path=args.dataset_path,
+                          args=args)
+    ds.config.set_seed(args.seed)
+    step_size = dataset.get_dataset_size()
+    args.steps_per_epoch = step_size
+
+    # evaluation dataset
+    eval_dataset = get_dataset(dataset_name=args.dataset_name,
+                               do_train=False,
+                               dataset_path=args.eval_path,
+                               args=args)
+
+    # evaluation engine
+    if args.auto_tune or args.open_profiler or eval_dataset is None or args.device_num == 1:
+        args.eval_engine = ''
+    eval_engine = get_eval_engine(args.eval_engine, net, eval_dataset, args)
+
+    # loss scale
+    loss_scale = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
+
+    # learning rate
+    lr_array = get_lr(global_step=0, lr_init=args.lr_init, lr_end=args.lr_min, lr_max=args.lr_max,
+                      warmup_epochs=args.warmup_epochs, total_epochs=epoch_size, steps_per_epoch=step_size,
+                      lr_decay_mode=args.lr_decay_mode, poly_power=args.poly_power)
+    lr = Tensor(lr_array)
+
+    # optimizer, group_params used in grad freeze
+    opt, _ = get_optimizer(optimizer_name=args.opt,
+                           network=net,
+                           lrs=lr,
+                           args=args)
+
+    # model
+    model = Model(net, loss_fn=loss, optimizer=opt,
+                  metrics=eval_engine.metric, eval_network=eval_engine.eval_network,
+                  loss_scale_manager=loss_scale, amp_level="O3")
+    eval_engine.set_model(model)
+    args.logger.save_args(args)
+
+    t0 = time.time()
+    # equal to model._init(dataset, sink_size=step_size)
+    eval_engine.compile(sink_size=step_size)
+
+    t1 = time.time()
+    args.logger.info('compile time used={:.2f}s'.format(t1 - t0))
+
+    # callbacks
+    state_cb = StateMonitor(data_size=step_size,
+                            tot_batch_size=args.batch_size * device_num,
+                            lrs=lr_array,
+                            eval_interval=args.eval_interval,
+                            eval_offset=args.eval_offset,
+                            eval_engine=eval_engine,
+                            logger=args.logger.info)
+
+    cb = [state_cb,]
+    if args.save_checkpoint and local_rank == 0:
+        config_ck = CheckpointConfig(save_checkpoint_steps=args.save_checkpoint_epochs*step_size,
+                                     keep_checkpoint_max=args.keep_checkpoint_max,
+                                     async_save=True)
+        ckpt_cb = ModelCheckpoint(prefix=args.backbone, directory=args.save_checkpoint_path, config=config_ck)
+        cb += [ckpt_cb]
+
+    t0 = time.time()
+    model.train(epoch_size, dataset, callbacks=cb, sink_size=step_size)
+    t1 = time.time()
+    args.logger.info('training time used={:.2f}s'.format(t1 - t0))
+    last_metric = 'last_metric[{}]'.format(state_cb.best_acc)
+    args.logger.info(last_metric)
+
+    is_cloud = args.enable_modelarts
+    if is_cloud:
+        ip = os.getenv("BATCH_TASK_CURRENT_HOST_IP")
+    else:
+        ip = socket.gethostbyname(socket.gethostname())
+    args.logger.info('ip[{}], mean_fps[{:.2f}]'.format(ip, state_cb.mean_fps))
+
+    if args.open_profiler:
+        profiler.analyse()
+
+if __name__ == '__main__':
+    train_net()