diff --git a/configs/README.md b/configs/README.md
new file mode 100644
index 0000000..014c4a1
--- /dev/null
+++ b/configs/README.md
@@ -0,0 +1,68 @@
+### File Structure and Naming
+This folder contains training recipes and model readme files for each model. The folder structure and naming rule of model configurations are as follows.
+
+
+```
+ ├── configs
+ ├── model_a // model name in lower case with _ seperator
+ │ ├─ model_a_small_ascend.yaml // training recipe denated as {model_name}_{specification}_{hardware}.yaml
+ | ├─ model_a_large_gpu.yaml
+ │ ├─ README.md //readme file containing performance results and pretrained weight urls
+ │ └─ README_CN.md //readme file in Chinese
+ ├── model_b
+ │ ├─ model_b_32_ascend.yaml
+ | ├─ model_l_16_ascend.yaml
+ │ ├─ README.md
+ │ └─ README_CN.md
+ ├── README.md //this file
+```
+
+### Model Readme Writing Guideline
+The model readme file in each sub-folder provides the introduction, reproduced results, and running guideline for each model.
+
+Please follow the outline structure and **table format** shown in [densenet/README.md](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/README.md) when contributing your models :)
+
+#### Table Format
+
+
+
+Illustration:
+- Model: model name in lower case with _ seperator.
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validatoin set of ImageNet-1K. Keep 2 digits after the decimal point.
+- Params (M): # of model parameters in millions (10^6). Keep **2 digits** after the decimal point
+- Recipe: Training recipe/configuration linked to a yaml config file.
+- Download: url of the pretrained model weights
+
+### Model Checkpoint Format
+ The checkpoint (i.e., model weight) name should follow this format: **{model_name}_{specification}-{sha256sum}.ckpt**, e.g., `poolformer_s12-5be5c4e4.ckpt`.
+
+ You can run the following command and take the first 8 characters of the computing result as the sha256sum value in the checkpoint name.
+
+ ```shell
+ sha256sum your_model.ckpt
+ ```
+
+
+#### Training Script Format
+
+For consistency, it is recommended to provide distributed training commands based on `mpirun -n {num_devices} python train.py`, instead of using shell script such as `distrubuted_train.sh`.
+
+ ```shell
+ # standalone training on a gpu or ascend device
+ python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False
+
+ # distributed training on gpu or ascend divices
+ mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
+
+ ```
+ > If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+#### URL and Hyperlink Format
+Please use **absolute path** in the hyperlink or url for linking the target resource in the readme file and table.
diff --git a/configs/bit/README.md b/configs/bit/README.md
new file mode 100644
index 0000000..e0283d0
--- /dev/null
+++ b/configs/bit/README.md
@@ -0,0 +1,91 @@
+# BigTransfer
+
+> [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370)
+
+## Introduction
+
+Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision.
+Big Transfer (BiT) can achieve strong performance on more than 20 data sets by combining some carefully selected components and using simple heuristic
+methods for transmission. The components distilled by BiT for training models that transfer well are: 1) Big datasets: as the size of the dataset increases,
+the optimal performance of the BIT model will also increase. 2) Big architectures: In order to make full use of large datasets, a large enough architecture
+is required. 3) Long pre-training time: Pretraining on a larger dataset requires more training epoch and training time. 4) GroupNorm and Weight Standardisation:
+BiT use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since BatchNorm performs worse when the number of images on each accelerator is
+too low. 5) With BiT fine-tuning, good performance can be achieved even if there are only a few examples of each type on natural images.[[1, 2](#References)]
+
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+
+[1] Kolesnikov A, Beyer L, Zhai X, et al. Big transfer (bit): General visual representation learning[C]//European conference on computer vision. Springer, Cham, 2020: 491-507.
+
+[2] BigTransfer (BiT): State-of-the-art transfer learning for computer vision, https://blog.tensorflow.org/2020/05/bigtransfer-bit-state-of-art-transfer-learning-computer-vision.html
diff --git a/configs/bit/bit_resnet101_ascend.yaml b/configs/bit/bit_resnet101_ascend.yaml
new file mode 100644
index 0000000..3a6abf9
--- /dev/null
+++ b/configs/bit/bit_resnet101_ascend.yaml
@@ -0,0 +1,47 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 16
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+crop_pct: 0.875
+
+# model
+model: 'BiTresnet101'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 90
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'multi_step_decay'
+lr: 0.06
+decay_rate: 0.5
+multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
+
+
+# optimizer
+opt: 'sgd'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
diff --git a/configs/bit/bit_resnet50_ascend.yaml b/configs/bit/bit_resnet50_ascend.yaml
new file mode 100644
index 0000000..75a61ca
--- /dev/null
+++ b/configs/bit/bit_resnet50_ascend.yaml
@@ -0,0 +1,46 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+crop_pct: 0.875
+
+# model
+model: 'BiTresnet50'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 90
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'multi_step_decay'
+lr: 0.06
+decay_rate: 0.5
+multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
+
+# optimizer
+opt: 'sgd'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
diff --git a/configs/bit/bit_resnet50x3_ascend.yaml b/configs/bit/bit_resnet50x3_ascend.yaml
new file mode 100644
index 0000000..487c8ee
--- /dev/null
+++ b/configs/bit/bit_resnet50x3_ascend.yaml
@@ -0,0 +1,49 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+mixup: 0.2
+crop_pct: 0.875
+auto_augment: "randaug-m7-mstd0.5"
+
+# model
+model: 'BiTresnet50x3'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 90
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler config
+warmup_epochs: 1
+scheduler: 'multi_step_decay'
+lr: 0.09
+decay_rate: 0.4
+multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
+
+# optimizer
+opt: 'sgd'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
diff --git a/configs/coat/README.md b/configs/coat/README.md
new file mode 100644
index 0000000..76adf0a
--- /dev/null
+++ b/configs/coat/README.md
@@ -0,0 +1,80 @@
+# CoaT
+
+> [Co-Scale Conv-Attentional Image Transformers](https://arxiv.org/abs/2104.06399v2)
+
+## Introduction
+
+Co-Scale Conv-Attentional Image Transformer (CoaT) is a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other. Second, the conv-attentional mechanism is designed by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities.
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 11976-11986.
diff --git a/configs/convnext/convnext_base_ascend.yaml b/configs/convnext/convnext_base_ascend.yaml
new file mode 100644
index 0000000..48b4065
--- /dev/null
+++ b/configs/convnext/convnext_base_ascend.yaml
@@ -0,0 +1,58 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 16
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: 'random'
+hflip: 0.5
+interpolation: 'bicubic'
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+re_prob: 0.25
+crop_pct: 0.95
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: 'convnext_base'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 450
+drop_path_rate: 0.5
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'ce'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.002
+min_lr: 0.0000003
+decay_epochs: 430
+warmup_factor: 0.0000175
+warmup_epochs: 20
+
+# optimizer
+opt: 'adamw'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: 'auto'
+use_nesterov: False
diff --git a/configs/convnext/convnext_small_ascend.yaml b/configs/convnext/convnext_small_ascend.yaml
new file mode 100644
index 0000000..09c4f4c
--- /dev/null
+++ b/configs/convnext/convnext_small_ascend.yaml
@@ -0,0 +1,58 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 16
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: 'random'
+hflip: 0.5
+interpolation: 'bicubic'
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+re_prob: 0.25
+crop_pct: 0.95
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: 'convnext_small'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 450
+drop_path_rate: 0.4
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'ce'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.002
+min_lr: 0.0000003
+decay_epochs: 430
+warmup_factor: 0.0000175
+warmup_epochs: 20
+
+# optimizer
+opt: 'adamw'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: 'auto'
+use_nesterov: False
diff --git a/configs/convnext/convnext_tiny_ascend.yaml b/configs/convnext/convnext_tiny_ascend.yaml
new file mode 100644
index 0000000..359cbcb
--- /dev/null
+++ b/configs/convnext/convnext_tiny_ascend.yaml
@@ -0,0 +1,59 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 16
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 16
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: 'random'
+hflip: 0.5
+interpolation: 'bicubic'
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+re_prob: 0.25
+crop_pct: 0.95
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: 'convnext_tiny'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 450
+drop_path_rate: 0.1
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'ce'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.002
+min_lr: 0.0000003
+decay_epochs: 430
+warmup_factor: 0.0000175
+warmup_epochs: 20
+
+# optimizer
+opt: 'adamw'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: 'dynamic'
+drop_overflow_update: True
+use_nesterov: False
diff --git a/configs/crossvit/README.md b/configs/crossvit/README.md
new file mode 100644
index 0000000..3708d5c
--- /dev/null
+++ b/configs/crossvit/README.md
@@ -0,0 +1,90 @@
+# Crossvit
+> [CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification](https://arxiv.org/abs/2103.14899)
+
+## Introduction
+
+CrossViT is a type of vision transformer that uses a dual-branch architecture to extract multi-scale feature representations for image classification. The architecture combines image patches (i.e. tokens in a transformer) of different sizes to produce stronger visual features for image classification. It processes small and large patch tokens with two separate branches of different computational complexities and these tokens are fused together multiple times to complement each other.
+
+Fusion is achieved by an efficient cross-attention module, in which each transformer branch creates a non-patch token as an agent to exchange information with the other branch by attention. This allows for linear-time generation of the attention map in fusion instead of quadratic time otherwise.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/crossvit/crossvit15_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+
+[1] Chun-Fu Chen, Quanfu Fan, Rameswar Panda. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
diff --git a/configs/crossvit/crossvit_15_ascend.yaml b/configs/crossvit/crossvit_15_ascend.yaml
new file mode 100644
index 0000000..739be06
--- /dev/null
+++ b/configs/crossvit/crossvit_15_ascend.yaml
@@ -0,0 +1,65 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 320
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [ 0.08, 1.0 ]
+ratio: [ 0.75, 1.333 ]
+hflip: 0.5
+vflip: 0.
+interpolation: 'bicubic'
+auto_augment: randaug-m9-mstd0.5-inc1
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+color_jitter: 0.4
+crop_pct: 0.935
+ema: True
+
+# model
+model: 'crossvit15'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: 'O3'
+drop_path_rate: 0.1
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+lr: 0.0009
+min_lr: 0.00001
+warmup_epochs: 30
+decay_epochs: 270
+decay_rate: 0.1
+num_cycles: 2
+cycle_decay: 1
+
+# optimizer
+opt: 'adamw'
+weight_decay: 0.05
+filter_bias_and_bn: True
+loss_scale: 512
+use_nesterov: False
+eps: 1e-8
+
+# Scheduler parameters
+lr_epoch_stair: True
diff --git a/configs/crossvit/crossvit_18_ascend.yaml b/configs/crossvit/crossvit_18_ascend.yaml
new file mode 100644
index 0000000..ba874d5
--- /dev/null
+++ b/configs/crossvit/crossvit_18_ascend.yaml
@@ -0,0 +1,63 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [ 0.08, 1.0 ]
+ratio: [ 0.75, 1.333 ]
+hflip: 0.5
+vflip: 0.
+interpolation: 'bicubic'
+auto_augment: randaug-m9-mstd0.5-inc1
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+color_jitter: 0.4
+crop_pct: 0.935
+ema: True
+
+# model
+model: 'crossvit18'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O3'
+drop_path_rate: 0.1
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+lr: 0.004
+min_lr: 0.00001
+warmup_epochs: 30
+decay_epochs: 270
+decay_rate: 0.1
+
+# optimizer
+opt: 'adamw'
+weight_decay: 0.05
+filter_bias_and_bn: True
+loss_scale: 1024
+use_nesterov: False
+eps: 1e-8
+
+# Scheduler parameters
+lr_epoch_stair: True
diff --git a/configs/crossvit/crossvit_9_ascend.yaml b/configs/crossvit/crossvit_9_ascend.yaml
new file mode 100644
index 0000000..612dc5b
--- /dev/null
+++ b/configs/crossvit/crossvit_9_ascend.yaml
@@ -0,0 +1,63 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+image_resize: 240
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+vflip: 0.
+interpolation: 'bicubic'
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+color_jitter: 0.4
+crop_pct: 0.935
+
+# model
+model: 'crossvit9'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O2'
+drop_path_rate: 0.1
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.0011
+min_lr: 0.00001
+warmup_epochs: 30
+decay_epochs: 270
+decay_rate: 0.1
+
+# optimizer
+opt: 'adamw'
+weight_decay: 0.05
+filter_bias_and_bn: True
+loss_scale_type: 'dynamic'
+drop_overflow_update: True
+use_nesterov: False
+eps: 1e-8
+
+# Scheduler parameters
+lr_epoch_stair: True
diff --git a/configs/densenet/README.md b/configs/densenet/README.md
new file mode 100644
index 0000000..5a7666a
--- /dev/null
+++ b/configs/densenet/README.md
@@ -0,0 +1,107 @@
+# DenseNet
+
+> [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
+
+## Introduction
+
+
+
+Recent work has shown that convolutional networks can be substantially deeper, more accurate, and more efficient to train if
+they contain shorter connections between layers close to the input and those close to the output. Dense Convolutional
+Network (DenseNet) is introduced based on this observation, which connects each layer to every other layer in a
+feed-forward fashion. Whereas traditional convolutional networks with $L$ layers have $L$ connections-one between each
+layer and its subsequent layer, DenseNet has $\frac{L(L+1)}{2}$ direct connections. For each layer, the feature maps
+of all preceding layers are used as inputs, and their feature maps are used as inputs into all subsequent layers.
+DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature
+propagation, encourage feature reuse, and substantially reduce the number of parameters.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 64 python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+
+[1] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.
diff --git a/configs/efficientnet/efficientnet_b0_ascend.yaml b/configs/efficientnet/efficientnet_b0_ascend.yaml
new file mode 100644
index 0000000..d941073
--- /dev/null
+++ b/configs/efficientnet/efficientnet_b0_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bicubic'
+crop_pct: 0.875
+color_jitter: [0.4, 0.4, 0.4]
+auto_augment: 'autoaug'
+
+# model
+model: 'efficientnet_b0'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 450
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 1e-10
+lr: 0.128
+warmup_epochs: 5
+decay_epochs: 445
+
+# optimizer
+opt: 'rmsprop'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 1e-5
+loss_scale_type: 'dynamic'
+drop_overflow_update: True
+use_nesterov: False
+eps: 1e-3
diff --git a/configs/googlenet/README.md b/configs/googlenet/README.md
new file mode 100644
index 0000000..133585a
--- /dev/null
+++ b/configs/googlenet/README.md
@@ -0,0 +1,89 @@
+# GoogLeNet
+> [GoogLeNet: Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842)
+
+## Introduction
+
+GoogLeNet is a new deep learning structure proposed by Christian Szegedy in 2014. Prior to this, AlexNet, VGG and other
+structures achieved better training effects by increasing the depth (number of layers) of the network, but the increase
+in the number of layers It will bring many negative effects, such as overfit, gradient disappearance, gradient
+explosion, etc. The proposal of inception improves the training results from another perspective: it can use computing
+resources more efficiently, and can extract more features under the same amount of computing, thereby improving the
+training results.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
diff --git a/configs/googlenet/googlenet_ascend.yaml b/configs/googlenet/googlenet_ascend.yaml
new file mode 100644
index 0000000..ed7416a
--- /dev/null
+++ b/configs/googlenet/googlenet_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'googlenet'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 150
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.045
+min_lr: 0.0
+decay_epochs: 145
+warmup_epochs: 5
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/hrnet/README.md b/configs/hrnet/README.md
new file mode 100644
index 0000000..d6d2cee
--- /dev/null
+++ b/configs/hrnet/README.md
@@ -0,0 +1,100 @@
+# HRNet
+
+
+> [Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919)
+
+## Introduction
+
+
+High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, the proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. It shows the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems.
+
+
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+
+## Quick Start
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+
+[1] Jingdong Wang, Ke Sun, Tianheng Cheng, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. arXiv preprint arXiv:1908.07919, 2019.
diff --git a/configs/hrnet/hrnet_w32_ascend.yaml b/configs/hrnet/hrnet_w32_ascend.yaml
new file mode 100644
index 0000000..2e7f3f6
--- /dev/null
+++ b/configs/hrnet/hrnet_w32_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bilinear"
+auto_augment: "randaug-m7-mstd0.5"
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+
+# model
+model: "hrnet_w32"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 5
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.00001
+lr: 0.001
+warmup_epochs: 20
+decay_epochs: 280
+
+# optimizer
+opt: 'adamw'
+weight_decay: 0.05
+loss_scale: 1024
+filter_bias_and_bn: True
diff --git a/configs/hrnet/hrnet_w48_ascend.yaml b/configs/hrnet/hrnet_w48_ascend.yaml
new file mode 100644
index 0000000..69486cb
--- /dev/null
+++ b/configs/hrnet/hrnet_w48_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bilinear"
+auto_augment: "randaug-m7-mstd0.5"
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+
+# model
+model: "hrnet_w48"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 5
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.00001
+lr: 0.001
+warmup_epochs: 20
+decay_epochs: 280
+
+# optimizer
+opt: 'adamw'
+weight_decay: 0.05
+loss_scale: 1024
+filter_bias_and_bn: True
diff --git a/configs/inception_v3/README.md b/configs/inception_v3/README.md
new file mode 100644
index 0000000..1b0ad22
--- /dev/null
+++ b/configs/inception_v3/README.md
@@ -0,0 +1,90 @@
+# InceptionV3
+> [InceptionV3: Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/pdf/1512.00567.pdf)
+
+## Introduction
+
+InceptionV3 is an upgraded version of GoogleNet. One of the most important improvements of V3 is Factorization, which
+decomposes 7x7 into two one-dimensional convolutions (1x7, 7x1), and 3x3 is the same (1x3, 3x1), such benefits, both It
+can accelerate the calculation (excess computing power can be used to deepen the network), and can split 1 conv into 2
+convs, which further increases the network depth and increases the nonlinearity of the network. It is also worth noting
+that the network input from 224x224 has become 299x299, and 35x35/17x17/8x8 modules are designed more precisely. In
+addition, V3 also adds batch normalization, which makes the model converge more quickly, which plays a role in partial
+regularization and effectively reduces overfitting.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.
diff --git a/configs/inception_v3/inception_v3_ascend.yaml b/configs/inception_v3/inception_v3_ascend.yaml
new file mode 100644
index 0000000..ec5da46
--- /dev/null
+++ b/configs/inception_v3/inception_v3_ascend.yaml
@@ -0,0 +1,54 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 299
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+auto_augment: 'autoaug'
+re_prob: 0.25
+crop_pct: 0.875
+
+# model
+model: 'inception_v3'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: 'O0'
+aux_factor: 0.1
+
+# loss config
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+lr: 0.045
+min_lr: 0.0
+decay_epochs: 195
+warmup_epochs: 5
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/inception_v4/README.md b/configs/inception_v4/README.md
new file mode 100644
index 0000000..32e2b88
--- /dev/null
+++ b/configs/inception_v4/README.md
@@ -0,0 +1,87 @@
+# InceptionV4
+> [InceptionV4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/pdf/1602.07261.pdf)
+
+## Introduction
+
+InceptionV4 studies whether the Inception module combined with Residual Connection can be improved. It is found that the
+structure of ResNet can greatly accelerate the training, and the performance is also improved. An Inception-ResNet v2
+network is obtained, and a deeper and more optimized Inception v4 model is also designed, which can achieve comparable
+performance with Inception-ResNet v2.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.
diff --git a/configs/inception_v4/inception_v4_ascend.yaml b/configs/inception_v4/inception_v4_ascend.yaml
new file mode 100644
index 0000000..758a939
--- /dev/null
+++ b/configs/inception_v4/inception_v4_ascend.yaml
@@ -0,0 +1,53 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 299
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+auto_augment: 'autoaug'
+re_prob: 0.25
+crop_pct: 0.875
+
+# model
+model: 'inception_v4'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+lr: 0.045
+min_lr: 0.0
+decay_epochs: 195
+warmup_epochs: 5
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mixnet/README.md b/configs/mixnet/README.md
new file mode 100644
index 0000000..bd9b005
--- /dev/null
+++ b/configs/mixnet/README.md
@@ -0,0 +1,91 @@
+# MixNet
+> [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595)
+
+## Introduction
+
+Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often
+overlooked. In this paper, the authors systematically study the impact of different kernel sizes, and observe that
+combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation,
+the authors propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a
+single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy
+and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distrubted training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Tan M, Le Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv:1907.09595, 2019.
diff --git a/configs/mixnet/mixnet_l_ascend.yaml b/configs/mixnet/mixnet_l_ascend.yaml
new file mode 100644
index 0000000..e1ee07d
--- /dev/null
+++ b/configs/mixnet/mixnet_l_ascend.yaml
@@ -0,0 +1,57 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 16
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5-inc1"
+re_prob: 0.25
+crop_pct: 0.875
+mixup: 0.2
+cutmix: 1.0
+
+# model
+model: "mixnet_l"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: "O3"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.25
+min_lr: 0.00001
+decay_epochs: 580
+warmup_epochs: 20
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00002
+loss_scale_type: "dynamic"
+drop_overflow_update: True
+loss_scale: 16777216
+use_nesterov: False
diff --git a/configs/mixnet/mixnet_m_ascend.yaml b/configs/mixnet/mixnet_m_ascend.yaml
new file mode 100644
index 0000000..cb6d28a
--- /dev/null
+++ b/configs/mixnet/mixnet_m_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5"
+re_prob: 0.25
+crop_pct: 0.875
+mixup: 0.2
+cutmix: 1.0
+
+# model
+model: "mixnet_m"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: "O3"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.2
+min_lr: 0.00001
+decay_epochs: 585
+warmup_epochs: 15
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00002
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mixnet/mixnet_s_ascend.yaml b/configs/mixnet/mixnet_s_ascend.yaml
new file mode 100644
index 0000000..7034d23
--- /dev/null
+++ b/configs/mixnet/mixnet_s_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5"
+re_prob: 0.25
+crop_pct: 0.875
+mixup: 0.2
+cutmix: 1.0
+
+# model
+model: "mixnet_s"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: "O3"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.2
+min_lr: 0.00001
+decay_epochs: 585
+warmup_epochs: 15
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00002
+loss_scale: 256
+use_nesterov: False
diff --git a/configs/mnasnet/README.md b/configs/mnasnet/README.md
new file mode 100644
index 0000000..cf9d9d8
--- /dev/null
+++ b/configs/mnasnet/README.md
@@ -0,0 +1,86 @@
+# MnasNet
+> [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626)
+
+## Introduction
+
+Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, the authors propose a novel factorized hierarchical search space that encourages layer diversity throughout the network.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
diff --git a/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml
new file mode 100644
index 0000000..32fb95a
--- /dev/null
+++ b/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 32
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+train_split: 'train'
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'mobilenet_v2_075_224'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 50
+ckpt_save_dir: './ckpt'
+epoch_size: 400
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.3
+warmup_epochs: 4
+decay_epochs: 396
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00003
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml
new file mode 100644
index 0000000..4e2f613
--- /dev/null
+++ b/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 32
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+train_split: 'train'
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'mobilenet_v2_100_224'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 100
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.4
+warmup_epochs: 4
+decay_epochs: 296
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml
new file mode 100644
index 0000000..f31e462
--- /dev/null
+++ b/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 32
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+train_split: 'train'
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'mobilenet_v2_140_224'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 50
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.4
+warmup_epochs: 4
+decay_epochs: 296
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mobilenetv3/README.md b/configs/mobilenetv3/README.md
new file mode 100644
index 0000000..ff07af2
--- /dev/null
+++ b/configs/mobilenetv3/README.md
@@ -0,0 +1,87 @@
+# MobileNetV3
+> [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
+
+## Introduction
+
+MobileNet v3 was published in 2019, and this v3 version combines the deep separable convolution of v1, the Inverted Residuals and Linear Bottleneck of v2, and the SE module to search the configuration and parameters of the network using NAS (Neural Architecture Search).MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Afterwards, MobileNetV3 then fine-tunes the architecture using NetAdapt, which exemplifies NetAdapt's complementary capability to tune underutilized activation channels with a small drop.
+
+mobilenet-v3 offers two versions, mobilenet-v3 large and mobilenet-v3 small, for situations with different resource requirements. The paper mentions that mobilenet-v3 small, for the imagenet classification task, has an accuracy The paper mentions that mobilenet-v3 small achieves about 3.2% better accuracy and 15% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves about 4.6% better accuracy and 5% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves the same accuracy and 25% faster speedup in COCO compared to v2 The improvement in the segmentation algorithm is also observed.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.
diff --git a/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml b/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml
new file mode 100644
index 0000000..cc99885
--- /dev/null
+++ b/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 75
+drop_remainder: True
+train_split: 'train'
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'mobilenet_v3_large_100'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 420
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss config
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 1.08
+warmup_epochs: 4
+decay_epochs: 416
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml b/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml
new file mode 100644
index 0000000..75d802a
--- /dev/null
+++ b/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml
@@ -0,0 +1,52 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 75
+drop_remainder: True
+train_split: 'train'
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+color_jitter: 0.4
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'mobilenet_v3_small_100'
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 470
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.77
+warmup_epochs: 4
+decay_epochs: 466
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/mobilevit/README.md b/configs/mobilevit/README.md
new file mode 100644
index 0000000..2cf586d
--- /dev/null
+++ b/configs/mobilevit/README.md
@@ -0,0 +1,81 @@
+# MobileViT
+> [MobileViT:Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/pdf/2110.02178.pdf)
+
+## Introduction
+
+MobileViT, a light-weight and general-purpose vision transformer for mobile devices. MobileViT presents a different perspective for the global processing of information with transformers, i.e., transformers as convolutions. MobileViT significantly outperforms CNN- and ViT-based networks across different tasks and datasets. On the ImageNet-1k dataset, MobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters, which is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT (ViT-based) for a similar number of parameters. On the MS-COCO object detection task, MobileViT is 5.7% more accurate than MobileNetv3 for a similar number of parameters.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+
+[1] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8697-8710.
diff --git a/configs/nasnet/nasnet_a_4x1056_ascend.yaml b/configs/nasnet/nasnet_a_4x1056_ascend.yaml
new file mode 100644
index 0000000..829b73d
--- /dev/null
+++ b/configs/nasnet/nasnet_a_4x1056_ascend.yaml
@@ -0,0 +1,53 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.875
+
+# model
+model: 'nasnet_a_4x1056'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 450
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 1e-10
+lr: 0.016
+warmup_epochs: 5
+decay_epochs: 445
+
+# optimizer
+opt: 'rmsprop'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 1e-5
+loss_scale_type: 'dynamic'
+drop_overflow_update: True
+use_nesterov: False
+eps: 1e-3
diff --git a/configs/pit/README.md b/configs/pit/README.md
new file mode 100644
index 0000000..3726ebb
--- /dev/null
+++ b/configs/pit/README.md
@@ -0,0 +1,89 @@
+# PiT
+> [PiT: Rethinking Spatial Dimensions of Vision Transformers](https://arxiv.org/abs/2103.16302v2)
+
+## Introduction
+
+PiT (Pooling-based Vision Transformer) is an improvement of Vision Transformer (ViT) model proposed by Byeongho Heo in 2021. PiT adds pooling layer on the basis of ViT model, so that the spatial dimension of each layer is reduced like CNN, instead of ViT using the same spatial dimension for all layers. PiT achieves the improved model capability and generalization performance against ViT. [[1](#references)]
+
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+- Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet --distribute False
+```
+
+### Validation
+
+```
+validation of poolformer has to be done in amp O3 mode which is not supported, coming soon...
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1]. Yu W, Luo M, Zhou P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10819-10829.
diff --git a/configs/poolformer/poolformer_s12_ascend.yaml b/configs/poolformer/poolformer_s12_ascend.yaml
new file mode 100644
index 0000000..470ce8b
--- /dev/null
+++ b/configs/poolformer/poolformer_s12_ascend.yaml
@@ -0,0 +1,61 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+vflip: 0.0
+interpolation: 'bilinear'
+crop_pct: 0.9
+color_jitter: [0.4, 0.4, 0.4]
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+
+# model
+model: 'poolformer_s12'
+drop_rate: 0.0
+drop_path_rate: 0.1
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.0005
+min_lr: 1e-06
+warmup_epochs: 30
+decay_epochs: 570
+decay_rate: 0.1
+
+# optimizer
+opt: 'AdamW'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/pvt/README.md b/configs/pvt/README.md
new file mode 100644
index 0000000..fa3c1fc
--- /dev/null
+++ b/configs/pvt/README.md
@@ -0,0 +1,84 @@
+# Pyramid Vision Transformer
+
+> [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/abs/2102.12122)
+
+## Introduction
+
+PVT is a general backbone network for dense prediction without convolution operation. PVT introduces a pyramid structure in Transformer to generate multi-scale feature maps for dense prediction tasks. PVT uses a gradual reduction strategy to control the size of the feature maps through the patch embedding layer, and proposes a spatial reduction attention (SRA) layer to replace the traditional multi head attention layer in the encoder, which greatly reduces the computing/memory overhead.[[1](#References)]
+
+![PVT](https://user-images.githubusercontent.com/74176172/210046926-2322161b-a963-4603-b3cb-86ecdca41262.png)
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distrubted training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Wang W, Xie E, Li X, et al. Pvt v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424.
diff --git a/configs/pvt_v2/pvt_v2_b0_ascend.yaml b/configs/pvt_v2/pvt_v2_b0_ascend.yaml
new file mode 100644
index 0000000..3926992
--- /dev/null
+++ b/configs/pvt_v2/pvt_v2_b0_ascend.yaml
@@ -0,0 +1,59 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: "random"
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: randaug-m9-mstd0.5-inc1
+re_prob: 0.25
+crop_pct: 0.9
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: "pvt_v2_b0"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 500
+drop_path_rate: 0.1
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "ce"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.001
+min_lr: 0.00001
+lr_epoch_stair: True
+decay_epochs: 490
+warmup_epochs: 10
+
+# optimizer
+opt: "adamw"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: "dynamic"
+drop_overflow_update: True
+use_nesterov: False
diff --git a/configs/pvt_v2/pvt_v2_b1_ascend.yaml b/configs/pvt_v2/pvt_v2_b1_ascend.yaml
new file mode 100644
index 0000000..559c18c
--- /dev/null
+++ b/configs/pvt_v2/pvt_v2_b1_ascend.yaml
@@ -0,0 +1,59 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: "random"
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5-inc1"
+re_prob: 0.25
+crop_pct: 0.9
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: "pvt_v2_b1"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 500
+drop_path_rate: 0.1
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "ce"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.001
+min_lr: 0.00001
+lr_epoch_stair: True
+decay_epochs: 490
+warmup_epochs: 10
+
+# optimizer
+opt: "adamw"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: "dynamic"
+drop_overflow_update: True
+use_nesterov: False
diff --git a/configs/pvt_v2/pvt_v2_b2_ascend.yaml b/configs/pvt_v2/pvt_v2_b2_ascend.yaml
new file mode 100644
index 0000000..3945eb0
--- /dev/null
+++ b/configs/pvt_v2/pvt_v2_b2_ascend.yaml
@@ -0,0 +1,58 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+re_value: "random"
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5-inc1"
+re_prob: 0.25
+crop_pct: 0.9
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: "pvt_v2_b2"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 550
+drop_path_rate: 0.2
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "ce"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.001
+min_lr: 0.000001
+decay_epochs: 530
+warmup_epochs: 20
+
+# optimizer
+opt: "adamw"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.05
+loss_scale_type: "dynamic"
+drop_overflow_update: True
+use_nesterov: False
diff --git a/configs/regnet/README.md b/configs/regnet/README.md
new file mode 100644
index 0000000..f8dc9ac
--- /dev/null
+++ b/configs/regnet/README.md
@@ -0,0 +1,81 @@
+# RegNet
+
+> [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678)
+
+## Introduction
+
+In this work, the authors present a new network design paradigm that combines the advantages of manual design and NAS. Instead of focusing on designing individual network instances, they design design spaces that parametrize populations of networks. Like in manual design, the authors aim for interpretability and to discover general design principles that describe networks that are simple, work well, and generalize across settings. Like in NAS, the authors aim to take advantage of semi-automated procedures to help achieve these goals The general strategy they adopt is to progressively design simplified versions of an initial, relatively unconstrained, design space while maintaining or improving its quality. The overall process is analogous to manual design, elevated to the population level and guided via distribution estimates of network design spaces. As a testbed for this paradigm, their focus is on exploring network structure (e.g., width, depth, groups, etc.) assuming standard model families including VGG, ResNet, and ResNeXt. The authors start with a relatively unconstrained design space they call AnyNet (e.g., widths and depths vary freely across stages) and apply their humanin-the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, that they call RegNet. The core of the RegNet design space is simple: stage widths and depths are determined by a quantized linear function. Compared to AnyNet, the RegNet design space has simpler models, is easier to interpret, and has a higher concentration of good models.[[1](#References)]
+
+![RegNet](https://user-images.githubusercontent.com/74176172/210046899-4e83bb56-f7f6-49b2-9dde-dce200428e92.png)
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+- Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/regnet/regnet_x_800mf_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/regnet/regnet_x_800mf_ascend.yaml --data_dir /path/to/imagenet --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py --model=regnet_x_800mf --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1]. Radosavovic I, Kosaraju R P, Girshick R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10428-10436.
diff --git a/configs/regnet/regnet_x_800mf_ascend.yaml b/configs/regnet/regnet_x_800mf_ascend.yaml
new file mode 100644
index 0000000..da384bf
--- /dev/null
+++ b/configs/regnet/regnet_x_800mf_ascend.yaml
@@ -0,0 +1,55 @@
+# system
+mode: 0
+distribute: True
+val_while_train: True
+val_interval: 1
+log_interval: 100
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+num_parallel_workers: 8
+batch_size: 64
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+vflip: 0.0
+interpolation: 'bilinear'
+color_jitter: 0.4
+re_prob: 0.1
+
+# model
+model: 'regnet_x_800mf'
+num_classes: 1000
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+ckpt_save_interval: 1
+ckpt_save_policy: 'latest_k'
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 5
+warmup_factor: 0.01
+decay_epochs: 195
+lr_epoch_stair: True
+
+# optimizer
+opt: 'momentum'
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 128
+use_nesterov: False
+filter_bias_and_bn: True
diff --git a/configs/repmlp/README.md b/configs/repmlp/README.md
new file mode 100644
index 0000000..9663833
--- /dev/null
+++ b/configs/repmlp/README.md
@@ -0,0 +1,91 @@
+# RepMLPNet
+
+> [RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality](https://arxiv.org/abs/2112.11081)
+
+## Introduction
+
+Compared to convolutional layers, fully-connected (FC) layers are better at modeling the long-range dependencies
+but worse at capturing the local patterns, hence usually less favored for image recognition. In this paper, the authors propose a
+methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a
+parallel conv kernel into the FC kernel. Locality Injection can be viewed as a novel Structural Re-parameterization
+method since it equivalently converts the structures via transforming the parameters. Based on that, the authors propose a
+multi-layer-perceptron (MLP) block named RepMLP Block, which uses three FC layers to extract features, and a novel
+architecture named RepMLPNet. The hierarchical design distinguishes RepMLPNet from the other concurrently proposed vision MLPs.
+As it produces feature maps of different levels, it qualifies as a backbone model for downstream tasks like semantic segmentation.
+Their results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency
+trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation.
+
+![RepMLP](https://user-images.githubusercontent.com/74176172/210046952-c4f05321-76e9-4d7a-b419-df91aac64cdf.png)
+Figure 1. RepMLP Block.[[1](#References)]
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+- Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/repmlp/repmlp_t224_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/repmlp/repmlp_t224_ascend.yaml --data_dir /path/to/imagenet --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py --model=RepMLPNet_T224 --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1]. Ding X, Chen H, Zhang X, et al. Repmlpnet: Hierarchical vision mlp with re-parameterized locality[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 578-587.
diff --git a/configs/repmlp/repmlp_t224_ascend.yaml b/configs/repmlp/repmlp_t224_ascend.yaml
new file mode 100644
index 0000000..17c8d2a
--- /dev/null
+++ b/configs/repmlp/repmlp_t224_ascend.yaml
@@ -0,0 +1,62 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+vflip: 0.0
+interpolation: 'bilinear'
+crop_pct: 0.875
+color_jitter: 0.4
+re_prob: 0.25
+re_ratio: [0.3, 3.333]
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: 'randaug-m9-mstd0.5-inc1'
+
+
+# model
+model: 'RepMLPNet_T224'
+num_classes: 1000
+in_channels: 3
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'ce'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr: 0.005
+min_lr: 1e-5
+warmup_epochs: 10
+decay_epochs: 290
+decay_rate: 0.01
+
+# optimizer
+opt: 'adamw'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 2e-05
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/repvgg/README.md b/configs/repvgg/README.md
new file mode 100644
index 0000000..b5cd384
--- /dev/null
+++ b/configs/repvgg/README.md
@@ -0,0 +1,105 @@
+# RepVGG
+
+> [RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697)
+
+## Introduction
+
+
+
+The key idea of Repvgg is that by using re-parameterization, the model architecture could be trained in multi-branch but validated in single branch.
+Figure 1 shows the basic model architecture of Repvgg. By utilizing different values for a and b, we could get various repvgg models.
+Repvgg could achieve better model performance with smaller model parameters on ImageNet-1K dataset compared with previous methods.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/resnest/resnest50_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/resnest/resnest50_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/resnest/resnest50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2736-2746.
diff --git a/configs/resnest/resnest50_ascend.yaml b/configs/resnest/resnest50_ascend.yaml
new file mode 100644
index 0000000..e3fc11e
--- /dev/null
+++ b/configs/resnest/resnest50_ascend.yaml
@@ -0,0 +1,57 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+auto_augment: "randaug-m9-mstd0.5-inc1"
+re_prob: 0.25
+crop_pct: 0.875
+mixup: 0.8
+cutmix: 1.0
+
+# model
+model: "resnest50"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 350
+dataset_sink_mode: True
+amp_level: "O2"
+drop_rate: 0.2
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.06
+warmup_epochs: 5
+decay_epochs: 345
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale_type: "dynamic"
+drop_overflow_update: True
+use_nesterov: False
diff --git a/configs/resnet/README.md b/configs/resnet/README.md
new file mode 100644
index 0000000..7edc008
--- /dev/null
+++ b/configs/resnet/README.md
@@ -0,0 +1,89 @@
+# ResNet
+
+> [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
+
+## Introduction
+
+Resnet is a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which is explicitly reformulated that the layers are learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. Lots of comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1] He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer International Publishing, 2016: 630-645.
diff --git a/configs/resnetv2/resnetv2_101_ascend.yaml b/configs/resnetv2/resnetv2_101_ascend.yaml
new file mode 100644
index 0000000..28c3fb8
--- /dev/null
+++ b/configs/resnetv2/resnetv2_101_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bilinear"
+crop_pct: 0.875
+
+# model
+model: "resnetv2_101"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_dir: "./ckpt"
+epoch_size: 120
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 0
+decay_epochs: 120
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/resnetv2/resnetv2_50_ascend.yaml b/configs/resnetv2/resnetv2_50_ascend.yaml
new file mode 100644
index 0000000..411034e
--- /dev/null
+++ b/configs/resnetv2/resnetv2_50_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bilinear"
+crop_pct: 0.875
+
+# model
+model: "resnetv2_50"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 120
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 0
+decay_epochs: 120
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/resnext/README.md b/configs/resnext/README.md
new file mode 100644
index 0000000..07cf74a
--- /dev/null
+++ b/configs/resnext/README.md
@@ -0,0 +1,93 @@
+# ResNeXt
+> [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431)
+
+## Introduction
+
+The authors present a simple, highly modularized network architecture for image classification. The network is
+constructed by repeating a building block that aggregates a set of transformations with the same topology. The simple
+design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy
+exposes a new dimension, which the authors call "cardinality" (the size of the set of transformations), as an essential
+factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, the authors empirically show that
+even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification
+accuracy.[[1](#references)]
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.
diff --git a/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml b/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml
new file mode 100644
index 0000000..68d5482
--- /dev/null
+++ b/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml
@@ -0,0 +1,49 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: "bilinear"
+crop_pct: 0.875
+
+# model
+model: "shufflenet_v1_g3_x0_5"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 250
+dataset_sink_mode: True
+amp_level: "O0"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.35
+warmup_epochs: 4
+decay_epochs: 246
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml b/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml
new file mode 100644
index 0000000..08fc85e
--- /dev/null
+++ b/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml
@@ -0,0 +1,49 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: "bilinear"
+crop_pct: 0.875
+
+# model
+model: "shufflenet_v1_g3_x1_0"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 250
+dataset_sink_mode: True
+amp_level: "O0"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.4
+warmup_epochs: 4
+decay_epochs: 246
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/shufflenet_v2/README.md b/configs/shufflenet_v2/README.md
new file mode 100644
index 0000000..5797a21
--- /dev/null
+++ b/configs/shufflenet_v2/README.md
@@ -0,0 +1,98 @@
+# ShuffleNetV2
+> [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164)
+
+## Introduction
+
+A key point was raised in ShuffleNetV2, where previous lightweight networks were guided by computing an indirect measure of network complexity, namely FLOPs. The speed of lightweight networks is described by calculating the amount of floating point operations. But the speed of operation was never considered directly. The running speed in mobile devices needs to consider not only FLOPs, but also other factors such as memory accesscost and platform characterics.
+
+Therefore, based on these two principles, ShuffleNetV2 proposes four effective network design principles.
+
+- MAC is minimized when the input feature matrix of the convolutional layer is equal to the output feature matrixchannel (when FLOPs are kept constant).
+- MAC increases when the groups of GConv increase (while keeping FLOPs constant).
+- the higher the fragmentation of the network design, the slower the speed.
+- The impact of Element-wise operation is not negligible.
+
+
+
+
+
+ Figure 1. Architecture Design in ShuffleNetV2 [1]
+
+
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+#### Notes
+
+- All models are trained on ImageNet-1K training set and the top-1 accuracy is reported on the validatoin set.
+- Context: GPU_TYPE x pieces - G/F, G - graph mode, F - pynative mode with ms function.
+
+
+## Quick Start
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/imagenet
+```
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+[1] Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131.
diff --git a/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml
new file mode 100644
index 0000000..dbe7a90
--- /dev/null
+++ b/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml
@@ -0,0 +1,50 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: 'bilinear'
+crop_pct: 0.875
+# re_prob: 0.5
+
+# model
+model: 'shufflenet_v2_x0_5'
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 250
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.35
+warmup_epochs: 4
+decay_epochs: 246
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1
+use_nesterov: False
diff --git a/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml
new file mode 100644
index 0000000..c82e141
--- /dev/null
+++ b/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml
@@ -0,0 +1,50 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/data/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: 'bilinear'
+crop_pct: 0.875
+re_prob: 0.5
+
+# model
+model: 'shufflenet_v2_x1_0'
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.4
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1
+use_nesterov: False
diff --git a/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml
new file mode 100644
index 0000000..e3fedcc
--- /dev/null
+++ b/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml
@@ -0,0 +1,50 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/data/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: 'bilinear'
+crop_pct: 0.875
+re_prob: 0.5
+
+# model
+model: 'shufflenet_v2_x1_5'
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.5
+warmup_epochs: 4
+decay_epochs: 246
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1
+use_nesterov: False
diff --git a/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml
new file mode 100644
index 0000000..10b4598
--- /dev/null
+++ b/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml
@@ -0,0 +1,50 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/data/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+hflip: 0.5
+color_jitter: 0.4
+interpolation: 'bilinear'
+crop_pct: 0.875
+re_prob: 0.5
+
+# model
+model: 'shufflenet_v2_x2_0'
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O0'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+min_lr: 0.0
+lr: 0.5
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'momentum'
+filter_bias_and_bn: False
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1
+use_nesterov: False
diff --git a/configs/sknet/README.md b/configs/sknet/README.md
new file mode 100644
index 0000000..01da4aa
--- /dev/null
+++ b/configs/sknet/README.md
@@ -0,0 +1,106 @@
+# SKNet
+
+> [Selective Kernel Networks](https://arxiv.org/pdf/1903.06586)
+
+## Introduction
+
+The local receptive fields (RFs) of neurons in the primary visual cortex (V1) of cats [[1](#references)] have inspired the
+construction of Convolutional Neural Networks (CNNs) [[2](#references)] in the last century, and it continues to inspire mordern CNN
+structure construction. For instance, it is well-known that in the visual cortex, the RF sizes of neurons in the
+same area (e.g.,V1 region) are different, which enables the neurons to collect multi-scale spatial information in the
+same processing stage. This mechanism has been widely adopted in recent Convolutional Neural Networks (CNNs).
+A typical example is InceptionNets [[3](#references), [4](#references), [5](#references), [6](#references)], in which a simple concatenation is designed to aggregate
+multi-scale information from, e.g., 3×3, 5×5, 7×7 convolutional kernels inside the “inception” building block.
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+
+## Quick Start
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet
+```
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+
+## References
+
+[1] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual
+cortex. The Journal of Physiology, 1962.
+
+[2] Y . LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation
+applied to handwritten zip code recognition. Neural Computation, 1989.
+
+[3] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In
+CVPR, 2016.
+
+[4] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift.
+arXiv preprint arXiv:1502.03167, 2015.
+
+[5] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In
+CVPR, 2016.
+
+[6] C. Szegedy, S. Ioffe, V . V anhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual
+connections on learning. In AAAI, 2017.
diff --git a/configs/sknet/skresnet18_ascend.yaml b/configs/sknet/skresnet18_ascend.yaml
new file mode 100644
index 0000000..b43bb46
--- /dev/null
+++ b/configs/sknet/skresnet18_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+crop_pct: 0.875
+
+# model
+model: "skresnet18"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 5
+decay_epochs: 195
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/sknet/skresnet34_ascend.yaml b/configs/sknet/skresnet34_ascend.yaml
new file mode 100644
index 0000000..e287151
--- /dev/null
+++ b/configs/sknet/skresnet34_ascend.yaml
@@ -0,0 +1,54 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bilinear"
+crop_pct: 0.875
+re_prob: 0.25
+
+# model
+model: "skresnet34"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 250
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 5
+decay_epochs: 245
+warmup_factor: 0.01
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00004
+loss_scale: 128
+use_nesterov: False
diff --git a/configs/sknet/skresnext50_32x4d_ascend.yaml b/configs/sknet/skresnext50_32x4d_ascend.yaml
new file mode 100644
index 0000000..5e991f2
--- /dev/null
+++ b/configs/sknet/skresnext50_32x4d_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 64
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+crop_pct: 0.875
+re_prob: 0.1
+
+# model
+model: "skresnext50_32x4d"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 30
+ckpt_save_dir: "./ckpt"
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+min_lr: 0.0
+lr: 0.1
+warmup_epochs: 5
+decay_epochs: 195
+
+# optimizer
+opt: "momentum"
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.0001
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/squeezenet/README.md b/configs/squeezenet/README.md
new file mode 100644
index 0000000..5c702d4
--- /dev/null
+++ b/configs/squeezenet/README.md
@@ -0,0 +1,91 @@
+# SqueezeNet
+
+> [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360)
+
+## Introduction
+
+SqueezeNet is a smaller CNN architectures which is comprised mainly of Fire modules and it achieves AlexNet-level
+accuracy on ImageNet with 50x fewer parameters. SqueezeNet can offer at least three advantages: (1) Smaller CNNs require
+less communication across servers during distributed training. (2) Smaller CNNs require less bandwidth to export a new
+model from the cloud to an autonomous car. (3) Smaller CNNs are more feasible to deploy on FPGAs and other hardware with
+limited memory. Additionally, with model compression techniques, SqueezeNet is able to be compressed to less than
+0.5MB (510× smaller than AlexNet). Blow is macroarchitectural view of SqueezeNet architecture. Left: SqueezeNet ;
+Middle: SqueezeNet with simple bypass; Right: SqueezeNet with complex bypass.
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+
+[1] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
diff --git a/configs/swintransformer/swin_tiny_ascend.yaml b/configs/swintransformer/swin_tiny_ascend.yaml
new file mode 100644
index 0000000..4d2c2ae
--- /dev/null
+++ b/configs/swintransformer/swin_tiny_ascend.yaml
@@ -0,0 +1,59 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+crop_pct: 0.875
+color_jitter: [0.4, 0.4, 0.4]
+auto_augment: "randaug-m7-mstd0.5"
+
+# model
+model: "swin_tiny"
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+loss_scale: 1024.0
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "cosine_decay"
+lr: 0.003
+min_lr: 1e-6
+warmup_epochs: 32
+decay_epochs: 568
+lr_epoch_stair: False
+
+# optimizer
+opt: "adamw"
+weight_decay: 0.025
+filter_bias_and_bn: True
+use_nesterov: False
diff --git a/configs/vgg/README.md b/configs/vgg/README.md
new file mode 100644
index 0000000..e0aa3e5
--- /dev/null
+++ b/configs/vgg/README.md
@@ -0,0 +1,102 @@
+# VGGNet
+
+> [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
+
+## Introduction
+
+
+
+Figure 1 shows the model architecture of VGGNet. VGGNet is a key milestone on image classification task. It expands the model to 16-19 layers for the first time. The key motivation of this model is
+that it shows usage of 3x3 kernels is efficient and by adding 3x3 kernels, it could have the same effect as 5x5 or 7x7 kernels. VGGNet could achieve better model performance compared with previous
+methods such as GoogleLeNet and AlexNet on ImageNet-1K dataset.[[1](#references)]
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/imagenet
+```
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+[1] Chen Z, Xie L, Niu J, et al. Visformer: The vision-friendly transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 589-598.
+
+[2] Visformer, https://paperswithcode.com/method/visformer
diff --git a/configs/visformer/visformer_small_ascend.yaml b/configs/visformer/visformer_small_ascend.yaml
new file mode 100644
index 0000000..85aa930
--- /dev/null
+++ b/configs/visformer/visformer_small_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+interpolation: BICUBIC
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: autoaug
+
+# model
+model: 'visformer_small'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr_epoch_stair: True
+lr: 0.0005
+min_lr: 0.00001
+warmup_factor: 0.002
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'adamw'
+momentum: 0.9
+weight_decay: 0.05
+loss_scale: 1024
diff --git a/configs/visformer/visformer_small_v2_ascend.yaml b/configs/visformer/visformer_small_v2_ascend.yaml
new file mode 100644
index 0000000..1b3f2fe
--- /dev/null
+++ b/configs/visformer/visformer_small_v2_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+
+# augmentation
+interpolation: BICUBIC
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: autoaug
+
+# model
+model: 'visformer_small_v2'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr_epoch_stair: True
+lr: 0.0005
+min_lr: 0.00001
+warmup_factor: 0.002
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'adamw'
+momentum: 0.9
+weight_decay: 0.05
+loss_scale: 1024
diff --git a/configs/visformer/visformer_tiny_ascend.yaml b/configs/visformer/visformer_tiny_ascend.yaml
new file mode 100644
index 0000000..722d847
--- /dev/null
+++ b/configs/visformer/visformer_tiny_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+interpolation: BICUBIC
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: autoaug
+
+# model
+model: 'visformer_tiny'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr_epoch_stair: True
+lr: 0.0005
+min_lr: 0.00001
+warmup_factor: 0.002
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'adamw'
+momentum: 0.9
+weight_decay: 0.05
+loss_scale: 1024
diff --git a/configs/visformer/visformer_tiny_v2_ascend.yaml b/configs/visformer/visformer_tiny_v2_ascend.yaml
new file mode 100644
index 0000000..1a46f3f
--- /dev/null
+++ b/configs/visformer/visformer_tiny_v2_ascend.yaml
@@ -0,0 +1,51 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+interpolation: BICUBIC
+re_prob: 0.25
+mixup: 0.8
+cutmix: 1.0
+cutmix_prob: 1.0
+auto_augment: autoaug
+
+# model
+model: 'visformer_tiny_v2'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 400
+dataset_sink_mode: True
+amp_level: 'O3'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'cosine_decay'
+lr_epoch_stair: True
+lr: 0.001
+min_lr: 0.00002
+warmup_factor: 0.002
+warmup_epochs: 5
+decay_epochs: 295
+
+# optimizer
+opt: 'adamw'
+momentum: 0.9
+weight_decay: 0.05
+loss_scale: 1024
diff --git a/configs/vit/README.md b/configs/vit/README.md
new file mode 100644
index 0000000..b44ea1a
--- /dev/null
+++ b/configs/vit/README.md
@@ -0,0 +1,104 @@
+# ViT
+
+
+> [ An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)
+
+## Introduction
+
+
+Vision Transformer (ViT) achieves remarkable results compared to convolutional neural networks (CNN) while obtaining fewer computational resources for pre-training. In comparison to convolutional neural networks (CNN), Vision Transformer (ViT) shows a generally weaker inductive bias resulting in increased reliance on model regularization or data augmentation (AugReg) when training on smaller datasets.
+
+The ViT is a visual model based on the architecture of a transformer originally designed for text-based tasks, as shown in the below figure. The ViT model represents an input image as a series of image patches, like the series of word embeddings used when using transformers to text, and directly predicts class labels for the image. ViT exhibits an extraordinary performance when trained on enough data, breaking the performance of a similar state-of-art CNN with 4x fewer computational resources. [[2](#references)]
+
+
+
+
+
+#### Notes
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+
+## Quick Start
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/imagenet
+```
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```
+python validate.py -c configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
+
+## References
+
+
+[1] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
+
+[2] "Vision Transformers (ViT) in Image Recognition – 2022 Guide", https://viso.ai/deep-learning/vision-transformer-vit/
diff --git a/configs/vit/vit_b32_224_ascend.yaml b/configs/vit/vit_b32_224_ascend.yaml
new file mode 100644
index 0000000..68b5514
--- /dev/null
+++ b/configs/vit/vit_b32_224_ascend.yaml
@@ -0,0 +1,61 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 256
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+crop_pct: 0.875
+color_jitter: [0.4, 0.4, 0.4]
+auto_augment: "randaug-m7-mstd0.5"
+
+# model
+model: "vit_b_32_224"
+drop_rate: 0.1
+drop_path_rate: 0.1
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 600
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+loss_scale: 1024.0
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "warmup_cosine_decay"
+lr: 0.003
+min_lr: 1e-6
+warmup_epochs: 32
+decay_epochs: 568
+lr_epoch_stair: False
+
+# optimizer
+opt: "adamw"
+weight_decay: 0.025
+filter_bias_and_bn: True
+use_nesterov: False
diff --git a/configs/vit/vit_l16_224_ascend.yaml b/configs/vit/vit_l16_224_ascend.yaml
new file mode 100644
index 0000000..8ad2780
--- /dev/null
+++ b/configs/vit/vit_l16_224_ascend.yaml
@@ -0,0 +1,61 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 48
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+re_prob: 0.15
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+crop_pct: 0.875
+color_jitter: [0.4, 0.4, 0.4]
+auto_augment: "randaug-m9-mstd0.5"
+
+# model
+model: "vit_l_16_224"
+drop_rate: 0.12
+drop_path_rate: 0.1
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+loss_scale: 1024.0
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "warmup_cosine_decay"
+lr: 0.0005
+min_lr: 1e-5
+warmup_epochs: 32
+decay_epochs: 268
+lr_epoch_stair: False
+
+# optimizer
+opt: "adamw"
+weight_decay: 0.05
+filter_bias_and_bn: True
+use_nesterov: False
diff --git a/configs/vit/vit_l32_224_ascend.yaml b/configs/vit/vit_l32_224_ascend.yaml
new file mode 100644
index 0000000..7aa88cd
--- /dev/null
+++ b/configs/vit/vit_l32_224_ascend.yaml
@@ -0,0 +1,61 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+val_interval: 1
+
+# dataset
+dataset: "imagenet"
+data_dir: "/path/to/imagenet"
+shuffle: True
+dataset_download: False
+batch_size: 128
+drop_remainder: True
+
+# augmentation
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: "bicubic"
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+crop_pct: 0.875
+color_jitter: [0.4, 0.4, 0.4]
+auto_augment: "randaug-m7-mstd0.5"
+
+# model
+model: "vit_l_32_224"
+drop_rate: 0.1
+drop_path_rate: 0.1
+num_classes: 1000
+pretrained: False
+ckpt_path: ""
+keep_checkpoint_max: 10
+ckpt_save_policy: "top_k"
+ckpt_save_dir: "./ckpt"
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: "O2"
+
+# loss
+loss: "CE"
+loss_scale: 1024.0
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: "warmup_cosine_decay"
+lr: 0.0015
+min_lr: 1e-6
+warmup_epochs: 32
+decay_epochs: 268
+lr_epoch_stair: False
+
+# optimizer
+opt: "adamw"
+weight_decay: 0.025
+filter_bias_and_bn: True
+use_nesterov: False
diff --git a/configs/xception/README.md b/configs/xception/README.md
new file mode 100644
index 0000000..fb73cba
--- /dev/null
+++ b/configs/xception/README.md
@@ -0,0 +1,90 @@
+# Xception
+> [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/pdf/1610.02357.pdf)
+
+## Introduction
+
+Xception is another improved network of InceptionV3 in addition to inceptionV4, using a deep convolutional neural
+network architecture involving depthwise separable convolution, which was developed by Google researchers. Google
+interprets the Inception module in convolutional neural networks as an intermediate step between regular convolution and
+depthwise separable convolution operations. From this point of view, the depthwise separable convolution can be
+understood as having the largest number of Inception modules, that is, the extreme idea proposed in the paper, combined
+with the idea of residual network, Google proposed a new type of deep convolutional neural network inspired by Inception
+Network architecture where the Inception module has been replaced by a depthwise separable convolution module.[[1](#references)]
+
+
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/xception/xception_ascend.yaml --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/xception/xception_ascend.yaml --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/xception/xception_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
+
+## References
+
+[1] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.
diff --git a/configs/xception/xception_ascend.yaml b/configs/xception/xception_ascend.yaml
new file mode 100644
index 0000000..6fc32da
--- /dev/null
+++ b/configs/xception/xception_ascend.yaml
@@ -0,0 +1,53 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 32
+drop_remainder: True
+
+# augmentation
+image_resize: 299
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+auto_augment: 'autoaug'
+re_prob: 0.25
+crop_pct: 0.875
+
+# model
+model: 'xception'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 200
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+lr: 0.045
+min_lr: 0.0
+decay_epochs: 195
+warmup_epochs: 5
+
+# optimizer
+opt: 'sgd'
+filter_bias_and_bn: True
+momentum: 0.9
+weight_decay: 0.00001
+loss_scale: 1024
+use_nesterov: False
diff --git a/configs/xcit/README.md b/configs/xcit/README.md
new file mode 100644
index 0000000..6bb7343
--- /dev/null
+++ b/configs/xcit/README.md
@@ -0,0 +1,86 @@
+# XCiT: Cross-Covariance Image Transformers
+
+> [XCiT: Cross-Covariance Image Transformers](https://arxiv.org/abs/2106.09681)
+## Introduction
+
+XCiT models propose a “transposed” version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) – built upon XCA – combines the accuracy of conventional transformers with the scalability of convolutional architectures.
+
+