update instructions of the challenge

This commit is contained in:
Xunhui Zhang 2023-04-18 15:43:37 +08:00
parent 3fa370b210
commit 6567b93c30
284 changed files with 20462 additions and 0 deletions

68
configs/README.md Normal file
View File

@ -0,0 +1,68 @@
### File Structure and Naming
This folder contains training recipes and model readme files for each model. The folder structure and naming rule of model configurations are as follows.
```
├── configs
├── model_a // model name in lower case with _ seperator
│ ├─ model_a_small_ascend.yaml // training recipe denated as {model_name}_{specification}_{hardware}.yaml
| ├─ model_a_large_gpu.yaml
│ ├─ README.md //readme file containing performance results and pretrained weight urls
│ └─ README_CN.md //readme file in Chinese
├── model_b
│ ├─ model_b_32_ascend.yaml
| ├─ model_l_16_ascend.yaml
│ ├─ README.md
│ └─ README_CN.md
├── README.md //this file
```
### Model Readme Writing Guideline
The model readme file in each sub-folder provides the introduction, reproduced results, and running guideline for each model.
Please follow the outline structure and **table format** shown in [densenet/README.md](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/README.md) when contributing your models :)
#### Table Format
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) |
</div>
Illustration:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validatoin set of ImageNet-1K. Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep **2 digits** after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file.
- Download: url of the pretrained model weights
### Model Checkpoint Format
The checkpoint (i.e., model weight) name should follow this format: **{model_name}_{specification}-{sha256sum}.ckpt**, e.g., `poolformer_s12-5be5c4e4.ckpt`.
You can run the following command and take the first 8 characters of the computing result as the sha256sum value in the checkpoint name.
```shell
sha256sum your_model.ckpt
```
#### Training Script Format
For consistency, it is recommended to provide distributed training commands based on `mpirun -n {num_devices} python train.py`, instead of using shell script such as `distrubuted_train.sh`.
```shell
# standalone training on a gpu or ascend device
python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False
# distributed training on gpu or ascend divices
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
#### URL and Hyperlink Format
Please use **absolute path** in the hyperlink or url for linking the target resource in the readme file and table.

91
configs/bit/README.md Normal file
View File

@ -0,0 +1,91 @@
# BigTransfer
> [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370)
## Introduction
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision.
Big Transfer (BiT) can achieve strong performance on more than 20 data sets by combining some carefully selected components and using simple heuristic
methods for transmission. The components distilled by BiT for training models that transfer well are: 1) Big datasets: as the size of the dataset increases,
the optimal performance of the BIT model will also increase. 2) Big architectures: In order to make full use of large datasets, a large enough architecture
is required. 3) Long pre-training time: Pretraining on a larger dataset requires more training epoch and training time. 4) GroupNorm and Weight Standardisation:
BiT use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since BatchNorm performs worse when the number of images on each accelerator is
too low. 5) With BiT fine-tuning, good performance can be achieved even if there are only a few examples of each type on natural images.[[1, 2](#References)]
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params(M) | Recipe | Download |
|----------------| -------- |-----------|-----------|-----------|--------------------------------------------------------------------------------------------------| -------------------------------------------------------------------------- |
| bit_resnet50 | D910x8-G | 76.81 | 93.17 | 25.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50-1e4795a4.ckpt) |
| bit_resnet50x3 | D910x8-G | 80.63 | 95.12 | 217.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50x3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50x3-a960f91f.ckpt) |
| bit_resnet101 | D910x8-G | 77.93 | 93.75 | 44.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet101-2efa9106.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format should follow GB/T 7714. -->
[1] Kolesnikov A, Beyer L, Zhai X, et al. Big transfer (bit): General visual representation learning[C]//European conference on computer vision. Springer, Cham, 2020: 491-507.
[2] BigTransfer (BiT): State-of-the-art transfer learning for computer vision, https://blog.tensorflow.org/2020/05/bigtransfer-bit-state-of-art-transfer-learning-computer-vision.html

View File

@ -0,0 +1,47 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 16
drop_remainder: True
# augmentation
image_resize: 224
hflip: 0.5
crop_pct: 0.875
# model
model: 'BiTresnet101'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 90
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'multi_step_decay'
lr: 0.06
decay_rate: 0.5
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
# optimizer
opt: 'sgd'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024

View File

@ -0,0 +1,46 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
hflip: 0.5
crop_pct: 0.875
# model
model: 'BiTresnet50'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 90
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'multi_step_decay'
lr: 0.06
decay_rate: 0.5
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
# optimizer
opt: 'sgd'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024

View File

@ -0,0 +1,49 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
hflip: 0.5
mixup: 0.2
crop_pct: 0.875
auto_augment: "randaug-m7-mstd0.5"
# model
model: 'BiTresnet50x3'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 30
ckpt_save_dir: './ckpt'
epoch_size: 90
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler config
warmup_epochs: 1
scheduler: 'multi_step_decay'
lr: 0.09
decay_rate: 0.4
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
# optimizer
opt: 'sgd'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024

80
configs/coat/README.md Normal file
View File

@ -0,0 +1,80 @@
# CoaT
> [Co-Scale Conv-Attentional Image Transformers](https://arxiv.org/abs/2104.06399v2)
## Introduction
Co-Scale Conv-Attentional Image Transformer (CoaT) is a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other. Second, the conv-attentional mechanism is designed by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities.
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Weight |
|-----------------|-----------|-------|------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| coat_lite_tiny | D910x8-G | 77.35 | 93.43 | 5.72 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_tiny-fa7bf894.ckpt) |
| coat_lite_mini | D910x8-G | 78.51 | 93.84 | 11.01 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_mini_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_mini-55a52f05.ckpt) |
| coat_tiny | D910x8-G | 79.67 | 94.88 | 5.50 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_tiny-071cb792.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
- Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
## References
[1] Han D, Yun S, Heo B, et al. Rethinking channel dimensions for efficient model design[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2021: 732-741.

View File

@ -0,0 +1,59 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m7-mstd0.5-inc1'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.4
# model
model: 'coat_lite_mini'
num_classes: 1000
pretrained: False
ckpt_path: ''
ckpt_save_policy: 'top_k'
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt/'
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0017
min_lr: 0.000005
warmup_epochs: 20
decay_epochs: 280
epoch_size: 1200
num_cycles: 4
cycle_decay: 1.0
# optimizer
opt: 'adamw'
weight_decay: 0.025
filter_bias_and_bn: True
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,59 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet/'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m7-mstd0.5-inc1'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.4
# model
model: 'coat_lite_tiny'
num_classes: 1000
pretrained: False
ckpt_path: ''
ckpt_save_policy: 'top_k'
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt/'
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0008
min_lr: 0.000005
warmup_epochs: 20
decay_epochs: 280
epoch_size: 900
num_cycles: 3
cycle_decay: 1.0
# optimizer
opt: 'adamw'
weight_decay: 0.025
filter_bias_and_bn: True
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,63 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
val_interval: 1
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet/'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m7-mstd0.5-inc1'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.4
# model config
model: 'coat_tiny'
drop_rate: 0.0
drop_path_rate: 0.0
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_policy: 'top_k'
ckpt_save_dir: './ckpt/'
dataset_sink_mode: True
amp_level: 'O2'
ema: True
ema_decay: 0.9995
# loss config
loss: 'CE'
label_smoothing: 0.1
# lr scheduler config
scheduler: 'warmup_cosine_decay'
lr: 0.00025
min_lr: 0.000001
warmup_epochs: 20
decay_epochs: 280
epoch_size: 300
# optimizer config
opt: 'lion'
weight_decay: 0.15
filter_bias_and_bn: True
loss_scale: 4096
use_nesterov: False
loss_scale_type: dynamic

97
configs/convit/README.md Normal file
View File

@ -0,0 +1,97 @@
# ConViT
> [ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases](https://arxiv.org/abs/2103.10697)
## Introduction
ConViT combines the strengths of convolutional architectures and Vision Transformers (ViTs).
ConViT introduces gated positional self-attention (GPSA), a form of positional self-attention
that can be equipped with a “soft” convolutional inductive bias.
ConViT initializes the GPSA layers to mimic the locality of convolutional layers,
then gives each attention head the freedom to escape locality by adjusting a gating parameter
regulating the attention paid to position versus content information.
ConViT, outperforms the DeiT (Touvron et al., 2020) on ImageNet,
while offering a much improved sample efficiency.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/52945530/210045403-721c9697-fe7e-429a-bd38-ba244fc8bd1b.png" width=400 />
</p>
<p align="center">
<em>Figure 1. Architecture of ConViT [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| convit_tiny | D910x8-G | 73.66 | 91.72 | 5.71 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny-e31023f2.ckpt) |
| convit_tiny_plus | D910x8-G | 77.00 | 93.60 | 9.97 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny_plus-e9d7fb92.ckpt) |
| convit_small | D910x8-G | 81.63 | 95.59 | 27.78 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small-ba858604.ckpt) |
| convit_small_plus | D910x8-G | 81.80 | 95.42 | 48.98 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small_plus-2352b9f7.ckpt) |
| convit_base | D910x8-G | 82.10 | 95.52 | 86.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base-c61b808c.ckpt) |
| convit_base_plus | D910x8-G | 81.96 | 95.04 | 153.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base_plus-5c61c9ce.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format should follow GB/T 7714. -->
[1] dAscoli S, Touvron H, Leavitt M L, et al. Convit: Improving vision transformers with soft convolutional inductive biases[C]//International Conference on Machine Learning. PMLR, 2021: 2286-2296.

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'autoaug-mstd0.5'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.93
color_jitter: 0.4
# model
model: 'convit_base'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.0007
min_lr: 0.000001
warmup_epochs: 40
decay_epochs: 260
# optimizer
opt: 'adamw'
weight_decay: 0.1
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'autoaug-mstd0.5'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.925
color_jitter: 0.4
# model
model: 'convit_base_plus'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.0007
min_lr: 0.000001
warmup_epochs: 40
decay_epochs: 260
# optimizer
opt: 'adamw'
weight_decay: 0.1
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 192
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'autoaug-mstd0.5'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.915
color_jitter: 0.4
# model
model: 'convit_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.0007
min_lr: 0.000001
warmup_epochs: 40
decay_epochs: 260
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 192
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'autoaug-mstd0.5'
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.93
color_jitter: 0.4
# model
model: 'convit_small_plus'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.0007
min_lr: 0.000001
warmup_epochs: 40
decay_epochs: 260
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,54 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
re_prob: 0.25
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.875
color_jitter: 0.4
# model
model: 'convit_tiny'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.00072
min_lr: 0.000001
warmup_epochs: 5
decay_epochs: 295
# optimizer
opt: 'adamw'
weight_decay: 0.0001
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
re_prob: 0.25
mixup: 0.8
crop_pct: 0.875
color_jitter: [0.4, 0.4, 0.4]
# model
model: 'convit_tiny'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0005
min_lr: 0.00001
warmup_epochs: 10
decay_epochs: 200
# optimizer
opt: 'adamw'
weight_decay: 0.025
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,54 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
re_prob: 0.25
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.87
color_jitter: 0.4
# model
model: 'convit_tiny_plus'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.00072
min_lr: 0.000001
warmup_epochs: 40
decay_epochs: 260
# optimizer
opt: 'adamw'
weight_decay: 0.0001
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,91 @@
# ConvNeXt
> [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)
## Introduction
In this work, the authors reexamine the design spaces and test the limits of what a pure ConvNet can achieve.
The authors gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key
components that contribute to the performance difference along the way. The outcome of this exploration is a family of
pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably
with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy, while maintaining the
simplicity and efficiency of standard ConvNets.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/223907142-3bf6acfb-080a-49f5-b021-233e003318c3.png" width=250 />
</p>
<p align="center">
<em>Figure 1. Architecture of ConvNeXt [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|----------------|-----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| ConvNeXt_tiny | D910x64-G | 81.91 | 95.79 | 28.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_tiny-ae5ff8d7.ckpt) |
| ConvNeXt_small | D910x64-G | 83.40 | 96.36 | 50.22 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_small-e23008f3.ckpt) |
| ConvNeXt_base | D910x64-G | 83.32 | 96.24 | 88.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_base-ee3544b8.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 11976-11986.

View File

@ -0,0 +1,58 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 16
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: 'random'
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
crop_pct: 0.95
mixup: 0.8
cutmix: 1.0
# model
model: 'convnext_base'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
drop_path_rate: 0.5
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.002
min_lr: 0.0000003
decay_epochs: 430
warmup_factor: 0.0000175
warmup_epochs: 20
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: 'auto'
use_nesterov: False

View File

@ -0,0 +1,58 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 16
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: 'random'
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
crop_pct: 0.95
mixup: 0.8
cutmix: 1.0
# model
model: 'convnext_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
drop_path_rate: 0.4
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.002
min_lr: 0.0000003
decay_epochs: 430
warmup_factor: 0.0000175
warmup_epochs: 20
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: 'auto'
use_nesterov: False

View File

@ -0,0 +1,59 @@
# system
mode: 0
distribute: True
num_parallel_workers: 16
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 16
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: 'random'
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
crop_pct: 0.95
mixup: 0.8
cutmix: 1.0
# model
model: 'convnext_tiny'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
drop_path_rate: 0.1
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.002
min_lr: 0.0000003
decay_epochs: 430
warmup_factor: 0.0000175
warmup_epochs: 20
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: 'dynamic'
drop_overflow_update: True
use_nesterov: False

View File

@ -0,0 +1,90 @@
# Crossvit
> [CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification](https://arxiv.org/abs/2103.14899)
## Introduction
CrossViT is a type of vision transformer that uses a dual-branch architecture to extract multi-scale feature representations for image classification. The architecture combines image patches (i.e. tokens in a transformer) of different sizes to produce stronger visual features for image classification. It processes small and large patch tokens with two separate branches of different computational complexities and these tokens are fused together multiple times to complement each other.
Fusion is achieved by an efficient cross-attention module, in which each transformer branch creates a non-patch token as an agent to exchange information with the other branch by attention. This allows for linear-time generation of the attention map in fusion instead of quadratic time otherwise.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/52945530/223635248-5871596d-43f2-44ee-b8be-1e7927ade243.jpg" width=400 />
</p>
<p align="center">
<em>Figure 1. Architecture of CrossViT [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| crossvit_9 | D910x8-G | 73.56 | 91.79 | 8.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_9_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_9-e74c8e18.ckpt) |
| crossvit_15 | D910x8-G | 81.08 | 95.33 | 27.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_15_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_15-eaa43c02.ckpt) |
| crossvit_18 | D910x8-G | 81.93 | 95.75 | 43.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_18-ca0a2e43.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/crossvit/crossvit15_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format should follow GB/T 7714. -->
[1] Chun-Fu Chen, Quanfu Fan, Rameswar Panda. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

View File

@ -0,0 +1,65 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 320
drop_remainder: True
# augmentation
image_resize: 224
scale: [ 0.08, 1.0 ]
ratio: [ 0.75, 1.333 ]
hflip: 0.5
vflip: 0.
interpolation: 'bicubic'
auto_augment: randaug-m9-mstd0.5-inc1
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
color_jitter: 0.4
crop_pct: 0.935
ema: True
# model
model: 'crossvit15'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 30
ckpt_save_dir: './ckpt'
epoch_size: 600
dataset_sink_mode: True
amp_level: 'O3'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.0009
min_lr: 0.00001
warmup_epochs: 30
decay_epochs: 270
decay_rate: 0.1
num_cycles: 2
cycle_decay: 1
# optimizer
opt: 'adamw'
weight_decay: 0.05
filter_bias_and_bn: True
loss_scale: 512
use_nesterov: False
eps: 1e-8
# Scheduler parameters
lr_epoch_stair: True

View File

@ -0,0 +1,63 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [ 0.08, 1.0 ]
ratio: [ 0.75, 1.333 ]
hflip: 0.5
vflip: 0.
interpolation: 'bicubic'
auto_augment: randaug-m9-mstd0.5-inc1
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
color_jitter: 0.4
crop_pct: 0.935
ema: True
# model
model: 'crossvit18'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O3'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.004
min_lr: 0.00001
warmup_epochs: 30
decay_epochs: 270
decay_rate: 0.1
# optimizer
opt: 'adamw'
weight_decay: 0.05
filter_bias_and_bn: True
loss_scale: 1024
use_nesterov: False
eps: 1e-8
# Scheduler parameters
lr_epoch_stair: True

View File

@ -0,0 +1,63 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 240
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
color_jitter: 0.4
crop_pct: 0.935
# model
model: 'crossvit9'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0011
min_lr: 0.00001
warmup_epochs: 30
decay_epochs: 270
decay_rate: 0.1
# optimizer
opt: 'adamw'
weight_decay: 0.05
filter_bias_and_bn: True
loss_scale_type: 'dynamic'
drop_overflow_update: True
use_nesterov: False
eps: 1e-8
# Scheduler parameters
lr_epoch_stair: True

107
configs/densenet/README.md Normal file
View File

@ -0,0 +1,107 @@
# DenseNet
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
> [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
## Introduction
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and more efficient to train if
they contain shorter connections between layers close to the input and those close to the output. Dense Convolutional
Network (DenseNet) is introduced based on this observation, which connects each layer to every other layer in a
feed-forward fashion. Whereas traditional convolutional networks with $L$ layers have $L$ connections-one between each
layer and its subsequent layer, DenseNet has $\frac{L(L+1)}{2}$ direct connections. For each layer, the feature maps
of all preceding layers are used as inputs, and their feature maps are used as inputs into all subsequent layers.
DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature
propagation, encourage feature reuse, and substantially reduce the number of parameters.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/52945530/210045537-7eda82c7-4575-4820-ba94-8fcab11c6482.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of DenseNet [<a href="#references">1</a>] </em>
</p>
## Results
<!--- Guideline:
Table Format:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
- Download: url of the pretrained model weights. Use absolute url path.
-->
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) |
| densenet_161 | D910x8-G | 79.09 | 94.66 | 28.90 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_161_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet161-120_5004_Ascend.ckpt) |
| densenet_169 | D910x8-G | 77.26 | 93.71 | 14.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_169_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet169-120_5004_Ascend.ckpt) |
| densenet_201 | D910x8-G | 78.14 | 94.08 | 20.24 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_201_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet201-120_5004_Ascend.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
[1] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet121'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet121'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet161'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet161'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet169'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet169'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet201'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
val_split: val
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'densenet201'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 120
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.1
warmup_epochs: 0
decay_epochs: 120
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

102
configs/dpn/README.md Normal file
View File

@ -0,0 +1,102 @@
# Dual Path Networks (DPN)
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
> [Dual Path Networks](https://arxiv.org/abs/1707.01629v2)
## Introduction
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
Figure 1 shows the model architecture of ResNet, DenseNet and Dual Path Networks. By combining the feature reusage of ResNet and new feature introduction of DenseNet,
DPN could enjoy both benefits so that it could share common features and maintain the flexibility to explore new features. As a result, DPN could achieve better performance with
fewer computation cost compared with ResNet and DenseNet on ImageNet-1K dataset.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/77485245/219323700-62029af1-e034-4bf4-8c87-d0c48a5e04b9.jpeg" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of DPN [<a href="#references">1</a>] </em>
</p>
## Results
<!--- Guideline:
Table Format:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
- Download: url of the pretrained model weights. Use absolute url path.
-->
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| dpn92 | D910x8-G | 79.46 | 94.49 | 37.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn92_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn92-e3e0fca.ckpt) |
| dpn98 | D910x8-G | 79.94 | 94.57 | 61.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn98_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn98-119a8207.ckpt) |
| dpn107 | D910x8-G | 80.05 | 94.74 | 87.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn107_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn107-7d7df07b.ckpt) |
| dpn131 | D910x8-G | 80.07 | 94.72 | 79.48 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn131_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn131-47f084b3.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distrubted training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
[1] Chen Y, Li J, Xiao H, et al. Dual path networks[J]. Advances in neural information processing systems, 2017, 30.

View File

@ -0,0 +1,52 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: 'path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.0
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'dpn107'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 0.05
warmup_epochs: 0
decay_epochs: 200
# optimizer
opt: 'SGD'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,52 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: 'path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.0
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'dpn131'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 0.05
warmup_epochs: 0
decay_epochs: 200
# optimizer
opt: 'SGD'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,52 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: 'path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.0
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'dpn92'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 30
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 0.05
warmup_epochs: 0
decay_epochs: 200
# optimizer
opt: 'SGD'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,52 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: 'path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.0
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'dpn98'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 0.05
warmup_epochs: 0
decay_epochs: 200
# optimizer
opt: 'SGD'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.0001
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,94 @@
# EdgeNeXt
> [EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications](https://arxiv.org/abs/2206.10589)
## Introduction
EdgeNeXt effectively combines the strengths of both CNN and Transformer models and is a
new efficient hybrid architecture. EdgeNeXt introduces a split depth-wise transpose
attention (SDTA) encoder that splits input tensors into multiple channel groups and
utilizes depth-wise convolution along with self-attention across channel dimensions
to implicitly increase the receptive field and encode multi-scale features.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/52945530/210045582-d31f832d-22e0-47bd-927f-74cf2daed91a.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of EdgeNeXt [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|----------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| edgenext_xx_small | D910x8-G | 71.02 | 89.99 | 1.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_xx_small-afc971fb.ckpt) |
| edgenext_x_small | D910x8-G | 75.14 | 92.50 | 2.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_x_small-a200c6fc.ckpt) |
| edgenext_small | D910x8-G | 79.15 | 94.39 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_small-f530c372.ckpt) |
| edgenext_base | D910x8-G | 82.24 | 95.94 | 18.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_base-4335e9dc.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format should follow GB/T 7714. -->
[1] Maaz M, Shaker A, Cholakkal H, et al. EdgeNeXt: efficiently amalgamated CNN-transformer architecture for Mobile vision applications[J]. arXiv preprint arXiv:2206.10589, 2022.

View File

@ -0,0 +1,63 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
val_split: val
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
crop_pct: 0.875
color_jitter: 0.4
re_prob: 0.0
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
auto_augment: 'randaug-m9-mstd0.5-inc1'
ema: True
ema_decay: 0.9995
# model
model: 'edgenext_base'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
val_interval: 2
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 4.5e-3
warmup_epochs: 20
decay_rate: 0.1
decay_epochs: 330
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,62 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
val_split: val
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
crop_pct: 0.875
color_jitter: 0.4
re_prob: 0.0
cutmix: 1.0
cutmix_prob: 0.0
auto_augment: 'randaug-m9-mstd0.5-inc1'
ema: True
ema_decay: 0.9995
# model
model: 'edgenext_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
val_interval: 2
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O3'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 4e-3
warmup_epochs: 20
decay_rate: 0.1
decay_epochs: 330
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,62 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
val_split: val
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
crop_pct: 0.875
color_jitter: 0.4
re_prob: 0.0
cutmix: 1.0
cutmix_prob: 0.0
auto_augment: 'randaug-m9-mstd0.5-inc1'
ema: True
ema_decay: 0.9995
# model
model: 'edgenext_x_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
val_interval: 2
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O3'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 4e-3
warmup_epochs: 20
decay_rate: 0.1
decay_epochs: 330
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,61 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
val_split: val
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
crop_pct: 0.875
color_jitter: 0.4
re_prob: 0.0
cutmix: 1.0
cutmix_prob: 0.0
ema: True
ema_decay: 0.9995
# model
model: 'edgenext_xx_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
val_interval: 2
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.0
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-6
lr: 4e-3
warmup_epochs: 20
decay_rate: 0.1
decay_epochs: 330
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,100 @@
# EfficientNet
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
> [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946)
## Introduction
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
Figure 1 shows the methods from three dimensions -- width, depth, resolution and compound to expand the model. Increasing the model
size solely would cause the model performance to sub-optimal solution. Howerver, if three methods could be applied together into the model
, it is more likely to achieve optimal solution. By using neural architecture search, the best configurations for width scaling, depth scaling
and resolution scaling could be found. EfficientNet could achieve better model performance on ImageNet-1K dataset compared with previous methods.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/77485245/225044036-d0344404-e86c-483c-971f-863ebe6decc6.jpeg" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of DPN [<a href="#references">1</a>] </em>
</p>
## Results
<!--- Guideline:
Table Format:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
- Download: url of the pretrained model weights. Use absolute url path.
-->
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------------|-----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| efficientnet_b0 | D910x64-G | 76.95 | 93.16 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/efficientnet/efficientnet_b0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/efficientnet/efficientnet_b0-103ec70c.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 64 python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
[1] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
crop_pct: 0.875
color_jitter: [0.4, 0.4, 0.4]
auto_augment: 'autoaug'
# model
model: 'efficientnet_b0'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.128
warmup_epochs: 5
decay_epochs: 445
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale_type: 'dynamic'
drop_overflow_update: True
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,89 @@
# GoogLeNet
> [GoogLeNet: Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842)
## Introduction
GoogLeNet is a new deep learning structure proposed by Christian Szegedy in 2014. Prior to this, AlexNet, VGG and other
structures achieved better training effects by increasing the depth (number of layers) of the network, but the increase
in the number of layers It will bring many negative effects, such as overfit, gradient disappearance, gradient
explosion, etc. The proposal of inception improves the training results from another perspective: it can use computing
resources more efficiently, and can extract more features under the same amount of computing, thereby improving the
training results.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210749903-5ff23c0e-547f-487d-bb64-70b6e99031ea.jpg" width=180 />
</p>
<p align="center">
<em>Figure 1. Architecture of GoogLENet [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
| GoogLeNet | D910x8-G | 72.68 | 90.89 | 6.99 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/googlenet/googlenet_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/googlenet/googlenet-5552fcd3.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'googlenet'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 150
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.045
min_lr: 0.0
decay_epochs: 145
warmup_epochs: 5
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

100
configs/hrnet/README.md Normal file
View File

@ -0,0 +1,100 @@
# HRNet
<!--- Guideline: use url linked to abstract in ArXiv instead of PDF for fast loading. -->
> [Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919)
## Introduction
<!--- Guideline: Introduce the model and architectures. Cite if you use/adopt paper explanation from others. -->
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, the proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. It shows the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems.
<!--- Guideline: If an architecture table/figure is available in the paper, put one here and cite for intuitive illustration. -->
<p align="center">
<img src="https://user-images.githubusercontent.com/8342575/218354682-4256e17e-bb69-4e51-8bb9-a08fc29087c4.png" width=800 />
</p>
<p align="center">
<em> Figure 1. Architecture of HRNet [<a href="#references">1</a>] </em>
</p>
## Results
<!--- Guideline:
Table Format:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
- Download: url of the pretrained model weights. Use absolute url path.
-->
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| hrnet_w32 | D910x8-G | 80.64 | 95.44 | 41.30 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w32_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w32-cc4fbd91.ckpt) |
| hrnet_w48 | D910x8-G | 81.19 | 95.69 | 77.57 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w48_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w48-2e3399cd.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
<!--- Guideline: Avoid using shell script in the command line. Python script preferred. -->
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
[1] Jingdong Wang, Ke Sun, Tianheng Cheng, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. arXiv preprint arXiv:1908.07919, 2019.

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
val_interval: 1
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bilinear"
auto_augment: "randaug-m7-mstd0.5"
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
# model
model: "hrnet_w32"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 5
ckpt_save_policy: "top_k"
ckpt_save_dir: "./ckpt"
epoch_size: 300
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "CE"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
min_lr: 0.00001
lr: 0.001
warmup_epochs: 20
decay_epochs: 280
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
val_interval: 1
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bilinear"
auto_augment: "randaug-m7-mstd0.5"
re_prob: 0.1
mixup: 0.2
cutmix: 1.0
cutmix_prob: 1.0
# model
model: "hrnet_w48"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 5
ckpt_save_policy: "top_k"
ckpt_save_dir: "./ckpt"
epoch_size: 300
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "CE"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
min_lr: 0.00001
lr: 0.001
warmup_epochs: 20
decay_epochs: 280
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True

View File

@ -0,0 +1,90 @@
# InceptionV3
> [InceptionV3: Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/pdf/1512.00567.pdf)
## Introduction
InceptionV3 is an upgraded version of GoogleNet. One of the most important improvements of V3 is Factorization, which
decomposes 7x7 into two one-dimensional convolutions (1x7, 7x1), and 3x3 is the same (1x3, 3x1), such benefits, both It
can accelerate the calculation (excess computing power can be used to deepen the network), and can split 1 conv into 2
convs, which further increases the network depth and increases the nonlinearity of the network. It is also worth noting
that the network input from 224x224 has become 299x299, and 35x35/17x17/8x8 modules are designed more precisely. In
addition, V3 also adds batch normalization, which makes the model converge more quickly, which plays a role in partial
regularization and effectively reduces overfitting.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210745725-736bd456-4d31-4f48-b958-75a53cf30e99.jpg" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of InceptionV3 [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| Inception_v3 | D910x8-G | 79.11 | 94.40 | 27.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v3/inception_v3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v3/inception_v3-38f67890.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.

View File

@ -0,0 +1,54 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 299
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
auto_augment: 'autoaug'
re_prob: 0.25
crop_pct: 0.875
# model
model: 'inception_v3'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O0'
aux_factor: 0.1
# loss config
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.045
min_lr: 0.0
decay_epochs: 195
warmup_epochs: 5
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,87 @@
# InceptionV4
> [InceptionV4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/pdf/1602.07261.pdf)
## Introduction
InceptionV4 studies whether the Inception module combined with Residual Connection can be improved. It is found that the
structure of ResNet can greatly accelerate the training, and the performance is also improved. An Inception-ResNet v2
network is obtained, and a deeper and more optimized Inception v4 model is also designed, which can achieve comparable
performance with Inception-ResNet v2.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210749903-5ff23c0e-547f-487d-bb64-70b6e99031ea.jpg" width=500 />
</p>
<p align="center">
<em>Figure 1. Architecture of InceptionV4 [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| Inception_v4 | D910x8-G | 80.88 | 95.34 | 42.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v4/inception_v4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v4/inception_v4-db9c45b3.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.

View File

@ -0,0 +1,53 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 32
drop_remainder: True
# augmentation
image_resize: 299
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
auto_augment: 'autoaug'
re_prob: 0.25
crop_pct: 0.875
# model
model: 'inception_v4'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'warmup_cosine_decay'
lr: 0.045
min_lr: 0.0
decay_epochs: 195
warmup_epochs: 5
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

91
configs/mixnet/README.md Normal file
View File

@ -0,0 +1,91 @@
# MixNet
> [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595)
## Introduction
Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often
overlooked. In this paper, the authors systematically study the impact of different kernel sizes, and observe that
combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation,
the authors propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a
single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy
and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/219263295-75de649e-d38b-4b05-bd26-1c96896f7e83.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MixNet [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| MixNet_s | D910x8-G | 75.52 | 92.52 | 4.17 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_s-2a5ef3a3.ckpt) |
| MixNet_m | D910x8-G | 76.64 | 93.05 | 5.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_m_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_m-74cc4cb1.ckpt) |
| MixNet_l | D910x8-G | 78.73 | 94.31 | 7.38 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_l_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_l-978edf2b.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distrubted training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Tan M, Le Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv:1907.09595, 2019.

View File

@ -0,0 +1,57 @@
# system
mode: 0
distribute: True
num_parallel_workers: 16
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5-inc1"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.2
cutmix: 1.0
# model
model: "mixnet_l"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
dataset_sink_mode: True
amp_level: "O3"
# loss
loss: "CE"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.25
min_lr: 0.00001
decay_epochs: 580
warmup_epochs: 20
# optimizer
opt: "momentum"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00002
loss_scale_type: "dynamic"
drop_overflow_update: True
loss_scale: 16777216
use_nesterov: False

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.2
cutmix: 1.0
# model
model: "mixnet_m"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
dataset_sink_mode: True
amp_level: "O3"
# loss
loss: "CE"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.2
min_lr: 0.00001
decay_epochs: 585
warmup_epochs: 15
# optimizer
opt: "momentum"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00002
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.2
cutmix: 1.0
# model
model: "mixnet_s"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
dataset_sink_mode: True
amp_level: "O3"
# loss
loss: "CE"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.2
min_lr: 0.00001
decay_epochs: 585
warmup_epochs: 15
# optimizer
opt: "momentum"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00002
loss_scale: 256
use_nesterov: False

86
configs/mnasnet/README.md Normal file
View File

@ -0,0 +1,86 @@
# MnasNet
> [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626)
## Introduction
Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, the authors propose a novel factorized hierarchical search space that encourages layer diversity throughout the network.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210044057-35febc60-8d24-434a-a4f2-db8db3859e7a.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MnasNet [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------------|----------|-----------|-----------|------------|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| MnasNet-B1-0_75 | D910x8-G | 71.81 | 90.53 | 3.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_075-465d366d.ckpt) |
| MnasNet-B1-1_0 | D910x8-G | 74.28 | 91.70 | 4.42 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_100-1bcf43f8.ckpt) |
| MnasNet-B1-1_4 | D910x8-G | 76.01 | 92.83 | 7.16 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_140-7e20bb30.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Tan M, Chen B, Pang R, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2820-2828.

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet0_75'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.016
warmup_epochs: 5
decay_epochs: 345
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet0_75'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 350
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.012
warmup_epochs: 5
decay_epochs: 345
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet1_0'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.016
warmup_epochs: 5
decay_epochs: 445
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet1_0'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.012
warmup_epochs: 5
decay_epochs: 445
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet1_4'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 400
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.016
warmup_epochs: 5
decay_epochs: 395
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mnasnet1_4'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 400
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.008
warmup_epochs: 5
decay_epochs: 395
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale: 256
use_nesterov: False
eps: 1e-3

View File

@ -0,0 +1,87 @@
# MobileNetV1
> [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
## Introduction
Compared with the traditional convolutional neural network, MobileNetV1's parameters and the amount of computation are greatly reduced on the premise that the accuracy rate is slightly reduced. (Compared to VGG16, the accuracy rate is reduced by 0.9%, but the model parameters are only 1/32 of VGG). The model is based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. At the same time, two simple global hyperparameters are introduced, which can effectively trade off latency and accuracy.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210044118-20dcc78a-96cd-4a3f-9cd1-fefa00c12227.png" width=400 />
</p>
<p align="center">
<em>Figure 1. Architecture of MobileNetV1 [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| MobileNet_v1_025 | D910x8-G | 53.87 | 77.66 | 0.47 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_025-d3377fba.ckpt) |
| MobileNet_v1_050 | D910x8-G | 65.94 | 86.51 | 1.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_050-23e9ddbe.ckpt) |
| MobileNet_v1_075 | D910x8-G | 70.44 | 89.49 | 2.60 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_075-5bed0c73.ckpt) |
| MobileNet_v1_100 | D910x8-G | 72.95 | 91.01 | 4.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_100-91c7b206.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_025_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_025_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_050_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_050_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_075_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_075_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,50 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_100_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v1_100_224'
num_classes: 1001
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 80
ckpt_save_dir: './ckpt'
epoch_size: 200
dataset_sink_mode: True
amp_level: 'O2'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 2
decay_epochs: 198
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,88 @@
# MobileNetV2
> [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)
## Introduction
The model is a new neural network architecture that is specifically tailored for mobile and resource-constrained environments. This network pushes the state of the art for mobile custom computer vision models, significantly reducing the amount of operations and memory required while maintaining the same accuracy.
The main innovation of the model is the proposal of a new layer module: The Inverted Residual with Linear Bottleneck. The module takes as input a low-dimensional compressed representation that is first extended to high-dimensionality and then filtered with lightweight depth convolution. Linear convolution is then used to project the features back to the low-dimensional representation.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210044190-8b5aab08-75fe-4e2c-87cc-d3529d9c60cd.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MobileNetV2 [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| MobileNet_v2_075 | D910x8-G | 69.76 | 89.28 | 2.66 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_075-243f9404.ckpt) |
| MobileNet_v2_100 | D910x8-G | 72.02 | 90.92 | 3.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_100-52122156.ckpt) |
| MobileNet_v2_140 | D910x8-G | 74.97 | 92.32 | 6.15 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_140-015cfb04.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 32
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v2_075_224'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 50
ckpt_save_dir: './ckpt'
epoch_size: 400
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.3
warmup_epochs: 4
decay_epochs: 396
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00003
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 32
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v2_100_224'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 100
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 4
decay_epochs: 296
# optimizer
opt: 'momentum'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 32
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v2_140_224'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 50
ckpt_save_dir: './ckpt'
epoch_size: 300
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.4
warmup_epochs: 4
decay_epochs: 296
# optimizer
opt: 'momentum'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,87 @@
# MobileNetV3
> [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
## Introduction
MobileNet v3 was published in 2019, and this v3 version combines the deep separable convolution of v1, the Inverted Residuals and Linear Bottleneck of v2, and the SE module to search the configuration and parameters of the network using NAS (Neural Architecture Search).MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Afterwards, MobileNetV3 then fine-tunes the architecture using NetAdapt, which exemplifies NetAdapt's complementary capability to tune underutilized activation channels with a small drop.
mobilenet-v3 offers two versions, mobilenet-v3 large and mobilenet-v3 small, for situations with different resource requirements. The paper mentions that mobilenet-v3 small, for the imagenet classification task, has an accuracy The paper mentions that mobilenet-v3 small achieves about 3.2% better accuracy and 15% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves about 4.6% better accuracy and 5% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves the same accuracy and 25% faster speedup in COCO compared to v2 The improvement in the segmentation algorithm is also observed.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/210044297-d658ca54-e6ff-4c0f-8080-88072814d8e6.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MobileNetV3 [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| MobileNetV3_small_100 | D910x8-G | 67.81 | 87.82 | 2.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_small_100-c884b105.ckpt) |
| MobileNetV3_large_100 | D910x8-G | 75.14 | 92.33 | 5.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_100-6f5bf961.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.

View File

@ -0,0 +1,51 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 75
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v3_large_100'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 30
ckpt_save_dir: './ckpt'
epoch_size: 420
dataset_sink_mode: True
amp_level: 'O0'
# loss config
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 1.08
warmup_epochs: 4
decay_epochs: 416
# optimizer
opt: 'momentum'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,52 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 75
drop_remainder: True
train_split: 'train'
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
color_jitter: 0.4
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'mobilenet_v3_small_100'
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 30
ckpt_save_dir: './ckpt'
epoch_size: 470
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.0
lr: 0.77
warmup_epochs: 4
decay_epochs: 466
# optimizer
opt: 'momentum'
filter_bias_and_bn: False
momentum: 0.9
weight_decay: 0.00004
loss_scale: 1024
use_nesterov: False

View File

@ -0,0 +1,81 @@
# MobileViT
> [MobileViTLight-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/pdf/2110.02178.pdf)
## Introduction
MobileViT, a light-weight and general-purpose vision transformer for mobile devices. MobileViT presents a different perspective for the global processing of information with transformers, i.e., transformers as convolutions. MobileViT significantly outperforms CNN- and ViT-based networks across different tasks and datasets. On the ImageNet-1k dataset, MobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters, which is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT (ViT-based) for a similar number of parameters. On the MS-COCO object detection task, MobileViT is 5.7% more accurate than MobileNetv3 for a similar number of parameters.
<p align="center">
<img src="https://user-images.githubusercontent.com/64628185/229476902-1b97496a-4a38-40ca-9e50-a88c52defcbb.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MobileViT [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| mobilevit_xx_small | D910x8-G | 68.90 | 88.92 | 1.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_xx_small-af9da8a0.ckpt) |
| mobilevit_x_small | D910x8-G | 74.98 | 92.33 | 2.32 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_x_small-673fc6f2.ckpt) |
| mobilevit_small | D910x8-G | 78.48 | 94.18 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_small-caf79638.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
color_jitter: [0.4, 0.4, 0.4]
re_prob: 1.0
# model
model: 'mobilevit_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O3'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.000002
lr: 0.002
warmup_epochs: 20
decay_epochs: 430
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.01
use_nesterov: False
loss_scale_type: 'dynamic'
drop_overflow_update: True
loss_scale: 1024

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
color_jitter: [0.4, 0.4, 0.4]
re_prob: 1.0
# model
model: 'mobilevit_x_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O3'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.000002
lr: 0.002
warmup_epochs: 20
decay_epochs: 430
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.01
use_nesterov: False
loss_scale_type: 'dynamic'
drop_overflow_update: True
loss_scale: 1024

View File

@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 64
drop_remainder: True
# augmentation
image_resize: 256
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
color_jitter: [0.4, 0.4, 0.4]
re_prob: 1.0
# model
model: 'mobilevit_xx_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O3'
# loss
loss: 'ce'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 0.000002
lr: 0.002
warmup_epochs: 40
decay_epochs: 410
# optimizer
opt: 'adamw'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.01
use_nesterov: False
loss_scale_type: 'dynamic'
drop_overflow_update: True
loss_scale: 1024

100
configs/nasnet/README.md Normal file
View File

@ -0,0 +1,100 @@
# NasNet
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
> [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012)
## Introduction
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
Neural architecture search (NAS) shows the flexibility on model configuration. By doing neural architecture search in a pooling with convolution layer, max pooling and average pooling layer,
the normal cell and the reduction cell are selected to be part of NasNet. Figure 1 shows NasNet architecture for ImageNet, which are stacked with reduction cell and normal cell.
In conclusion, NasNet could achieve better model performance with fewer model parametes and fewer computation cost on image classification
compared with previous state-of-the-art methods on ImageNet-1K dataset.[[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/77485245/224208085-0d6e6b91-873d-49cb-ad54-23ea12483d8f.jpeg" width=200 height=400/>
</p>
<p align="center">
<em>Figure 1. Architecture of Nasnet [<a href="#references">1</a>] </em>
</p>
## Results
<!--- Guideline:
Table Format:
- Model: model name in lower case with _ seperator.
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Keep 2 digits after the decimal point.
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
- Download: url of the pretrained model weights. Use absolute url path.
-->
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|-----------------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| nasnet_a_4x1056 | D910x8-G | 73.65 | 91.25 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/nasnet/nasnet_a_4x1056_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/nasnet/nasnet_a_4x1056-0fbb5cdd.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```
python validate.py -c configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
[1] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8697-8710.

View File

@ -0,0 +1,53 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 256
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bilinear'
crop_pct: 0.875
# model
model: 'nasnet_a_4x1056'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O0'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
min_lr: 1e-10
lr: 0.016
warmup_epochs: 5
decay_epochs: 445
# optimizer
opt: 'rmsprop'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 1e-5
loss_scale_type: 'dynamic'
drop_overflow_update: True
use_nesterov: False
eps: 1e-3

89
configs/pit/README.md Normal file
View File

@ -0,0 +1,89 @@
# PiT
> [PiT: Rethinking Spatial Dimensions of Vision Transformers](https://arxiv.org/abs/2103.16302v2)
## Introduction
PiT (Pooling-based Vision Transformer) is an improvement of Vision Transformer (ViT) model proposed by Byeongho Heo in 2021. PiT adds pooling layer on the basis of ViT model, so that the spatial dimension of each layer is reduced like CNN, instead of ViT using the same spatial dimension for all layers. PiT achieves the improved model capability and generalization performance against ViT. [[1](#references)]
<p align="center">
<img src="https://user-images.githubusercontent.com/37565353/215304821-efaf99ad-12ba-4020-90a3-5897247f9368.png" width=400 />
</p>
<p align="center">
<em>Figure 1. Architecture of PiT [<a href="#references">1</a>] </em>
</p>
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|--------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| PiT_ti | D910x8-G | 72.96 | 91.33 | 4.85 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_ti_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_ti-e647a593.ckpt) |
| PiT_xs | D910x8-G | 78.41 | 94.06 | 10.61 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_xs_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_xs-fea0d37e.ckpt) |
| PiT_s | D910x8-G | 80.56 | 94.80 | 23.46 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_s-3c1ba36f.ckpt) |
| PiT_b | D910x8-G | 81.87 | 95.04 | 73.76 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_b_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_b-2411c9b6.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
* Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/dataset --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py -c configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
## References
[1] Heo B, Yun S, Han D, et al. Rethinking spatial dimensions of vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11936-11945.

View File

@ -0,0 +1,60 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: "random"
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5-inc1"
re_prob: 0.25
crop_pct: 0.9
mixup: 0.8
cutmix: 1.0
aug_repeats: 3
# model
model: "pit_b"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
drop_path_rate: 0.2
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "ce"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.001
min_lr: 0.00001
lr_epoch_stair: True
decay_epochs: 590
warmup_epochs: 10
warmup_factor: 0.002
# optimizer
opt: "adamw"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: "auto"
use_nesterov: False

View File

@ -0,0 +1,61 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: "random"
hflip: 0.5
interpolation: "bicubic"
color_jitter: 0.3
auto_augment: "randaug-m9-mstd0.5-inc1"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.8
cutmix: 1.0
# model
model: "pit_s"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
drop_path_rate: 0.1
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "ce"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.002
min_lr: 0.00001
lr_epoch_stair: True
decay_epochs: 590
warmup_epochs: 10
warmup_factor: 0.002
# optimizer
opt: "adamw"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: "dynamic"
drop_overflow_update: True
use_nesterov: False

View File

@ -0,0 +1,57 @@
# system
mode: 0
distribute: True
num_parallel_workers: 16
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: "random"
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5-inc1"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.8
cutmix: 1.0
# model
model: "pit_ti"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 500
drop_path_rate: 0.1
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "ce"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.002
min_lr: 0.00001
decay_epochs: 490
warmup_epochs: 10
# optimizer
opt: "adamw"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: "auto"
use_nesterov: False

View File

@ -0,0 +1,59 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: "random"
hflip: 0.5
interpolation: "bicubic"
color_jitter: [0.3, 0.3, 0.3]
auto_augment: "randaug-m9-mstd0.5-inc1"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.8
cutmix: 1.0
# model
model: "pit_xs"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
drop_path_rate: 0.1
dataset_sink_mode: True
amp_level: "O2"
# loss
loss: "ce"
label_smoothing: 0.1
# lr scheduler
scheduler: "cosine_decay"
lr: 0.001
min_lr: 0.00001
decay_epochs: 590
warmup_epochs: 10
# optimizer
opt: "adamw"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: "dynamic"
drop_overflow_update: True
use_nesterov: False

View File

@ -0,0 +1,83 @@
# PoolFormer
> [MetaFormer Is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)
## Introduction
Instead of designing complicated token mixer to achieve SOTA performance, the target of this work is to demonstrate the competence of Transformer models largely stem from the general architecture MetaFormer. Pooling/PoolFormer are just the tools to support the authors' claim.
![MetaFormer](https://user-images.githubusercontent.com/74176172/210046827-c218f5d3-1ee8-47bf-a78a-482d821ece89.png)
Figure 1. MetaFormer and performance of MetaFormer-based models on ImageNet-1K validation set. The authors argue that the competence of Transformer/MLP-like models primarily stem from the general architecture MetaFormer instead of the equipped specific token mixers. To demonstrate this, they exploit an embarrassingly simple non-parametric operator, pooling, to conduct extremely basic token mixing. Surprisingly, the resulted model PoolFormer consistently outperforms the DeiT and ResMLP as shown in (b), which well supports that MetaFormer is actually what we need to achieve competitive performance. RSB-ResNet in (b) means the results are from “ResNet Strikes Back” where ResNet is trained with improved training procedure for 300 epochs.
![PoolFormer](https://user-images.githubusercontent.com/74176172/210046845-6caa1574-b6a4-47f3-8298-c8ca3b4f8fa4.png)
Figure 2. (a) The overall framework of PoolFormer. (b) The architecture of PoolFormer block. Compared with Transformer block, it replaces attention with an extremely simple non-parametric operator, pooling, to conduct only basic token mixing.[[1](#References)]
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|:--------------:|:--------:|:---------:|:---------:|:----------:|---------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| poolformer_s12 | D910x8-G | 77.33 | 93.34 | 11.92 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/poolformer/poolformer_s12_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/poolformer/poolformer_s12-5be5c4e4.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
- Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet --distribute False
```
### Validation
```
validation of poolformer has to be done in amp O3 mode which is not supported, coming soon...
```
### Deployment
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
## References
[1]. Yu W, Luo M, Zhou P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10819-10829.

View File

@ -0,0 +1,61 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
vflip: 0.0
interpolation: 'bilinear'
crop_pct: 0.9
color_jitter: [0.4, 0.4, 0.4]
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
cutmix_prob: 1.0
auto_augment: 'randaug-m9-mstd0.5-inc1'
# model
model: 'poolformer_s12'
drop_rate: 0.0
drop_path_rate: 0.1
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 600
dataset_sink_mode: True
amp_level: 'O3'
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0005
min_lr: 1e-06
warmup_epochs: 30
decay_epochs: 570
decay_rate: 0.1
# optimizer
opt: 'AdamW'
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale: 1024
use_nesterov: False

84
configs/pvt/README.md Normal file
View File

@ -0,0 +1,84 @@
# Pyramid Vision Transformer
> [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/abs/2102.12122)
## Introduction
PVT is a general backbone network for dense prediction without convolution operation. PVT introduces a pyramid structure in Transformer to generate multi-scale feature maps for dense prediction tasks. PVT uses a gradual reduction strategy to control the size of the feature maps through the patch embedding layer, and proposes a spatial reduction attention (SRA) layer to replace the traditional multi head attention layer in the encoder, which greatly reduces the computing/memory overhead.[[1](#References)]
![PVT](https://user-images.githubusercontent.com/74176172/210046926-2322161b-a963-4603-b3cb-86ecdca41262.png)
## Results
Our reproduced model performance on ImageNet-1K is reported as follows.
<div align="center">
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|:----------:|:--------:|:---------:|:---------:|:----------:|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| PVT_tiny | D910x8-G | 74.81 | 92.18 | 13.23 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_tiny-6abb953d.ckpt) |
| PVT_small | D910x8-G | 79.66 | 94.71 | 24.49 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_small-213c2ed1.ckpt) |
| PVT_medium | D910x8-G | 81.82 | 95.81 | 44.21 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_medium_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_medium-469e6802.ckpt) |
| PVT_large | D910x8-G | 81.75 | 95.70 | 61.36 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_large-bb6895d7.ckpt) |
</div>
#### Notes
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
## Quick Start
### Preparation
#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
### Training
- Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
* Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet --distribute False
```
### Validation
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
```shell
python validate.py --model=pvt_tiny --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```
### Deployment
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
## References
[1]. Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 568-578.

View File

@ -0,0 +1,56 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.0
# model
model: 'pvt_large'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 400
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.3
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.001
min_lr: 0.000001
warmup_epochs: 10
decay_epochs: 390
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 300
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,56 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.0
# model
model: 'pvt_medium'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 400
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.3
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.001
min_lr: 0.000001
warmup_epochs: 10
decay_epochs: 390
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,56 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.0
# model
model: 'pvt_small'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 500
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0005
min_lr: 0.000001
warmup_epochs: 10
decay_epochs: 390
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

View File

@ -0,0 +1,56 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
# dataset
dataset: 'imagenet'
data_dir: '/path/to/imagenet'
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True
# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: 'bicubic'
auto_augment: 'randaug-m9-mstd0.5-inc1'
re_prob: 0.25
mixup: 0.8
cutmix: 1.0
cutmix_prob: 1.0
crop_pct: 0.9
color_jitter: 0.0
# model
model: 'pvt_tiny'
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: './ckpt'
epoch_size: 450
dataset_sink_mode: True
amp_level: 'O2'
drop_path_rate: 0.1
# loss
loss: 'CE'
label_smoothing: 0.1
# lr scheduler
scheduler: 'cosine_decay'
lr: 0.0005
min_lr: 0.000001
warmup_epochs: 10
decay_epochs: 440
# optimizer
opt: 'adamw'
weight_decay: 0.05
loss_scale: 1024
filter_bias_and_bn: True
use_nesterov: False

Some files were not shown because too many files have changed in this diff Show More