update instructions of the challenge
This commit is contained in:
parent
3fa370b210
commit
6567b93c30
|
@ -0,0 +1,68 @@
|
||||||
|
### File Structure and Naming
|
||||||
|
This folder contains training recipes and model readme files for each model. The folder structure and naming rule of model configurations are as follows.
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
├── configs
|
||||||
|
├── model_a // model name in lower case with _ seperator
|
||||||
|
│ ├─ model_a_small_ascend.yaml // training recipe denated as {model_name}_{specification}_{hardware}.yaml
|
||||||
|
| ├─ model_a_large_gpu.yaml
|
||||||
|
│ ├─ README.md //readme file containing performance results and pretrained weight urls
|
||||||
|
│ └─ README_CN.md //readme file in Chinese
|
||||||
|
├── model_b
|
||||||
|
│ ├─ model_b_32_ascend.yaml
|
||||||
|
| ├─ model_l_16_ascend.yaml
|
||||||
|
│ ├─ README.md
|
||||||
|
│ └─ README_CN.md
|
||||||
|
├── README.md //this file
|
||||||
|
```
|
||||||
|
|
||||||
|
### Model Readme Writing Guideline
|
||||||
|
The model readme file in each sub-folder provides the introduction, reproduced results, and running guideline for each model.
|
||||||
|
|
||||||
|
Please follow the outline structure and **table format** shown in [densenet/README.md](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/README.md) when contributing your models :)
|
||||||
|
|
||||||
|
#### Table Format
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
|
||||||
|
| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
Illustration:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validatoin set of ImageNet-1K. Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep **2 digits** after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file.
|
||||||
|
- Download: url of the pretrained model weights
|
||||||
|
|
||||||
|
### Model Checkpoint Format
|
||||||
|
The checkpoint (i.e., model weight) name should follow this format: **{model_name}_{specification}-{sha256sum}.ckpt**, e.g., `poolformer_s12-5be5c4e4.ckpt`.
|
||||||
|
|
||||||
|
You can run the following command and take the first 8 characters of the computing result as the sha256sum value in the checkpoint name.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
sha256sum your_model.ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### Training Script Format
|
||||||
|
|
||||||
|
For consistency, it is recommended to provide distributed training commands based on `mpirun -n {num_devices} python train.py`, instead of using shell script such as `distrubuted_train.sh`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a gpu or ascend device
|
||||||
|
python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
|
||||||
|
# distributed training on gpu or ascend divices
|
||||||
|
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
#### URL and Hyperlink Format
|
||||||
|
Please use **absolute path** in the hyperlink or url for linking the target resource in the readme file and table.
|
|
@ -0,0 +1,91 @@
|
||||||
|
# BigTransfer
|
||||||
|
|
||||||
|
> [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision.
|
||||||
|
Big Transfer (BiT) can achieve strong performance on more than 20 data sets by combining some carefully selected components and using simple heuristic
|
||||||
|
methods for transmission. The components distilled by BiT for training models that transfer well are: 1) Big datasets: as the size of the dataset increases,
|
||||||
|
the optimal performance of the BIT model will also increase. 2) Big architectures: In order to make full use of large datasets, a large enough architecture
|
||||||
|
is required. 3) Long pre-training time: Pretraining on a larger dataset requires more training epoch and training time. 4) GroupNorm and Weight Standardisation:
|
||||||
|
BiT use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since BatchNorm performs worse when the number of images on each accelerator is
|
||||||
|
too low. 5) With BiT fine-tuning, good performance can be achieved even if there are only a few examples of each type on natural images.[[1, 2](#References)]
|
||||||
|
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params(M) | Recipe | Download |
|
||||||
|
|----------------| -------- |-----------|-----------|-----------|--------------------------------------------------------------------------------------------------| -------------------------------------------------------------------------- |
|
||||||
|
| bit_resnet50 | D910x8-G | 76.81 | 93.17 | 25.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50-1e4795a4.ckpt) |
|
||||||
|
| bit_resnet50x3 | D910x8-G | 80.63 | 95.12 | 217.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50x3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50x3-a960f91f.ckpt) |
|
||||||
|
| bit_resnet101 | D910x8-G | 77.93 | 93.75 | 44.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet101-2efa9106.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
<!--- Guideline: Citation format should follow GB/T 7714. -->
|
||||||
|
[1] Kolesnikov A, Beyer L, Zhai X, et al. Big transfer (bit): General visual representation learning[C]//European conference on computer vision. Springer, Cham, 2020: 491-507.
|
||||||
|
|
||||||
|
[2] BigTransfer (BiT): State-of-the-art transfer learning for computer vision, https://blog.tensorflow.org/2020/05/bigtransfer-bit-state-of-art-transfer-learning-computer-vision.html
|
|
@ -0,0 +1,47 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 16
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
hflip: 0.5
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'BiTresnet101'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 90
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'multi_step_decay'
|
||||||
|
lr: 0.06
|
||||||
|
decay_rate: 0.5
|
||||||
|
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
|
||||||
|
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'sgd'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,46 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
hflip: 0.5
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'BiTresnet50'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 90
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'multi_step_decay'
|
||||||
|
lr: 0.06
|
||||||
|
decay_rate: 0.5
|
||||||
|
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'sgd'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,49 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
hflip: 0.5
|
||||||
|
mixup: 0.2
|
||||||
|
crop_pct: 0.875
|
||||||
|
auto_augment: "randaug-m7-mstd0.5"
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'BiTresnet50x3'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 30
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 90
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler config
|
||||||
|
warmup_epochs: 1
|
||||||
|
scheduler: 'multi_step_decay'
|
||||||
|
lr: 0.09
|
||||||
|
decay_rate: 0.4
|
||||||
|
multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85]
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'sgd'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,80 @@
|
||||||
|
# CoaT
|
||||||
|
|
||||||
|
> [Co-Scale Conv-Attentional Image Transformers](https://arxiv.org/abs/2104.06399v2)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Co-Scale Conv-Attentional Image Transformer (CoaT) is a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other. Second, the conv-attentional mechanism is designed by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities.
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Weight |
|
||||||
|
|-----------------|-----------|-------|------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
|
||||||
|
| coat_lite_tiny | D910x8-G | 77.35 | 93.43 | 5.72 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_tiny-fa7bf894.ckpt) |
|
||||||
|
| coat_lite_mini | D910x8-G | 78.51 | 93.84 | 11.01 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_mini_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_mini-55a52f05.ckpt) |
|
||||||
|
| coat_tiny | D910x8-G | 79.67 | 94.88 | 5.50 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_tiny-071cb792.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
- Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Han D, Yun S, Heo B, et al. Rethinking channel dimensions for efficient model design[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2021: 732-741.
|
|
@ -0,0 +1,59 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m7-mstd0.5-inc1'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'coat_lite_mini'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
ckpt_save_policy: 'top_k'
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt/'
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0017
|
||||||
|
min_lr: 0.000005
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 280
|
||||||
|
epoch_size: 1200
|
||||||
|
num_cycles: 4
|
||||||
|
cycle_decay: 1.0
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.025
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,59 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet/'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m7-mstd0.5-inc1'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'coat_lite_tiny'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
ckpt_save_policy: 'top_k'
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt/'
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0008
|
||||||
|
min_lr: 0.000005
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 280
|
||||||
|
epoch_size: 900
|
||||||
|
num_cycles: 3
|
||||||
|
cycle_decay: 1.0
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.025
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,63 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
val_interval: 1
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet/'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m7-mstd0.5-inc1'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model config
|
||||||
|
model: 'coat_tiny'
|
||||||
|
drop_rate: 0.0
|
||||||
|
drop_path_rate: 0.0
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_policy: 'top_k'
|
||||||
|
ckpt_save_dir: './ckpt/'
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
ema: True
|
||||||
|
ema_decay: 0.9995
|
||||||
|
|
||||||
|
# loss config
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler config
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.00025
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 280
|
||||||
|
epoch_size: 300
|
||||||
|
|
||||||
|
# optimizer config
|
||||||
|
opt: 'lion'
|
||||||
|
weight_decay: 0.15
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale: 4096
|
||||||
|
use_nesterov: False
|
||||||
|
loss_scale_type: dynamic
|
|
@ -0,0 +1,97 @@
|
||||||
|
# ConViT
|
||||||
|
> [ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases](https://arxiv.org/abs/2103.10697)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
ConViT combines the strengths of convolutional architectures and Vision Transformers (ViTs).
|
||||||
|
ConViT introduces gated positional self-attention (GPSA), a form of positional self-attention
|
||||||
|
that can be equipped with a “soft” convolutional inductive bias.
|
||||||
|
ConViT initializes the GPSA layers to mimic the locality of convolutional layers,
|
||||||
|
then gives each attention head the freedom to escape locality by adjusting a gating parameter
|
||||||
|
regulating the attention paid to position versus content information.
|
||||||
|
ConViT, outperforms the DeiT (Touvron et al., 2020) on ImageNet,
|
||||||
|
while offering a much improved sample efficiency.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/52945530/210045403-721c9697-fe7e-429a-bd38-ba244fc8bd1b.png" width=400 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of ConViT [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
|
||||||
|
| convit_tiny | D910x8-G | 73.66 | 91.72 | 5.71 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny-e31023f2.ckpt) |
|
||||||
|
| convit_tiny_plus | D910x8-G | 77.00 | 93.60 | 9.97 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny_plus-e9d7fb92.ckpt) |
|
||||||
|
| convit_small | D910x8-G | 81.63 | 95.59 | 27.78 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small-ba858604.ckpt) |
|
||||||
|
| convit_small_plus | D910x8-G | 81.80 | 95.42 | 48.98 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small_plus-2352b9f7.ckpt) |
|
||||||
|
| convit_base | D910x8-G | 82.10 | 95.52 | 86.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base-c61b808c.ckpt) |
|
||||||
|
| convit_base_plus | D910x8-G | 81.96 | 95.04 | 153.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base_plus-5c61c9ce.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
<!--- Guideline: Citation format should follow GB/T 7714. -->
|
||||||
|
[1] d’Ascoli S, Touvron H, Leavitt M L, et al. Convit: Improving vision transformers with soft convolutional inductive biases[C]//International Conference on Machine Learning. PMLR, 2021: 2286-2296.
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'autoaug-mstd0.5'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.93
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_base'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.0007
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 260
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.1
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'autoaug-mstd0.5'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.925
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_base_plus'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.0007
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 260
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.1
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 192
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'autoaug-mstd0.5'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.915
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.0007
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 260
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 192
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'autoaug-mstd0.5'
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.93
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_small_plus'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.0007
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 260
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,54 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_tiny'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.00072
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 295
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_tiny'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0005
|
||||||
|
min_lr: 0.00001
|
||||||
|
warmup_epochs: 10
|
||||||
|
decay_epochs: 200
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.025
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,54 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.87
|
||||||
|
color_jitter: 0.4
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convit_tiny_plus'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.00072
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 260
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,91 @@
|
||||||
|
# ConvNeXt
|
||||||
|
> [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
In this work, the authors reexamine the design spaces and test the limits of what a pure ConvNet can achieve.
|
||||||
|
The authors gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key
|
||||||
|
components that contribute to the performance difference along the way. The outcome of this exploration is a family of
|
||||||
|
pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably
|
||||||
|
with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy, while maintaining the
|
||||||
|
simplicity and efficiency of standard ConvNets.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/223907142-3bf6acfb-080a-49f5-b021-233e003318c3.png" width=250 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of ConvNeXt [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|----------------|-----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
|
||||||
|
| ConvNeXt_tiny | D910x64-G | 81.91 | 95.79 | 28.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_tiny-ae5ff8d7.ckpt) |
|
||||||
|
| ConvNeXt_small | D910x64-G | 83.40 | 96.36 | 50.22 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_small-e23008f3.ckpt) |
|
||||||
|
| ConvNeXt_base | D910x64-G | 83.32 | 96.24 | 88.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_base-ee3544b8.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 11976-11986.
|
|
@ -0,0 +1,58 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 16
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: 'random'
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.95
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convnext_base'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
drop_path_rate: 0.5
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.002
|
||||||
|
min_lr: 0.0000003
|
||||||
|
decay_epochs: 430
|
||||||
|
warmup_factor: 0.0000175
|
||||||
|
warmup_epochs: 20
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: 'auto'
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,58 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 16
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: 'random'
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.95
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convnext_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
drop_path_rate: 0.4
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.002
|
||||||
|
min_lr: 0.0000003
|
||||||
|
decay_epochs: 430
|
||||||
|
warmup_factor: 0.0000175
|
||||||
|
warmup_epochs: 20
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: 'auto'
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,59 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 16
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 16
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: 'random'
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.95
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'convnext_tiny'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.002
|
||||||
|
min_lr: 0.0000003
|
||||||
|
decay_epochs: 430
|
||||||
|
warmup_factor: 0.0000175
|
||||||
|
warmup_epochs: 20
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,90 @@
|
||||||
|
# Crossvit
|
||||||
|
> [CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification](https://arxiv.org/abs/2103.14899)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
CrossViT is a type of vision transformer that uses a dual-branch architecture to extract multi-scale feature representations for image classification. The architecture combines image patches (i.e. tokens in a transformer) of different sizes to produce stronger visual features for image classification. It processes small and large patch tokens with two separate branches of different computational complexities and these tokens are fused together multiple times to complement each other.
|
||||||
|
|
||||||
|
Fusion is achieved by an efficient cross-attention module, in which each transformer branch creates a non-patch token as an agent to exchange information with the other branch by attention. This allows for linear-time generation of the attention map in fusion instead of quadratic time otherwise.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/52945530/223635248-5871596d-43f2-44ee-b8be-1e7927ade243.jpg" width=400 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of CrossViT [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
|
||||||
|
| crossvit_9 | D910x8-G | 73.56 | 91.79 | 8.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_9_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_9-e74c8e18.ckpt) |
|
||||||
|
| crossvit_15 | D910x8-G | 81.08 | 95.33 | 27.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_15_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_15-eaa43c02.ckpt) |
|
||||||
|
| crossvit_18 | D910x8-G | 81.93 | 95.75 | 43.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_18-ca0a2e43.ckpt) |
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/crossvit/crossvit15_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
<!--- Guideline: Citation format should follow GB/T 7714. -->
|
||||||
|
[1] Chun-Fu Chen, Quanfu Fan, Rameswar Panda. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
|
|
@ -0,0 +1,65 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 320
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [ 0.08, 1.0 ]
|
||||||
|
ratio: [ 0.75, 1.333 ]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: randaug-m9-mstd0.5-inc1
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
color_jitter: 0.4
|
||||||
|
crop_pct: 0.935
|
||||||
|
ema: True
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'crossvit15'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 30
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 600
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.0009
|
||||||
|
min_lr: 0.00001
|
||||||
|
warmup_epochs: 30
|
||||||
|
decay_epochs: 270
|
||||||
|
decay_rate: 0.1
|
||||||
|
num_cycles: 2
|
||||||
|
cycle_decay: 1
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale: 512
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-8
|
||||||
|
|
||||||
|
# Scheduler parameters
|
||||||
|
lr_epoch_stair: True
|
|
@ -0,0 +1,63 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [ 0.08, 1.0 ]
|
||||||
|
ratio: [ 0.75, 1.333 ]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: randaug-m9-mstd0.5-inc1
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
color_jitter: 0.4
|
||||||
|
crop_pct: 0.935
|
||||||
|
ema: True
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'crossvit18'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.004
|
||||||
|
min_lr: 0.00001
|
||||||
|
warmup_epochs: 30
|
||||||
|
decay_epochs: 270
|
||||||
|
decay_rate: 0.1
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-8
|
||||||
|
|
||||||
|
# Scheduler parameters
|
||||||
|
lr_epoch_stair: True
|
|
@ -0,0 +1,63 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 240
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
color_jitter: 0.4
|
||||||
|
crop_pct: 0.935
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'crossvit9'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0011
|
||||||
|
min_lr: 0.00001
|
||||||
|
warmup_epochs: 30
|
||||||
|
decay_epochs: 270
|
||||||
|
decay_rate: 0.1
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-8
|
||||||
|
|
||||||
|
# Scheduler parameters
|
||||||
|
lr_epoch_stair: True
|
|
@ -0,0 +1,107 @@
|
||||||
|
# DenseNet
|
||||||
|
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
|
||||||
|
> [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
|
||||||
|
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
|
||||||
|
|
||||||
|
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and more efficient to train if
|
||||||
|
they contain shorter connections between layers close to the input and those close to the output. Dense Convolutional
|
||||||
|
Network (DenseNet) is introduced based on this observation, which connects each layer to every other layer in a
|
||||||
|
feed-forward fashion. Whereas traditional convolutional networks with $L$ layers have $L$ connections-one between each
|
||||||
|
layer and its subsequent layer, DenseNet has $\frac{L(L+1)}{2}$ direct connections. For each layer, the feature maps
|
||||||
|
of all preceding layers are used as inputs, and their feature maps are used as inputs into all subsequent layers.
|
||||||
|
DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature
|
||||||
|
propagation, encourage feature reuse, and substantially reduce the number of parameters.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/52945530/210045537-7eda82c7-4575-4820-ba94-8fcab11c6482.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of DenseNet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
<!--- Guideline:
|
||||||
|
Table Format:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
|
||||||
|
- Download: url of the pretrained model weights. Use absolute url path.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
|
||||||
|
| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) |
|
||||||
|
| densenet_161 | D910x8-G | 79.09 | 94.66 | 28.90 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_161_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet161-120_5004_Ascend.ckpt) |
|
||||||
|
| densenet_169 | D910x8-G | 77.26 | 93.71 | 14.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_169_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet169-120_5004_Ascend.ckpt) |
|
||||||
|
| densenet_201 | D910x8-G | 78.14 | 94.08 | 20.24 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_201_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet201-120_5004_Ascend.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
|
||||||
|
|
||||||
|
[1] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet121'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet121'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet161'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet161'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet169'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet169'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet201'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'densenet201'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 120
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.1
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 120
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,102 @@
|
||||||
|
# Dual Path Networks (DPN)
|
||||||
|
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
|
||||||
|
> [Dual Path Networks](https://arxiv.org/abs/1707.01629v2)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
|
||||||
|
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
|
||||||
|
|
||||||
|
Figure 1 shows the model architecture of ResNet, DenseNet and Dual Path Networks. By combining the feature reusage of ResNet and new feature introduction of DenseNet,
|
||||||
|
DPN could enjoy both benefits so that it could share common features and maintain the flexibility to explore new features. As a result, DPN could achieve better performance with
|
||||||
|
fewer computation cost compared with ResNet and DenseNet on ImageNet-1K dataset.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/77485245/219323700-62029af1-e034-4bf4-8c87-d0c48a5e04b9.jpeg" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of DPN [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
<!--- Guideline:
|
||||||
|
Table Format:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
|
||||||
|
- Download: url of the pretrained model weights. Use absolute url path.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
|
||||||
|
| dpn92 | D910x8-G | 79.46 | 94.49 | 37.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn92_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn92-e3e0fca.ckpt) |
|
||||||
|
| dpn98 | D910x8-G | 79.94 | 94.57 | 61.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn98_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn98-119a8207.ckpt) |
|
||||||
|
| dpn107 | D910x8-G | 80.05 | 94.74 | 87.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn107_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn107-7d7df07b.ckpt) |
|
||||||
|
| dpn131 | D910x8-G | 80.07 | 94.72 | 79.48 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn131_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn131-47f084b3.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distrubted training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
|
||||||
|
|
||||||
|
[1] Chen Y, Li J, Xiao H, et al. Dual path networks[J]. Advances in neural information processing systems, 2017, 30.
|
|
@ -0,0 +1,52 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: 'path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.0
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'dpn107'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 0.05
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 200
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'SGD'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,52 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: 'path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.0
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'dpn131'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 0.05
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 200
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'SGD'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,52 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: 'path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.0
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'dpn92'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 30
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 0.05
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 200
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'SGD'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,52 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: 'path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.0
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'dpn98'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 0.05
|
||||||
|
warmup_epochs: 0
|
||||||
|
decay_epochs: 200
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'SGD'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.0001
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,94 @@
|
||||||
|
# EdgeNeXt
|
||||||
|
|
||||||
|
> [EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications](https://arxiv.org/abs/2206.10589)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
EdgeNeXt effectively combines the strengths of both CNN and Transformer models and is a
|
||||||
|
new efficient hybrid architecture. EdgeNeXt introduces a split depth-wise transpose
|
||||||
|
attention (SDTA) encoder that splits input tensors into multiple channel groups and
|
||||||
|
utilizes depth-wise convolution along with self-attention across channel dimensions
|
||||||
|
to implicitly increase the receptive field and encode multi-scale features.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/52945530/210045582-d31f832d-22e0-47bd-927f-74cf2daed91a.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of EdgeNeXt [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|----------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
|
||||||
|
| edgenext_xx_small | D910x8-G | 71.02 | 89.99 | 1.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_xx_small-afc971fb.ckpt) |
|
||||||
|
| edgenext_x_small | D910x8-G | 75.14 | 92.50 | 2.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_x_small-a200c6fc.ckpt) |
|
||||||
|
| edgenext_small | D910x8-G | 79.15 | 94.39 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_small-f530c372.ckpt) |
|
||||||
|
| edgenext_base | D910x8-G | 82.24 | 95.94 | 18.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_base-4335e9dc.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
<!--- Guideline: Citation format should follow GB/T 7714. -->
|
||||||
|
[1] Maaz M, Shaker A, Cholakkal H, et al. EdgeNeXt: efficiently amalgamated CNN-transformer architecture for Mobile vision applications[J]. arXiv preprint arXiv:2206.10589, 2022.
|
|
@ -0,0 +1,63 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: 0.4
|
||||||
|
re_prob: 0.0
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
ema: True
|
||||||
|
ema_decay: 0.9995
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'edgenext_base'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
val_interval: 2
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 4.5e-3
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_rate: 0.1
|
||||||
|
decay_epochs: 330
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,62 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: 0.4
|
||||||
|
re_prob: 0.0
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 0.0
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
ema: True
|
||||||
|
ema_decay: 0.9995
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'edgenext_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
val_interval: 2
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 4e-3
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_rate: 0.1
|
||||||
|
decay_epochs: 330
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,62 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: 0.4
|
||||||
|
re_prob: 0.0
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 0.0
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
ema: True
|
||||||
|
ema_decay: 0.9995
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'edgenext_x_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
val_interval: 2
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 4e-3
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_rate: 0.1
|
||||||
|
decay_epochs: 330
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,61 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
val_split: val
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: 0.4
|
||||||
|
re_prob: 0.0
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 0.0
|
||||||
|
ema: True
|
||||||
|
ema_decay: 0.9995
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'edgenext_xx_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
val_interval: 2
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.0
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-6
|
||||||
|
lr: 4e-3
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_rate: 0.1
|
||||||
|
decay_epochs: 330
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,100 @@
|
||||||
|
# EfficientNet
|
||||||
|
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
|
||||||
|
> [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
|
||||||
|
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
|
||||||
|
|
||||||
|
Figure 1 shows the methods from three dimensions -- width, depth, resolution and compound to expand the model. Increasing the model
|
||||||
|
size solely would cause the model performance to sub-optimal solution. Howerver, if three methods could be applied together into the model
|
||||||
|
, it is more likely to achieve optimal solution. By using neural architecture search, the best configurations for width scaling, depth scaling
|
||||||
|
and resolution scaling could be found. EfficientNet could achieve better model performance on ImageNet-1K dataset compared with previous methods.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/77485245/225044036-d0344404-e86c-483c-971f-863ebe6decc6.jpeg" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of DPN [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
<!--- Guideline:
|
||||||
|
Table Format:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
|
||||||
|
- Download: url of the pretrained model weights. Use absolute url path.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------------|-----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
|
||||||
|
| efficientnet_b0 | D910x64-G | 76.95 | 93.16 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/efficientnet/efficientnet_b0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/efficientnet/efficientnet_b0-103ec70c.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 64 python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
|
||||||
|
|
||||||
|
[1] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114.
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
auto_augment: 'autoaug'
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'efficientnet_b0'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.128
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 445
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,89 @@
|
||||||
|
# GoogLeNet
|
||||||
|
> [GoogLeNet: Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
GoogLeNet is a new deep learning structure proposed by Christian Szegedy in 2014. Prior to this, AlexNet, VGG and other
|
||||||
|
structures achieved better training effects by increasing the depth (number of layers) of the network, but the increase
|
||||||
|
in the number of layers It will bring many negative effects, such as overfit, gradient disappearance, gradient
|
||||||
|
explosion, etc. The proposal of inception improves the training results from another perspective: it can use computing
|
||||||
|
resources more efficiently, and can extract more features under the same amount of computing, thereby improving the
|
||||||
|
training results.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210749903-5ff23c0e-547f-487d-bb64-70b6e99031ea.jpg" width=180 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of GoogLENet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
|
||||||
|
| GoogLeNet | D910x8-G | 72.68 | 90.89 | 6.99 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/googlenet/googlenet_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/googlenet/googlenet-5552fcd3.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'googlenet'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 150
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.045
|
||||||
|
min_lr: 0.0
|
||||||
|
decay_epochs: 145
|
||||||
|
warmup_epochs: 5
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,100 @@
|
||||||
|
# HRNet
|
||||||
|
<!--- Guideline: use url linked to abstract in ArXiv instead of PDF for fast loading. -->
|
||||||
|
|
||||||
|
> [Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<!--- Guideline: Introduce the model and architectures. Cite if you use/adopt paper explanation from others. -->
|
||||||
|
|
||||||
|
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, the proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. It shows the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems.
|
||||||
|
|
||||||
|
<!--- Guideline: If an architecture table/figure is available in the paper, put one here and cite for intuitive illustration. -->
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/8342575/218354682-4256e17e-bb69-4e51-8bb9-a08fc29087c4.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em> Figure 1. Architecture of HRNet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
<!--- Guideline:
|
||||||
|
Table Format:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
|
||||||
|
- Download: url of the pretrained model weights. Use absolute url path.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
|
||||||
|
| hrnet_w32 | D910x8-G | 80.64 | 95.44 | 41.30 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w32_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w32-cc4fbd91.ckpt) |
|
||||||
|
| hrnet_w48 | D910x8-G | 81.19 | 95.69 | 77.57 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w48_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w48-2e3399cd.ckpt) |
|
||||||
|
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
<!--- Guideline: Avoid using shell script in the command line. Python script preferred. -->
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
|
||||||
|
|
||||||
|
## References
|
||||||
|
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
|
||||||
|
|
||||||
|
[1] Jingdong Wang, Ke Sun, Tianheng Cheng, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. arXiv preprint arXiv:1908.07919, 2019.
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
val_interval: 1
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bilinear"
|
||||||
|
auto_augment: "randaug-m7-mstd0.5"
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "hrnet_w32"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 5
|
||||||
|
ckpt_save_policy: "top_k"
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "CE"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
min_lr: 0.00001
|
||||||
|
lr: 0.001
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 280
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
val_interval: 1
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bilinear"
|
||||||
|
auto_augment: "randaug-m7-mstd0.5"
|
||||||
|
re_prob: 0.1
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "hrnet_w48"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 5
|
||||||
|
ckpt_save_policy: "top_k"
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "CE"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
min_lr: 0.00001
|
||||||
|
lr: 0.001
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 280
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
|
@ -0,0 +1,90 @@
|
||||||
|
# InceptionV3
|
||||||
|
> [InceptionV3: Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/pdf/1512.00567.pdf)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
InceptionV3 is an upgraded version of GoogleNet. One of the most important improvements of V3 is Factorization, which
|
||||||
|
decomposes 7x7 into two one-dimensional convolutions (1x7, 7x1), and 3x3 is the same (1x3, 3x1), such benefits, both It
|
||||||
|
can accelerate the calculation (excess computing power can be used to deepen the network), and can split 1 conv into 2
|
||||||
|
convs, which further increases the network depth and increases the nonlinearity of the network. It is also worth noting
|
||||||
|
that the network input from 224x224 has become 299x299, and 35x35/17x17/8x8 modules are designed more precisely. In
|
||||||
|
addition, V3 also adds batch normalization, which makes the model converge more quickly, which plays a role in partial
|
||||||
|
regularization and effectively reduces overfitting.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210745725-736bd456-4d31-4f48-b958-75a53cf30e99.jpg" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of InceptionV3 [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
|
||||||
|
| Inception_v3 | D910x8-G | 79.11 | 94.40 | 27.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v3/inception_v3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v3/inception_v3-38f67890.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.
|
|
@ -0,0 +1,54 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 299
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
auto_augment: 'autoaug'
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'inception_v3'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
aux_factor: 0.1
|
||||||
|
|
||||||
|
# loss config
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.045
|
||||||
|
min_lr: 0.0
|
||||||
|
decay_epochs: 195
|
||||||
|
warmup_epochs: 5
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,87 @@
|
||||||
|
# InceptionV4
|
||||||
|
> [InceptionV4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/pdf/1602.07261.pdf)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
InceptionV4 studies whether the Inception module combined with Residual Connection can be improved. It is found that the
|
||||||
|
structure of ResNet can greatly accelerate the training, and the performance is also improved. An Inception-ResNet v2
|
||||||
|
network is obtained, and a deeper and more optimized Inception v4 model is also designed, which can achieve comparable
|
||||||
|
performance with Inception-ResNet v2.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210749903-5ff23c0e-547f-487d-bb64-70b6e99031ea.jpg" width=500 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of InceptionV4 [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
|
||||||
|
| Inception_v4 | D910x8-G | 80.88 | 95.34 | 42.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v4/inception_v4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v4/inception_v4-db9c45b3.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.
|
|
@ -0,0 +1,53 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 32
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 299
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
auto_augment: 'autoaug'
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'inception_v4'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'warmup_cosine_decay'
|
||||||
|
lr: 0.045
|
||||||
|
min_lr: 0.0
|
||||||
|
decay_epochs: 195
|
||||||
|
warmup_epochs: 5
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,91 @@
|
||||||
|
# MixNet
|
||||||
|
> [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often
|
||||||
|
overlooked. In this paper, the authors systematically study the impact of different kernel sizes, and observe that
|
||||||
|
combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation,
|
||||||
|
the authors propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a
|
||||||
|
single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy
|
||||||
|
and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/219263295-75de649e-d38b-4b05-bd26-1c96896f7e83.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MixNet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
|
||||||
|
| MixNet_s | D910x8-G | 75.52 | 92.52 | 4.17 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_s-2a5ef3a3.ckpt) |
|
||||||
|
| MixNet_m | D910x8-G | 76.64 | 93.05 | 5.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_m_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_m-74cc4cb1.ckpt) |
|
||||||
|
| MixNet_l | D910x8-G | 78.73 | 94.31 | 7.38 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_l_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_l-978edf2b.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distrubted training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Tan M, Le Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv:1907.09595, 2019.
|
|
@ -0,0 +1,57 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 16
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
auto_augment: "randaug-m9-mstd0.5-inc1"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "mixnet_l"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O3"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "CE"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.25
|
||||||
|
min_lr: 0.00001
|
||||||
|
decay_epochs: 580
|
||||||
|
warmup_epochs: 20
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "momentum"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00002
|
||||||
|
loss_scale_type: "dynamic"
|
||||||
|
drop_overflow_update: True
|
||||||
|
loss_scale: 16777216
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
auto_augment: "randaug-m9-mstd0.5"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "mixnet_m"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O3"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "CE"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.2
|
||||||
|
min_lr: 0.00001
|
||||||
|
decay_epochs: 585
|
||||||
|
warmup_epochs: 15
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "momentum"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00002
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
auto_augment: "randaug-m9-mstd0.5"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.2
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "mixnet_s"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O3"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "CE"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.2
|
||||||
|
min_lr: 0.00001
|
||||||
|
decay_epochs: 585
|
||||||
|
warmup_epochs: 15
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "momentum"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00002
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,86 @@
|
||||||
|
# MnasNet
|
||||||
|
> [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, the authors propose a novel factorized hierarchical search space that encourages layer diversity throughout the network.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210044057-35febc60-8d24-434a-a4f2-db8db3859e7a.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MnasNet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------------|----------|-----------|-----------|------------|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
|
||||||
|
| MnasNet-B1-0_75 | D910x8-G | 71.81 | 90.53 | 3.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_075-465d366d.ckpt) |
|
||||||
|
| MnasNet-B1-1_0 | D910x8-G | 74.28 | 91.70 | 4.42 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_100-1bcf43f8.ckpt) |
|
||||||
|
| MnasNet-B1-1_4 | D910x8-G | 76.01 | 92.83 | 7.16 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_140-7e20bb30.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Tan M, Chen B, Pang R, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2820-2828.
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet0_75'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.016
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 345
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet0_75'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 350
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.012
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 345
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet1_0'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.016
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 445
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet1_0'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.012
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 445
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet1_4'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 400
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.016
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 395
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mnasnet1_4'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 400
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.008
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 395
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale: 256
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,87 @@
|
||||||
|
# MobileNetV1
|
||||||
|
> [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Compared with the traditional convolutional neural network, MobileNetV1's parameters and the amount of computation are greatly reduced on the premise that the accuracy rate is slightly reduced. (Compared to VGG16, the accuracy rate is reduced by 0.9%, but the model parameters are only 1/32 of VGG). The model is based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. At the same time, two simple global hyperparameters are introduced, which can effectively trade off latency and accuracy.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210044118-20dcc78a-96cd-4a3f-9cd1-fefa00c12227.png" width=400 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MobileNetV1 [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| MobileNet_v1_025 | D910x8-G | 53.87 | 77.66 | 0.47 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_025-d3377fba.ckpt) |
|
||||||
|
| MobileNet_v1_050 | D910x8-G | 65.94 | 86.51 | 1.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_050-23e9ddbe.ckpt) |
|
||||||
|
| MobileNet_v1_075 | D910x8-G | 70.44 | 89.49 | 2.60 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_075-5bed0c73.ckpt) |
|
||||||
|
| MobileNet_v1_100 | D910x8-G | 72.95 | 91.01 | 4.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_100-91c7b206.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_025_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_025_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_050_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_050_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_075_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_075_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,50 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_100_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v1_100_224'
|
||||||
|
num_classes: 1001
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 80
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 200
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 2
|
||||||
|
decay_epochs: 198
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,88 @@
|
||||||
|
# MobileNetV2
|
||||||
|
> [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
The model is a new neural network architecture that is specifically tailored for mobile and resource-constrained environments. This network pushes the state of the art for mobile custom computer vision models, significantly reducing the amount of operations and memory required while maintaining the same accuracy.
|
||||||
|
|
||||||
|
The main innovation of the model is the proposal of a new layer module: The Inverted Residual with Linear Bottleneck. The module takes as input a low-dimensional compressed representation that is first extended to high-dimensionality and then filtered with lightweight depth convolution. Linear convolution is then used to project the features back to the low-dimensional representation.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210044190-8b5aab08-75fe-4e2c-87cc-d3529d9c60cd.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MobileNetV2 [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| MobileNet_v2_075 | D910x8-G | 69.76 | 89.28 | 2.66 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_075-243f9404.ckpt) |
|
||||||
|
| MobileNet_v2_100 | D910x8-G | 72.02 | 90.92 | 3.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_100-52122156.ckpt) |
|
||||||
|
| MobileNet_v2_140 | D910x8-G | 74.97 | 92.32 | 6.15 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_140-015cfb04.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 32
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v2_075_224'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 50
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 400
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.3
|
||||||
|
warmup_epochs: 4
|
||||||
|
decay_epochs: 396
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00003
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 32
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v2_100_224'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 100
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 4
|
||||||
|
decay_epochs: 296
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 32
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v2_140_224'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 50
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 300
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.4
|
||||||
|
warmup_epochs: 4
|
||||||
|
decay_epochs: 296
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,87 @@
|
||||||
|
# MobileNetV3
|
||||||
|
> [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
MobileNet v3 was published in 2019, and this v3 version combines the deep separable convolution of v1, the Inverted Residuals and Linear Bottleneck of v2, and the SE module to search the configuration and parameters of the network using NAS (Neural Architecture Search).MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Afterwards, MobileNetV3 then fine-tunes the architecture using NetAdapt, which exemplifies NetAdapt's complementary capability to tune underutilized activation channels with a small drop.
|
||||||
|
|
||||||
|
mobilenet-v3 offers two versions, mobilenet-v3 large and mobilenet-v3 small, for situations with different resource requirements. The paper mentions that mobilenet-v3 small, for the imagenet classification task, has an accuracy The paper mentions that mobilenet-v3 small achieves about 3.2% better accuracy and 15% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves about 4.6% better accuracy and 5% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves the same accuracy and 25% faster speedup in COCO compared to v2 The improvement in the segmentation algorithm is also observed.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/53842165/210044297-d658ca54-e6ff-4c0f-8080-88072814d8e6.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MobileNetV3 [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| MobileNetV3_small_100 | D910x8-G | 67.81 | 87.82 | 2.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_small_100-c884b105.ckpt) |
|
||||||
|
| MobileNetV3_large_100 | D910x8-G | 75.14 | 92.33 | 5.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_100-6f5bf961.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.
|
|
@ -0,0 +1,51 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 75
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v3_large_100'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 30
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 420
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss config
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 1.08
|
||||||
|
warmup_epochs: 4
|
||||||
|
decay_epochs: 416
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,52 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 75
|
||||||
|
drop_remainder: True
|
||||||
|
train_split: 'train'
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
color_jitter: 0.4
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilenet_v3_small_100'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 30
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 470
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.0
|
||||||
|
lr: 0.77
|
||||||
|
warmup_epochs: 4
|
||||||
|
decay_epochs: 466
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'momentum'
|
||||||
|
filter_bias_and_bn: False
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.00004
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,81 @@
|
||||||
|
# MobileViT
|
||||||
|
> [MobileViT:Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/pdf/2110.02178.pdf)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
MobileViT, a light-weight and general-purpose vision transformer for mobile devices. MobileViT presents a different perspective for the global processing of information with transformers, i.e., transformers as convolutions. MobileViT significantly outperforms CNN- and ViT-based networks across different tasks and datasets. On the ImageNet-1k dataset, MobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters, which is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT (ViT-based) for a similar number of parameters. On the MS-COCO object detection task, MobileViT is 5.7% more accurate than MobileNetv3 for a similar number of parameters.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/64628185/229476902-1b97496a-4a38-40ca-9e50-a88c52defcbb.png" width=800 />
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of MobileViT [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
|
||||||
|
| mobilevit_xx_small | D910x8-G | 68.90 | 88.92 | 1.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_xx_small-af9da8a0.ckpt) |
|
||||||
|
| mobilevit_x_small | D910x8-G | 74.98 | 92.33 | 2.32 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_x_small-673fc6f2.ckpt) |
|
||||||
|
| mobilevit_small | D910x8-G | 78.48 | 94.18 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_small-caf79638.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
re_prob: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilevit_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.000002
|
||||||
|
lr: 0.002
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 430
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.01
|
||||||
|
use_nesterov: False
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
re_prob: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilevit_x_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.000002
|
||||||
|
lr: 0.002
|
||||||
|
warmup_epochs: 20
|
||||||
|
decay_epochs: 430
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.01
|
||||||
|
use_nesterov: False
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,55 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 64
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 256
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
re_prob: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'mobilevit_xx_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'ce'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 0.000002
|
||||||
|
lr: 0.002
|
||||||
|
warmup_epochs: 40
|
||||||
|
decay_epochs: 410
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.01
|
||||||
|
use_nesterov: False
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
loss_scale: 1024
|
|
@ -0,0 +1,100 @@
|
||||||
|
# NasNet
|
||||||
|
<!--- Guideline: please use url linked to the paper abstract in ArXiv instead of PDF for fast loading. -->
|
||||||
|
> [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<!--- Guideline: Introduce the model and architectures. Please cite if you use/adopt paper explanation from others. -->
|
||||||
|
<!--- Guideline: If an architecture table/figure is available in the paper, please put one here and cite for intuitive illustration. -->
|
||||||
|
|
||||||
|
Neural architecture search (NAS) shows the flexibility on model configuration. By doing neural architecture search in a pooling with convolution layer, max pooling and average pooling layer,
|
||||||
|
the normal cell and the reduction cell are selected to be part of NasNet. Figure 1 shows NasNet architecture for ImageNet, which are stacked with reduction cell and normal cell.
|
||||||
|
In conclusion, NasNet could achieve better model performance with fewer model parametes and fewer computation cost on image classification
|
||||||
|
compared with previous state-of-the-art methods on ImageNet-1K dataset.[[1](#references)]
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/77485245/224208085-0d6e6b91-873d-49cb-ad54-23ea12483d8f.jpeg" width=200 height=400/>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of Nasnet [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
<!--- Guideline:
|
||||||
|
Table Format:
|
||||||
|
- Model: model name in lower case with _ seperator.
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Keep 2 digits after the decimal point.
|
||||||
|
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
|
||||||
|
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
|
||||||
|
- Download: url of the pretrained model weights. Use absolute url path.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|-----------------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
|
||||||
|
| nasnet_a_4x1056 | D910x8-G | 73.65 | 91.25 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/nasnet/nasnet_a_4x1056_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/nasnet/nasnet_a_4x1056-0fbb5cdd.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
<!--- Guideline: Please avoid using shell scripts in the command line. Python scripts preferred. -->
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```
|
||||||
|
python validate.py -c configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
|
||||||
|
|
||||||
|
[1] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8697-8710.
|
|
@ -0,0 +1,53 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 256
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.875
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'nasnet_a_4x1056'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O0'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
min_lr: 1e-10
|
||||||
|
lr: 0.016
|
||||||
|
warmup_epochs: 5
|
||||||
|
decay_epochs: 445
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'rmsprop'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 1e-5
|
||||||
|
loss_scale_type: 'dynamic'
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
||||||
|
eps: 1e-3
|
|
@ -0,0 +1,89 @@
|
||||||
|
# PiT
|
||||||
|
> [PiT: Rethinking Spatial Dimensions of Vision Transformers](https://arxiv.org/abs/2103.16302v2)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
PiT (Pooling-based Vision Transformer) is an improvement of Vision Transformer (ViT) model proposed by Byeongho Heo in 2021. PiT adds pooling layer on the basis of ViT model, so that the spatial dimension of each layer is reduced like CNN, instead of ViT using the same spatial dimension for all layers. PiT achieves the improved model capability and generalization performance against ViT. [[1](#references)]
|
||||||
|
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/37565353/215304821-efaf99ad-12ba-4020-90a3-5897247f9368.png" width=400 />
|
||||||
|
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<em>Figure 1. Architecture of PiT [<a href="#references">1</a>] </em>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|--------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
|
||||||
|
| PiT_ti | D910x8-G | 72.96 | 91.33 | 4.85 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_ti_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_ti-e647a593.ckpt) |
|
||||||
|
| PiT_xs | D910x8-G | 78.41 | 94.06 | 10.61 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_xs_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_xs-fea0d37e.ckpt) |
|
||||||
|
| PiT_s | D910x8-G | 80.56 | 94.80 | 23.46 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_s-3c1ba36f.ckpt) |
|
||||||
|
| PiT_b | D910x8-G | 81.87 | 95.04 | 73.76 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_b_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_b-2411c9b6.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
* Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/dataset --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py -c configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1] Heo B, Yun S, Han D, et al. Rethinking spatial dimensions of vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11936-11945.
|
|
@ -0,0 +1,60 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: "random"
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
auto_augment: "randaug-m9-mstd0.5-inc1"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.9
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
aug_repeats: 3
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "pit_b"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
drop_path_rate: 0.2
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "ce"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.001
|
||||||
|
min_lr: 0.00001
|
||||||
|
lr_epoch_stair: True
|
||||||
|
decay_epochs: 590
|
||||||
|
warmup_epochs: 10
|
||||||
|
warmup_factor: 0.002
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "adamw"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: "auto"
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,61 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: "random"
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
color_jitter: 0.3
|
||||||
|
auto_augment: "randaug-m9-mstd0.5-inc1"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "pit_s"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "ce"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.002
|
||||||
|
min_lr: 0.00001
|
||||||
|
lr_epoch_stair: True
|
||||||
|
decay_epochs: 590
|
||||||
|
warmup_epochs: 10
|
||||||
|
warmup_factor: 0.002
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "adamw"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: "dynamic"
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,57 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 16
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: "random"
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
auto_augment: "randaug-m9-mstd0.5-inc1"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "pit_ti"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 500
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "ce"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.002
|
||||||
|
min_lr: 0.00001
|
||||||
|
decay_epochs: 490
|
||||||
|
warmup_epochs: 10
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "adamw"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: "auto"
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,59 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: "imagenet"
|
||||||
|
data_dir: "/path/to/imagenet"
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
re_value: "random"
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: "bicubic"
|
||||||
|
color_jitter: [0.3, 0.3, 0.3]
|
||||||
|
auto_augment: "randaug-m9-mstd0.5-inc1"
|
||||||
|
re_prob: 0.25
|
||||||
|
crop_pct: 0.875
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: "pit_xs"
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ""
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: "./ckpt"
|
||||||
|
epoch_size: 600
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: "O2"
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: "ce"
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: "cosine_decay"
|
||||||
|
lr: 0.001
|
||||||
|
min_lr: 0.00001
|
||||||
|
decay_epochs: 590
|
||||||
|
warmup_epochs: 10
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: "adamw"
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale_type: "dynamic"
|
||||||
|
drop_overflow_update: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,83 @@
|
||||||
|
# PoolFormer
|
||||||
|
|
||||||
|
> [MetaFormer Is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Instead of designing complicated token mixer to achieve SOTA performance, the target of this work is to demonstrate the competence of Transformer models largely stem from the general architecture MetaFormer. Pooling/PoolFormer are just the tools to support the authors' claim.
|
||||||
|
|
||||||
|
![MetaFormer](https://user-images.githubusercontent.com/74176172/210046827-c218f5d3-1ee8-47bf-a78a-482d821ece89.png)
|
||||||
|
Figure 1. MetaFormer and performance of MetaFormer-based models on ImageNet-1K validation set. The authors argue that the competence of Transformer/MLP-like models primarily stem from the general architecture MetaFormer instead of the equipped specific token mixers. To demonstrate this, they exploit an embarrassingly simple non-parametric operator, pooling, to conduct extremely basic token mixing. Surprisingly, the resulted model PoolFormer consistently outperforms the DeiT and ResMLP as shown in (b), which well supports that MetaFormer is actually what we need to achieve competitive performance. RSB-ResNet in (b) means the results are from “ResNet Strikes Back” where ResNet is trained with improved training procedure for 300 epochs.
|
||||||
|
|
||||||
|
![PoolFormer](https://user-images.githubusercontent.com/74176172/210046845-6caa1574-b6a4-47f3-8298-c8ca3b4f8fa4.png)
|
||||||
|
Figure 2. (a) The overall framework of PoolFormer. (b) The architecture of PoolFormer block. Compared with Transformer block, it replaces attention with an extremely simple non-parametric operator, pooling, to conduct only basic token mixing.[[1](#References)]
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|:--------------:|:--------:|:---------:|:---------:|:----------:|---------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
|
||||||
|
| poolformer_s12 | D910x8-G | 77.33 | 93.34 | 11.92 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/poolformer/poolformer_s12_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/poolformer/poolformer_s12-5be5c4e4.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
- Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
```
|
||||||
|
validation of poolformer has to be done in amp O3 mode which is not supported, coming soon...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1]. Yu W, Luo M, Zhou P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10819-10829.
|
|
@ -0,0 +1,61 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
val_while_train: True
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
vflip: 0.0
|
||||||
|
interpolation: 'bilinear'
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: [0.4, 0.4, 0.4]
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'poolformer_s12'
|
||||||
|
drop_rate: 0.0
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 600
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O3'
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0005
|
||||||
|
min_lr: 1e-06
|
||||||
|
warmup_epochs: 30
|
||||||
|
decay_epochs: 570
|
||||||
|
decay_rate: 0.1
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'AdamW'
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
momentum: 0.9
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,84 @@
|
||||||
|
# Pyramid Vision Transformer
|
||||||
|
|
||||||
|
> [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/abs/2102.12122)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
PVT is a general backbone network for dense prediction without convolution operation. PVT introduces a pyramid structure in Transformer to generate multi-scale feature maps for dense prediction tasks. PVT uses a gradual reduction strategy to control the size of the feature maps through the patch embedding layer, and proposes a spatial reduction attention (SRA) layer to replace the traditional multi head attention layer in the encoder, which greatly reduces the computing/memory overhead.[[1](#References)]
|
||||||
|
|
||||||
|
![PVT](https://user-images.githubusercontent.com/74176172/210046926-2322161b-a963-4603-b3cb-86ecdca41262.png)
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Our reproduced model performance on ImageNet-1K is reported as follows.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|
||||||
|
|:----------:|:--------:|:---------:|:---------:|:----------:|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
|
||||||
|
| PVT_tiny | D910x8-G | 74.81 | 92.18 | 13.23 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_tiny-6abb953d.ckpt) |
|
||||||
|
| PVT_small | D910x8-G | 79.66 | 94.71 | 24.49 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_small-213c2ed1.ckpt) |
|
||||||
|
| PVT_medium | D910x8-G | 81.82 | 95.81 | 44.21 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_medium_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_medium-469e6802.ckpt) |
|
||||||
|
| PVT_large | D910x8-G | 81.75 | 95.70 | 61.36 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_large-bb6895d7.ckpt) |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Notes
|
||||||
|
|
||||||
|
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
|
||||||
|
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
#### Installation
|
||||||
|
|
||||||
|
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.
|
||||||
|
|
||||||
|
#### Dataset Preparation
|
||||||
|
|
||||||
|
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
- Distributed Training
|
||||||
|
|
||||||
|
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# distributed training on multiple GPU/Ascend devices
|
||||||
|
mpirun -n 8 python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet
|
||||||
|
```
|
||||||
|
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
|
||||||
|
|
||||||
|
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
|
||||||
|
|
||||||
|
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
|
||||||
|
|
||||||
|
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
|
||||||
|
|
||||||
|
* Standalone Training
|
||||||
|
|
||||||
|
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
# standalone training on a CPU/GPU/Ascend device
|
||||||
|
python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet --distribute False
|
||||||
|
```
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python validate.py --model=pvt_tiny --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md).
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
[1]. Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 568-578.
|
|
@ -0,0 +1,56 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'pvt_large'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 400
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.3
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.001
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 10
|
||||||
|
decay_epochs: 390
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 300
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,56 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'pvt_medium'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 400
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.3
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.001
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 10
|
||||||
|
decay_epochs: 390
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,56 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'pvt_small'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 500
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0005
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 10
|
||||||
|
decay_epochs: 390
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
|
@ -0,0 +1,56 @@
|
||||||
|
# system
|
||||||
|
mode: 0
|
||||||
|
distribute: True
|
||||||
|
num_parallel_workers: 8
|
||||||
|
|
||||||
|
# dataset
|
||||||
|
dataset: 'imagenet'
|
||||||
|
data_dir: '/path/to/imagenet'
|
||||||
|
shuffle: True
|
||||||
|
dataset_download: False
|
||||||
|
batch_size: 128
|
||||||
|
drop_remainder: True
|
||||||
|
|
||||||
|
# augmentation
|
||||||
|
image_resize: 224
|
||||||
|
scale: [0.08, 1.0]
|
||||||
|
ratio: [0.75, 1.333]
|
||||||
|
hflip: 0.5
|
||||||
|
interpolation: 'bicubic'
|
||||||
|
auto_augment: 'randaug-m9-mstd0.5-inc1'
|
||||||
|
re_prob: 0.25
|
||||||
|
mixup: 0.8
|
||||||
|
cutmix: 1.0
|
||||||
|
cutmix_prob: 1.0
|
||||||
|
crop_pct: 0.9
|
||||||
|
color_jitter: 0.0
|
||||||
|
|
||||||
|
# model
|
||||||
|
model: 'pvt_tiny'
|
||||||
|
num_classes: 1000
|
||||||
|
pretrained: False
|
||||||
|
ckpt_path: ''
|
||||||
|
keep_checkpoint_max: 10
|
||||||
|
ckpt_save_dir: './ckpt'
|
||||||
|
epoch_size: 450
|
||||||
|
dataset_sink_mode: True
|
||||||
|
amp_level: 'O2'
|
||||||
|
drop_path_rate: 0.1
|
||||||
|
|
||||||
|
# loss
|
||||||
|
loss: 'CE'
|
||||||
|
label_smoothing: 0.1
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
scheduler: 'cosine_decay'
|
||||||
|
lr: 0.0005
|
||||||
|
min_lr: 0.000001
|
||||||
|
warmup_epochs: 10
|
||||||
|
decay_epochs: 440
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt: 'adamw'
|
||||||
|
weight_decay: 0.05
|
||||||
|
loss_scale: 1024
|
||||||
|
filter_bias_and_bn: True
|
||||||
|
use_nesterov: False
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue