diff --git a/configs/README.md b/configs/README.md new file mode 100644 index 0000000..014c4a1 --- /dev/null +++ b/configs/README.md @@ -0,0 +1,68 @@ +### File Structure and Naming +This folder contains training recipes and model readme files for each model. The folder structure and naming rule of model configurations are as follows. + + +``` + ├── configs + ├── model_a // model name in lower case with _ seperator + │ ├─ model_a_small_ascend.yaml // training recipe denated as {model_name}_{specification}_{hardware}.yaml + | ├─ model_a_large_gpu.yaml + │ ├─ README.md //readme file containing performance results and pretrained weight urls + │ └─ README_CN.md //readme file in Chinese + ├── model_b + │ ├─ model_b_32_ascend.yaml + | ├─ model_l_16_ascend.yaml + │ ├─ README.md + │ └─ README_CN.md + ├── README.md //this file +``` + +### Model Readme Writing Guideline +The model readme file in each sub-folder provides the introduction, reproduced results, and running guideline for each model. + +Please follow the outline structure and **table format** shown in [densenet/README.md](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/README.md) when contributing your models :) + +#### Table Format + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------| +| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) | + +
+ +Illustration: +- Model: model name in lower case with _ seperator. +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validatoin set of ImageNet-1K. Keep 2 digits after the decimal point. +- Params (M): # of model parameters in millions (10^6). Keep **2 digits** after the decimal point +- Recipe: Training recipe/configuration linked to a yaml config file. +- Download: url of the pretrained model weights + +### Model Checkpoint Format + The checkpoint (i.e., model weight) name should follow this format: **{model_name}_{specification}-{sha256sum}.ckpt**, e.g., `poolformer_s12-5be5c4e4.ckpt`. + + You can run the following command and take the first 8 characters of the computing result as the sha256sum value in the checkpoint name. + + ```shell + sha256sum your_model.ckpt + ``` + + +#### Training Script Format + +For consistency, it is recommended to provide distributed training commands based on `mpirun -n {num_devices} python train.py`, instead of using shell script such as `distrubuted_train.sh`. + + ```shell + # standalone training on a gpu or ascend device + python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False + + # distributed training on gpu or ascend divices + mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet + + ``` + > If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +#### URL and Hyperlink Format +Please use **absolute path** in the hyperlink or url for linking the target resource in the readme file and table. diff --git a/configs/bit/README.md b/configs/bit/README.md new file mode 100644 index 0000000..e0283d0 --- /dev/null +++ b/configs/bit/README.md @@ -0,0 +1,91 @@ +# BigTransfer + +> [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370) + +## Introduction + +Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. +Big Transfer (BiT) can achieve strong performance on more than 20 data sets by combining some carefully selected components and using simple heuristic +methods for transmission. The components distilled by BiT for training models that transfer well are: 1) Big datasets: as the size of the dataset increases, +the optimal performance of the BIT model will also increase. 2) Big architectures: In order to make full use of large datasets, a large enough architecture +is required. 3) Long pre-training time: Pretraining on a larger dataset requires more training epoch and training time. 4) GroupNorm and Weight Standardisation: +BiT use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since BatchNorm performs worse when the number of images on each accelerator is +too low. 5) With BiT fine-tuning, good performance can be achieved even if there are only a few examples of each type on natural images.[[1, 2](#References)] + + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params(M) | Recipe | Download | +|----------------| -------- |-----------|-----------|-----------|--------------------------------------------------------------------------------------------------| -------------------------------------------------------------------------- | +| bit_resnet50 | D910x8-G | 76.81 | 93.17 | 25.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50-1e4795a4.ckpt) | +| bit_resnet50x3 | D910x8-G | 80.63 | 95.12 | 217.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet50x3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet50x3-a960f91f.ckpt) | +| bit_resnet101 | D910x8-G | 77.93 | 93.75 | 44.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/bit/bit_resnet101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/bit/BiT_resnet101-2efa9106.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/bit/bit_resnet50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Kolesnikov A, Beyer L, Zhai X, et al. Big transfer (bit): General visual representation learning[C]//European conference on computer vision. Springer, Cham, 2020: 491-507. + +[2] BigTransfer (BiT): State-of-the-art transfer learning for computer vision, https://blog.tensorflow.org/2020/05/bigtransfer-bit-state-of-art-transfer-learning-computer-vision.html diff --git a/configs/bit/bit_resnet101_ascend.yaml b/configs/bit/bit_resnet101_ascend.yaml new file mode 100644 index 0000000..3a6abf9 --- /dev/null +++ b/configs/bit/bit_resnet101_ascend.yaml @@ -0,0 +1,47 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 16 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +crop_pct: 0.875 + +# model +model: 'BiTresnet101' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 90 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'multi_step_decay' +lr: 0.06 +decay_rate: 0.5 +multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85] + + +# optimizer +opt: 'sgd' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 diff --git a/configs/bit/bit_resnet50_ascend.yaml b/configs/bit/bit_resnet50_ascend.yaml new file mode 100644 index 0000000..75a61ca --- /dev/null +++ b/configs/bit/bit_resnet50_ascend.yaml @@ -0,0 +1,46 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +crop_pct: 0.875 + +# model +model: 'BiTresnet50' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 90 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'multi_step_decay' +lr: 0.06 +decay_rate: 0.5 +multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85] + +# optimizer +opt: 'sgd' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 diff --git a/configs/bit/bit_resnet50x3_ascend.yaml b/configs/bit/bit_resnet50x3_ascend.yaml new file mode 100644 index 0000000..487c8ee --- /dev/null +++ b/configs/bit/bit_resnet50x3_ascend.yaml @@ -0,0 +1,49 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +mixup: 0.2 +crop_pct: 0.875 +auto_augment: "randaug-m7-mstd0.5" + +# model +model: 'BiTresnet50x3' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 90 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler config +warmup_epochs: 1 +scheduler: 'multi_step_decay' +lr: 0.09 +decay_rate: 0.4 +multi_step_decay_milestones: [30, 40, 50, 60, 70, 80, 85] + +# optimizer +opt: 'sgd' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 diff --git a/configs/coat/README.md b/configs/coat/README.md new file mode 100644 index 0000000..76adf0a --- /dev/null +++ b/configs/coat/README.md @@ -0,0 +1,80 @@ +# CoaT + +> [Co-Scale Conv-Attentional Image Transformers](https://arxiv.org/abs/2104.06399v2) + +## Introduction + +Co-Scale Conv-Attentional Image Transformer (CoaT) is a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other. Second, the conv-attentional mechanism is designed by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities. + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Weight | +|-----------------|-----------|-------|------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------| +| coat_lite_tiny | D910x8-G | 77.35 | 93.43 | 5.72 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_tiny-fa7bf894.ckpt) | +| coat_lite_mini | D910x8-G | 78.51 | 93.84 | 11.01 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_lite_mini_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_lite_mini-55a52f05.ckpt) | +| coat_tiny | D910x8-G | 79.67 | 94.88 | 5.50 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/coat/coat_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/coat/coat_tiny-071cb792.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun` + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +- Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/coat/coat_lite_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] Han D, Yun S, Heo B, et al. Rethinking channel dimensions for efficient model design[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2021: 732-741. diff --git a/configs/coat/coat_lite_mini_ascend.yaml b/configs/coat/coat_lite_mini_ascend.yaml new file mode 100644 index 0000000..7b82e66 --- /dev/null +++ b/configs/coat/coat_lite_mini_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m7-mstd0.5-inc1' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.4 + +# model +model: 'coat_lite_mini' +num_classes: 1000 +pretrained: False +ckpt_path: '' +ckpt_save_policy: 'top_k' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt/' +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0017 +min_lr: 0.000005 +warmup_epochs: 20 +decay_epochs: 280 +epoch_size: 1200 +num_cycles: 4 +cycle_decay: 1.0 + +# optimizer +opt: 'adamw' +weight_decay: 0.025 +filter_bias_and_bn: True +loss_scale: 1024 +use_nesterov: False diff --git a/configs/coat/coat_lite_tiny_ascend.yaml b/configs/coat/coat_lite_tiny_ascend.yaml new file mode 100644 index 0000000..0b10674 --- /dev/null +++ b/configs/coat/coat_lite_tiny_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet/' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m7-mstd0.5-inc1' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.4 + +# model +model: 'coat_lite_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +ckpt_save_policy: 'top_k' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt/' +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0008 +min_lr: 0.000005 +warmup_epochs: 20 +decay_epochs: 280 +epoch_size: 900 +num_cycles: 3 +cycle_decay: 1.0 + +# optimizer +opt: 'adamw' +weight_decay: 0.025 +filter_bias_and_bn: True +loss_scale: 1024 +use_nesterov: False diff --git a/configs/coat/coat_tiny_ascend.yaml b/configs/coat/coat_tiny_ascend.yaml new file mode 100644 index 0000000..ce05f94 --- /dev/null +++ b/configs/coat/coat_tiny_ascend.yaml @@ -0,0 +1,63 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet/' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m7-mstd0.5-inc1' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.4 + +# model config +model: 'coat_tiny' +drop_rate: 0.0 +drop_path_rate: 0.0 +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_policy: 'top_k' +ckpt_save_dir: './ckpt/' +dataset_sink_mode: True +amp_level: 'O2' +ema: True +ema_decay: 0.9995 + +# loss config +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler config +scheduler: 'warmup_cosine_decay' +lr: 0.00025 +min_lr: 0.000001 +warmup_epochs: 20 +decay_epochs: 280 +epoch_size: 300 + +# optimizer config +opt: 'lion' +weight_decay: 0.15 +filter_bias_and_bn: True +loss_scale: 4096 +use_nesterov: False +loss_scale_type: dynamic diff --git a/configs/convit/README.md b/configs/convit/README.md new file mode 100644 index 0000000..0bc0304 --- /dev/null +++ b/configs/convit/README.md @@ -0,0 +1,97 @@ +# ConViT +> [ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases](https://arxiv.org/abs/2103.10697) + +## Introduction + +ConViT combines the strengths of convolutional architectures and Vision Transformers (ViTs). +ConViT introduces gated positional self-attention (GPSA), a form of positional self-attention +that can be equipped with a “soft” convolutional inductive bias. +ConViT initializes the GPSA layers to mimic the locality of convolutional layers, +then gives each attention head the freedom to escape locality by adjusting a gating parameter +regulating the attention paid to position versus content information. +ConViT, outperforms the DeiT (Touvron et al., 2020) on ImageNet, +while offering a much improved sample efficiency.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of ConViT [1] +

+ + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------| +| convit_tiny | D910x8-G | 73.66 | 91.72 | 5.71 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny-e31023f2.ckpt) | +| convit_tiny_plus | D910x8-G | 77.00 | 93.60 | 9.97 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_tiny_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_tiny_plus-e9d7fb92.ckpt) | +| convit_small | D910x8-G | 81.63 | 95.59 | 27.78 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small-ba858604.ckpt) | +| convit_small_plus | D910x8-G | 81.80 | 95.42 | 48.98 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_small_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_small_plus-2352b9f7.ckpt) | +| convit_base | D910x8-G | 82.10 | 95.52 | 86.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base-c61b808c.ckpt) | +| convit_base_plus | D910x8-G | 81.96 | 95.04 | 153.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convit/convit_base_plus_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convit/convit_base_plus-5c61c9ce.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/convit/convit_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] d’Ascoli S, Touvron H, Leavitt M L, et al. Convit: Improving vision transformers with soft convolutional inductive biases[C]//International Conference on Machine Learning. PMLR, 2021: 2286-2296. diff --git a/configs/convit/convit_base_ascend.yaml b/configs/convit/convit_base_ascend.yaml new file mode 100644 index 0000000..df15b47 --- /dev/null +++ b/configs/convit/convit_base_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'autoaug-mstd0.5' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.93 +color_jitter: 0.4 + +# model +model: 'convit_base' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.0007 +min_lr: 0.000001 +warmup_epochs: 40 +decay_epochs: 260 + +# optimizer +opt: 'adamw' +weight_decay: 0.1 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_base_plus_ascend.yaml b/configs/convit/convit_base_plus_ascend.yaml new file mode 100644 index 0000000..d4a0ff4 --- /dev/null +++ b/configs/convit/convit_base_plus_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'autoaug-mstd0.5' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.925 +color_jitter: 0.4 + +# model +model: 'convit_base_plus' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.0007 +min_lr: 0.000001 +warmup_epochs: 40 +decay_epochs: 260 + +# optimizer +opt: 'adamw' +weight_decay: 0.1 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_small_ascend.yaml b/configs/convit/convit_small_ascend.yaml new file mode 100644 index 0000000..20feb9a --- /dev/null +++ b/configs/convit/convit_small_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 192 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'autoaug-mstd0.5' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.915 +color_jitter: 0.4 + +# model +model: 'convit_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.0007 +min_lr: 0.000001 +warmup_epochs: 40 +decay_epochs: 260 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_small_plus_ascend.yaml b/configs/convit/convit_small_plus_ascend.yaml new file mode 100644 index 0000000..ca6aa5a --- /dev/null +++ b/configs/convit/convit_small_plus_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 192 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'autoaug-mstd0.5' +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.93 +color_jitter: 0.4 + +# model +model: 'convit_small_plus' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.0007 +min_lr: 0.000001 +warmup_epochs: 40 +decay_epochs: 260 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_tiny_ascend.yaml b/configs/convit/convit_tiny_ascend.yaml new file mode 100644 index 0000000..3259047 --- /dev/null +++ b/configs/convit/convit_tiny_ascend.yaml @@ -0,0 +1,54 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +re_prob: 0.25 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.875 +color_jitter: 0.4 + +# model +model: 'convit_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.00072 +min_lr: 0.000001 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'adamw' +weight_decay: 0.0001 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_tiny_gpu.yaml b/configs/convit/convit_tiny_gpu.yaml new file mode 100644 index 0000000..642fd7c --- /dev/null +++ b/configs/convit/convit_tiny_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +re_prob: 0.25 +mixup: 0.8 +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] + +# model +model: 'convit_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0005 +min_lr: 0.00001 +warmup_epochs: 10 +decay_epochs: 200 + +# optimizer +opt: 'adamw' +weight_decay: 0.025 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convit/convit_tiny_plus_ascend.yaml b/configs/convit/convit_tiny_plus_ascend.yaml new file mode 100644 index 0000000..77fa42c --- /dev/null +++ b/configs/convit/convit_tiny_plus_ascend.yaml @@ -0,0 +1,54 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +re_prob: 0.25 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.87 +color_jitter: 0.4 + +# model +model: 'convit_tiny_plus' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.00072 +min_lr: 0.000001 +warmup_epochs: 40 +decay_epochs: 260 + +# optimizer +opt: 'adamw' +weight_decay: 0.0001 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/convnext/README.md b/configs/convnext/README.md new file mode 100644 index 0000000..49d93c4 --- /dev/null +++ b/configs/convnext/README.md @@ -0,0 +1,91 @@ +# ConvNeXt +> [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) + +## Introduction + +In this work, the authors reexamine the design spaces and test the limits of what a pure ConvNet can achieve. +The authors gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key +components that contribute to the performance difference along the way. The outcome of this exploration is a family of +pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably +with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy, while maintaining the +simplicity and efficiency of standard ConvNets.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of ConvNeXt [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------------|-----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------| +| ConvNeXt_tiny | D910x64-G | 81.91 | 95.79 | 28.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_tiny-ae5ff8d7.ckpt) | +| ConvNeXt_small | D910x64-G | 83.40 | 96.36 | 50.22 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_small-e23008f3.ckpt) | +| ConvNeXt_base | D910x64-G | 83.32 | 96.24 | 88.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/convnext/convnext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/convnext/convnext_base-ee3544b8.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/convnext/convnext_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 11976-11986. diff --git a/configs/convnext/convnext_base_ascend.yaml b/configs/convnext/convnext_base_ascend.yaml new file mode 100644 index 0000000..48b4065 --- /dev/null +++ b/configs/convnext/convnext_base_ascend.yaml @@ -0,0 +1,58 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 16 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: 'random' +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +crop_pct: 0.95 +mixup: 0.8 +cutmix: 1.0 + +# model +model: 'convnext_base' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +drop_path_rate: 0.5 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.002 +min_lr: 0.0000003 +decay_epochs: 430 +warmup_factor: 0.0000175 +warmup_epochs: 20 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: 'auto' +use_nesterov: False diff --git a/configs/convnext/convnext_small_ascend.yaml b/configs/convnext/convnext_small_ascend.yaml new file mode 100644 index 0000000..09c4f4c --- /dev/null +++ b/configs/convnext/convnext_small_ascend.yaml @@ -0,0 +1,58 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 16 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: 'random' +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +crop_pct: 0.95 +mixup: 0.8 +cutmix: 1.0 + +# model +model: 'convnext_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +drop_path_rate: 0.4 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.002 +min_lr: 0.0000003 +decay_epochs: 430 +warmup_factor: 0.0000175 +warmup_epochs: 20 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: 'auto' +use_nesterov: False diff --git a/configs/convnext/convnext_tiny_ascend.yaml b/configs/convnext/convnext_tiny_ascend.yaml new file mode 100644 index 0000000..359cbcb --- /dev/null +++ b/configs/convnext/convnext_tiny_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 16 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: 'random' +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +crop_pct: 0.95 +mixup: 0.8 +cutmix: 1.0 + +# model +model: 'convnext_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.002 +min_lr: 0.0000003 +decay_epochs: 430 +warmup_factor: 0.0000175 +warmup_epochs: 20 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: 'dynamic' +drop_overflow_update: True +use_nesterov: False diff --git a/configs/crossvit/README.md b/configs/crossvit/README.md new file mode 100644 index 0000000..3708d5c --- /dev/null +++ b/configs/crossvit/README.md @@ -0,0 +1,90 @@ +# Crossvit +> [CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification](https://arxiv.org/abs/2103.14899) + +## Introduction + +CrossViT is a type of vision transformer that uses a dual-branch architecture to extract multi-scale feature representations for image classification. The architecture combines image patches (i.e. tokens in a transformer) of different sizes to produce stronger visual features for image classification. It processes small and large patch tokens with two separate branches of different computational complexities and these tokens are fused together multiple times to complement each other. + +Fusion is achieved by an efficient cross-attention module, in which each transformer branch creates a non-patch token as an agent to exchange information with the other branch by attention. This allows for linear-time generation of the attention map in fusion instead of quadratic time otherwise.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of CrossViT [1] +

+ + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------| +| crossvit_9 | D910x8-G | 73.56 | 91.79 | 8.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_9_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_9-e74c8e18.ckpt) | +| crossvit_15 | D910x8-G | 81.08 | 95.33 | 27.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_15_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_15-eaa43c02.ckpt) | +| crossvit_18 | D910x8-G | 81.93 | 95.75 | 43.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/crossvit/crossvit_18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/crossvit/crossvit_18-ca0a2e43.ckpt) | + + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/crossvit/crossvit_15_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/crossvit/crossvit15_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Chun-Fu Chen, Quanfu Fan, Rameswar Panda. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification diff --git a/configs/crossvit/crossvit_15_ascend.yaml b/configs/crossvit/crossvit_15_ascend.yaml new file mode 100644 index 0000000..739be06 --- /dev/null +++ b/configs/crossvit/crossvit_15_ascend.yaml @@ -0,0 +1,65 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 320 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [ 0.08, 1.0 ] +ratio: [ 0.75, 1.333 ] +hflip: 0.5 +vflip: 0. +interpolation: 'bicubic' +auto_augment: randaug-m9-mstd0.5-inc1 +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +color_jitter: 0.4 +crop_pct: 0.935 +ema: True + +# model +model: 'crossvit15' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 600 +dataset_sink_mode: True +amp_level: 'O3' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.0009 +min_lr: 0.00001 +warmup_epochs: 30 +decay_epochs: 270 +decay_rate: 0.1 +num_cycles: 2 +cycle_decay: 1 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +filter_bias_and_bn: True +loss_scale: 512 +use_nesterov: False +eps: 1e-8 + +# Scheduler parameters +lr_epoch_stair: True diff --git a/configs/crossvit/crossvit_18_ascend.yaml b/configs/crossvit/crossvit_18_ascend.yaml new file mode 100644 index 0000000..ba874d5 --- /dev/null +++ b/configs/crossvit/crossvit_18_ascend.yaml @@ -0,0 +1,63 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [ 0.08, 1.0 ] +ratio: [ 0.75, 1.333 ] +hflip: 0.5 +vflip: 0. +interpolation: 'bicubic' +auto_augment: randaug-m9-mstd0.5-inc1 +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +color_jitter: 0.4 +crop_pct: 0.935 +ema: True + +# model +model: 'crossvit18' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O3' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.004 +min_lr: 0.00001 +warmup_epochs: 30 +decay_epochs: 270 +decay_rate: 0.1 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +filter_bias_and_bn: True +loss_scale: 1024 +use_nesterov: False +eps: 1e-8 + +# Scheduler parameters +lr_epoch_stair: True diff --git a/configs/crossvit/crossvit_9_ascend.yaml b/configs/crossvit/crossvit_9_ascend.yaml new file mode 100644 index 0000000..612dc5b --- /dev/null +++ b/configs/crossvit/crossvit_9_ascend.yaml @@ -0,0 +1,63 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 240 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0. +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +color_jitter: 0.4 +crop_pct: 0.935 + +# model +model: 'crossvit9' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0011 +min_lr: 0.00001 +warmup_epochs: 30 +decay_epochs: 270 +decay_rate: 0.1 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +filter_bias_and_bn: True +loss_scale_type: 'dynamic' +drop_overflow_update: True +use_nesterov: False +eps: 1e-8 + +# Scheduler parameters +lr_epoch_stair: True diff --git a/configs/densenet/README.md b/configs/densenet/README.md new file mode 100644 index 0000000..5a7666a --- /dev/null +++ b/configs/densenet/README.md @@ -0,0 +1,107 @@ +# DenseNet + +> [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993) + +## Introduction + + + +Recent work has shown that convolutional networks can be substantially deeper, more accurate, and more efficient to train if +they contain shorter connections between layers close to the input and those close to the output. Dense Convolutional +Network (DenseNet) is introduced based on this observation, which connects each layer to every other layer in a +feed-forward fashion. Whereas traditional convolutional networks with $L$ layers have $L$ connections-one between each +layer and its subsequent layer, DenseNet has $\frac{L(L+1)}{2}$ direct connections. For each layer, the feature maps +of all preceding layers are used as inputs, and their feature maps are used as inputs into all subsequent layers. +DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature +propagation, encourage feature reuse, and substantially reduce the number of parameters.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of DenseNet [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------| +| densenet_121 | D910x8-G | 75.64 | 92.84 | 8.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_121_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet121-120_5004_Ascend.ckpt) | +| densenet_161 | D910x8-G | 79.09 | 94.66 | 28.90 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_161_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet161-120_5004_Ascend.ckpt) | +| densenet_169 | D910x8-G | 77.26 | 93.71 | 14.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_169_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet169-120_5004_Ascend.ckpt) | +| densenet_201 | D910x8-G | 78.14 | 94.08 | 20.24 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/densenet/densenet_201_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/densenet/densenet201-120_5004_Ascend.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708. diff --git a/configs/densenet/densenet_121_ascend.yaml b/configs/densenet/densenet_121_ascend.yaml new file mode 100644 index 0000000..1720e12 --- /dev/null +++ b/configs/densenet/densenet_121_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet121' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_121_gpu.yaml b/configs/densenet/densenet_121_gpu.yaml new file mode 100644 index 0000000..6348516 --- /dev/null +++ b/configs/densenet/densenet_121_gpu.yaml @@ -0,0 +1,50 @@ +# system +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet121' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_161_ascend.yaml b/configs/densenet/densenet_161_ascend.yaml new file mode 100644 index 0000000..016d825 --- /dev/null +++ b/configs/densenet/densenet_161_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet161' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_161_gpu.yaml b/configs/densenet/densenet_161_gpu.yaml new file mode 100644 index 0000000..016d825 --- /dev/null +++ b/configs/densenet/densenet_161_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet161' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_169_ascend.yaml b/configs/densenet/densenet_169_ascend.yaml new file mode 100644 index 0000000..7ee1ff0 --- /dev/null +++ b/configs/densenet/densenet_169_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet169' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_169_gpu.yaml b/configs/densenet/densenet_169_gpu.yaml new file mode 100644 index 0000000..7ee1ff0 --- /dev/null +++ b/configs/densenet/densenet_169_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet169' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_201_ascend.yaml b/configs/densenet/densenet_201_ascend.yaml new file mode 100644 index 0000000..62ff557 --- /dev/null +++ b/configs/densenet/densenet_201_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet201' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/densenet/densenet_201_gpu.yaml b/configs/densenet/densenet_201_gpu.yaml new file mode 100644 index 0000000..62ff557 --- /dev/null +++ b/configs/densenet/densenet_201_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'densenet201' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 120 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/dpn/README.md b/configs/dpn/README.md new file mode 100644 index 0000000..2c1b546 --- /dev/null +++ b/configs/dpn/README.md @@ -0,0 +1,102 @@ +# Dual Path Networks (DPN) + +> [Dual Path Networks](https://arxiv.org/abs/1707.01629v2) + +## Introduction + + + +Figure 1 shows the model architecture of ResNet, DenseNet and Dual Path Networks. By combining the feature reusage of ResNet and new feature introduction of DenseNet, +DPN could enjoy both benefits so that it could share common features and maintain the flexibility to explore new features. As a result, DPN could achieve better performance with +fewer computation cost compared with ResNet and DenseNet on ImageNet-1K dataset.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of DPN [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| dpn92 | D910x8-G | 79.46 | 94.49 | 37.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn92_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn92-e3e0fca.ckpt) | +| dpn98 | D910x8-G | 79.94 | 94.57 | 61.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn98_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn98-119a8207.ckpt) | +| dpn107 | D910x8-G | 80.05 | 94.74 | 87.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn107_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn107-7d7df07b.ckpt) | +| dpn131 | D910x8-G | 80.07 | 94.72 | 79.48 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/dpn/dpn131_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/dpn/dpn131-47f084b3.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distrubted training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/dpn/dpn92_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/dpn/dpn92_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Chen Y, Li J, Xiao H, et al. Dual path networks[J]. Advances in neural information processing systems, 2017, 30. diff --git a/configs/dpn/dpn107_ascend.yaml b/configs/dpn/dpn107_ascend.yaml new file mode 100644 index 0000000..87428e3 --- /dev/null +++ b/configs/dpn/dpn107_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'dpn107' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 0.05 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'SGD' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/dpn/dpn131_ascend.yaml b/configs/dpn/dpn131_ascend.yaml new file mode 100644 index 0000000..f9b4843 --- /dev/null +++ b/configs/dpn/dpn131_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'dpn131' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 0.05 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'SGD' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/dpn/dpn92_ascend.yaml b/configs/dpn/dpn92_ascend.yaml new file mode 100644 index 0000000..493276c --- /dev/null +++ b/configs/dpn/dpn92_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'dpn92' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 0.05 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'SGD' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/dpn/dpn98_ascend.yaml b/configs/dpn/dpn98_ascend.yaml new file mode 100644 index 0000000..2e31ac0 --- /dev/null +++ b/configs/dpn/dpn98_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'dpn98' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 0.05 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'SGD' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/edgenext/README.md b/configs/edgenext/README.md new file mode 100644 index 0000000..061dee3 --- /dev/null +++ b/configs/edgenext/README.md @@ -0,0 +1,94 @@ +# EdgeNeXt + +> [EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications](https://arxiv.org/abs/2206.10589) + +## Introduction + +EdgeNeXt effectively combines the strengths of both CNN and Transformer models and is a +new efficient hybrid architecture. EdgeNeXt introduces a split depth-wise transpose +attention (SDTA) encoder that splits input tensors into multiple channel groups and +utilizes depth-wise convolution along with self-attention across channel dimensions +to implicitly increase the receptive field and encode multi-scale features.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of EdgeNeXt [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------| +| edgenext_xx_small | D910x8-G | 71.02 | 89.99 | 1.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_xx_small-afc971fb.ckpt) | +| edgenext_x_small | D910x8-G | 75.14 | 92.50 | 2.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_x_small-a200c6fc.ckpt) | +| edgenext_small | D910x8-G | 79.15 | 94.39 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_small-f530c372.ckpt) | +| edgenext_base | D910x8-G | 82.24 | 95.94 | 18.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/edgenext/edgenext_base_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/edgenext/edgenext_base-4335e9dc.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/edgenext/edgenext_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Maaz M, Shaker A, Cholakkal H, et al. EdgeNeXt: efficiently amalgamated CNN-transformer architecture for Mobile vision applications[J]. arXiv preprint arXiv:2206.10589, 2022. diff --git a/configs/edgenext/edgenext_base_ascend.yaml b/configs/edgenext/edgenext_base_ascend.yaml new file mode 100644 index 0000000..ad990ff --- /dev/null +++ b/configs/edgenext/edgenext_base_ascend.yaml @@ -0,0 +1,63 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.0 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' +ema: True +ema_decay: 0.9995 + +# model +model: 'edgenext_base' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +val_interval: 2 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 4.5e-3 +warmup_epochs: 20 +decay_rate: 0.1 +decay_epochs: 330 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/edgenext/edgenext_small_ascend.yaml b/configs/edgenext/edgenext_small_ascend.yaml new file mode 100644 index 0000000..5a7f755 --- /dev/null +++ b/configs/edgenext/edgenext_small_ascend.yaml @@ -0,0 +1,62 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.0 +cutmix: 1.0 +cutmix_prob: 0.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' +ema: True +ema_decay: 0.9995 + +# model +model: 'edgenext_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +val_interval: 2 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O3' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 4e-3 +warmup_epochs: 20 +decay_rate: 0.1 +decay_epochs: 330 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/edgenext/edgenext_x_small_ascend.yaml b/configs/edgenext/edgenext_x_small_ascend.yaml new file mode 100644 index 0000000..9d6eab7 --- /dev/null +++ b/configs/edgenext/edgenext_x_small_ascend.yaml @@ -0,0 +1,62 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.0 +cutmix: 1.0 +cutmix_prob: 0.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' +ema: True +ema_decay: 0.9995 + +# model +model: 'edgenext_x_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +val_interval: 2 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O3' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 4e-3 +warmup_epochs: 20 +decay_rate: 0.1 +decay_epochs: 330 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/edgenext/edgenext_xx_small_ascend.yaml b/configs/edgenext/edgenext_xx_small_ascend.yaml new file mode 100644 index 0000000..b8c3f7c --- /dev/null +++ b/configs/edgenext/edgenext_xx_small_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.0 +cutmix: 1.0 +cutmix_prob: 0.0 +ema: True +ema_decay: 0.9995 + +# model +model: 'edgenext_xx_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +val_interval: 2 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.0 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-6 +lr: 4e-3 +warmup_epochs: 20 +decay_rate: 0.1 +decay_epochs: 330 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/efficientnet/README.md b/configs/efficientnet/README.md new file mode 100644 index 0000000..b3ce5c5 --- /dev/null +++ b/configs/efficientnet/README.md @@ -0,0 +1,100 @@ +# EfficientNet + +> [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) + +## Introduction + + + +Figure 1 shows the methods from three dimensions -- width, depth, resolution and compound to expand the model. Increasing the model +size solely would cause the model performance to sub-optimal solution. Howerver, if three methods could be applied together into the model +, it is more likely to achieve optimal solution. By using neural architecture search, the best configurations for width scaling, depth scaling +and resolution scaling could be found. EfficientNet could achieve better model performance on ImageNet-1K dataset compared with previous methods.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of DPN [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------| +| efficientnet_b0 | D910x64-G | 76.95 | 93.16 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/efficientnet/efficientnet_b0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/efficientnet/efficientnet_b0-103ec70c.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 64 python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/efficientnet/efficientnet_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//International conference on machine learning. PMLR, 2019: 6105-6114. diff --git a/configs/efficientnet/efficientnet_b0_ascend.yaml b/configs/efficientnet/efficientnet_b0_ascend.yaml new file mode 100644 index 0000000..d941073 --- /dev/null +++ b/configs/efficientnet/efficientnet_b0_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +auto_augment: 'autoaug' + +# model +model: 'efficientnet_b0' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.128 +warmup_epochs: 5 +decay_epochs: 445 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale_type: 'dynamic' +drop_overflow_update: True +use_nesterov: False +eps: 1e-3 diff --git a/configs/googlenet/README.md b/configs/googlenet/README.md new file mode 100644 index 0000000..133585a --- /dev/null +++ b/configs/googlenet/README.md @@ -0,0 +1,89 @@ +# GoogLeNet +> [GoogLeNet: Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842) + +## Introduction + +GoogLeNet is a new deep learning structure proposed by Christian Szegedy in 2014. Prior to this, AlexNet, VGG and other +structures achieved better training effects by increasing the depth (number of layers) of the network, but the increase +in the number of layers It will bring many negative effects, such as overfit, gradient disappearance, gradient +explosion, etc. The proposal of inception improves the training results from another perspective: it can use computing +resources more efficiently, and can extract more features under the same amount of computing, thereby improving the +training results.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of GoogLENet [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| +| GoogLeNet | D910x8-G | 72.68 | 90.89 | 6.99 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/googlenet/googlenet_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/googlenet/googlenet-5552fcd3.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/googlenet/googlenet_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9. diff --git a/configs/googlenet/googlenet_ascend.yaml b/configs/googlenet/googlenet_ascend.yaml new file mode 100644 index 0000000..ed7416a --- /dev/null +++ b/configs/googlenet/googlenet_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'googlenet' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 150 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.045 +min_lr: 0.0 +decay_epochs: 145 +warmup_epochs: 5 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/hrnet/README.md b/configs/hrnet/README.md new file mode 100644 index 0000000..d6d2cee --- /dev/null +++ b/configs/hrnet/README.md @@ -0,0 +1,100 @@ +# HRNet + + +> [Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919) + +## Introduction + + +High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, the proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. It shows the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. + + + +

+ +

+

+ Figure 1. Architecture of HRNet [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| hrnet_w32 | D910x8-G | 80.64 | 95.44 | 41.30 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w32_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w32-cc4fbd91.ckpt) | +| hrnet_w48 | D910x8-G | 81.19 | 95.69 | 77.57 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/hrnet/hrnet_w48_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/hrnet/hrnet_w48-2e3399cd.ckpt) | + + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/hrnet/hrnet_w32_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + + +[1] Jingdong Wang, Ke Sun, Tianheng Cheng, et al. Deep High-Resolution Representation Learning for Visual Recognition[J]. arXiv preprint arXiv:1908.07919, 2019. diff --git a/configs/hrnet/hrnet_w32_ascend.yaml b/configs/hrnet/hrnet_w32_ascend.yaml new file mode 100644 index 0000000..2e7f3f6 --- /dev/null +++ b/configs/hrnet/hrnet_w32_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +auto_augment: "randaug-m7-mstd0.5" +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 + +# model +model: "hrnet_w32" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 5 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.00001 +lr: 0.001 +warmup_epochs: 20 +decay_epochs: 280 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True diff --git a/configs/hrnet/hrnet_w48_ascend.yaml b/configs/hrnet/hrnet_w48_ascend.yaml new file mode 100644 index 0000000..69486cb --- /dev/null +++ b/configs/hrnet/hrnet_w48_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +auto_augment: "randaug-m7-mstd0.5" +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 + +# model +model: "hrnet_w48" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 5 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.00001 +lr: 0.001 +warmup_epochs: 20 +decay_epochs: 280 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True diff --git a/configs/inception_v3/README.md b/configs/inception_v3/README.md new file mode 100644 index 0000000..1b0ad22 --- /dev/null +++ b/configs/inception_v3/README.md @@ -0,0 +1,90 @@ +# InceptionV3 +> [InceptionV3: Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/pdf/1512.00567.pdf) + +## Introduction + +InceptionV3 is an upgraded version of GoogleNet. One of the most important improvements of V3 is Factorization, which +decomposes 7x7 into two one-dimensional convolutions (1x7, 7x1), and 3x3 is the same (1x3, 3x1), such benefits, both It +can accelerate the calculation (excess computing power can be used to deepen the network), and can split 1 conv into 2 +convs, which further increases the network depth and increases the nonlinearity of the network. It is also worth noting +that the network input from 224x224 has become 299x299, and 35x35/17x17/8x8 modules are designed more precisely. In +addition, V3 also adds batch normalization, which makes the model converge more quickly, which plays a role in partial +regularization and effectively reduces overfitting.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of InceptionV3 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| +| Inception_v3 | D910x8-G | 79.11 | 94.40 | 27.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v3/inception_v3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v3/inception_v3-38f67890.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/inception_v3/inception_v3_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826. diff --git a/configs/inception_v3/inception_v3_ascend.yaml b/configs/inception_v3/inception_v3_ascend.yaml new file mode 100644 index 0000000..ec5da46 --- /dev/null +++ b/configs/inception_v3/inception_v3_ascend.yaml @@ -0,0 +1,54 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 299 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +auto_augment: 'autoaug' +re_prob: 0.25 +crop_pct: 0.875 + +# model +model: 'inception_v3' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' +aux_factor: 0.1 + +# loss config +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.045 +min_lr: 0.0 +decay_epochs: 195 +warmup_epochs: 5 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/inception_v4/README.md b/configs/inception_v4/README.md new file mode 100644 index 0000000..32e2b88 --- /dev/null +++ b/configs/inception_v4/README.md @@ -0,0 +1,87 @@ +# InceptionV4 +> [InceptionV4: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/pdf/1602.07261.pdf) + +## Introduction + +InceptionV4 studies whether the Inception module combined with Residual Connection can be improved. It is found that the +structure of ResNet can greatly accelerate the training, and the performance is also improved. An Inception-ResNet v2 +network is obtained, and a deeper and more optimized Inception v4 model is also designed, which can achieve comparable +performance with Inception-ResNet v2.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of InceptionV4 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| +| Inception_v4 | D910x8-G | 80.88 | 95.34 | 42.74 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/inception_v4/inception_v4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/inception_v4/inception_v4-db9c45b3.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/inception_v4/inception_v4_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017. diff --git a/configs/inception_v4/inception_v4_ascend.yaml b/configs/inception_v4/inception_v4_ascend.yaml new file mode 100644 index 0000000..758a939 --- /dev/null +++ b/configs/inception_v4/inception_v4_ascend.yaml @@ -0,0 +1,53 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 299 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +auto_augment: 'autoaug' +re_prob: 0.25 +crop_pct: 0.875 + +# model +model: 'inception_v4' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.045 +min_lr: 0.0 +decay_epochs: 195 +warmup_epochs: 5 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mixnet/README.md b/configs/mixnet/README.md new file mode 100644 index 0000000..bd9b005 --- /dev/null +++ b/configs/mixnet/README.md @@ -0,0 +1,91 @@ +# MixNet +> [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595) + +## Introduction + +Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often +overlooked. In this paper, the authors systematically study the impact of different kernel sizes, and observe that +combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, +the authors propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a +single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy +and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of MixNet [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------| +| MixNet_s | D910x8-G | 75.52 | 92.52 | 4.17 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_s-2a5ef3a3.ckpt) | +| MixNet_m | D910x8-G | 76.64 | 93.05 | 5.06 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_m_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_m-74cc4cb1.ckpt) | +| MixNet_l | D910x8-G | 78.73 | 94.31 | 7.38 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_l_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_l-978edf2b.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distrubted training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Tan M, Le Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv:1907.09595, 2019. diff --git a/configs/mixnet/mixnet_l_ascend.yaml b/configs/mixnet/mixnet_l_ascend.yaml new file mode 100644 index 0000000..e1ee07d --- /dev/null +++ b/configs/mixnet/mixnet_l_ascend.yaml @@ -0,0 +1,57 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.2 +cutmix: 1.0 + +# model +model: "mixnet_l" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.25 +min_lr: 0.00001 +decay_epochs: 580 +warmup_epochs: 20 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00002 +loss_scale_type: "dynamic" +drop_overflow_update: True +loss_scale: 16777216 +use_nesterov: False diff --git a/configs/mixnet/mixnet_m_ascend.yaml b/configs/mixnet/mixnet_m_ascend.yaml new file mode 100644 index 0000000..cb6d28a --- /dev/null +++ b/configs/mixnet/mixnet_m_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.2 +cutmix: 1.0 + +# model +model: "mixnet_m" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.2 +min_lr: 0.00001 +decay_epochs: 585 +warmup_epochs: 15 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00002 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mixnet/mixnet_s_ascend.yaml b/configs/mixnet/mixnet_s_ascend.yaml new file mode 100644 index 0000000..7034d23 --- /dev/null +++ b/configs/mixnet/mixnet_s_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.2 +cutmix: 1.0 + +# model +model: "mixnet_s" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.2 +min_lr: 0.00001 +decay_epochs: 585 +warmup_epochs: 15 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00002 +loss_scale: 256 +use_nesterov: False diff --git a/configs/mnasnet/README.md b/configs/mnasnet/README.md new file mode 100644 index 0000000..cf9d9d8 --- /dev/null +++ b/configs/mnasnet/README.md @@ -0,0 +1,86 @@ +# MnasNet +> [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626) + +## Introduction + +Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, the authors propose a novel factorized hierarchical search space that encourages layer diversity throughout the network.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of MnasNet [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|----------|-----------|-----------|------------|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------| +| MnasNet-B1-0_75 | D910x8-G | 71.81 | 90.53 | 3.20 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_075-465d366d.ckpt) | +| MnasNet-B1-1_0 | D910x8-G | 74.28 | 91.70 | 4.42 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_100-1bcf43f8.ckpt) | +| MnasNet-B1-1_4 | D910x8-G | 76.01 | 92.83 | 7.16 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mnasnet/mnasnet_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mnasnet/mnasnet_140-7e20bb30.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/mnasnet/mnasnet_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Tan M, Chen B, Pang R, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2820-2828. diff --git a/configs/mnasnet/mnasnet_0.75_ascend.yaml b/configs/mnasnet/mnasnet_0.75_ascend.yaml new file mode 100644 index 0000000..032de12 --- /dev/null +++ b/configs/mnasnet/mnasnet_0.75_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet0_75' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.016 +warmup_epochs: 5 +decay_epochs: 345 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mnasnet/mnasnet_0.75_gpu.yaml b/configs/mnasnet/mnasnet_0.75_gpu.yaml new file mode 100644 index 0000000..11e5d38 --- /dev/null +++ b/configs/mnasnet/mnasnet_0.75_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet0_75' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 350 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.012 +warmup_epochs: 5 +decay_epochs: 345 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mnasnet/mnasnet_1.0_ascend.yaml b/configs/mnasnet/mnasnet_1.0_ascend.yaml new file mode 100644 index 0000000..436921a --- /dev/null +++ b/configs/mnasnet/mnasnet_1.0_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.016 +warmup_epochs: 5 +decay_epochs: 445 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mnasnet/mnasnet_1.0_gpu.yaml b/configs/mnasnet/mnasnet_1.0_gpu.yaml new file mode 100644 index 0000000..78ef46a --- /dev/null +++ b/configs/mnasnet/mnasnet_1.0_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.012 +warmup_epochs: 5 +decay_epochs: 445 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mnasnet/mnasnet_1.4_ascend.yaml b/configs/mnasnet/mnasnet_1.4_ascend.yaml new file mode 100644 index 0000000..9e0ceef --- /dev/null +++ b/configs/mnasnet/mnasnet_1.4_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet1_4' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.016 +warmup_epochs: 5 +decay_epochs: 395 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mnasnet/mnasnet_1.4_gpu.yaml b/configs/mnasnet/mnasnet_1.4_gpu.yaml new file mode 100644 index 0000000..942af47 --- /dev/null +++ b/configs/mnasnet/mnasnet_1.4_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mnasnet1_4' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.008 +warmup_epochs: 5 +decay_epochs: 395 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale: 256 +use_nesterov: False +eps: 1e-3 diff --git a/configs/mobilenetv1/README.md b/configs/mobilenetv1/README.md new file mode 100644 index 0000000..e612651 --- /dev/null +++ b/configs/mobilenetv1/README.md @@ -0,0 +1,87 @@ +# MobileNetV1 +> [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) + +## Introduction + +Compared with the traditional convolutional neural network, MobileNetV1's parameters and the amount of computation are greatly reduced on the premise that the accuracy rate is slightly reduced. (Compared to VGG16, the accuracy rate is reduced by 0.9%, but the model parameters are only 1/32 of VGG). The model is based on a streamlined architecture that uses depthwise separable convolutions to build lightweight deep neural networks. At the same time, two simple global hyperparameters are introduced, which can effectively trade off latency and accuracy.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of MobileNetV1 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------| +| MobileNet_v1_025 | D910x8-G | 53.87 | 77.66 | 0.47 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_025-d3377fba.ckpt) | +| MobileNet_v1_050 | D910x8-G | 65.94 | 86.51 | 1.34 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_050-23e9ddbe.ckpt) | +| MobileNet_v1_075 | D910x8-G | 70.44 | 89.49 | 2.60 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_075-5bed0c73.ckpt) | +| MobileNet_v1_100 | D910x8-G | 72.95 | 91.01 | 4.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv1/mobilenet_v1_100-91c7b206.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017. diff --git a/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml b/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml new file mode 100644 index 0000000..c900255 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.25_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_025_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_0.25_gpu.yaml b/configs/mobilenetv1/mobilenet_v1_0.25_gpu.yaml new file mode 100644 index 0000000..c900255 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.25_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_025_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml b/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml new file mode 100644 index 0000000..69c0417 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.5_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_050_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_0.5_gpu.yaml b/configs/mobilenetv1/mobilenet_v1_0.5_gpu.yaml new file mode 100644 index 0000000..69c0417 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.5_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_050_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml b/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml new file mode 100644 index 0000000..aeaf56c --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.75_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_075_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_0.75_gpu.yaml b/configs/mobilenetv1/mobilenet_v1_0.75_gpu.yaml new file mode 100644 index 0000000..aeaf56c --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_0.75_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_075_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml b/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml new file mode 100644 index 0000000..5349cc1 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_1.0_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_100_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv1/mobilenet_v1_1.0_gpu.yaml b/configs/mobilenetv1/mobilenet_v1_1.0_gpu.yaml new file mode 100644 index 0000000..4a98ec2 --- /dev/null +++ b/configs/mobilenetv1/mobilenet_v1_1.0_gpu.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v1_100_224' +num_classes: 1001 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 80 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 2 +decay_epochs: 198 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv2/README.md b/configs/mobilenetv2/README.md new file mode 100644 index 0000000..cf4ea50 --- /dev/null +++ b/configs/mobilenetv2/README.md @@ -0,0 +1,88 @@ +# MobileNetV2 +> [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) + +## Introduction + +The model is a new neural network architecture that is specifically tailored for mobile and resource-constrained environments. This network pushes the state of the art for mobile custom computer vision models, significantly reducing the amount of operations and memory required while maintaining the same accuracy. + +The main innovation of the model is the proposal of a new layer module: The Inverted Residual with Linear Bottleneck. The module takes as input a low-dimensional compressed representation that is first extended to high-dimensionality and then filtered with lightweight depth convolution. Linear convolution is then used to project the features back to the low-dimensional representation.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of MobileNetV2 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| +| MobileNet_v2_075 | D910x8-G | 69.76 | 89.28 | 2.66 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_075-243f9404.ckpt) | +| MobileNet_v2_100 | D910x8-G | 72.02 | 90.92 | 3.54 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_100-52122156.ckpt) | +| MobileNet_v2_140 | D910x8-G | 74.97 | 92.32 | 6.15 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_140-015cfb04.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520. diff --git a/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml new file mode 100644 index 0000000..32fb95a --- /dev/null +++ b/configs/mobilenetv2/mobilenet_v2_0.75_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 32 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v2_075_224' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 50 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.3 +warmup_epochs: 4 +decay_epochs: 396 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00003 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml new file mode 100644 index 0000000..4e2f613 --- /dev/null +++ b/configs/mobilenetv2/mobilenet_v2_1.0_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 32 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v2_100_224' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 100 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 4 +decay_epochs: 296 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml b/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml new file mode 100644 index 0000000..f31e462 --- /dev/null +++ b/configs/mobilenetv2/mobilenet_v2_1.4_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 32 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v2_140_224' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 50 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 4 +decay_epochs: 296 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv3/README.md b/configs/mobilenetv3/README.md new file mode 100644 index 0000000..ff07af2 --- /dev/null +++ b/configs/mobilenetv3/README.md @@ -0,0 +1,87 @@ +# MobileNetV3 +> [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) + +## Introduction + +MobileNet v3 was published in 2019, and this v3 version combines the deep separable convolution of v1, the Inverted Residuals and Linear Bottleneck of v2, and the SE module to search the configuration and parameters of the network using NAS (Neural Architecture Search).MobileNetV3 first uses MnasNet to perform a coarse structure search, and then uses reinforcement learning to select the optimal configuration from a set of discrete choices. Afterwards, MobileNetV3 then fine-tunes the architecture using NetAdapt, which exemplifies NetAdapt's complementary capability to tune underutilized activation channels with a small drop. + +mobilenet-v3 offers two versions, mobilenet-v3 large and mobilenet-v3 small, for situations with different resource requirements. The paper mentions that mobilenet-v3 small, for the imagenet classification task, has an accuracy The paper mentions that mobilenet-v3 small achieves about 3.2% better accuracy and 15% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves about 4.6% better accuracy and 5% less time than mobilenet-v2 for the imagenet classification task, mobilenet-v3 large achieves the same accuracy and 25% faster speedup in COCO compared to v2 The improvement in the segmentation algorithm is also observed.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of MobileNetV3 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------| +| MobileNetV3_small_100 | D910x8-G | 67.81 | 87.82 | 2.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_small_100-c884b105.ckpt) | +| MobileNetV3_large_100 | D910x8-G | 75.14 | 92.33 | 5.51 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_100-6f5bf961.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/mobilenetv3/mobilenet_v3_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324. diff --git a/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml b/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml new file mode 100644 index 0000000..cc99885 --- /dev/null +++ b/configs/mobilenetv3/mobilenet_v3_large_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 75 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v3_large_100' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 420 +dataset_sink_mode: True +amp_level: 'O0' + +# loss config +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 1.08 +warmup_epochs: 4 +decay_epochs: 416 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml b/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml new file mode 100644 index 0000000..75d802a --- /dev/null +++ b/configs/mobilenetv3/mobilenet_v3_small_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 75 +drop_remainder: True +train_split: 'train' + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'mobilenet_v3_small_100' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 470 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.77 +warmup_epochs: 4 +decay_epochs: 466 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/mobilevit/README.md b/configs/mobilevit/README.md new file mode 100644 index 0000000..2cf586d --- /dev/null +++ b/configs/mobilevit/README.md @@ -0,0 +1,81 @@ +# MobileViT +> [MobileViT:Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/pdf/2110.02178.pdf) + +## Introduction + +MobileViT, a light-weight and general-purpose vision transformer for mobile devices. MobileViT presents a different perspective for the global processing of information with transformers, i.e., transformers as convolutions. MobileViT significantly outperforms CNN- and ViT-based networks across different tasks and datasets. On the ImageNet-1k dataset, MobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters, which is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT (ViT-based) for a similar number of parameters. On the MS-COCO object detection task, MobileViT is 5.7% more accurate than MobileNetv3 for a similar number of parameters. + +

+ +

+

+ Figure 1. Architecture of MobileViT [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------| +| mobilevit_xx_small | D910x8-G | 68.90 | 88.92 | 1.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_xx_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_xx_small-af9da8a0.ckpt) | +| mobilevit_x_small | D910x8-G | 74.98 | 92.33 | 2.32 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_x_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_x_small-673fc6f2.ckpt) | +| mobilevit_small | D910x8-G | 78.48 | 94.18 | 5.59 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mobilevit/mobilevit_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mobilevit/mobilevit_small-caf79638.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/mobilevit/mobilevit_xx_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. diff --git a/configs/mobilevit/mobilevit_small_ascend.yaml b/configs/mobilevit/mobilevit_small_ascend.yaml new file mode 100644 index 0000000..97bd0de --- /dev/null +++ b/configs/mobilevit/mobilevit_small_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 1.0 + +# model +model: 'mobilevit_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.000002 +lr: 0.002 +warmup_epochs: 20 +decay_epochs: 430 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.01 +use_nesterov: False +loss_scale_type: 'dynamic' +drop_overflow_update: True +loss_scale: 1024 diff --git a/configs/mobilevit/mobilevit_x_small_ascend.yaml b/configs/mobilevit/mobilevit_x_small_ascend.yaml new file mode 100644 index 0000000..388c93d --- /dev/null +++ b/configs/mobilevit/mobilevit_x_small_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 1.0 + +# model +model: 'mobilevit_x_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.000002 +lr: 0.002 +warmup_epochs: 20 +decay_epochs: 430 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.01 +use_nesterov: False +loss_scale_type: 'dynamic' +drop_overflow_update: True +loss_scale: 1024 diff --git a/configs/mobilevit/mobilevit_xx_small_ascend.yaml b/configs/mobilevit/mobilevit_xx_small_ascend.yaml new file mode 100644 index 0000000..bdbc410 --- /dev/null +++ b/configs/mobilevit/mobilevit_xx_small_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 256 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 1.0 + +# model +model: 'mobilevit_xx_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.000002 +lr: 0.002 +warmup_epochs: 40 +decay_epochs: 410 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.01 +use_nesterov: False +loss_scale_type: 'dynamic' +drop_overflow_update: True +loss_scale: 1024 diff --git a/configs/nasnet/README.md b/configs/nasnet/README.md new file mode 100644 index 0000000..f648da2 --- /dev/null +++ b/configs/nasnet/README.md @@ -0,0 +1,100 @@ +# NasNet + +> [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012) + +## Introduction + + + +Neural architecture search (NAS) shows the flexibility on model configuration. By doing neural architecture search in a pooling with convolution layer, max pooling and average pooling layer, +the normal cell and the reduction cell are selected to be part of NasNet. Figure 1 shows NasNet architecture for ImageNet, which are stacked with reduction cell and normal cell. +In conclusion, NasNet could achieve better model performance with fewer model parametes and fewer computation cost on image classification +compared with previous state-of-the-art methods on ImageNet-1K dataset.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of Nasnet [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------| +| nasnet_a_4x1056 | D910x8-G | 73.65 | 91.25 | 5.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/nasnet/nasnet_a_4x1056_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/nasnet/nasnet_a_4x1056-0fbb5cdd.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/nasnet/nasnet_a_4x1056_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8697-8710. diff --git a/configs/nasnet/nasnet_a_4x1056_ascend.yaml b/configs/nasnet/nasnet_a_4x1056_ascend.yaml new file mode 100644 index 0000000..829b73d --- /dev/null +++ b/configs/nasnet/nasnet_a_4x1056_ascend.yaml @@ -0,0 +1,53 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'nasnet_a_4x1056' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 1e-10 +lr: 0.016 +warmup_epochs: 5 +decay_epochs: 445 + +# optimizer +opt: 'rmsprop' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1e-5 +loss_scale_type: 'dynamic' +drop_overflow_update: True +use_nesterov: False +eps: 1e-3 diff --git a/configs/pit/README.md b/configs/pit/README.md new file mode 100644 index 0000000..3726ebb --- /dev/null +++ b/configs/pit/README.md @@ -0,0 +1,89 @@ +# PiT +> [PiT: Rethinking Spatial Dimensions of Vision Transformers](https://arxiv.org/abs/2103.16302v2) + +## Introduction + +PiT (Pooling-based Vision Transformer) is an improvement of Vision Transformer (ViT) model proposed by Byeongho Heo in 2021. PiT adds pooling layer on the basis of ViT model, so that the spatial dimension of each layer is reduced like CNN, instead of ViT using the same spatial dimension for all layers. PiT achieves the improved model capability and generalization performance against ViT. [[1](#references)] + + +

+ + +

+

+ Figure 1. Architecture of PiT [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| PiT_ti | D910x8-G | 72.96 | 91.33 | 4.85 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_ti_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_ti-e647a593.ckpt) | +| PiT_xs | D910x8-G | 78.41 | 94.06 | 10.61 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_xs_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_xs-fea0d37e.ckpt) | +| PiT_s | D910x8-G | 80.56 | 94.80 | 23.46 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_s-3c1ba36f.ckpt) | +| PiT_b | D910x8-G | 81.87 | 95.04 | 73.76 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pit/pit_b_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pit/pit_b-2411c9b6.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/pit/pit_xs_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/pit/pit_xs_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Heo B, Yun S, Han D, et al. Rethinking spatial dimensions of vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11936-11945. diff --git a/configs/pit/pit_b_ascend.yaml b/configs/pit/pit_b_ascend.yaml new file mode 100644 index 0000000..07d93da --- /dev/null +++ b/configs/pit/pit_b_ascend.yaml @@ -0,0 +1,60 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.9 +mixup: 0.8 +cutmix: 1.0 +aug_repeats: 3 + +# model +model: "pit_b" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +drop_path_rate: 0.2 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.001 +min_lr: 0.00001 +lr_epoch_stair: True +decay_epochs: 590 +warmup_epochs: 10 +warmup_factor: 0.002 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "auto" +use_nesterov: False diff --git a/configs/pit/pit_s_ascend.yaml b/configs/pit/pit_s_ascend.yaml new file mode 100644 index 0000000..00dccd6 --- /dev/null +++ b/configs/pit/pit_s_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +color_jitter: 0.3 +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pit_s" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.002 +min_lr: 0.00001 +lr_epoch_stair: True +decay_epochs: 590 +warmup_epochs: 10 +warmup_factor: 0.002 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/pit/pit_ti_ascend.yaml b/configs/pit/pit_ti_ascend.yaml new file mode 100644 index 0000000..2a889e1 --- /dev/null +++ b/configs/pit/pit_ti_ascend.yaml @@ -0,0 +1,57 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pit_ti" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 500 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.002 +min_lr: 0.00001 +decay_epochs: 490 +warmup_epochs: 10 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "auto" +use_nesterov: False diff --git a/configs/pit/pit_xs_ascend.yaml b/configs/pit/pit_xs_ascend.yaml new file mode 100644 index 0000000..465ce9d --- /dev/null +++ b/configs/pit/pit_xs_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +color_jitter: [0.3, 0.3, 0.3] +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pit_xs" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 600 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.001 +min_lr: 0.00001 +decay_epochs: 590 +warmup_epochs: 10 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/poolformer/README.md b/configs/poolformer/README.md new file mode 100644 index 0000000..ab0a233 --- /dev/null +++ b/configs/poolformer/README.md @@ -0,0 +1,83 @@ +# PoolFormer + +> [MetaFormer Is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) + +## Introduction + +Instead of designing complicated token mixer to achieve SOTA performance, the target of this work is to demonstrate the competence of Transformer models largely stem from the general architecture MetaFormer. Pooling/PoolFormer are just the tools to support the authors' claim. + +![MetaFormer](https://user-images.githubusercontent.com/74176172/210046827-c218f5d3-1ee8-47bf-a78a-482d821ece89.png) +Figure 1. MetaFormer and performance of MetaFormer-based models on ImageNet-1K validation set. The authors argue that the competence of Transformer/MLP-like models primarily stem from the general architecture MetaFormer instead of the equipped specific token mixers. To demonstrate this, they exploit an embarrassingly simple non-parametric operator, pooling, to conduct extremely basic token mixing. Surprisingly, the resulted model PoolFormer consistently outperforms the DeiT and ResMLP as shown in (b), which well supports that MetaFormer is actually what we need to achieve competitive performance. RSB-ResNet in (b) means the results are from “ResNet Strikes Back” where ResNet is trained with improved training procedure for 300 epochs. + +![PoolFormer](https://user-images.githubusercontent.com/74176172/210046845-6caa1574-b6a4-47f3-8298-c8ca3b4f8fa4.png) +Figure 2. (a) The overall framework of PoolFormer. (b) The architecture of PoolFormer block. Compared with Transformer block, it replaces attention with an extremely simple non-parametric operator, pooling, to conduct only basic token mixing.[[1](#References)] + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|:--------------:|:--------:|:---------:|:---------:|:----------:|---------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------| +| poolformer_s12 | D910x8-G | 77.33 | 93.34 | 11.92 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/poolformer/poolformer_s12_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/poolformer/poolformer_s12-5be5c4e4.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +- Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/poolformer/poolformer_s12_ascend.yaml --data_dir /path/to/imagenet --distribute False +``` + +### Validation + +``` +validation of poolformer has to be done in amp O3 mode which is not supported, coming soon... +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1]. Yu W, Luo M, Zhou P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10819-10829. diff --git a/configs/poolformer/poolformer_s12_ascend.yaml b/configs/poolformer/poolformer_s12_ascend.yaml new file mode 100644 index 0000000..470ce8b --- /dev/null +++ b/configs/poolformer/poolformer_s12_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.9 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' + +# model +model: 'poolformer_s12' +drop_rate: 0.0 +drop_path_rate: 0.1 +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 600 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0005 +min_lr: 1e-06 +warmup_epochs: 30 +decay_epochs: 570 +decay_rate: 0.1 + +# optimizer +opt: 'AdamW' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/pvt/README.md b/configs/pvt/README.md new file mode 100644 index 0000000..fa3c1fc --- /dev/null +++ b/configs/pvt/README.md @@ -0,0 +1,84 @@ +# Pyramid Vision Transformer + +> [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/abs/2102.12122) + +## Introduction + +PVT is a general backbone network for dense prediction without convolution operation. PVT introduces a pyramid structure in Transformer to generate multi-scale feature maps for dense prediction tasks. PVT uses a gradual reduction strategy to control the size of the feature maps through the patch embedding layer, and proposes a spatial reduction attention (SRA) layer to replace the traditional multi head attention layer in the encoder, which greatly reduces the computing/memory overhead.[[1](#References)] + +![PVT](https://user-images.githubusercontent.com/74176172/210046926-2322161b-a963-4603-b3cb-86ecdca41262.png) + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|:----------:|:--------:|:---------:|:---------:|:----------:|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------| +| PVT_tiny | D910x8-G | 74.81 | 92.18 | 13.23 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_tiny-6abb953d.ckpt) | +| PVT_small | D910x8-G | 79.66 | 94.71 | 24.49 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_small-213c2ed1.ckpt) | +| PVT_medium | D910x8-G | 81.82 | 95.81 | 44.21 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_medium_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_medium-469e6802.ckpt) | +| PVT_large | D910x8-G | 81.75 | 95.70 | 61.36 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt/pvt_large_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt/pvt_large-bb6895d7.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +- Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/pvt/pvt_tiny_ascend.yaml --data_dir /path/to/imagenet --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py --model=pvt_tiny --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1]. Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 568-578. diff --git a/configs/pvt/pvt_large_ascend.yaml b/configs/pvt/pvt_large_ascend.yaml new file mode 100644 index 0000000..0f9f669 --- /dev/null +++ b/configs/pvt/pvt_large_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.0 + +# model +model: 'pvt_large' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.3 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.001 +min_lr: 0.000001 +warmup_epochs: 10 +decay_epochs: 390 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 300 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/pvt/pvt_medium_ascend.yaml b/configs/pvt/pvt_medium_ascend.yaml new file mode 100644 index 0000000..a14856a --- /dev/null +++ b/configs/pvt/pvt_medium_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.0 + +# model +model: 'pvt_medium' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.3 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.001 +min_lr: 0.000001 +warmup_epochs: 10 +decay_epochs: 390 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/pvt/pvt_small_ascend.yaml b/configs/pvt/pvt_small_ascend.yaml new file mode 100644 index 0000000..db14b59 --- /dev/null +++ b/configs/pvt/pvt_small_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.0 + +# model +model: 'pvt_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 500 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0005 +min_lr: 0.000001 +warmup_epochs: 10 +decay_epochs: 390 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/pvt/pvt_tiny_ascend.yaml b/configs/pvt/pvt_tiny_ascend.yaml new file mode 100644 index 0000000..929e8d5 --- /dev/null +++ b/configs/pvt/pvt_tiny_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bicubic' +auto_augment: 'randaug-m9-mstd0.5-inc1' +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.9 +color_jitter: 0.0 + +# model +model: 'pvt_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 450 +dataset_sink_mode: True +amp_level: 'O2' +drop_path_rate: 0.1 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.0005 +min_lr: 0.000001 +warmup_epochs: 10 +decay_epochs: 440 + +# optimizer +opt: 'adamw' +weight_decay: 0.05 +loss_scale: 1024 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/pvt_v2/README.md b/configs/pvt_v2/README.md new file mode 100644 index 0000000..e52f477 --- /dev/null +++ b/configs/pvt_v2/README.md @@ -0,0 +1,90 @@ +# PVTV2 +> [PVT v2: Improved Baselines with Pyramid Vision Transformer](https://arxiv.org/abs/2106.13797) + +## Introduction + +In this work, the authors present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding +three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional +feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linear and +achieves significant improvements on fundamental vision tasks such as classification, detection, and +segmentation.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of PVTV2 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------| +| PVTV2_b0 | D910x8-G | 71.50 | 90.60 | 3.67 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt_v2/pvt_v2_b0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt_v2/pvt_v2_b0-1c4f6683.ckpt) | +| PVTV2_b1 | D910x8-G | 78.91 | 94.49 | 14.01 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt_v2/pvt_v2_b1_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt_v2/pvt_v2_b1-3ceb171a.ckpt) | +| PVTV2_b2 | D910x8-G | 81.99 | 95.74 | 25.35 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/pvt_v2/pvt_v2_b2_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/pvt_v2/pvt_v2_b2-0565d18e.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distrubted training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/pvt_v2/pvt_v2_b0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Wang W, Xie E, Li X, et al. Pvt v2: Improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8(3): 415-424. diff --git a/configs/pvt_v2/pvt_v2_b0_ascend.yaml b/configs/pvt_v2/pvt_v2_b0_ascend.yaml new file mode 100644 index 0000000..3926992 --- /dev/null +++ b/configs/pvt_v2/pvt_v2_b0_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +auto_augment: randaug-m9-mstd0.5-inc1 +re_prob: 0.25 +crop_pct: 0.9 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pvt_v2_b0" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 500 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.001 +min_lr: 0.00001 +lr_epoch_stair: True +decay_epochs: 490 +warmup_epochs: 10 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/pvt_v2/pvt_v2_b1_ascend.yaml b/configs/pvt_v2/pvt_v2_b1_ascend.yaml new file mode 100644 index 0000000..559c18c --- /dev/null +++ b/configs/pvt_v2/pvt_v2_b1_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.9 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pvt_v2_b1" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 500 +drop_path_rate: 0.1 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.001 +min_lr: 0.00001 +lr_epoch_stair: True +decay_epochs: 490 +warmup_epochs: 10 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/pvt_v2/pvt_v2_b2_ascend.yaml b/configs/pvt_v2/pvt_v2_b2_ascend.yaml new file mode 100644 index 0000000..3945eb0 --- /dev/null +++ b/configs/pvt_v2/pvt_v2_b2_ascend.yaml @@ -0,0 +1,58 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +re_value: "random" +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.9 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "pvt_v2_b2" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 550 +drop_path_rate: 0.2 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "ce" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.001 +min_lr: 0.000001 +decay_epochs: 530 +warmup_epochs: 20 + +# optimizer +opt: "adamw" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.05 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/regnet/README.md b/configs/regnet/README.md new file mode 100644 index 0000000..f8dc9ac --- /dev/null +++ b/configs/regnet/README.md @@ -0,0 +1,81 @@ +# RegNet + +> [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) + +## Introduction + +In this work, the authors present a new network design paradigm that combines the advantages of manual design and NAS. Instead of focusing on designing individual network instances, they design design spaces that parametrize populations of networks. Like in manual design, the authors aim for interpretability and to discover general design principles that describe networks that are simple, work well, and generalize across settings. Like in NAS, the authors aim to take advantage of semi-automated procedures to help achieve these goals The general strategy they adopt is to progressively design simplified versions of an initial, relatively unconstrained, design space while maintaining or improving its quality. The overall process is analogous to manual design, elevated to the population level and guided via distribution estimates of network design spaces. As a testbed for this paradigm, their focus is on exploring network structure (e.g., width, depth, groups, etc.) assuming standard model families including VGG, ResNet, and ResNeXt. The authors start with a relatively unconstrained design space they call AnyNet (e.g., widths and depths vary freely across stages) and apply their humanin-the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, that they call RegNet. The core of the RegNet design space is simple: stage widths and depths are determined by a quantized linear function. Compared to AnyNet, the RegNet design space has simpler models, is easier to interpret, and has a higher concentration of good models.[[1](#References)] + +![RegNet](https://user-images.githubusercontent.com/74176172/210046899-4e83bb56-f7f6-49b2-9dde-dce200428e92.png) + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|:--------------:|:--------:|:---------:|:---------:|:----------:|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------| +| regnet_x_800mf | D910x8-G | 76.04 | 92.97 | 7.26 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/regnet/regnet_x_800mf_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/regnet/regnet_x_800mf-617227f4.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +- Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/regnet/regnet_x_800mf_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/regnet/regnet_x_800mf_ascend.yaml --data_dir /path/to/imagenet --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py --model=regnet_x_800mf --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1]. Radosavovic I, Kosaraju R P, Girshick R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10428-10436. diff --git a/configs/regnet/regnet_x_800mf_ascend.yaml b/configs/regnet/regnet_x_800mf_ascend.yaml new file mode 100644 index 0000000..da384bf --- /dev/null +++ b/configs/regnet/regnet_x_800mf_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +val_while_train: True +val_interval: 1 +log_interval: 100 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +num_parallel_workers: 8 +batch_size: 64 + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +color_jitter: 0.4 +re_prob: 0.1 + +# model +model: 'regnet_x_800mf' +num_classes: 1000 +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +ckpt_save_interval: 1 +ckpt_save_policy: 'latest_k' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +warmup_factor: 0.01 +decay_epochs: 195 +lr_epoch_stair: True + +# optimizer +opt: 'momentum' +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 128 +use_nesterov: False +filter_bias_and_bn: True diff --git a/configs/repmlp/README.md b/configs/repmlp/README.md new file mode 100644 index 0000000..9663833 --- /dev/null +++ b/configs/repmlp/README.md @@ -0,0 +1,91 @@ +# RepMLPNet + +> [RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality](https://arxiv.org/abs/2112.11081) + +## Introduction + +Compared to convolutional layers, fully-connected (FC) layers are better at modeling the long-range dependencies +but worse at capturing the local patterns, hence usually less favored for image recognition. In this paper, the authors propose a +methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a +parallel conv kernel into the FC kernel. Locality Injection can be viewed as a novel Structural Re-parameterization +method since it equivalently converts the structures via transforming the parameters. Based on that, the authors propose a +multi-layer-perceptron (MLP) block named RepMLP Block, which uses three FC layers to extract features, and a novel +architecture named RepMLPNet. The hierarchical design distinguishes RepMLPNet from the other concurrently proposed vision MLPs. +As it produces feature maps of different levels, it qualifies as a backbone model for downstream tasks like semantic segmentation. +Their results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency +trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation. + +![RepMLP](https://user-images.githubusercontent.com/74176172/210046952-c4f05321-76e9-4d7a-b419-df91aac64cdf.png) +Figure 1. RepMLP Block.[[1](#References)] + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|:-----------:|:--------:|:---------:|:---------:|:----------:|--------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------| +| repmlp_t224 | D910x8-G | 76.68 | 93.30 | 38.30 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repmlp/repmlp_t224_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repmlp/repmlp_t224-8dbedd00.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +- Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/repmlp/repmlp_t224_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/repmlp/repmlp_t224_ascend.yaml --data_dir /path/to/imagenet --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py --model=RepMLPNet_T224 --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1]. Ding X, Chen H, Zhang X, et al. Repmlpnet: Hierarchical vision mlp with re-parameterized locality[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 578-587. diff --git a/configs/repmlp/repmlp_t224_ascend.yaml b/configs/repmlp/repmlp_t224_ascend.yaml new file mode 100644 index 0000000..17c8d2a --- /dev/null +++ b/configs/repmlp/repmlp_t224_ascend.yaml @@ -0,0 +1,62 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +vflip: 0.0 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.25 +re_ratio: [0.3, 3.333] +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' + + +# model +model: 'RepMLPNet_T224' +num_classes: 1000 +in_channels: 3 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'ce' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr: 0.005 +min_lr: 1e-5 +warmup_epochs: 10 +decay_epochs: 290 +decay_rate: 0.01 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 2e-05 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/repvgg/README.md b/configs/repvgg/README.md new file mode 100644 index 0000000..b5cd384 --- /dev/null +++ b/configs/repvgg/README.md @@ -0,0 +1,105 @@ +# RepVGG + +> [RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697) + +## Introduction + + + +The key idea of Repvgg is that by using re-parameterization, the model architecture could be trained in multi-branch but validated in single branch. +Figure 1 shows the basic model architecture of Repvgg. By utilizing different values for a and b, we could get various repvgg models. +Repvgg could achieve better model performance with smaller model parameters on ImageNet-1K dataset compared with previous methods.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of Repvgg [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------|-----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------| +| repvgg_a0 | D910x8-G | 72.19 | 90.75 | 9.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_a0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_a0-6e71139d.ckpt) | +| repvgg_a1 | D910x8-G | 74.19 | 91.89 | 14.12 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_a1_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_a1-539513ac.ckpt) | +| repvgg_a2 | D910x8-G | 76.63 | 93.42 | 28.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_a2_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_a2-cdc90b11.ckpt) | +| repvgg_b0 | D910x8-G | 74.99 | 92.40 | 15.85 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_b0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_b0-54d5862c.ckpt) | +| repvgg_b1 | D910x8-G | 78.81 | 94.37 | 57.48 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_b1_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_b1-4673797.ckpt) | +| repvgg_b2 | D910x64-G | 79.29 | 94.66 | 89.11 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_b2_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_b2-7c91ccd4.ckpt) | +| repvgg_b3 | D910x64-G | 80.46 | 95.34 | 123.19 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/repvgg/repvgg_b3_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/repvgg/repvgg_b3-30b35f52.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/repvgg/repvgg_a1_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/repvgg/repvgg_a1_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/repvgg/repvgg_a1_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13733-13742. diff --git a/configs/repvgg/repvgg_a0_ascend.yaml b/configs/repvgg/repvgg_a0_ascend.yaml new file mode 100644 index 0000000..a9bb828 --- /dev/null +++ b/configs/repvgg/repvgg_a0_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "repvgg_a0" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.05 +warmup_epochs: 10 +decay_epochs: 390 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_a1_ascend.yaml b/configs/repvgg/repvgg_a1_ascend.yaml new file mode 100644 index 0000000..7032d26 --- /dev/null +++ b/configs/repvgg/repvgg_a1_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "repvgg_a1" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.05 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_a2_ascend.yaml b/configs/repvgg/repvgg_a2_ascend.yaml new file mode 100644 index 0000000..4304dad --- /dev/null +++ b/configs/repvgg/repvgg_a2_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "repvgg_a2" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.05 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_b0_ascend.yaml b/configs/repvgg/repvgg_b0_ascend.yaml new file mode 100644 index 0000000..47a563e --- /dev/null +++ b/configs/repvgg/repvgg_b0_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "repvgg_b0" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.05 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_b1_ascend.yaml b/configs/repvgg/repvgg_b1_ascend.yaml new file mode 100644 index 0000000..e3aa59d --- /dev/null +++ b/configs/repvgg/repvgg_b1_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "repvgg_b1" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.05 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_b2_ascend.yaml b/configs/repvgg/repvgg_b2_ascend.yaml new file mode 100644 index 0000000..45d89f7 --- /dev/null +++ b/configs/repvgg/repvgg_b2_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "repvgg_b2" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.4 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/repvgg/repvgg_b3_ascend.yaml b/configs/repvgg/repvgg_b3_ascend.yaml new file mode 100644 index 0000000..7b2c937 --- /dev/null +++ b/configs/repvgg/repvgg_b3_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +auto_augment: "autoaug" +mixup: 0.8 +cutmix: 1.0 + +# model +model: "repvgg_b3" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 240 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.4 +warmup_epochs: 5 +decay_epochs: 235 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False +eps: 0.00000001 diff --git a/configs/res2net/README.md b/configs/res2net/README.md new file mode 100644 index 0000000..5ee9666 --- /dev/null +++ b/configs/res2net/README.md @@ -0,0 +1,88 @@ +# Res2Net + +> [Res2Net: A New Multi-scale Backbone Architecture](https://arxiv.org/abs/1904.01169) + +## Introduction + +Res2Net is a novel building block for CNNs proposed by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. Ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, verify the superiority of the Res2Net over the state-of-the-art baseline methods such as ResNet-50, DLA-60 and etc. + +

+ +

+

+ Figure 1. Architecture of Res2Net [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|-------|-------|------------|-------------------------------------------------------------------------------------------------------|---| +| Res2Net50 | D910x8-G | 79.35 | 94.64 | 25.76 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/res2net/res2net_50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/res2net/res2net50-f42cf71b.ckpt) | +| Res2Net101 | D910x8-G | 79.56 | 94.70 | 45.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/res2net/res2net_101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/res2net/res2net101-8ae60132.ckpt) | +| Res2Net50-v1b | D910x8-G | 80.32 | 95.09 | 25.77 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/res2net/res2net_50_v1b_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/res2net/res2net50_v1b-99304e92.ckpt) | +| Res2Net101-v1b | D910x8-G | 81.26 | 95.41 | 45.35 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/res2net/res2net_101_v1b_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/res2net/res2net101_v1b-7e6db001.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/res2net/res2net_50_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/res2net/res2net_50_ascend.yaml --data_dir /path/to/imagenet --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/res2net/res2net_50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] Gao S H, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(2): 652-662. diff --git a/configs/res2net/res2net_101_ascend.yaml b/configs/res2net/res2net_101_ascend.yaml new file mode 100644 index 0000000..9a2a2de --- /dev/null +++ b/configs/res2net/res2net_101_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 + +# model +model: "res2net101" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 140 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "step_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_rate: 0.1 +decay_epochs: 30 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_101_gpu.yaml b/configs/res2net/res2net_101_gpu.yaml new file mode 100644 index 0000000..af5de52 --- /dev/null +++ b/configs/res2net/res2net_101_gpu.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 +mixup: 0.2 + +# model +model: "res2net101" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_101_v1b_ascend.yaml b/configs/res2net/res2net_101_v1b_ascend.yaml new file mode 100644 index 0000000..53a5d26 --- /dev/null +++ b/configs/res2net/res2net_101_v1b_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 + +# model +model: "res2net101_v1b" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 200 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 4 +decay_epochs: 196 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_101_v1b_gpu.yaml b/configs/res2net/res2net_101_v1b_gpu.yaml new file mode 100644 index 0000000..703e27d --- /dev/null +++ b/configs/res2net/res2net_101_v1b_gpu.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 +mixup: 0.2 + +# model +model: "res2net101_v1b" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_50_ascend.yaml b/configs/res2net/res2net_50_ascend.yaml new file mode 100644 index 0000000..f5ce50f --- /dev/null +++ b/configs/res2net/res2net_50_ascend.yaml @@ -0,0 +1,55 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 + +# model +model: "res2net50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 200 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 4 +decay_epochs: 196 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_50_gpu.yaml b/configs/res2net/res2net_50_gpu.yaml new file mode 100644 index 0000000..30c78ae --- /dev/null +++ b/configs/res2net/res2net_50_gpu.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 +mixup: 0.2 + +# model +model: "res2net50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_50_v1b_ascend.yaml b/configs/res2net/res2net_50_v1b_ascend.yaml new file mode 100644 index 0000000..5c86e3d --- /dev/null +++ b/configs/res2net/res2net_50_v1b_ascend.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 +mixup: 0.2 + +# model +model: "res2net50_v1b" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O3" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/res2net/res2net_50_v1b_gpu.yaml b/configs/res2net/res2net_50_v1b_gpu.yaml new file mode 100644 index 0000000..276c825 --- /dev/null +++ b/configs/res2net/res2net_50_v1b_gpu.yaml @@ -0,0 +1,56 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +val_split: val + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +color_jitter: 0.4 +re_prob: 0.5 +mixup: 0.2 + +# model +model: "res2net50_v1b" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +val_interval: 5 +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnest/README.md b/configs/resnest/README.md new file mode 100644 index 0000000..88295de --- /dev/null +++ b/configs/resnest/README.md @@ -0,0 +1,88 @@ +# ResNeSt +> [ResNeSt: Split-Attention Networks](https://arxiv.org/abs/2004.08955) + +## Introduction + +In this paper, the authors present a modularized architecture, which applies the channel-wise attention on different +network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. +The network design results in a simple and unified computation block, which can be parameterized using only a few +variables. As a result, ResNeSt outperforms EfficientNet in accuracy and latency trade-off on image +classification.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of ResNeSt [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------| +| ResNeSt50 | D910x8-G | 80.81 | 95.16 | 27.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnest/resnest50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnest/resnest50-f2e7fc9c.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/resnest/resnest50_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/resnest/resnest50_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/resnest/resnest50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2736-2746. diff --git a/configs/resnest/resnest50_ascend.yaml b/configs/resnest/resnest50_ascend.yaml new file mode 100644 index 0000000..e3fc11e --- /dev/null +++ b/configs/resnest/resnest50_ascend.yaml @@ -0,0 +1,57 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +auto_augment: "randaug-m9-mstd0.5-inc1" +re_prob: 0.25 +crop_pct: 0.875 +mixup: 0.8 +cutmix: 1.0 + +# model +model: "resnest50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 350 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.06 +warmup_epochs: 5 +decay_epochs: 345 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale_type: "dynamic" +drop_overflow_update: True +use_nesterov: False diff --git a/configs/resnet/README.md b/configs/resnet/README.md new file mode 100644 index 0000000..7edc008 --- /dev/null +++ b/configs/resnet/README.md @@ -0,0 +1,89 @@ +# ResNet + +> [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) + +## Introduction + +Resnet is a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which is explicitly reformulated that the layers are learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. Lots of comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. + +

+ +

+

+ Figure 1. Architecture of ResNet [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|-----------|-----------|-------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| +| ResNet18 | D910x8-G | 70.31 | 89.62 | 11.70 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnet/resnet_18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet18-1e65cd21.ckpt) | +| ResNet34 | D910x8-G | 74.15 | 91.98 | 21.81 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnet/resnet_34_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet34-f297d27e.ckpt) | +| ResNet50 | D910x8-G | 76.69 | 93.50 | 25.61 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnet/resnet_50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet50-e0733ab8.ckpt) | +| ResNet101 | D910x8-G | 78.24 | 94.09 |44.65 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnet/resnet_101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet101-689c5e77.ckpt) | +| ResNet152 | D910x8-G | 78.72 | 94.45 | 60.34| [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnet/resnet_152_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet152-beb689d8.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/resnet/resnet_18_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/resnet/resnet_18_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/resnet/resnet_18_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. diff --git a/configs/resnet/resnet_101_ascend.yaml b/configs/resnet/resnet_101_ascend.yaml new file mode 100644 index 0000000..e52d7b4 --- /dev/null +++ b/configs/resnet/resnet_101_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet101" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_101_gpu.yaml b/configs/resnet/resnet_101_gpu.yaml new file mode 100644 index 0000000..e52d7b4 --- /dev/null +++ b/configs/resnet/resnet_101_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet101" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_152_ascend.yaml b/configs/resnet/resnet_152_ascend.yaml new file mode 100644 index 0000000..ce09304 --- /dev/null +++ b/configs/resnet/resnet_152_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet152" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_152_gpu.yaml b/configs/resnet/resnet_152_gpu.yaml new file mode 100644 index 0000000..ce09304 --- /dev/null +++ b/configs/resnet/resnet_152_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet152" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_18_ascend.yaml b/configs/resnet/resnet_18_ascend.yaml new file mode 100644 index 0000000..c94a392 --- /dev/null +++ b/configs/resnet/resnet_18_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet18" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_18_gpu.yaml b/configs/resnet/resnet_18_gpu.yaml new file mode 100644 index 0000000..c94a392 --- /dev/null +++ b/configs/resnet/resnet_18_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet18" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_34_ascend.yaml b/configs/resnet/resnet_34_ascend.yaml new file mode 100644 index 0000000..9986d12 --- /dev/null +++ b/configs/resnet/resnet_34_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet34" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_34_gpu.yaml b/configs/resnet/resnet_34_gpu.yaml new file mode 100644 index 0000000..9986d12 --- /dev/null +++ b/configs/resnet/resnet_34_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet34" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_50_ascend.yaml b/configs/resnet/resnet_50_ascend.yaml new file mode 100644 index 0000000..df97da0 --- /dev/null +++ b/configs/resnet/resnet_50_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnet/resnet_50_gpu.yaml b/configs/resnet/resnet_50_gpu.yaml new file mode 100644 index 0000000..df97da0 --- /dev/null +++ b/configs/resnet/resnet_50_gpu.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnet50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnetv2/README.md b/configs/resnetv2/README.md new file mode 100644 index 0000000..699ec89 --- /dev/null +++ b/configs/resnetv2/README.md @@ -0,0 +1,87 @@ +# ResNetV2 + +> [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) + +## Introduction + +Author analyzes the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block +to any other block, when using identity mappings as the skip connections and after-addition activation. + +

+ +

+

+ Figure 1. Architecture of ResNetV2 [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|-----------|-----------|-------|-------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| +| ResNetv2_50 | D910x8-G | 76.90 | 93.37 | 25.60 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnetv2/resnetv2_50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnetv2/resnetv2_50-3c2f143b.ckpt) | +| ResNetv2_101 | D910x8-G | 78.48 | 94.23 | 44.55 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnetv2/resnetv2_101_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnetv2/resnetv2_101-5d4c49a1.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/resnetv2/resnetv2_50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer International Publishing, 2016: 630-645. diff --git a/configs/resnetv2/resnetv2_101_ascend.yaml b/configs/resnetv2/resnetv2_101_ascend.yaml new file mode 100644 index 0000000..28c3fb8 --- /dev/null +++ b/configs/resnetv2/resnetv2_101_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnetv2_101" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnetv2/resnetv2_50_ascend.yaml b/configs/resnetv2/resnetv2_50_ascend.yaml new file mode 100644 index 0000000..411034e --- /dev/null +++ b/configs/resnetv2/resnetv2_50_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnetv2_50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 120 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 120 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnext/README.md b/configs/resnext/README.md new file mode 100644 index 0000000..07cf74a --- /dev/null +++ b/configs/resnext/README.md @@ -0,0 +1,93 @@ +# ResNeXt +> [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431) + +## Introduction + +The authors present a simple, highly modularized network architecture for image classification. The network is +constructed by repeating a building block that aggregates a set of transformations with the same topology. The simple +design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy +exposes a new dimension, which the authors call "cardinality" (the size of the set of transformations), as an essential +factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, the authors empirically show that +even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification +accuracy.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of ResNeXt [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|------------------|----------|-----------|-----------|------------|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| +| ResNeXt50_32x4d | D910x8-G | 78.53 | 94.10 | 25.10 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnext/resnext50_32x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnext/resnext50_32x4d-af8aba16.ckpt) | +| ResNeXt101_32x4d | D910x8-G | 79.83 | 94.80 | 44.32 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnext/resnext101_32x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnext/resnext101_32x4d-3c1e9c51.ckpt) | +| ResNeXt101_64x4d | D910x8-G | 80.30 | 94.82 | 83.66 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnext/resnext101_64x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnext/resnext101_64x4d-8929255b.ckpt) | +| ResNeXt152_64x4d | D910x8-G | 80.52 | 95.00 | 115.27 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/resnext/resnext152_64x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/resnext/resnext152_64x4d-3aba275c.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/resnext/resnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/resnext/resnext50_32x4d_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/resnext/resnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. diff --git a/configs/resnext/resnext101_32x4d_ascend.yaml b/configs/resnext/resnext101_32x4d_ascend.yaml new file mode 100644 index 0000000..16ae3dd --- /dev/null +++ b/configs/resnext/resnext101_32x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnext101_32x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnext/resnext101_64x4d_ascend.yaml b/configs/resnext/resnext101_64x4d_ascend.yaml new file mode 100644 index 0000000..888212a --- /dev/null +++ b/configs/resnext/resnext101_64x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnext101_64x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnext/resnext152_64x4d_ascend.yaml b/configs/resnext/resnext152_64x4d_ascend.yaml new file mode 100644 index 0000000..7ae0063 --- /dev/null +++ b/configs/resnext/resnext152_64x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnext152_64x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/resnext/resnext50_32x4d_ascend.yaml b/configs/resnext/resnext50_32x4d_ascend.yaml new file mode 100644 index 0000000..c4d4669 --- /dev/null +++ b/configs/resnext/resnext50_32x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "resnext50_32x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/rexnet/README.md b/configs/rexnet/README.md new file mode 100644 index 0000000..2e5ccda --- /dev/null +++ b/configs/rexnet/README.md @@ -0,0 +1,82 @@ +# ReXNet + +> [ReXNet: Rethinking Channel Dimensions for Efficient Model Design](https://arxiv.org/abs/2007.00992) + +## Introduction + +ReXNets is a new model achieved based on parameterization. It utilizes a new search method for a channel configuration via piece-wise linear functions of block index. The search space contains the conventions, and an effective channel configuration that can be parameterized by a linear function of the block index is used. ReXNets outperforms the recent lightweight models including NAS-based models and further showed remarkable fine-tuning performances on COCO object detection, instance segmentation, and fine-grained classifications. + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|-------|-------|------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------| +| rexnet_x09 | D910x8-G | 77.07 | 93.41 | 4.13 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/rexnet/rexnet_x09_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/rexnet/rexnet_09-da498331.ckpt) | +| rexnet_x10 | D910x8-G | 77.38 | 93.60 | 4.84 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/rexnet/rexnet_x10_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/rexnet/rexnet_10-c5fb2dc7.ckpt) | +| rexnet_x13 | D910x8-G | 79.06 | 94.28 | 7.61 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/rexnet/rexnet_x13_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/rexnet/rexnet_13-a49c41e5.ckpt) | +| rexnet_x15 | D910x8-G | 79.94 | 94.74 | 9.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/rexnet/rexnet_x15_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/rexnet/rexnet_15-37a931d3.ckpt) | +| rexnet_x20 | D910x8-G | 80.64 | 94.99 | 16.45 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/rexnet/rexnet_x20_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/rexnet/rexnet_20-c5810914.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/rexnet/rexnet_x09_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/rexnet/rexnet_x09_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/rexnet/rexnet_x09_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] Han D, Yun S, Heo B, et al. Rethinking channel dimensions for efficient model design[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2021: 732-741. diff --git a/configs/rexnet/rexnet_x09_ascend.yaml b/configs/rexnet/rexnet_x09_ascend.yaml new file mode 100644 index 0000000..d6deb82 --- /dev/null +++ b/configs/rexnet/rexnet_x09_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: "bicubic" +re_prob: 0.2 +auto_augment: "randaug-m9-mstd0.5" +re_value: "random" + +# model +model: "rexnet_x09" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +ckpt_save_policy: "top_k" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 1.0e-6 +lr: 0.5 +warmup_epochs: 5 +decay_epochs: 395 + +# optimizer +opt: "sgd" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1.0e-5 +loss_scale: 1024 +use_nesterov: True diff --git a/configs/rexnet/rexnet_x10_ascend.yaml b/configs/rexnet/rexnet_x10_ascend.yaml new file mode 100644 index 0000000..7d16fda --- /dev/null +++ b/configs/rexnet/rexnet_x10_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: "bicubic" +re_prob: 0.2 +auto_augment: "randaug-m9-mstd0.5" +re_value: "random" + +# model +model: "rexnet_x10" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +ckpt_save_policy: "top_k" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 1.0e-6 +lr: 0.5 +warmup_epochs: 5 +decay_epochs: 395 + +# optimizer +opt: "sgd" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1.0e-5 +loss_scale: 1024 +use_nesterov: True diff --git a/configs/rexnet/rexnet_x13_ascend.yaml b/configs/rexnet/rexnet_x13_ascend.yaml new file mode 100644 index 0000000..f657e89 --- /dev/null +++ b/configs/rexnet/rexnet_x13_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: "bicubic" +re_prob: 0.2 +auto_augment: "randaug-m9-mstd0.5" +re_value: "random" + +# model +model: "rexnet_x13" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +ckpt_save_policy: "top_k" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 1.0e-6 +lr: 0.5 +warmup_epochs: 5 +decay_epochs: 395 + +# optimizer +opt: "sgd" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1.0e-5 +loss_scale: 1024 +use_nesterov: True diff --git a/configs/rexnet/rexnet_x15_ascend.yaml b/configs/rexnet/rexnet_x15_ascend.yaml new file mode 100644 index 0000000..22aa096 --- /dev/null +++ b/configs/rexnet/rexnet_x15_ascend.yaml @@ -0,0 +1,53 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: "bicubic" +re_prob: 0.2 +auto_augment: "randaug-m9-mstd0.5" +re_value: "random" + +# model +model: "rexnet_x15" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +ckpt_save_policy: "top_k" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 +drop_path_rate: 0.1 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 1.0e-6 +lr: 0.5 +warmup_epochs: 30 +decay_epochs: 370 + +# optimizer +opt: "sgd" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1.0e-5 +loss_scale: 1024 +use_nesterov: True diff --git a/configs/rexnet/rexnet_x20_ascend.yaml b/configs/rexnet/rexnet_x20_ascend.yaml new file mode 100644 index 0000000..17835dd --- /dev/null +++ b/configs/rexnet/rexnet_x20_ascend.yaml @@ -0,0 +1,53 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: "bicubic" +re_prob: 0.2 +auto_augment: "randaug-m9-mstd0.5" +re_value: "random" + +# model +model: "rexnet_x20" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_dir: "./ckpt" +ckpt_save_policy: "top_k" +epoch_size: 400 +dataset_sink_mode: True +amp_level: "O2" +drop_rate: 0.2 +drop_path_rate: 0.1 + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 1.0e-6 +lr: 0.5 +warmup_epochs: 30 +decay_epochs: 370 + +# optimizer +opt: "sgd" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 1.0e-5 +loss_scale: 1024 +use_nesterov: True diff --git a/configs/senet/README.md b/configs/senet/README.md new file mode 100644 index 0000000..084a3e3 --- /dev/null +++ b/configs/senet/README.md @@ -0,0 +1,93 @@ +# SENet +> [Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507) + +## Introduction + +In this work, the authors focus instead on the channel relationship and propose a novel architectural unit, which the +authors term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by +explicitly modelling interdependencies between channels. The results show that these blocks can be stacked together to +form SENet architectures that generalise extremely effectively across different datasets. The authors further +demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight +additional computational cost.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of SENet [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------------------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------| +| SEResNet18 | D910x8-G | 71.81 | 90.49 | 11.80 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/senet/seresnet18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/senet/seresnet18-7880643b.ckpt) | +| SEResNet34 | D910x8-G | 75.38 | 92.50 | 21.98 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/senet/seresnet34_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/senet/seresnet34-8179d3c9.ckpt) | +| SEResNet50 | D910x8-G | 78.32 | 94.07 | 28.14 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/senet/seresnet50_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/senet/seresnet50-ff9cd214.ckpt) | +| SEResNeXt26_32x4d | D910x8-G | 77.17 | 93.42 | 16.83 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/senet/seresnext26_32x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/senet/seresnext26_32x4d-5361f5b6.ckpt) | +| SEResNeXt50_32x4d | D910x8-G | 78.71 | 94.36 | 27.63 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/senet/seresnext50_32x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/senet/seresnext50_32x4d-fdc35aca.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/senet/seresnet50_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/senet/seresnet50_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/senet/seresnet50_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. diff --git a/configs/senet/seresnet18_ascend.yaml b/configs/senet/seresnet18_ascend.yaml new file mode 100644 index 0000000..c620d78 --- /dev/null +++ b/configs/senet/seresnet18_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "seresnet18" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +min_lr: 0.0 +lr: 0.075 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/senet/seresnet34_ascend.yaml b/configs/senet/seresnet34_ascend.yaml new file mode 100644 index 0000000..7a4233d --- /dev/null +++ b/configs/senet/seresnet34_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "seresnet34" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +min_lr: 0.0 +lr: 0.075 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/senet/seresnet50_ascend.yaml b/configs/senet/seresnet50_ascend.yaml new file mode 100644 index 0000000..cc2ae04 --- /dev/null +++ b/configs/senet/seresnet50_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "seresnet50" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +min_lr: 0.0 +lr: 0.075 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/senet/seresnext26_32x4d_ascend.yaml b/configs/senet/seresnext26_32x4d_ascend.yaml new file mode 100644 index 0000000..117f80f --- /dev/null +++ b/configs/senet/seresnext26_32x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "seresnext26_32x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +min_lr: 0.0 +lr: 0.075 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/senet/seresnext50_32x4d_ascend.yaml b/configs/senet/seresnext50_32x4d_ascend.yaml new file mode 100644 index 0000000..95c17c4 --- /dev/null +++ b/configs/senet/seresnext50_32x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: "imagenet" +data_dir: "path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "seresnext50_32x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 150 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +min_lr: 0.0 +lr: 0.075 +warmup_epochs: 0 +decay_epochs: 150 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/shufflenet_v1/README.md b/configs/shufflenet_v1/README.md new file mode 100644 index 0000000..afb7670 --- /dev/null +++ b/configs/shufflenet_v1/README.md @@ -0,0 +1,87 @@ +# ShuffleNetV1 + +> [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083) + +## Introduction + +ShuffleNet is a computationally efficient CNN model proposed by KuangShi Technology in 2017, which, like MobileNet and SqueezeNet, etc., is mainly intended to be applied to mobile. ShuffleNet uses two operations at its core: pointwise group convolution and channel shuffle, which greatly reduces the model computation while maintaining accuracy. ShuffleNet designs more efficient network structures to achieve smaller and faster models, instead of compressing or migrating a large trained model. + +

+ +

+

+ Figure 1. Architecture of ShuffleNetV1 [1] +

+ + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------| +| shufflenet_v1_g3_x0_5 | D910x8-G | 57.05 | 79.73 | 0.73 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv1/shufflenet_v1_g3_05-42cfe109.ckpt) | +| shufflenet_v1_g3_x1_0 | D910x8-G | 67.77 | 87.73 | 1.89 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv1/shufflenet_v1_g3_10-245f0ccf.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856. diff --git a/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml b/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml new file mode 100644 index 0000000..68d5482 --- /dev/null +++ b/configs/shufflenet_v1/shufflenet_v1_0.5_ascend.yaml @@ -0,0 +1,49 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "shufflenet_v1_g3_x0_5" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 250 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.35 +warmup_epochs: 4 +decay_epochs: 246 + +# optimizer +opt: "momentum" +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml b/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml new file mode 100644 index 0000000..08fc85e --- /dev/null +++ b/configs/shufflenet_v1/shufflenet_v1_1.0_ascend.yaml @@ -0,0 +1,49 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: "bilinear" +crop_pct: 0.875 + +# model +model: "shufflenet_v1_g3_x1_0" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 250 +dataset_sink_mode: True +amp_level: "O0" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 4 +decay_epochs: 246 + +# optimizer +opt: "momentum" +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/shufflenet_v2/README.md b/configs/shufflenet_v2/README.md new file mode 100644 index 0000000..5797a21 --- /dev/null +++ b/configs/shufflenet_v2/README.md @@ -0,0 +1,98 @@ +# ShuffleNetV2 +> [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) + +## Introduction + +A key point was raised in ShuffleNetV2, where previous lightweight networks were guided by computing an indirect measure of network complexity, namely FLOPs. The speed of lightweight networks is described by calculating the amount of floating point operations. But the speed of operation was never considered directly. The running speed in mobile devices needs to consider not only FLOPs, but also other factors such as memory accesscost and platform characterics. + +Therefore, based on these two principles, ShuffleNetV2 proposes four effective network design principles. + +- MAC is minimized when the input feature matrix of the convolutional layer is equal to the output feature matrixchannel (when FLOPs are kept constant). +- MAC increases when the groups of GConv increase (while keeping FLOPs constant). +- the higher the fragmentation of the network design, the slower the speed. +- The impact of Element-wise operation is not negligible. + +

+ +

+

+ Figure 1. Architecture Design in ShuffleNetV2 [1] +

+ + +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------------|-----------|------------|------------|-------------|---------|----------| +| shufflenet_v2_x0_5 | D910x8-G | 60.68 | 82.44 | 1.37 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv2/shufflenet_v2_05-a53c62b9.ckpt) | +| shufflenet_v2_x1_0 | D910x8-G | 69.51 | 88.67 | 2.29 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv2/shufflenet_v2_10-e6b8c4fe.ckpt) | +| shufflenet_v2_x1_5 | D910x8-G | 72.59 | 90.79 | 3.53 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv2/shufflenet_v2_15-e717dd88.ckpt) | +| shufflenet_v2_x2_0 | D910x8-G | 75.14 | 92.13 | 7.44 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/shufflenet/shufflenetv2/shufflenet_v2_20-ada6a359.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +#### Notes + +- All models are trained on ImageNet-1K training set and the top-1 accuracy is reported on the validatoin set. +- Context: GPU_TYPE x pieces - G/F, G - graph mode, F - pynative mode with ms function. + + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/imagenet +``` + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + +[1] Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131. diff --git a/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml new file mode 100644 index 0000000..dbe7a90 --- /dev/null +++ b/configs/shufflenet_v2/shufflenet_v2_0.5_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bilinear' +crop_pct: 0.875 +# re_prob: 0.5 + +# model +model: 'shufflenet_v2_x0_5' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 250 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.35 +warmup_epochs: 4 +decay_epochs: 246 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1 +use_nesterov: False diff --git a/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml new file mode 100644 index 0000000..c82e141 --- /dev/null +++ b/configs/shufflenet_v2/shufflenet_v2_1.0_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/data/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bilinear' +crop_pct: 0.875 +re_prob: 0.5 + +# model +model: 'shufflenet_v2_x1_0' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.4 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1 +use_nesterov: False diff --git a/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml new file mode 100644 index 0000000..e3fedcc --- /dev/null +++ b/configs/shufflenet_v2/shufflenet_v2_1.5_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/data/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bilinear' +crop_pct: 0.875 +re_prob: 0.5 + +# model +model: 'shufflenet_v2_x1_5' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.5 +warmup_epochs: 4 +decay_epochs: 246 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1 +use_nesterov: False diff --git a/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml b/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml new file mode 100644 index 0000000..10b4598 --- /dev/null +++ b/configs/shufflenet_v2/shufflenet_v2_2.0_ascend.yaml @@ -0,0 +1,50 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/data/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bilinear' +crop_pct: 0.875 +re_prob: 0.5 + +# model +model: 'shufflenet_v2_x2_0' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.5 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: False +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1 +use_nesterov: False diff --git a/configs/sknet/README.md b/configs/sknet/README.md new file mode 100644 index 0000000..01da4aa --- /dev/null +++ b/configs/sknet/README.md @@ -0,0 +1,106 @@ +# SKNet + +> [Selective Kernel Networks](https://arxiv.org/pdf/1903.06586) + +## Introduction + +The local receptive fields (RFs) of neurons in the primary visual cortex (V1) of cats [[1](#references)] have inspired the +construction of Convolutional Neural Networks (CNNs) [[2](#references)] in the last century, and it continues to inspire mordern CNN +structure construction. For instance, it is well-known that in the visual cortex, the RF sizes of neurons in the +same area (e.g.,V1 region) are different, which enables the neurons to collect multi-scale spatial information in the +same processing stage. This mechanism has been widely adopted in recent Convolutional Neural Networks (CNNs). +A typical example is InceptionNets [[3](#references), [4](#references), [5](#references), [6](#references)], in which a simple concatenation is designed to aggregate +multi-scale information from, e.g., 3×3, 5×5, 7×7 convolutional kernels inside the “inception” building block. + +

+ +

+

+ Figure 1. Selective Kernel Convolution. +

+ +## Results + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------------|---------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| +| skresnet18 | D910x8-G | 73.09 | 91.20 | 11.97 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/sknet/skresnet18_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/sknet/sknet18-868228e5.ckpt) | +| skresnet34 | D910x8-G | 76.80 | 93.10 | 22.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/sknet/skresnet34_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/sknet/skresnet34-d668b629.ckpt) | +| skresnet50_32x4d | D910x8-G | 79.08 | 94.60 | 37.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/sknet/skresnext50_32x4d_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/sknet/skresnext50_32x4d-395413a2.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet +``` + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + + +## References + +[1] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual +cortex. The Journal of Physiology, 1962. + +[2] Y . LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation +applied to handwritten zip code recognition. Neural Computation, 1989. + +[3] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In +CVPR, 2016. + +[4] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. +arXiv preprint arXiv:1502.03167, 2015. + +[5] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In +CVPR, 2016. + +[6] C. Szegedy, S. Ioffe, V . V anhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual +connections on learning. In AAAI, 2017. diff --git a/configs/sknet/skresnet18_ascend.yaml b/configs/sknet/skresnet18_ascend.yaml new file mode 100644 index 0000000..b43bb46 --- /dev/null +++ b/configs/sknet/skresnet18_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 + +# model +model: "skresnet18" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 200 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 195 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/sknet/skresnet34_ascend.yaml b/configs/sknet/skresnet34_ascend.yaml new file mode 100644 index 0000000..e287151 --- /dev/null +++ b/configs/sknet/skresnet34_ascend.yaml @@ -0,0 +1,54 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bilinear" +crop_pct: 0.875 +re_prob: 0.25 + +# model +model: "skresnet34" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 250 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 245 +warmup_factor: 0.01 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 128 +use_nesterov: False diff --git a/configs/sknet/skresnext50_32x4d_ascend.yaml b/configs/sknet/skresnext50_32x4d_ascend.yaml new file mode 100644 index 0000000..5e991f2 --- /dev/null +++ b/configs/sknet/skresnext50_32x4d_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 64 + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +crop_pct: 0.875 +re_prob: 0.1 + +# model +model: "skresnext50_32x4d" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: "./ckpt" +epoch_size: 200 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 5 +decay_epochs: 195 + +# optimizer +opt: "momentum" +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/squeezenet/README.md b/configs/squeezenet/README.md new file mode 100644 index 0000000..5c702d4 --- /dev/null +++ b/configs/squeezenet/README.md @@ -0,0 +1,91 @@ +# SqueezeNet + +> [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360) + +## Introduction + +SqueezeNet is a smaller CNN architectures which is comprised mainly of Fire modules and it achieves AlexNet-level +accuracy on ImageNet with 50x fewer parameters. SqueezeNet can offer at least three advantages: (1) Smaller CNNs require +less communication across servers during distributed training. (2) Smaller CNNs require less bandwidth to export a new +model from the cloud to an autonomous car. (3) Smaller CNNs are more feasible to deploy on FPGAs and other hardware with +limited memory. Additionally, with model compression techniques, SqueezeNet is able to be compressed to less than +0.5MB (510× smaller than AlexNet). Blow is macroarchitectural view of SqueezeNet architecture. Left: SqueezeNet ; +Middle: SqueezeNet with simple bypass; Right: SqueezeNet with complex bypass. + +

+ +

+

+ Figure 1. Architecture of SqueezeNet [1] +

+ +## Results + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------------|---------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| +| squeezenet_1.0 | D910x8-G | 59.01 | 81.01| 1.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/squeezenet/squeezenet_1.0_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/squeezenet/squeezenet1_0-e2d78c4a.ckpt) | +| squeezenet_1.0 | GPUx8-G | 59.49 | 81.22 | 1.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/squeezenet/squeezenet_1.0_gpu.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/squeezenet/squeezenet_1.0_224.ckpt) | +| squeezenet_1.1 | D910x8-G | 58.44 | 80.84 | 1.24 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/squeezenet/squeezenet_1.1_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/squeezenet/squeezenet1_1-da256d3a.ckpt) | +| squeezenet_1.1 | GPUx8-G | 58.99 | 80.99 | 1.24 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/squeezenet/squeezenet_1.1_gpu.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/squeezenet/squeezenet_1.1_224.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/squeezenet/squeezenet_1.0_ascend.yaml --data_dir /path/to/imagenet +``` + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/squeezenet/squeezenet_1.0_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/squeezenet/squeezenet_1.0_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + + +## References + +[1] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016. diff --git a/configs/squeezenet/squeezenet_1.0_ascend.yaml b/configs/squeezenet/squeezenet_1.0_ascend.yaml new file mode 100644 index 0000000..701fad9 --- /dev/null +++ b/configs/squeezenet/squeezenet_1.0_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +re_prob: 0.25 + +# model +model: 'squeezenet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/squeezenet/squeezenet_1.0_gpu.yaml b/configs/squeezenet/squeezenet_1.0_gpu.yaml new file mode 100644 index 0000000..8588ec0 --- /dev/null +++ b/configs/squeezenet/squeezenet_1.0_gpu.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 0.5 + +# model +model: 'squeezenet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.01 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00007 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/squeezenet/squeezenet_1.1_ascend.yaml b/configs/squeezenet/squeezenet_1.1_ascend.yaml new file mode 100644 index 0000000..4edb336 --- /dev/null +++ b/configs/squeezenet/squeezenet_1.1_ascend.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +re_prob: 0.25 + +# model +model: 'squeezenet1_1' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +min_lr: 0.0 +lr: 0.1 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.0002 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/squeezenet/squeezenet_1.1_gpu.yaml b/configs/squeezenet/squeezenet_1.1_gpu.yaml new file mode 100644 index 0000000..4161733 --- /dev/null +++ b/configs/squeezenet/squeezenet_1.1_gpu.yaml @@ -0,0 +1,52 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 0.5 + +# model +model: 'squeezenet1_1' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.01 +warmup_epochs: 0 +decay_epochs: 200 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00007 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/swintransformer/README.md b/configs/swintransformer/README.md new file mode 100644 index 0000000..dce2cea --- /dev/null +++ b/configs/swintransformer/README.md @@ -0,0 +1,101 @@ +# Swin Transformer + +> [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) + +## Introduction + + + +The key idea of Swin transformer is that the features in shifted window go through transformer module rather than the whole feature map. +Besides that, Swin transformer extracts features of different levels. Additionally, compared with Vision Transformer (ViT), the resolution +of Swin Transformer in different stages varies so that features with different sizes could be learned. Figure 1 shows the model architecture +of Swin transformer. Swin transformer could achieve better model performance with smaller model parameters and less computation cost +on ImageNet-1K dataset compared with ViT and ResNet.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of Swin Transformer [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-----------|----------|-----------|-----------|------------|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------| +| swin_tiny | D910x8-G | 80.82 | 94.80 | 33.38 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/swintransformer/swin_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/swin/swin_tiny-0ff2f96d.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/swintransformer/swin_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022. diff --git a/configs/swintransformer/swin_tiny_ascend.yaml b/configs/swintransformer/swin_tiny_ascend.yaml new file mode 100644 index 0000000..4d2c2ae --- /dev/null +++ b/configs/swintransformer/swin_tiny_ascend.yaml @@ -0,0 +1,59 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +auto_augment: "randaug-m7-mstd0.5" + +# model +model: "swin_tiny" +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 600 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +loss_scale: 1024.0 +label_smoothing: 0.1 + +# lr scheduler +scheduler: "cosine_decay" +lr: 0.003 +min_lr: 1e-6 +warmup_epochs: 32 +decay_epochs: 568 +lr_epoch_stair: False + +# optimizer +opt: "adamw" +weight_decay: 0.025 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/vgg/README.md b/configs/vgg/README.md new file mode 100644 index 0000000..e0aa3e5 --- /dev/null +++ b/configs/vgg/README.md @@ -0,0 +1,102 @@ +# VGGNet + +> [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) + +## Introduction + + + +Figure 1 shows the model architecture of VGGNet. VGGNet is a key milestone on image classification task. It expands the model to 16-19 layers for the first time. The key motivation of this model is +that it shows usage of 3x3 kernels is efficient and by adding 3x3 kernels, it could have the same effect as 5x5 or 7x7 kernels. VGGNet could achieve better model performance compared with previous +methods such as GoogleLeNet and AlexNet on ImageNet-1K dataset.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of VGG [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|-------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| vgg11 | D910x8-G | 72.00 | 90.50 | 132.86 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vgg/vgg11_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vgg/vgg11-59a09738.ckpt)| +| vgg13 | D910x8-G | 72.75 | 91.03 | 133.04 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vgg/vgg13_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vgg/vgg13-d30c46b7.ckpt)| +| vgg16 | D910x8-G | 74.53 | 92.05 | 138.35 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vgg/vgg16_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vgg/vgg16-22d7d708.ckpt) | +| vgg19 | D910x8-G | 75.20 | 92.52 | 143.66 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vgg/vgg19_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vgg/vgg19-0c442461.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distrubted training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/vgg/vgg16_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/vgg/vgg16_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/vgg/vgg16_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014. diff --git a/configs/vgg/vgg11_ascend.yaml b/configs/vgg/vgg11_ascend.yaml new file mode 100644 index 0000000..06ef0e6 --- /dev/null +++ b/configs/vgg/vgg11_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'vgg11' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.01 +min_lr: 0.0001 +decay_epochs: 198 +warmup_epochs: 2 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/vgg/vgg13_ascend.yaml b/configs/vgg/vgg13_ascend.yaml new file mode 100644 index 0000000..b40346c --- /dev/null +++ b/configs/vgg/vgg13_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'vgg13' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.01 +min_lr: 0.0001 +decay_epochs: 198 +warmup_epochs: 2 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/vgg/vgg16_ascend.yaml b/configs/vgg/vgg16_ascend.yaml new file mode 100644 index 0000000..372357f --- /dev/null +++ b/configs/vgg/vgg16_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'vgg16' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.01 +min_lr: 0.0001 +decay_epochs: 198 +warmup_epochs: 2 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/vgg/vgg19_ascend.yaml b/configs/vgg/vgg19_ascend.yaml new file mode 100644 index 0000000..c2b3651 --- /dev/null +++ b/configs/vgg/vgg19_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: 'path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 + +# model +model: 'vgg19' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.01 +min_lr: 0.0001 +decay_epochs: 198 +warmup_epochs: 2 + +# optimizer +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00004 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/visformer/README.md b/configs/visformer/README.md new file mode 100644 index 0000000..8cb1eb8 --- /dev/null +++ b/configs/visformer/README.md @@ -0,0 +1,86 @@ +# Visformer +> [Visformer: The Vision-friendly Transformer](https://arxiv.org/abs/2104.12533) + +## Introduction + +Visformer, or Vision-friendly Transformer, is an architecture that combines Transformer-based architectural features with those from convolutional neural network architectures. Visformer adopts the stage-wise design for higher base performance. But self-attentions are only utilized in the last two stages, considering that self-attention in the high-resolution stage is relatively inefficient even when the FLOPs are balanced. Visformer employs bottleneck blocks in the first stage and utilizes group 3 × 3 convolutions in bottleneck blocks inspired by ResNeXt. It also introduces BatchNorm to patch embedding modules as in CNNs. [[2](#references)] + +

+ +

+

+ Figure 1. Network Configuration of Visformer [1] +

+ +## Results + +## ImageNet-1k + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------------|----------|-----------|-----------|------------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| +| visformer_tiny | D910x8-G | 78.28 | 94.15 | 10.33 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/visformer/visformer_tiny_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/visformer/visformer_tiny-daee0322.ckpt) | +| visformer_tiny_v2 | D910x8-G | 78.82 | 94.41 | 9.38 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/visformer/visformer_tiny_v2_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/visformer/visformer_tiny_v2-6711a758.ckpt) | +| visformer_small | D910x8-G | 81.73 | 95.88 | 40.25 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/visformer/visformer_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/visformer/visformer_small-6c83b6db.ckpt) | +| visformer_small_v2 | D910x8-G | 82.17 | 95.90 | 23.52 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/visformer/visformer_small_v2_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/visformer/visformer_small_v2-63674ade.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/imagenet +``` + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/visformer/visformer_tiny_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References +[1] Chen Z, Xie L, Niu J, et al. Visformer: The vision-friendly transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 589-598. + +[2] Visformer, https://paperswithcode.com/method/visformer diff --git a/configs/visformer/visformer_small_ascend.yaml b/configs/visformer/visformer_small_ascend.yaml new file mode 100644 index 0000000..85aa930 --- /dev/null +++ b/configs/visformer/visformer_small_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: BICUBIC +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: autoaug + +# model +model: 'visformer_small' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr_epoch_stair: True +lr: 0.0005 +min_lr: 0.00001 +warmup_factor: 0.002 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'adamw' +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 diff --git a/configs/visformer/visformer_small_v2_ascend.yaml b/configs/visformer/visformer_small_v2_ascend.yaml new file mode 100644 index 0000000..1b3f2fe --- /dev/null +++ b/configs/visformer/visformer_small_v2_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 64 +drop_remainder: True + +# augmentation +interpolation: BICUBIC +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: autoaug + +# model +model: 'visformer_small_v2' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr_epoch_stair: True +lr: 0.0005 +min_lr: 0.00001 +warmup_factor: 0.002 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'adamw' +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 diff --git a/configs/visformer/visformer_tiny_ascend.yaml b/configs/visformer/visformer_tiny_ascend.yaml new file mode 100644 index 0000000..722d847 --- /dev/null +++ b/configs/visformer/visformer_tiny_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +interpolation: BICUBIC +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: autoaug + +# model +model: 'visformer_tiny' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 300 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr_epoch_stair: True +lr: 0.0005 +min_lr: 0.00001 +warmup_factor: 0.002 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'adamw' +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 diff --git a/configs/visformer/visformer_tiny_v2_ascend.yaml b/configs/visformer/visformer_tiny_v2_ascend.yaml new file mode 100644 index 0000000..1a46f3f --- /dev/null +++ b/configs/visformer/visformer_tiny_v2_ascend.yaml @@ -0,0 +1,51 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +interpolation: BICUBIC +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +cutmix_prob: 1.0 +auto_augment: autoaug + +# model +model: 'visformer_tiny_v2' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 400 +dataset_sink_mode: True +amp_level: 'O3' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +lr_epoch_stair: True +lr: 0.001 +min_lr: 0.00002 +warmup_factor: 0.002 +warmup_epochs: 5 +decay_epochs: 295 + +# optimizer +opt: 'adamw' +momentum: 0.9 +weight_decay: 0.05 +loss_scale: 1024 diff --git a/configs/vit/README.md b/configs/vit/README.md new file mode 100644 index 0000000..b44ea1a --- /dev/null +++ b/configs/vit/README.md @@ -0,0 +1,104 @@ +# ViT + + +> [ An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) + +## Introduction + + +Vision Transformer (ViT) achieves remarkable results compared to convolutional neural networks (CNN) while obtaining fewer computational resources for pre-training. In comparison to convolutional neural networks (CNN), Vision Transformer (ViT) shows a generally weaker inductive bias resulting in increased reliance on model regularization or data augmentation (AugReg) when training on smaller datasets. + +The ViT is a visual model based on the architecture of a transformer originally designed for text-based tasks, as shown in the below figure. The ViT model represents an input image as a series of image patches, like the series of word embeddings used when using transformers to text, and directly predicts class labels for the image. ViT exhibits an extraordinary performance when trained on enough data, breaking the performance of a similar state-of-art CNN with 4x fewer computational resources. [[2](#references)] + + + +

+ +

+

+ Figure 1. Architecture of ViT [1] +

+ +## Results + + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------| +| vit_b_32_224 | D910x8-G | 75.86 | 92.08 | 87.46 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vit/vit_b32_224_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vit/vit_b_32_224-7553218f.ckpt) | +| vit_l_16_224 | D910x8-G | 76.34 | 92.79 | 303.31 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vit/vit_l16_224_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vit/vit_l_16_224-f02b2487.ckpt) | +| vit_l_32_224 | D910x8-G | 73.71 | 90.92 | 305.52 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/vit/vit_b32_224_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/vit/vit_l_32_224-3a961018.ckpt) | + +
+ +#### Notes +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + + +## Quick Start +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/vit/vit_b32_224_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +To deploy online inference services with the trained model efficiently, please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md). + +## References + + +[1] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020. + +[2] "Vision Transformers (ViT) in Image Recognition – 2022 Guide", https://viso.ai/deep-learning/vision-transformer-vit/ diff --git a/configs/vit/vit_b32_224_ascend.yaml b/configs/vit/vit_b32_224_ascend.yaml new file mode 100644 index 0000000..68b5514 --- /dev/null +++ b/configs/vit/vit_b32_224_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 256 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +auto_augment: "randaug-m7-mstd0.5" + +# model +model: "vit_b_32_224" +drop_rate: 0.1 +drop_path_rate: 0.1 +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 600 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +loss_scale: 1024.0 +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +lr: 0.003 +min_lr: 1e-6 +warmup_epochs: 32 +decay_epochs: 568 +lr_epoch_stair: False + +# optimizer +opt: "adamw" +weight_decay: 0.025 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/vit/vit_l16_224_ascend.yaml b/configs/vit/vit_l16_224_ascend.yaml new file mode 100644 index 0000000..8ad2780 --- /dev/null +++ b/configs/vit/vit_l16_224_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 48 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +re_prob: 0.15 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +auto_augment: "randaug-m9-mstd0.5" + +# model +model: "vit_l_16_224" +drop_rate: 0.12 +drop_path_rate: 0.1 +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +loss_scale: 1024.0 +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +lr: 0.0005 +min_lr: 1e-5 +warmup_epochs: 32 +decay_epochs: 268 +lr_epoch_stair: False + +# optimizer +opt: "adamw" +weight_decay: 0.05 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/vit/vit_l32_224_ascend.yaml b/configs/vit/vit_l32_224_ascend.yaml new file mode 100644 index 0000000..7aa88cd --- /dev/null +++ b/configs/vit/vit_l32_224_ascend.yaml @@ -0,0 +1,61 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True +val_interval: 1 + +# dataset +dataset: "imagenet" +data_dir: "/path/to/imagenet" +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: "bicubic" +re_prob: 0.1 +mixup: 0.2 +cutmix: 1.0 +cutmix_prob: 1.0 +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +auto_augment: "randaug-m7-mstd0.5" + +# model +model: "vit_l_32_224" +drop_rate: 0.1 +drop_path_rate: 0.1 +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 10 +ckpt_save_policy: "top_k" +ckpt_save_dir: "./ckpt" +epoch_size: 300 +dataset_sink_mode: True +amp_level: "O2" + +# loss +loss: "CE" +loss_scale: 1024.0 +label_smoothing: 0.1 + +# lr scheduler +scheduler: "warmup_cosine_decay" +lr: 0.0015 +min_lr: 1e-6 +warmup_epochs: 32 +decay_epochs: 268 +lr_epoch_stair: False + +# optimizer +opt: "adamw" +weight_decay: 0.025 +filter_bias_and_bn: True +use_nesterov: False diff --git a/configs/xception/README.md b/configs/xception/README.md new file mode 100644 index 0000000..fb73cba --- /dev/null +++ b/configs/xception/README.md @@ -0,0 +1,90 @@ +# Xception +> [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/pdf/1610.02357.pdf) + +## Introduction + +Xception is another improved network of InceptionV3 in addition to inceptionV4, using a deep convolutional neural +network architecture involving depthwise separable convolution, which was developed by Google researchers. Google +interprets the Inception module in convolutional neural networks as an intermediate step between regular convolution and +depthwise separable convolution operations. From this point of view, the depthwise separable convolution can be +understood as having the largest number of Inception modules, that is, the extreme idea proposed in the paper, combined +with the idea of residual network, Google proposed a new type of deep convolutional neural network inspired by Inception +Network architecture where the Inception module has been replaced by a depthwise separable convolution module.[[1](#references)] + +

+ +

+

+ Figure 1. Architecture of Xception [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|----------|----------|-----------|-----------|------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------| +| Xception | D910x8-G | 79.01 | 94.25 | 22.91 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/xception/xception_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/xception/xception-2c1e711df.ckpt) | + +
+ +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. + +#### Dataset Preparation +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/xception/xception_ascend.yaml --data_dir /path/to/imagenet +``` + +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. + +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/xception/xception_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +```shell +python validate.py -c configs/xception/xception_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + +[1] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258. diff --git a/configs/xception/xception_ascend.yaml b/configs/xception/xception_ascend.yaml new file mode 100644 index 0000000..6fc32da --- /dev/null +++ b/configs/xception/xception_ascend.yaml @@ -0,0 +1,53 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 8 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True + +# augmentation +image_resize: 299 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +auto_augment: 'autoaug' +re_prob: 0.25 +crop_pct: 0.875 + +# model +model: 'xception' +num_classes: 1000 +pretrained: False +ckpt_path: '' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O2' + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'warmup_cosine_decay' +lr: 0.045 +min_lr: 0.0 +decay_epochs: 195 +warmup_epochs: 5 + +# optimizer +opt: 'sgd' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00001 +loss_scale: 1024 +use_nesterov: False diff --git a/configs/xcit/README.md b/configs/xcit/README.md new file mode 100644 index 0000000..6bb7343 --- /dev/null +++ b/configs/xcit/README.md @@ -0,0 +1,86 @@ +# XCiT: Cross-Covariance Image Transformers + +> [XCiT: Cross-Covariance Image Transformers](https://arxiv.org/abs/2106.09681) +## Introduction + +XCiT models propose a “transposed” version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) – built upon XCA – combines the accuracy of conventional transformers with the scalability of convolutional architectures. + +

+ +

+

+ Figure 1. Architecture of XCiT [1] +

+ +## Results + +Our reproduced model performance on ImageNet-1K is reported as follows. + +
+ +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | +|--------------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------| +| xcit_tiny_12_p16 | D910x8-G | 77.67 | 93.79 | 7.00 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/xcit/xcit_tiny_12_p16_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/xcit/xcit_tiny_12_p16_224-1b1c9301.ckpt) | + +
+ + +#### Notes + +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. + +## Quick Start + +### Preparation + +#### Installation + +Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV. + +#### Dataset Preparation + +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. + +### Training + +* Distributed Training + +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run + +```shell +# distributed training on multiple GPU/Ascend devices +mpirun -n 8 python train.py --config configs/xcit/xcit_tiny_12_p16_ascend.yaml --data_dir /path/to/imagenet +``` +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. + +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). + +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. + +* Standalone Training + +If you want to train or finetune the model on a smaller dataset without distributed training, please run: + +```shell +# standalone training on a CPU/GPU/Ascend device +python train.py --config configs/xcit/xcit_tiny_12_p16_ascend.yaml --data_dir /path/to/dataset --distribute False +``` + +### Validation + +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. + +``` +python validate.py -c configs/xcit/xcit_tiny_12_p16_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt +``` + +### Deployment + +Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV. + +## References + + +[1] Ali A, Touvron H, Caron M, et al. Xcit: Cross-covariance image transformers[J]. Advances in neural information processing systems, 2021, 34: 20014-20027. diff --git a/configs/xcit/xcit_tiny_12_p16_ascend.yaml b/configs/xcit/xcit_tiny_12_p16_ascend.yaml new file mode 100644 index 0000000..363d444 --- /dev/null +++ b/configs/xcit/xcit_tiny_12_p16_ascend.yaml @@ -0,0 +1,58 @@ +# system +mode: 0 +distribute: True +num_parallel_workers: 16 +val_while_train: True + +# dataset +dataset: 'imagenet' +data_dir: '/path/to/imagenet' +shuffle: True +dataset_download: False +batch_size: 128 +drop_remainder: True + +# augmentation +image_resize: 224 +hflip: 0.5 +color_jitter: 0.4 +interpolation: 'bicubic' +crop_pct: 0.875 +re_prob: 0.25 +mixup: 0.8 +cutmix: 1.0 +auto_augment: 'randaug-m9-mstd0.5-inc1' +ema: True +ema_decay: 0.99996 + +# model +model: 'xcit_tiny_12_p16' +num_classes: 1000 +pretrained: False +ckpt_path: "" +keep_checkpoint_max: 30 +ckpt_save_dir: './ckpt' +epoch_size: 500 +dataset_sink_mode: True +amp_level: 'O2' +drop_rate: 0.0 +drop_path_rate: 0.0 + +# loss +loss: 'CE' +label_smoothing: 0.1 + +# lr scheduler +scheduler: 'cosine_decay' +min_lr: 0.00001 +lr: 0.0005 +warmup_epochs: 40 +decay_epochs: 460 +decay_rate: 0.1 + +# optimizer +opt: 'adamw' +filter_bias_and_bn: True +weight_decay: 0.05 +loss_scale: 1024 +use_nesterov: False diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..88044e7 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,28 @@ +## Build Documentation + +1. Clone mindcv + + ```bash + git clone https://github.com/mindspore-lab/mindcv.git + cd mindcv + ``` + +2. Install the building dependencies of documentation + + ```bash + pip install -r docs/requirements.txt + ``` + +3. Change directory to `docs/en` or `docs/zh_cn` + + ```bash + cd docs/en # or docs/zh_cn + ``` + +4. Build documentation + + ```bash + make html + ``` + +5. Open `_build/html/index.html` with browser diff --git a/docs/en/Makefile b/docs/en/Makefile new file mode 100644 index 0000000..73a28c7 --- /dev/null +++ b/docs/en/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/en/_static/css/readthedocs.css b/docs/en/_static/css/readthedocs.css new file mode 100644 index 0000000..c019c1f --- /dev/null +++ b/docs/en/_static/css/readthedocs.css @@ -0,0 +1,3 @@ +table.colwidths-auto td { + width: 50% +} diff --git a/docs/en/_static/image/cat_and_dog.png b/docs/en/_static/image/cat_and_dog.png new file mode 100644 index 0000000..61bdcae Binary files /dev/null and b/docs/en/_static/image/cat_and_dog.png differ diff --git a/docs/en/_templates/classtemplate.rst b/docs/en/_templates/classtemplate.rst new file mode 100644 index 0000000..3601668 --- /dev/null +++ b/docs/en/_templates/classtemplate.rst @@ -0,0 +1,14 @@ +.. role:: hidden + :class: hidden-section +.. currentmodule:: {{ module }} + + +{{ name | underline}} + +.. autoclass:: {{ name }} + :members: + + +.. + autogenerated from source/_templates/classtemplate.rst + note it does not have :inherited-members: diff --git a/docs/en/api/mindcv.data.rst b/docs/en/api/mindcv.data.rst new file mode 100644 index 0000000..4cb1f70 --- /dev/null +++ b/docs/en/api/mindcv.data.rst @@ -0,0 +1,7 @@ +mindcv.data +=================================== + +.. automodule:: mindcv.data + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/api/mindcv.loss.rst b/docs/en/api/mindcv.loss.rst new file mode 100644 index 0000000..19c03d6 --- /dev/null +++ b/docs/en/api/mindcv.loss.rst @@ -0,0 +1,7 @@ +mindcv.loss +=================================== + +.. automodule:: mindcv.loss + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/api/mindcv.models.layers.rst b/docs/en/api/mindcv.models.layers.rst new file mode 100644 index 0000000..6845f11 --- /dev/null +++ b/docs/en/api/mindcv.models.layers.rst @@ -0,0 +1,7 @@ +mindcv.models.layers +=================================== + +.. automodule:: mindcv.models.layers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/api/mindcv.models.rst b/docs/en/api/mindcv.models.rst new file mode 100644 index 0000000..11a2b79 --- /dev/null +++ b/docs/en/api/mindcv.models.rst @@ -0,0 +1,84 @@ +mindcv.models +=================================== + +.. autoclass:: mindcv.models.create_model + +.. autoclass:: mindcv.models.list_models + +.. autoclass:: mindcv.models.is_model + +.. autoclass:: mindcv.models.model_entrypoint + +.. autoclass:: mindcv.models.list_modules + +.. autoclass:: mindcv.models.is_model_in_modules + +.. autoclass:: mindcv.models.is_model_pretrained + +.. autoclass:: mindcv.models.BiT_ResNet + +.. autoclass:: mindcv.models.BiTresnet50 + +.. autoclass:: mindcv.models.ConViT + +.. autoclass:: mindcv.models.ConvNeXt + +.. autoclass:: mindcv.models.DenseNet + +.. autoclass:: mindcv.models.DPN + +.. autoclass:: mindcv.models.EdgeNeXt + +.. autoclass:: mindcv.models.EfficientNet + +.. autoclass:: mindcv.models.GhostNet + +.. autoclass:: mindcv.models.GoogLeNet + +.. autoclass:: mindcv.models.InceptionV3 + +.. autoclass:: mindcv.models.InceptionV4 + +.. autoclass:: mindcv.models.Mnasnet + +.. autoclass:: mindcv.models.MobileNetV1 + +.. autoclass:: mindcv.models.MobileNetV2 + +.. autoclass:: mindcv.models.MobileNetV3 + +.. autoclass:: mindcv.models.NASNetAMobile + +.. autoclass:: mindcv.models.Pnasnet + +.. autoclass:: mindcv.models.PoolFormer + +.. autoclass:: mindcv.models.PyramidVisionTransformer + +.. autoclass:: mindcv.models.PyramidVisionTransformerV2 + +.. autoclass:: mindcv.models.RepMLPNet + +.. autoclass:: mindcv.models.RepVGG + +.. autoclass:: mindcv.models.Res2Net + +.. autoclass:: mindcv.models.ResNet + +.. autoclass:: mindcv.models.ReXNetV1 + +.. autoclass:: mindcv.models.ShuffleNetV1 + +.. autoclass:: mindcv.models.ShuffleNetV2 + +.. autoclass:: mindcv.models.SKNet + +.. autoclass:: mindcv.models.SqueezeNet + +.. autoclass:: mindcv.models.SwinTransformer + +.. autoclass:: mindcv.models.VGG + +.. autoclass:: mindcv.models.ViT + +.. autoclass:: mindcv.models.Xception diff --git a/docs/en/api/mindcv.optim.rst b/docs/en/api/mindcv.optim.rst new file mode 100644 index 0000000..58a9837 --- /dev/null +++ b/docs/en/api/mindcv.optim.rst @@ -0,0 +1,7 @@ +mindcv.optim +=================================== + +.. automodule:: mindcv.optim + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/api/mindcv.scheduler.rst b/docs/en/api/mindcv.scheduler.rst new file mode 100644 index 0000000..b5f9db7 --- /dev/null +++ b/docs/en/api/mindcv.scheduler.rst @@ -0,0 +1,7 @@ +mindcv.scheduler +=================================== + +.. automodule:: mindcv.scheduler + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/api/mindcv.utils.rst b/docs/en/api/mindcv.utils.rst new file mode 100644 index 0000000..db8c14e --- /dev/null +++ b/docs/en/api/mindcv.utils.rst @@ -0,0 +1,7 @@ +mindcv.utils +=================================== + +.. automodule:: mindcv.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/en/classification/definition.md b/docs/en/classification/definition.md new file mode 100644 index 0000000..9b287b9 --- /dev/null +++ b/docs/en/classification/definition.md @@ -0,0 +1,3 @@ +# What is MindCV + +MindCV is an open source toolbox for computer vision research and development based on [MindSpore](https://www.mindspore.cn/en). It collects a series of classic and SoTA vision models, such as ResNet and SwinTransformer, along with their pretrained weights. SoTA methods such as AutoAugment are also provided for performance improvement. With the decoupled module design, it is easy to apply or adapt MindCV to your own CV tasks. diff --git a/docs/en/classification/feature.md b/docs/en/classification/feature.md new file mode 100644 index 0000000..7882893 --- /dev/null +++ b/docs/en/classification/feature.md @@ -0,0 +1,22 @@ +# Features + +- **Easy-to-Use.** MindCV decomposes the vision framework into various configurable components. It is easy to customize your data pipeline, models, and learning pipeline with MindCV: + +```python +>>> import mindcv +# create a dataset +>>> dataset = mindcv.create_dataset('cifar10', download=True) +# create a model +>>> network = mindcv.create_model('resnet50', pretrained=True) +``` + +Users can customize and launch their transfer learning or training task in one command line. + +``` python +# transfer learning in one command line +>>> !python train.py --model=swin_tiny --pretrained --opt=adamw --lr=0.001 --data_dir={data_dir} +``` + +- **State-of-The-Art.** MindCV provides various CNN-based and Transformer-based vision models including SwinTransformer. Their pretrained weights and performance reports are provided to help users select and reuse the right model. + +- **Flexibility and efficiency.** MindCV is bulit on MindSpore which is an efficent DL framework that can run on different hardward platforms (GPU/CPU/Ascend). It supports both graph mode for high efficiency and pynative mode for flexibity. diff --git a/docs/en/conf.py b/docs/en/conf.py new file mode 100644 index 0000000..bd6bf20 --- /dev/null +++ b/docs/en/conf.py @@ -0,0 +1,108 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import shutil +import sys + +import sphinx_rtd_theme + +import mindcv + +sys.path.insert(0, os.path.abspath("../../")) + +# -- Project information ----------------------------------------------------- + +project = "mindcv" +copyright = "2022, mindcv contributors" +author = "mindcv contributors" + +version_file = "../../mindcv/version.py" +with open(version_file) as f: + exec(compile(f.read(), version_file, "exec")) +__version__ = locals()["__version__"] +# The short X.Y version +version = __version__ +# The full version, including alpha/beta/rc tags +release = __version__ + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "sphinx.ext.intersphinx", + "sphinx.ext.napoleon", + "sphinx.ext.viewcode", + "sphinx.ext.autosectionlabel", + "sphinx_markdown_tables", + "myst_parser", + "sphinx_copybutton", + "sphinx.ext.autodoc.typehints", +] # yapf: disable +autodoc_typehints = "description" +myst_heading_anchors = 4 + +source_suffix = { + ".rst": "restructuredtext", + ".md": "markdown", +} + +# copy markdown files from outer directory +if not os.path.exists("./tutorials"): + os.makedirs("./tutorials") +shutil.copy("../../tutorials/deployment.md", "./tutorials/deployment.md") +shutil.copy("../../tutorials/finetune.md", "./tutorials/finetune.md") +shutil.copy("../../tutorials/inference.md", "./tutorials/inference.md") +shutil.copy("../../tutorials/learn_about_config.md", "./tutorials/learn_about_config.md") +shutil.copy("../../tutorials/output_8_0.png", "./tutorials/output_8_0.png") +shutil.copy("../../tutorials/output_11_0.png", "./tutorials/output_11_0.png") +shutil.copy("../../tutorials/output_23_0.png", "./tutorials/output_23_0.png") +shutil.copy("../../tutorials/output_30_0.png", "./tutorials/output_30_0.png") +if not os.path.exists("./quick_start"): + os.makedirs("./quick_start") +shutil.copy("../../quick_start.md", "./quick_start/quick_start.md") + +shutil.copy("../../benchmark_results.md", "./classification/benchmark_results.md") + +os.system("cp -R %s %s" % ("../../configs", "./")) + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = "sphinx_rtd_theme" +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ["_static"] +html_css_files = ["css/readthedocs.css"] + +# -- Extension configuration ------------------------------------------------- +# Ignore >>> when copying code +copybutton_prompt_text = r">>> |\.\.\. " +copybutton_prompt_is_regexp = True diff --git a/docs/en/docutils.conf b/docs/en/docutils.conf new file mode 100644 index 0000000..ddd79c3 --- /dev/null +++ b/docs/en/docutils.conf @@ -0,0 +1,2 @@ +[html writers] +table_style: colwidths-auto diff --git a/docs/en/index.rst b/docs/en/index.rst new file mode 100644 index 0000000..e295a1b --- /dev/null +++ b/docs/en/index.rst @@ -0,0 +1,88 @@ +Welcome to MindSpore CV's documentation! +============================================================================================= +(You can switch between Chinese and English documents in the lower-left corner of the layout.) + +.. toctree:: + :maxdepth: 1 + :caption: Introduction + + classification/definition.md + classification/feature.md + classification/benchmark_results.md + +.. toctree:: + :maxdepth: 1 + :caption: Quick Start + + quick_start/quick_start.md + +.. toctree:: + :maxdepth: 1 + :caption: Tutorials + + tutorials/learn_about_config.md + tutorials/inference.md + tutorials/finetune.md + tutorials/deployment.md + +.. toctree:: + :maxdepth: 1 + :caption: APIs + + api/mindcv.data + api/mindcv.loss + api/mindcv.optim + api/mindcv.models + api/mindcv.models.layers + api/mindcv.scheduler + +.. toctree:: + :maxdepth: 1 + :caption: Notes + + notes/changelog.md + notes/contribute.md + notes/faq.md + +.. toctree:: + :maxdepth: 1 + :caption: Examples + + configs/BigTransfer/README.md + configs/convit/README.md + configs/densenet/README.md + configs/edgenext/README.md + configs/googlenet/README.md + configs/inception_v3/README.md + configs/inception_v4/README.md + configs/mnasnet/README.md + configs/mobilenetv1/README.md + configs/mobilenetv2/README.md + configs/mobilenetv3/README.md + configs/poolformer/README.md + configs/pvt/README.md + configs/regnet/README.md + configs/repmlp/README.md + configs/repvgg/README.md + configs/res2net/README.md + configs/resnet/README.md + configs/rexnet/README.md + configs/shufflenet_v1/README.md + configs/shufflenet_v2/README.md + configs/sknet/README.md + configs/squeezenet/README.md + configs/visformer/README.md + configs/vit/README.md + configs/xception/README.md + +.. toctree:: + :caption: Switch Language + + switch_language.md + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/docs/en/make.bat b/docs/en/make.bat new file mode 100644 index 0000000..2119f51 --- /dev/null +++ b/docs/en/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/en/notes/changelog.md b/docs/en/notes/changelog.md new file mode 100644 index 0000000..72bb073 --- /dev/null +++ b/docs/en/notes/changelog.md @@ -0,0 +1,3 @@ +# Changelog + +Coming soon. diff --git a/docs/en/notes/contribute.md b/docs/en/notes/contribute.md new file mode 100644 index 0000000..a8980a3 --- /dev/null +++ b/docs/en/notes/contribute.md @@ -0,0 +1,3 @@ +# How to contribute + +Coming soon. diff --git a/docs/en/notes/faq.md b/docs/en/notes/faq.md new file mode 100644 index 0000000..36f8bd2 --- /dev/null +++ b/docs/en/notes/faq.md @@ -0,0 +1,3 @@ +# FAQs + +Coming soon. diff --git a/docs/en/switch_language.md b/docs/en/switch_language.md new file mode 100644 index 0000000..bae5bed --- /dev/null +++ b/docs/en/switch_language.md @@ -0,0 +1,3 @@ +## English + +## 简体中文 diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 0000000..aed315b --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,13 @@ +docutils==0.16.0 +myst-parser +opencv-python +sphinx-rtd-theme +sphinx==4.0.2 +sphinx-copybutton +sphinx_markdown_tables +torch +torchvision +numpy >= 1.17.0 +pyyaml >= 5.3 +tqdm +-e git+https://github.com/mindlab-ai/mindcv.git#egg=mindcv diff --git a/docs/resource/_static/layout.html b/docs/resource/_static/layout.html new file mode 100644 index 0000000..7e7e82a --- /dev/null +++ b/docs/resource/_static/layout.html @@ -0,0 +1,373 @@ +{# TEMPLATE VAR SETTINGS #} +{%- set url_root = pathto('', 1) %} +{%- if url_root == '#' %}{% set url_root = '' %}{% endif %} +{%- if not embedded and docstitle %} +{%- set titlesuffix = " — "|safe + docstitle|e %} +{%- else %} +{%- set titlesuffix = "" %} +{%- endif %} +{%- set lang_attr = 'en' if language == None else (language | replace('_', '-')) %} +{% import 'theme_variables.jinja' as theme_variables %} + +{%- if theme_menu_lang %} +{%- if theme_menu_lang not in theme_variables.lang_menu_mapping %} +{%- set shared_menu = theme_variables.lang_menu_mapping['default'] %} +{%- else %} +{%- set shared_menu = theme_variables.lang_menu_mapping[theme_menu_lang] %} +{%- endif %} +{%- endif %} + + + + + + + + + + {{ metatags }} + + {% block htmltitle %} + {{ title|striptags|e }}{{ titlesuffix }} + {% endblock %} + + + {# CANONICAL URL #} + {% if theme_canonical_url %} + + {% endif %} + + {# CSS #} + + {# OPENSEARCH #} + {% if not embedded %} + {% if use_opensearch %} + + {% endif %} + + {% endif %} + + + + {%- for css in css_files %} + {%- if css|attr("rel") %} + + {%- else %} + + {%- endif %} + {%- endfor %} + {%- for cssfile in extra_css_files %} + + {%- endfor %} + + {%- block linktags %} + {%- if hasdoc('about') %} + + {%- endif %} + {%- if hasdoc('genindex') %} + + {%- endif %} + {%- if hasdoc('search') %} + + {%- endif %} + {%- if hasdoc('copyright') %} + + {%- endif %} + {%- if next %} + + {%- endif %} + {%- if prev %} + + {%- endif %} + {%- endblock %} + + {%- block extrahead %} + + + {% if theme_analytics_id %} + + + {% endif %} + + {% endblock %} + + {# Keep modernizr in head - http://modernizr.com/docs/#installing #} + + {% include "mathjax_config.html" %} + + {% include "fonts.html" %} + + + +
+
+
+ + + + + +
+
+
+ + + + {% block extrabody %} {% endblock %} + + {# SIDE NAV, TOGGLES ON MOBILE #} + + + + + +
+
+
+ {% include "breadcrumbs.html" %} +
+ +
+ Shortcuts +
+
+ +
+
+ + {%- block content %} + {% if theme_style_external_links|tobool %} + + +
+
+
+ {{ toc }} +
+
+
+
+
+ + {% include "versions.html" %} + + {% if not embedded %} + + {% if sphinx_version >= "1.8.0" %} + + {%- for scriptfile in script_files %} + {{ js_tag(scriptfile) }} + {%- endfor %} + {% else %} + + {%- for scriptfile in script_files %} + + {%- endfor %} + {% endif %} + + {% endif %} + + + + + + + + + {%- block footer %} {% endblock %} + + + +
+
+ + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + diff --git a/docs/resource/_static/logo.png b/docs/resource/_static/logo.png new file mode 100644 index 0000000..5f8edf7 Binary files /dev/null and b/docs/resource/_static/logo.png differ diff --git a/docs/resource/_static/theme_variables.jinja b/docs/resource/_static/theme_variables.jinja new file mode 100644 index 0000000..71bc3e4 --- /dev/null +++ b/docs/resource/_static/theme_variables.jinja @@ -0,0 +1,105 @@ +{%- set shared_menu_en = [ + { + 'name': + 'Docs', + 'children': [ + { + 'name': 'mindcv', + 'url': 'https://mindcv.readthedocs.io/en/latest/', + }, + { + 'name': 'mindnlp', + 'url': 'https://mindnlp.readthedocs.io/en/latest/', + }, + { + 'name': 'mindface', + 'url': 'https://mindface.readthedocs.io/en/latest/', + }, + { + 'name': 'mindaudio', + 'url': 'https://mindaudio.readthedocs.io/en/latest/', + }, + ] + }, + { + 'name': + 'mindspore-lab', + 'children': [ + { + 'name': 'Homepage', + 'url': '#' + }, + { + 'name': 'Open Platform', + 'url': '#' + }, + { + 'name': 'GitHub', + 'url': 'https://github.com/mindspore-lab/' + }, + { + 'name': 'Twitter', + 'url': '#' + }, + { + 'name': 'Zhihu', + 'url': '#' + }, + ] + }, + ] +-%} + +{%- set shared_menu_cn = [ + { + 'name': + '文档', + 'children': [ + { + 'name': 'mindcv', + 'url': 'https://mindcv.readthedocs.io/zh_CN/latest/', + }, + { + 'name': 'mindnlp', + 'url': 'https://mindnlp.readthedocs.io/zh_CN/latest/', + }, + { + 'name': 'mindface', + 'url': 'https://mindface.readthedocs.io/zh_CN/latest/', + }, + { + 'name': 'mindaudio', + 'url': 'https://mindaudio.readthedocs.io/zh_CN/latest/', + }, + ] + }, + { + 'name': + 'mindspore-lab', + 'children': [ + { + 'name': '官网', + 'url': '#' + }, + { + 'name': '开放平台', + 'url': '#' + }, + { + 'name': 'GitHub', + 'url': 'https://github.com/mindspore-lab/' + }, + { + 'name': '推特', + 'url': '#' + }, + { + 'name': '知乎', + 'url': '#' + }, + ] + }, + ] +-%} + +{%- set lang_menu_mapping = {'default': shared_menu_en, 'cn': shared_menu_cn, 'en': shared_menu_en} -%} diff --git a/docs/zh_cn/Makefile b/docs/zh_cn/Makefile new file mode 100644 index 0000000..73a28c7 --- /dev/null +++ b/docs/zh_cn/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/zh_cn/_static/css/readthedocs.css b/docs/zh_cn/_static/css/readthedocs.css new file mode 100644 index 0000000..c019c1f --- /dev/null +++ b/docs/zh_cn/_static/css/readthedocs.css @@ -0,0 +1,3 @@ +table.colwidths-auto td { + width: 50% +} diff --git a/docs/zh_cn/_templates/classtemplate.rst b/docs/zh_cn/_templates/classtemplate.rst new file mode 100644 index 0000000..3601668 --- /dev/null +++ b/docs/zh_cn/_templates/classtemplate.rst @@ -0,0 +1,14 @@ +.. role:: hidden + :class: hidden-section +.. currentmodule:: {{ module }} + + +{{ name | underline}} + +.. autoclass:: {{ name }} + :members: + + +.. + autogenerated from source/_templates/classtemplate.rst + note it does not have :inherited-members: diff --git a/docs/zh_cn/api/mindcv.data.rst b/docs/zh_cn/api/mindcv.data.rst new file mode 100644 index 0000000..4cb1f70 --- /dev/null +++ b/docs/zh_cn/api/mindcv.data.rst @@ -0,0 +1,7 @@ +mindcv.data +=================================== + +.. automodule:: mindcv.data + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/api/mindcv.loss.rst b/docs/zh_cn/api/mindcv.loss.rst new file mode 100644 index 0000000..19c03d6 --- /dev/null +++ b/docs/zh_cn/api/mindcv.loss.rst @@ -0,0 +1,7 @@ +mindcv.loss +=================================== + +.. automodule:: mindcv.loss + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/api/mindcv.models.layers.rst b/docs/zh_cn/api/mindcv.models.layers.rst new file mode 100644 index 0000000..6845f11 --- /dev/null +++ b/docs/zh_cn/api/mindcv.models.layers.rst @@ -0,0 +1,7 @@ +mindcv.models.layers +=================================== + +.. automodule:: mindcv.models.layers + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/api/mindcv.models.rst b/docs/zh_cn/api/mindcv.models.rst new file mode 100644 index 0000000..11a2b79 --- /dev/null +++ b/docs/zh_cn/api/mindcv.models.rst @@ -0,0 +1,84 @@ +mindcv.models +=================================== + +.. autoclass:: mindcv.models.create_model + +.. autoclass:: mindcv.models.list_models + +.. autoclass:: mindcv.models.is_model + +.. autoclass:: mindcv.models.model_entrypoint + +.. autoclass:: mindcv.models.list_modules + +.. autoclass:: mindcv.models.is_model_in_modules + +.. autoclass:: mindcv.models.is_model_pretrained + +.. autoclass:: mindcv.models.BiT_ResNet + +.. autoclass:: mindcv.models.BiTresnet50 + +.. autoclass:: mindcv.models.ConViT + +.. autoclass:: mindcv.models.ConvNeXt + +.. autoclass:: mindcv.models.DenseNet + +.. autoclass:: mindcv.models.DPN + +.. autoclass:: mindcv.models.EdgeNeXt + +.. autoclass:: mindcv.models.EfficientNet + +.. autoclass:: mindcv.models.GhostNet + +.. autoclass:: mindcv.models.GoogLeNet + +.. autoclass:: mindcv.models.InceptionV3 + +.. autoclass:: mindcv.models.InceptionV4 + +.. autoclass:: mindcv.models.Mnasnet + +.. autoclass:: mindcv.models.MobileNetV1 + +.. autoclass:: mindcv.models.MobileNetV2 + +.. autoclass:: mindcv.models.MobileNetV3 + +.. autoclass:: mindcv.models.NASNetAMobile + +.. autoclass:: mindcv.models.Pnasnet + +.. autoclass:: mindcv.models.PoolFormer + +.. autoclass:: mindcv.models.PyramidVisionTransformer + +.. autoclass:: mindcv.models.PyramidVisionTransformerV2 + +.. autoclass:: mindcv.models.RepMLPNet + +.. autoclass:: mindcv.models.RepVGG + +.. autoclass:: mindcv.models.Res2Net + +.. autoclass:: mindcv.models.ResNet + +.. autoclass:: mindcv.models.ReXNetV1 + +.. autoclass:: mindcv.models.ShuffleNetV1 + +.. autoclass:: mindcv.models.ShuffleNetV2 + +.. autoclass:: mindcv.models.SKNet + +.. autoclass:: mindcv.models.SqueezeNet + +.. autoclass:: mindcv.models.SwinTransformer + +.. autoclass:: mindcv.models.VGG + +.. autoclass:: mindcv.models.ViT + +.. autoclass:: mindcv.models.Xception diff --git a/docs/zh_cn/api/mindcv.optim.rst b/docs/zh_cn/api/mindcv.optim.rst new file mode 100644 index 0000000..58a9837 --- /dev/null +++ b/docs/zh_cn/api/mindcv.optim.rst @@ -0,0 +1,7 @@ +mindcv.optim +=================================== + +.. automodule:: mindcv.optim + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/api/mindcv.scheduler.rst b/docs/zh_cn/api/mindcv.scheduler.rst new file mode 100644 index 0000000..b5f9db7 --- /dev/null +++ b/docs/zh_cn/api/mindcv.scheduler.rst @@ -0,0 +1,7 @@ +mindcv.scheduler +=================================== + +.. automodule:: mindcv.scheduler + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/api/mindcv.utils.rst b/docs/zh_cn/api/mindcv.utils.rst new file mode 100644 index 0000000..db8c14e --- /dev/null +++ b/docs/zh_cn/api/mindcv.utils.rst @@ -0,0 +1,7 @@ +mindcv.utils +=================================== + +.. automodule:: mindcv.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/zh_cn/classification/definition_CN.md b/docs/zh_cn/classification/definition_CN.md new file mode 100644 index 0000000..3cf61d7 --- /dev/null +++ b/docs/zh_cn/classification/definition_CN.md @@ -0,0 +1,3 @@ +# MindCV是什么 + +MindCV是一个基于 [MindSpore](https://www.mindspore.cn/)开发的,致力于计算机视觉相关技术研发的开源工具箱。它提供大量的计算机视觉领域的经典模型和SoTA模型以及它们的预训练权重。同时,还提供了AutoAugment等SoTA算法来提高性能。通过解耦的模块设计,您可以轻松地将MindCV应用到您自己的CV任务中。 diff --git a/docs/zh_cn/classification/feature_CN.md b/docs/zh_cn/classification/feature_CN.md new file mode 100644 index 0000000..90e4f98 --- /dev/null +++ b/docs/zh_cn/classification/feature_CN.md @@ -0,0 +1,22 @@ +# 特性 + +- **高易用性** MindCV将视觉框架分解为各种可配置组件,方便您使用MindCV定制您的数据管道、模型和学习管道。 + +```python +>>> import mindcv +# 创建一个数据集 +>>> dataset = mindcv.create_dataset('cifar10', download=True) +# 创建一个模型 +>>> network = mindcv.create_model('resnet50', pretrained=True) +``` + +用户可以在一个命令行中自定义和启动他们的迁移学习或训练任务。 + +```shell +# 仅使用一个命令行即可启动迁移学习任务 +python train.py --model swin_tiny --pretrained --opt adamw --lr 0.001 --data_dir = {data_dir} +``` + +- **业内最佳** MindCV提供了大量包括SwinTransformer在内的基于CNN和基于Transformer结构的视觉模型。同时,还提供了它们的预训练权重以及性能测试报告,帮助用户正确地选择和使用他们所需要的模型。 + +- **灵活高效** MindCV是基于新一代高效的深度学习框架MindSpore编写的,可以运行在多种硬件平台上(CPU/GPU/Ascend),还同时支持高效的图模式和灵活的调试模式。 diff --git a/docs/zh_cn/conf.py b/docs/zh_cn/conf.py new file mode 100644 index 0000000..0238979 --- /dev/null +++ b/docs/zh_cn/conf.py @@ -0,0 +1,108 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import shutil +import sys + +import sphinx_rtd_theme + +import mindcv + +sys.path.insert(0, os.path.abspath("../../")) + +# -- Project information ----------------------------------------------------- + +project = "mindcv" +copyright = "2022, mindcv contributors" +author = "mindcv contributors" + +version_file = "../../mindcv/version.py" +with open(version_file) as f: + exec(compile(f.read(), version_file, "exec")) +__version__ = locals()["__version__"] +# The short X.Y version +version = __version__ +# The full version, including alpha/beta/rc tags +release = __version__ + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "sphinx.ext.intersphinx", + "sphinx.ext.napoleon", + "sphinx.ext.viewcode", + "sphinx.ext.autosectionlabel", + "sphinx_markdown_tables", + "myst_parser", + "sphinx_copybutton", + "sphinx.ext.autodoc.typehints", +] # yapf: disable +autodoc_typehints = "description" +myst_heading_anchors = 4 + +source_suffix = { + ".rst": "restructuredtext", + ".md": "markdown", +} + +# copy markdown files from outer directory +if not os.path.exists("./tutorials"): + os.makedirs("./tutorials") +shutil.copy("../../tutorials/deployment_CN.md", "./tutorials/deployment_CN.md") +shutil.copy("../../tutorials/finetune_CN.md", "./tutorials/finetune_CN.md") +shutil.copy("../../tutorials/inference_CN.md", "./tutorials/inference_CN.md") +shutil.copy("../../tutorials/learn_about_config_CN.md", "./tutorials/learn_about_config_CN.md") +shutil.copy("../../tutorials/output_8_0.png", "./tutorials/output_8_0.png") +shutil.copy("../../tutorials/output_11_0.png", "./tutorials/output_11_0.png") +shutil.copy("../../tutorials/output_23_0.png", "./tutorials/output_23_0.png") +shutil.copy("../../tutorials/output_30_0.png", "./tutorials/output_30_0.png") +if not os.path.exists("./quick_start"): + os.makedirs("./quick_start") +shutil.copy("../../quick_start_CN.md", "./quick_start/quick_start_CN.md") + +shutil.copy("../../benchmark_results.md", "./classification/benchmark_results_CN.md") + +os.system("cp -R %s %s" % ("../../configs", "./")) + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = "sphinx_rtd_theme" +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ["_static"] +html_css_files = ["css/readthedocs.css"] + +# -- Extension configuration ------------------------------------------------- +# Ignore >>> when copying code +copybutton_prompt_text = r">>> |\.\.\. " +copybutton_prompt_is_regexp = True diff --git a/docs/zh_cn/docutils.conf b/docs/zh_cn/docutils.conf new file mode 100644 index 0000000..ddd79c3 --- /dev/null +++ b/docs/zh_cn/docutils.conf @@ -0,0 +1,2 @@ +[html writers] +table_style: colwidths-auto diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst new file mode 100644 index 0000000..295c971 --- /dev/null +++ b/docs/zh_cn/index.rst @@ -0,0 +1,88 @@ +欢迎来到 MindSpore计算机视觉 的中文文档! +========================================= +(您可以在页面左下角切换中英文文档。) + +.. toctree:: + :maxdepth: 1 + :caption: 介绍 + + classification/definition_CN.md + classification/feature_CN.md + classification/benchmark_results_CN.md + +.. toctree:: + :maxdepth: 1 + :caption: 快速开始 + + quick_start/quick_start_CN.md + +.. toctree:: + :maxdepth: 1 + :caption: 教程 + + tutorials/learn_about_config_CN.md + tutorials/inference_CN.md + tutorials/finetune_CN.md + tutorials/deployment_CN.md + +.. toctree:: + :maxdepth: 1 + :caption: 接口文档 + + api/mindcv.data + api/mindcv.loss + api/mindcv.optim + api/mindcv.models + api/mindcv.models.layers + api/mindcv.scheduler + +.. toctree:: + :maxdepth: 1 + :caption: 记录 + + notes/changelog_CN.md + notes/contribute_CN.md + notes/faq_CN.md + +.. toctree:: + :maxdepth: 1 + :caption: 示例 + + configs/BigTransfer/README_CN.md + configs/convit/README_CN.md + configs/densenet/README_CN.md + configs/edgenext/README_CN.md + configs/googlenet/README_CN.md + configs/inception_v3/README_CN.md + configs/inception_v4/README_CN.md + configs/mnasnet/README_CN.md + configs/mobilenetv1/README_CN.md + configs/mobilenetv2/README_CN.md + configs/mobilenetv3/README_CN.md + configs/poolformer/README_CN.md + configs/pvt/README_CN.md + configs/regnet/README_CN.md + configs/repmlp/README_CN.md + configs/repvgg/README_CN.md + configs/res2net/README_CN.md + configs/resnet/README_CN.md + configs/rexnet/README_CN.md + configs/shufflenet_v1/README_CN.md + configs/shufflenet_v2/README_CN.md + configs/sknet/README_CN.md + configs/squeezenet/README_CN.md + configs/visformer/README_CN.md + configs/vit/README_CN.md + configs/xception/README_CN.md + +.. toctree:: + :caption: 切换语言 + + switch_language.md + +索引和表格 +================== + +* :ref:`索引` +* :ref:`模块索引` +* :ref:`搜索页面` diff --git a/docs/zh_cn/make.bat b/docs/zh_cn/make.bat new file mode 100644 index 0000000..2119f51 --- /dev/null +++ b/docs/zh_cn/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/zh_cn/notes/changelog_CN.md b/docs/zh_cn/notes/changelog_CN.md new file mode 100644 index 0000000..7a2d903 --- /dev/null +++ b/docs/zh_cn/notes/changelog_CN.md @@ -0,0 +1,3 @@ +# 更改日志 + +即将到来。 diff --git a/docs/zh_cn/notes/contribute_CN.md b/docs/zh_cn/notes/contribute_CN.md new file mode 100644 index 0000000..707a704 --- /dev/null +++ b/docs/zh_cn/notes/contribute_CN.md @@ -0,0 +1,3 @@ +# 如何贡献 + +即将到来。 diff --git a/docs/zh_cn/notes/faq_CN.md b/docs/zh_cn/notes/faq_CN.md new file mode 100644 index 0000000..af25c98 --- /dev/null +++ b/docs/zh_cn/notes/faq_CN.md @@ -0,0 +1,3 @@ +# 常见问题解答 + +即将到来 diff --git a/docs/zh_cn/switch_language.md b/docs/zh_cn/switch_language.md new file mode 100644 index 0000000..bae5bed --- /dev/null +++ b/docs/zh_cn/switch_language.md @@ -0,0 +1,3 @@ +## English + +## 简体中文 diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 0000000..ab11007 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,22 @@ +This folder contains examples for various tasks, which users can run easily. + +### Finetune +``` +python examples/finetune.py + +``` +This example shows how to finetune a pretrained model on your own dataset. You can also specifiy `--freeze_backbone` to choose whether to freeze the backbone and finetune the classifier head only. + + +### Single process with model training and evaluation +``` +python examples/train_with_func_example.py +``` +This example shows how to train and evaluate a model on your own dataset. + +### Multiprocess with model training and evaluation +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 # suppose there are 4 GPUs +mpirun --allow-run-as-root -n 4 python examples/train_parallel_with_func_example.py +``` +This example shows how to train and evaluate a mode with multiprocess on your own dataset on GPU. diff --git a/examples/finetune.py b/examples/finetune.py new file mode 100644 index 0000000..a0cc603 --- /dev/null +++ b/examples/finetune.py @@ -0,0 +1,108 @@ +import os +import sys + +sys.path.append(".") + +import matplotlib.pyplot as plt +import numpy as np + +from mindspore import LossMonitor, Model, TimeMonitor + +from mindcv.data import create_dataset, create_loader, create_transforms, get_dataset_download_root +from mindcv.loss import create_loss +from mindcv.models import create_model +from mindcv.optim import create_optimizer +from mindcv.utils.download import DownLoad + +freeze_backbone = False +visualize = False + +dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" +) +root_dir = os.path.join(get_dataset_download_root(), "Canidae") +data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. +if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + +num_workers = 8 + +# 加载自定义数据集 +dataset_train = create_dataset(root=data_dir, split="train", num_parallel_workers=num_workers) +dataset_val = create_dataset(root=data_dir, split="val", num_parallel_workers=num_workers) + +# 定义和获取数据处理及增强操作 +trans_train = create_transforms(dataset_name="ImageNet", is_training=True) +trans_val = create_transforms(dataset_name="ImageNet", is_training=False) + +loader_train = create_loader( + dataset=dataset_train, + batch_size=16, + is_training=True, + num_classes=2, + transform=trans_train, + num_parallel_workers=num_workers, +) + + +loader_val = create_loader( + dataset=dataset_val, + batch_size=5, + is_training=True, + num_classes=2, + transform=trans_val, + num_parallel_workers=num_workers, +) + +images, labels = next(loader_train.create_tuple_iterator()) +# images = data["image"] +# labels = data["label"] + +print("Tensor of image", images.shape) +print("Labels:", labels) + +# class_name对应label,按文件夹字符串从小到大的顺序标记label +class_name = {0: "dogs", 1: "wolves"} + +if visualize: + plt.figure(figsize=(15, 7)) + for i in range(len(labels)): + # 获取图像及其对应的label + data_image = images[i].asnumpy() + data_label = labels[i] + # 处理图像供展示使用 + data_image = np.transpose(data_image, (1, 2, 0)) + mean = np.array([0.485, 0.456, 0.406]) + std = np.array([0.229, 0.224, 0.225]) + data_image = std * data_image + mean + data_image = np.clip(data_image, 0, 1) + # 显示图像 + plt.subplot(3, 6, i + 1) + plt.imshow(data_image) + plt.title(class_name[int(labels[i].asnumpy())]) + plt.axis("off") + + plt.show() + +network = create_model(model_name="densenet121", num_classes=2, pretrained=True) + + +# 定义优化器和损失函数 +lr = 1e-3 if freeze_backbone else 1e-4 +opt = create_optimizer(network.trainable_params(), opt="adam", lr=lr) +loss = create_loss(name="CE") + +if freeze_backbone: + # freeze backbone + for param in network.get_parameters(): + if param.name not in ["classifier.weight", "classifier.bias"]: + param.requires_grad = False + + +# 实例化模型 +model = Model(network, loss_fn=loss, optimizer=opt, metrics={"accuracy"}) +print("Training...") +model.train(10, loader_train, callbacks=[LossMonitor(5), TimeMonitor(5)], dataset_sink_mode=False) +print("Evaluating...") +res = model.eval(loader_val) +print(res) diff --git a/examples/scripts/train_densenet_multigpus.sh b/examples/scripts/train_densenet_multigpus.sh new file mode 100644 index 0000000..7a8d608 --- /dev/null +++ b/examples/scripts/train_densenet_multigpus.sh @@ -0,0 +1,3 @@ +export CUDA_VISIBLE_DEVICES=0,1,2,3 +mpirun --allow-run-as-root -n 4 python train.py --distribute --model=densenet121 --pretrained --epoch_size=5 --dataset=cifar10 --dataset_download +#mpirun --allow-run-as-root -n 4 python train.py --distribute --model=densenet121 --pretrained --epoch_size=5 --dataset=cifar10 --data_dir=./datasets/cifar/cifar-10-batches-bin diff --git a/examples/scripts/train_densenet_standalone.sh b/examples/scripts/train_densenet_standalone.sh new file mode 100644 index 0000000..3513d2d --- /dev/null +++ b/examples/scripts/train_densenet_standalone.sh @@ -0,0 +1,2 @@ +python train.py --model=densenet121 --optimizer=adam --lr=0.001 --dataset=cifar10 --num_classes=10 --dataset_download +#python train.py --model=densenet121 --opt=adam --lr=0.001 --dataset=cifar10 --num_classes=10 --data_dir=./datasets/cifar/cifar-10-batches-bin diff --git a/examples/train_parallel_with_func_example.py b/examples/train_parallel_with_func_example.py new file mode 100644 index 0000000..aed0c49 --- /dev/null +++ b/examples/train_parallel_with_func_example.py @@ -0,0 +1,156 @@ +import os +import sys +from time import time + +sys.path.append(".") + +import mindspore as ms +import mindspore.nn as nn +from mindspore import Tensor, ops +from mindspore.communication import get_group_size, get_rank, init +from mindspore.parallel._utils import _get_device_num, _get_gradients_mean + +from mindcv.data import create_dataset, create_loader, create_transforms +from mindcv.loss import create_loss +from mindcv.models import create_model +from mindcv.optim import create_optimizer +from mindcv.utils import Allreduce + + +def main(): + ms.set_seed(1) + ms.set_context(mode=ms.PYNATIVE_MODE) + + # --------------------------- Prepare data -------------------------# + # create dataset for train and val + init() + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + num_classes = 10 + num_workers = 8 + data_dir = "/data/cifar-10-batches-bin" + download = False if os.path.exists(data_dir) else True + + dataset_train = create_dataset( + name="cifar10", + root=data_dir, + split="train", + shuffle=True, + download=download, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=num_workers, + ) + dataset_test = create_dataset( + name="cifar10", + root=data_dir, + split="test", + shuffle=False, + download=False, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=num_workers, + ) + + # create transform and get trans list + trans_train = create_transforms(dataset_name="cifar10", is_training=True) + trans_test = create_transforms(dataset_name="cifar10", is_training=False) + + # get data loader + loader_train = create_loader( + dataset=dataset_train, + batch_size=64, + is_training=True, + num_classes=num_classes, + transform=trans_train, + num_parallel_workers=num_workers, + drop_remainder=True, + ) + loader_test = create_loader( + dataset=dataset_test, + batch_size=32, + is_training=False, + num_classes=num_classes, + transform=trans_test, + ) + + num_batches = loader_train.get_dataset_size() + print("Num batches: ", num_batches) + + # --------------------------- Build model -------------------------# + network = create_model(model_name="resnet18", num_classes=num_classes, pretrained=False) + + loss = create_loss(name="CE") + + opt = create_optimizer(network.trainable_params(), opt="adam", lr=1e-3) + + # --------------------------- Training and monitoring -------------------------# + epochs = 10 + for t in range(epochs): + print(f"Epoch {t + 1}\n-------------------------------") + save_path = f"./resnet18-{t + 1}_{num_batches}.ckpt" + b = time() + train_epoch(network, loader_train, loss, opt) + print("Epoch time cost: ", time() - b) + test_epoch(network, loader_test) + if rank_id in [None, 0]: + ms.save_checkpoint(network, save_path, async_save=True) + print("Done!") + + +def train_epoch(network, dataset, loss_fn, optimizer): + # Define forward function + def forward_fn(data, label): + logits = network(data) + loss = loss_fn(logits, label) + return loss, logits + + # Get gradient function + grad_fn = ops.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True) + mean = _get_gradients_mean() + degree = _get_device_num() + grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree) + + # Define function of one-step training, + @ms.ms_function + def train_step_parallel(data, label): + (loss, _), grads = grad_fn(data, label) + grads = grad_reducer(grads) + loss = ops.depend(loss, optimizer(grads)) + return loss + + network.set_train() + size = dataset.get_dataset_size() + for batch, (data, label) in enumerate(dataset.create_tuple_iterator()): + loss = train_step_parallel(data, label) + if batch % 100 == 0: + loss, current = loss.asnumpy(), batch + print(f"loss: {loss:>7f} [{current:>3d}/{size:>3d}]") + + +def test_epoch(network, dataset): + network.set_train(False) + total, correct = 0, 0 + for data, label in dataset.create_tuple_iterator(): + pred = network(data) + total += len(data) + if len(label.shape) == 1: + correct += (pred.argmax(1) == label).asnumpy().sum() + else: # one-hot or soft label + correct += (pred.argmax(1) == label.argmax(1)).asnumpy().sum() + all_reduce = Allreduce() + correct = all_reduce(Tensor(correct, ms.float32)) + total = all_reduce(Tensor(total, ms.float32)) + correct /= total + acc = 100 * correct.asnumpy() + print(f"Test Accuracy: {acc:>0.2f}% \n") + return acc + + +if __name__ == "__main__": + main() diff --git a/examples/train_with_func_example.py b/examples/train_with_func_example.py new file mode 100644 index 0000000..a53d5df --- /dev/null +++ b/examples/train_with_func_example.py @@ -0,0 +1,131 @@ +import os +import sys +from time import time + +sys.path.append(".") + +import mindspore as ms +from mindspore import ops + +from mindcv.data import create_dataset, create_loader, create_transforms +from mindcv.loss import create_loss +from mindcv.models import create_model +from mindcv.optim import create_optimizer + + +def main(): + ms.set_seed(1) + ms.set_context(mode=ms.PYNATIVE_MODE) + + # --------------------------- Prepare data -------------------------# + # create dataset for train and val + num_classes = 10 + num_workers = 8 + data_dir = "/data/cifar-10-batches-bin" + download = False if os.path.exists(data_dir) else True + + dataset_train = create_dataset( + name="cifar10", + root=data_dir, + split="train", + shuffle=True, + download=download, + num_parallel_workers=num_workers, + ) + dataset_test = create_dataset( + name="cifar10", + root=data_dir, + split="test", + shuffle=False, + download=False, + ) + + # create transform and get trans list + trans_train = create_transforms(dataset_name="cifar10", is_training=True) + trans_test = create_transforms(dataset_name="cifar10", is_training=False) + + # get data loader + loader_train = create_loader( + dataset=dataset_train, + batch_size=64, + is_training=True, + num_classes=num_classes, + transform=trans_train, + num_parallel_workers=num_workers, + drop_remainder=True, + ) + loader_test = create_loader( + dataset=dataset_test, + batch_size=32, + is_training=False, + num_classes=num_classes, + transform=trans_test, + ) + + num_batches = loader_train.get_dataset_size() + print("Num batches: ", num_batches) + + # --------------------------- Build model -------------------------# + network = create_model(model_name="resnet18", num_classes=num_classes, pretrained=False) + + loss = create_loss(name="CE") + + opt = create_optimizer(network.trainable_params(), opt="adam", lr=1e-3) + + # --------------------------- Training and monitoring -------------------------# + epochs = 10 + for t in range(epochs): + print(f"Epoch {t + 1}\n-------------------------------") + save_path = f"./ckpt/resnet18-{t + 1}_{num_batches}.ckpt" + b = time() + train_epoch(network, loader_train, loss, opt) + print("Epoch time cost: ", time() - b) + test_epoch(network, loader_test) + ms.save_checkpoint(network, save_path, async_save=True) + print("Done!") + + +def train_epoch(network, dataset, loss_fn, optimizer): + # Define forward function + def forward_fn(data, label): + logits = network(data) + loss = loss_fn(logits, label) + return loss, logits + + # Get gradient function + grad_fn = ops.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True) + + # Define function of one-step training, + @ms.ms_function + def train_step(data, label): + (loss, _), grads = grad_fn(data, label) + loss = ops.depend(loss, optimizer(grads)) + return loss + + network.set_train() + size = dataset.get_dataset_size() + for batch, (data, label) in enumerate(dataset.create_tuple_iterator()): + loss = train_step(data, label) + if batch % 100 == 0: + loss, current = loss.asnumpy(), batch + print(f"loss: {loss:>7f} [{current:>3d}/{size:>3d}]") + + +def test_epoch(network, dataset): + network.set_train(False) + total, correct = 0, 0 + for data, label in dataset.create_tuple_iterator(): + pred = network(data) + total += len(data) + if len(label.shape) == 1: + correct += (pred.argmax(1) == label).asnumpy().sum() + else: # one-hot or soft label + correct += (pred.argmax(1) == label.argmax(1)).asnumpy().sum() + correct /= total + acc = 100 * correct + print(f"Test Accuracy: {acc:>0.2f}% \n") + return acc + + +if __name__ == "__main__": + main() diff --git a/requirements/dev.txt b/requirements/dev.txt new file mode 100644 index 0000000..ae0a57d --- /dev/null +++ b/requirements/dev.txt @@ -0,0 +1,3 @@ +-r ../requirements.txt +pytest +pre-commit diff --git a/requirements/docs.txt b/requirements/docs.txt new file mode 100644 index 0000000..2c3cd78 --- /dev/null +++ b/requirements/docs.txt @@ -0,0 +1,7 @@ +mindspore +numpy >= 1.17.0 +pyyaml >= 5.3 +tqdm +sphinx_markdown_tables +myst_parser +sphinx_copybutton diff --git a/scripts/gen_benchmark.py b/scripts/gen_benchmark.py new file mode 100644 index 0000000..9d42718 --- /dev/null +++ b/scripts/gen_benchmark.py @@ -0,0 +1,174 @@ +""" +Usage: + $ python scripts/gen_benchmark.py +""" + +import glob +import os + +clear_empty_urls = True +rm_columns = [] # ['Infer T.', 'Log'] + +# get all training recips +recipes = sorted(glob.glob("configs/*/*.yaml")) +print("Total number of training recipes: ", len(recipes)) +ar = glob.glob("configs/*/*_ascend.yaml") +print("Ascend training recipes: ", len(ar)) +gr = glob.glob("configs/*/*_gpu.yaml") +print("GPU training recipes: ", len(gr)) +for item in set(recipes) - set(ar) - set(gr): + print(item) + +models_with_train_rec = [] +for r in recipes: + mn = r.split("/")[-2] + if mn not in models_with_train_rec: + models_with_train_rec.append(mn) +models_with_train_rec = sorted(models_with_train_rec) + +print("\n==> Models with training recipes: ", len(models_with_train_rec)) +print(models_with_train_rec) + +# get readme file list +config_dirs = sorted([d for d in os.listdir("./configs") if os.path.isdir("configs/" + d)]) +print("\nTotal number of config folders: ", len(config_dirs)) +print("==> Configs w/o training rec: ", set(config_dirs) - set(models_with_train_rec)) +readmes = [f"configs/{d}/README.md" for d in config_dirs] + +for readme in readmes: + if not os.path.exists(readme): + print("Missing readme: ", readme) + +# check yaml and reported performance + +# merge readme reported results +print("\r\n ") +output_path = "./benchmark_results.md" +fout = open(output_path, "w") + +kw = ["Model", "Top", "Download", "Config"] + +# process table head +head = ( + "| Model | Context | Top-1 (%) | Top-5 (%) | Params(M) " + "| Recipe " + "| Download |" +) +fout.write(head + "\n") + +fout.write( + "| -------------- | -------- | --------- | --------- | --------- " + "| ------------------------------------------------------------------------------------------------------- " + "| ---------------------------------------------------------------------------- |\n" +) + +attrs = head.replace(" ", "")[1:-1].split("|") +print("table attrs: ", attrs) + +result_kw = ["Results", "Benchmark", "Result"] # TODO: unify this name +head_detect_kw = ["Model", "Top"] + +# process each model readme +parsed_models = [] +parsed_model_specs = [] +for r in readmes: + state = 0 + print("parsing ", r) + results = [] + with open(r) as fp: + for line in fp: + if state == 0: + for kw in result_kw: + if f"##{kw}" in line.strip().replace(" ", ""): + state = 1 + # detect head + elif state == 1: + if "|Model|Context" in line.replace(" ", ""): + if len(line.split("|")) == len(head.split("|")): + state = 2 + else: + print("Detect head, but format is incorrect:") + # print(line) + + # get table values + elif state == 2: + if len(line.split("|")) == len(head.split("|")): + # clear empty model + if "--" not in line: + results.append(line) + # print(line) + fout.write(line) + parsed_model_specs.append(line.split("|")[0]) + else: + parsed_models.append(r.split("/")[-2]) + state = 3 + + if state == 0: + print("Fail to get Results") + elif state == 1: + print("Fail to get table head") + elif state == 2: + print("Fail to get table values") + +print("Parsed models in benchmark: ", len(parsed_models)) +print("Parsed model specs in benchmark: ", len(parsed_model_specs)) +print("Readme using inconsistent result table format: \r\n", set(config_dirs) - set(parsed_models)) + +""" +fout.close() +def md_to_pd(md_fp, md_has_col_name=True, save_csv=False): + # Convert the Markdown table to a list of lists + with open(md_fp) as f: + rows = [] + for row in f.readlines(): + if len(row.split('|')) >= 2: + # Get rid of leading and trailing '|' + tmp = row[1:-2] + + # Split line and ignore column whitespace + clean_line = [col.strip() for col in tmp.split('|')] + + # Append clean row data to rows variable + rows.append(clean_line) + + # Get rid of syntactical sugar to indicate header (2nd row) + rows = rows[:1] + rows[2:] + print(rows) + if md_has_col_name: + df = pd.DataFrame(data=rows[1:], columns=rows[0]) + else: + df = pd.DataFrame(rows) + + if save_csv: + df.to_csv(md_fp.replace('.md', '.csv'), index=False, header=False) + return df + +df = md_to_pd(output_path, save_csv=True) +print(df) + +for cn in rm_columns: + df = df.drop(cn, axis=1) + +print(df) + +md_doc = df.to_markdown(mode='w', index=False, tablefmt='pipe') + +fout = open(output_path, 'w') +fout.write(md_doc) +""" + + +# write notes +fout.write("\n#### Notes\n") + +fout.write( + "- Context: Training context denoted as {device}x{pieces}-{MS mode}, " + "where mindspore mode can be G - graph mode or F - pynative mode with ms function. " + "For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.\n" + "- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K." +) + +fout.close() + + +print(f"\n ===> Done! Benchmark generated in {output_path}") diff --git a/tests/README.md b/tests/README.md new file mode 100644 index 0000000..6b5ba3f --- /dev/null +++ b/tests/README.md @@ -0,0 +1,13 @@ +- `modules` for unit test (UT): test the main modules including dataset, transform, loader, model, loss, optimizer, and scheduler. + +To test all modules: +```shell +pytest tests/modules/*.py +``` + +- `tasks` for system test (ST): test the training and validation pipeline. + +To test the training process (in graph mode and pynative+ms_function mode) and the validation process, run +```shell +pytest tests/tasks/test_train_val_imagenet_subset.py +``` diff --git a/tests/modules/non_cpu/test_utils.py b/tests/modules/non_cpu/test_utils.py new file mode 100644 index 0000000..4de2bde --- /dev/null +++ b/tests/modules/non_cpu/test_utils.py @@ -0,0 +1,72 @@ +"""Test utils""" +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +from mindspore import Tensor, nn +from mindspore.common.initializer import Normal +from mindspore.nn import WithLossCell + +from mindcv.optim import create_optimizer +from mindcv.utils import TrainStep + +ms.set_seed(1) +np.random.seed(1) + + +class SimpleCNN(nn.Cell): + def __init__(self, num_classes=10, in_channels=1, include_top=True): + super(SimpleCNN, self).__init__() + self.include_top = include_top + self.conv1 = nn.Conv2d(in_channels, 6, 5, pad_mode="valid") + self.conv2 = nn.Conv2d(6, 16, 5, pad_mode="valid") + self.relu = nn.ReLU() + self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + + if self.include_top: + self.flatten = nn.Flatten() + self.fc = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + def construct(self, x): + x = self.conv1(x) + x = self.relu(x) + x = self.max_pool2d(x) + x = self.conv2(x) + x = self.relu(x) + x = self.max_pool2d(x) + ret = x + if self.include_top: + x_flatten = self.flatten(x) + x = self.fc(x_flatten) + ret = x + return ret + + +@pytest.mark.parametrize("ema", [True, False]) +@pytest.mark.parametrize("ema_decay", [0.9997, 0.5]) +def test_ema(ema, ema_decay): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer(network.trainable_params(), "adam", lr=0.001, weight_decay=1e-7) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + loss_scale_manager = Tensor(1, ms.float32) + train_network = TrainStep(net_with_loss, net_opt, scale_sense=loss_scale_manager, ema=ema, ema_decay=ema_decay) + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{net_opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" diff --git a/tests/modules/parallel/test_parallel_dataset.py b/tests/modules/parallel/test_parallel_dataset.py new file mode 100644 index 0000000..93cef6f --- /dev/null +++ b/tests/modules/parallel/test_parallel_dataset.py @@ -0,0 +1,110 @@ +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms +from mindspore.communication import get_group_size, get_rank, init + +from mindcv.data import create_dataset + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["ImageNet"]) +@pytest.mark.parametrize("split", ["train", "val"]) +@pytest.mark.parametrize("shuffle", [True, False]) +@pytest.mark.parametrize("num_parallel_workers", [2, 4, 8, 16]) +def test_create_dataset_distribute_imagenet(mode, name, split, shuffle, num_parallel_workers): + """ + test create_dataset API(distribute) + command: mpirun -n 8 pytest -s test_dataset.py::test_create_dataset_distribute_imagenet + + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + + root = "/data0/dataset/imagenet2012/imagenet_original/" + + dataset = create_dataset( + name=name, + root=root, + split=split, + shuffle=shuffle, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=num_parallel_workers, + download=False, + ) + + assert type(dataset) == ms.dataset.engine.datasets_vision.ImageFolderDataset + assert dataset is not None + print(dataset.output_types()) + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["MNIST", "CIFAR10"]) +@pytest.mark.parametrize("split", ["train", "val"]) +@pytest.mark.parametrize("shuffle", [True, False]) +@pytest.mark.parametrize("num_parallel_workers", [2, 4, 8, 16]) +@pytest.mark.parametrize("download", [True, False]) +def test_create_dataset_distribute_mc(mode, name, split, shuffle, num_parallel_workers, download): + """ + test create_dataset API(distribute) + command: mpirun -n 8 pytest -s test_dataset.py::test_create_dataset_distribute_mc + + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + + dataset = create_dataset( + name=name, + split=split, + shuffle=shuffle, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=num_parallel_workers, + download=download, + ) + + assert type(dataset) == ms.dataset.engine.datasets_vision.ImageFolderDataset + assert dataset is not None + print(dataset.output_types()) diff --git a/tests/modules/parallel/test_parallel_loader.py b/tests/modules/parallel/test_parallel_loader.py new file mode 100644 index 0000000..2c96392 --- /dev/null +++ b/tests/modules/parallel/test_parallel_loader.py @@ -0,0 +1,97 @@ +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms +from mindspore.communication import get_group_size, get_rank, init +from mindspore.dataset.transforms import OneHot + +from mindcv.data import create_dataset, create_loader + +MAX = 10 + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("split", ["train", "val"]) +@pytest.mark.parametrize("batch_size", [1, MAX]) +@pytest.mark.parametrize("drop_remainder", [True, False]) +@pytest.mark.parametrize("is_training", [True, False]) +@pytest.mark.parametrize("transform", [None]) +@pytest.mark.parametrize("target_transform", [None, [OneHot(MAX)]]) +@pytest.mark.parametrize("mixup", [0, 1]) +@pytest.mark.parametrize("num_classes", [1, MAX]) +@pytest.mark.parametrize("num_parallel_workers", [2, 4, 8, 16]) +@pytest.mark.parametrize("python_multiprocessing", [True, False]) +def test_create_dataset_distribute( + mode, + split, + batch_size, + drop_remainder, + is_training, + transform, + target_transform, + mixup, + num_classes, + num_parallel_workers, + python_multiprocessing, +): + """ + test create_dataset API(distribute) + command: mpirun -n 8 pytest -s test_dataset.py::test_create_dataset_distribute + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + name = "ImageNet" + if name == "ImageNet": + root = "/home/mindspore/dataset/imagenet2012/imagenet/imagenet_original" + download = False + + dataset = create_dataset( + name=name, + root=root, + split=split, + shuffle=False, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=num_parallel_workers, + download=download, + ) + + # load dataset + loader_train = create_loader( + dataset=dataset, + batch_size=batch_size, + drop_remainder=drop_remainder, + is_training=is_training, + transform=transform, + target_transform=target_transform, + mixup=mixup, + num_classes=num_classes, + num_parallel_workers=num_parallel_workers, + python_multiprocessing=python_multiprocessing, + ) + steps_per_epoch = loader_train.get_dataset_size() + print(loader_train) + print(steps_per_epoch) diff --git a/tests/modules/parallel/test_parallel_model_factory.py b/tests/modules/parallel/test_parallel_model_factory.py new file mode 100644 index 0000000..cf80b16 --- /dev/null +++ b/tests/modules/parallel/test_parallel_model_factory.py @@ -0,0 +1,95 @@ +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms +import mindspore.nn as nn +from mindspore import Model +from mindspore.communication import get_group_size, get_rank, init + +from mindcv.data import create_dataset, create_loader, create_transforms +from mindcv.loss import create_loss +from mindcv.models.model_factory import create_model +from mindcv.models.registry import list_models + +MAX = 6250 + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("in_channels", [3]) +@pytest.mark.parametrize("pretrained", [True, False]) +@pytest.mark.parametrize("num_classes", [1, 100, MAX]) +@pytest.mark.parametrize("checkpoint_path", [None]) +def test_model_factory_parallel(mode, num_classes, in_channels, pretrained, checkpoint_path): + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + + batch_size = 1 + model_names = list_models() + for model_name in model_names: + if checkpoint_path != "": + pretrained = False + network = create_model( + model_name=model_name, + num_classes=num_classes, + in_channels=in_channels, + pretrained=pretrained, + checkpoint_path=checkpoint_path, + ) + + # create dataset + dataset_eval = create_dataset( + name="ImageNet", + root="/home/mindspore/dataset/imagenet2012/imagenet/imagenet_original", + split="val", + num_samples=1, + num_parallel_workers=1, + download=False, + ) + + # create transform + transform_list = create_transforms( + dataset_name="ImageNet", + is_training=False, + image_resize=224, + interpolation="bilinear", + ) + + # load dataset + loader_eval = create_loader( + dataset=dataset_eval, + batch_size=batch_size, + drop_remainder=False, + is_training=False, + transform=transform_list, + num_parallel_workers=1, + ) + + # create model + network = network + network.set_train(False) + + # create loss + loss = create_loss(name="CE", label_smoothing=0.1, aux_factor=0) + + # Define eval metrics. + eval_metrics = {"Top_1_Accuracy": nn.Top1CategoricalAccuracy(), "Top_5_Accuracy": nn.Top5CategoricalAccuracy()} + + # init model + model = Model(network, loss_fn=loss, metrics=eval_metrics) + iterator = loader_eval.create_dict_iterator() + data = iterator.__next__() + result = model.predict(data["image"]) + print(result.shape) + assert result.shape[0] == batch_size + assert result.shape[1] == num_classes diff --git a/tests/modules/parallel/test_parallel_optim.py b/tests/modules/parallel/test_parallel_optim.py new file mode 100644 index 0000000..c9601b5 --- /dev/null +++ b/tests/modules/parallel/test_parallel_optim.py @@ -0,0 +1,702 @@ +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +import mindspore.nn as nn +from mindspore import Tensor +from mindspore.common.initializer import Normal +from mindspore.communication import get_group_size, get_rank, init +from mindspore.nn import TrainOneStepCell, WithLossCell + +from mindcv.optim import create_optimizer + + +class SimpleCNN(nn.Cell): + def __init__(self, num_classes=10, in_channels=1, include_top=True): + super(SimpleCNN, self).__init__() + self.include_top = include_top + + self.conv1 = nn.Conv2d(in_channels, 6, 5, pad_mode="valid") + self.conv2 = nn.Conv2d(6, 16, 5, pad_mode="valid") + self.relu = nn.ReLU() + self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + + if self.include_top: + self.flatten = nn.Flatten() + self.fc = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + def construct(self, x): + x = self.conv1(x) + x = self.relu(x) + x = self.max_pool2d(x) + x = self.conv2(x) + x = self.relu(x) + x = self.max_pool2d(x) + if self.include_top: + x = self.flatten(x) + x = self.fc(x) + return x + + +@pytest.mark.parametrize("opt", ["sgd", "momentum"]) +@pytest.mark.parametrize("nesterov", [True, False]) +@pytest.mark.parametrize("filter_bias_and_bn", [True, False]) +def test_sgd_optimizer(opt, nesterov, filter_bias_and_bn): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), + opt, + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=nesterov, + filter_bias_and_bn=filter_bias_and_bn, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("bs", [1, 2, 4, 8, 16]) +@pytest.mark.parametrize("opt", ["adam", "adamW", "rmsprop", "adagrad"]) +def test_bs_adam_optimizer(opt, bs): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer(network.trainable_params(), opt, lr=0.01, weight_decay=1e-5) + + bs = bs + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("loss_scale", [0.1, 0.2, 0.3, 0.5, 0.9, 1.0]) +@pytest.mark.parametrize("weight_decay", [0.00001, 0.0001, 0.001, 0.005, 0.01, 0.05]) +@pytest.mark.parametrize("lr", [0.0001, 0.001, 0.005, 0.05, 0.1, 0.2]) +def test_lr_weight_decay_loss_scale_optimizer(lr, weight_decay, loss_scale): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), + "adamW", + lr=lr, + weight_decay=weight_decay, + loss_scale=loss_scale, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + + print(f"{lr}, {weight_decay}, {loss_scale}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("momentum", [0.1, 0.2, 0.5, 0.9, 0.99]) +def test_momentum_optimizer(momentum): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=momentum, + nesterov=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{momentum}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_001_filter_bias_and_bn_optimizer(): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.01}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "adamW", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_0001_filter_bias_and_bn_optimizer(): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.001}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "adamW", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("momentum", [-0.1, -1.0, -2]) +def test_wrong_momentum_optimizer(momentum): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "momentum", + lr=0.01, + weight_decay=0.0001, + momentum=momentum, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{momentum}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("loss_scale", [-0.1, -1.0]) +def test_wrong_loss_scale_optimizer(loss_scale): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "momentum", + lr=0.01, + weight_decay=0.0001, + momentum=0.9, + loss_scale=loss_scale, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{loss_scale}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + if cur_loss < begin_loss: + raise ValueError + + +@pytest.mark.parametrize("weight_decay", [-0.1, -1.0]) +def test_wrong_weight_decay_optimizer(weight_decay): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "adamW", + lr=0.01, + weight_decay=weight_decay, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{weight_decay}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("lr", [-1.0, -0.1]) +def test_wrong_lr_optimizer(lr): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "adamW", + lr=lr, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{lr}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_01_filter_bias_and_bn_optimizer(): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.1}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("opt", ["test", "bdam", "mindspore"]) +def test_wrong_opt_optimizer(opt): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + opt, + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_wrong_params_more_optimizer(): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + conv_params.append("test") + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.0}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_wrong_params_input_optimizer(): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = [1, 2, 3, 4] + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.0}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize( + "mode", + [ + ms.GRAPH_MODE, + ms.PYNATIVE_MODE, + ], +) +def test_mode_mult_single_optimizer(mode): + init("nccl") + device_num = get_group_size() + rank_id = get_rank() # noqa: F841 + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + ms.set_context(mode=mode) + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.1}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" diff --git a/tests/modules/parallel/test_parallel_transforms.py b/tests/modules/parallel/test_parallel_transforms.py new file mode 100644 index 0000000..3cb2c61 --- /dev/null +++ b/tests/modules/parallel/test_parallel_transforms.py @@ -0,0 +1,129 @@ +import sys + +sys.path.append("../..") + +import pytest + +import mindspore as ms +from mindspore.communication import get_group_size, get_rank, init + +from mindcv.data import create_dataset, create_loader, create_transforms + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["ImageNet"]) +@pytest.mark.parametrize("image_resize", [224, 256, 320]) +@pytest.mark.parametrize("is_training", [True, False]) +def test_transforms_distribute_imagenet(mode, name, image_resize, is_training): + """ + test transform_list API(distribute) + command: mpirun -n 8 pytest -s test_transforms.py::test_transforms_distribute_imagenet + + API Args: + dataset_name='', + image_resize=224, + is_training=False, + **kwargs + """ + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + + root = "/data0/dataset/imagenet2012/imagenet_original/" + dataset = create_dataset( + name=name, + root=root, + split="train", + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=8, + download=False, + ) + + # create transforms + transform_list = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=is_training, + ) + + # load dataset + loader = create_loader( + dataset=dataset, + batch_size=32, + drop_remainder=True, + is_training=is_training, + transform=transform_list, + num_parallel_workers=8, + ) + + print(loader) + print(loader.output_shapes()) + + assert loader.output_shapes()[0][2] == image_resize, "image_resize error !" + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["MNIST", "CIFAR10"]) +@pytest.mark.parametrize("image_resize", [224, 256, 320]) +@pytest.mark.parametrize("is_training", [True, False]) +@pytest.mark.parametrize("download", [True, False]) +def test_transforms_distribute_imagenet_mc(mode, name, image_resize, is_training, download): + """ + test transform_list API(distribute) + command: mpirun -n 8 pytest -s test_transforms.py::test_transforms_distribute_imagenet_mc + + API Args: + dataset_name='', + image_resize=224, + is_training=False, + **kwargs + """ + ms.set_context(mode=mode) + + init("nccl") + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + + dataset = create_dataset( + name=name, + split="train", + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=8, + download=download, + ) + + # create transforms + transform_list = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=is_training, + ) + + # load dataset + loader = create_loader( + dataset=dataset, + batch_size=32, + drop_remainder=True, + is_training=is_training, + transform=transform_list, + num_parallel_workers=8, + ) + + print(loader) + print(loader.output_shapes()) + + assert loader.output_shapes()[0][2] == image_resize, "image_resize error !" diff --git a/tests/modules/test_config.py b/tests/modules/test_config.py new file mode 100644 index 0000000..bfc6037 --- /dev/null +++ b/tests/modules/test_config.py @@ -0,0 +1,75 @@ +import os +import sys + +sys.path.append(".") + +import pytest +import yaml + +from config import _check_cfgs_in_parser, create_parser, parse_args + + +def test_checker_valid(): + cfgs = yaml.safe_load( + """ + mode: 1 + dataset: imagenet + """ + ) + _, parser = create_parser() + _check_cfgs_in_parser(cfgs, parser) + + +def test_checker_invalid(): + cfgs = yaml.safe_load( + """ + mode: 1 + dataset: imagenet + valid: False + """ + ) + _, parser = create_parser() + with pytest.raises(KeyError) as exc_info: + _check_cfgs_in_parser(cfgs, parser) + assert exc_info.type is KeyError + assert exc_info.value.args[0] == "valid does not exist in ArgumentParser!" + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("dataset", ["mnist", "imagenet"]) +def test_parse_args_without_yaml(mode, dataset): + args = parse_args([f"--mode={mode}", f"--dataset={dataset}"]) + assert args.mode == mode + assert args.dataset == dataset + assert args.amp_level == "O0" # default value + + +@pytest.mark.parametrize("cfg_yaml", ["configs/resnet/resnet_18_ascend.yaml"]) +@pytest.mark.parametrize("mode", [1]) +@pytest.mark.parametrize("dataset", ["mnist"]) +def test_parse_args_with_yaml(cfg_yaml, mode, dataset): + args = parse_args([f"--config={cfg_yaml}", f"--mode={mode}", f"--dataset={dataset}"]) + assert args.mode == mode + assert args.dataset == dataset + with open(cfg_yaml, "r") as f: + cfg = yaml.safe_load(f) + model = cfg["model"] + assert args.model == model # from cfg.yaml + + +def test_parse_args_from_all_yaml(): + cfgs_root = "configs" + cfg_paths = [] + for dirpath, dirnames, filenames in os.walk(cfgs_root): + for filename in filenames: + if filename.endswith((".yaml", "yml")): + cfg_paths.append(os.path.join(dirpath, filename)) + for cfg_yaml in cfg_paths: + try: + args = parse_args([f"--config={cfg_yaml}"]) + with open(cfg_yaml, "r") as f: + cfg = yaml.safe_load(f) + model = cfg["model"] + assert args.model == model + except KeyError as e: + raise AssertionError(f"{cfg_yaml} has some invalid options: {e}") diff --git a/tests/modules/test_dataset.py b/tests/modules/test_dataset.py new file mode 100644 index 0000000..e2561f6 --- /dev/null +++ b/tests/modules/test_dataset.py @@ -0,0 +1,102 @@ +import os +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms + +from mindcv.data import create_dataset, get_dataset_download_root +from mindcv.utils.download import DownLoad + + +# test imagenet +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["ImageNet"]) +@pytest.mark.parametrize("split", ["train", "val"]) +@pytest.mark.parametrize("shuffle", [True, False]) +@pytest.mark.parametrize("num_samples", [2, None]) +@pytest.mark.parametrize("num_parallel_workers", [2]) +def test_create_dataset_standalone_imagenet(mode, name, split, shuffle, num_samples, num_parallel_workers): + """ + test create_dataset API(standalone) + command: pytest -s test_dataset.py::test_create_dataset_standalone_imagenet + + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + + ms.set_context(mode=mode) + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + dataset = create_dataset( + name=name, + root=data_dir, + split=split, + shuffle=shuffle, + num_samples=num_samples, + num_parallel_workers=num_parallel_workers, + download=False, + ) + + assert type(dataset) == ms.dataset.engine.datasets_vision.ImageFolderDataset + assert dataset is not None + + +# test MNIST CIFAR10 +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["MNIST", "CIFAR10"]) +@pytest.mark.parametrize("split", ["train", "test"]) +@pytest.mark.parametrize("shuffle", [True, False]) +@pytest.mark.parametrize("num_samples", [2, None]) +@pytest.mark.parametrize("num_parallel_workers", [2]) +@pytest.mark.parametrize("download", [True]) +def test_create_dataset_standalone_mc(mode, name, split, shuffle, num_samples, num_parallel_workers, download): + """ + test create_dataset API(standalone) + command: pytest -s test_dataset.py::test_create_dataset_standalone_mc + + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + + ms.set_context(mode=mode) + + dataset = create_dataset( + name=name, + split=split, + shuffle=shuffle, + num_samples=num_samples, + num_parallel_workers=num_parallel_workers, + download=download, + ) + + assert ( + type(dataset) == ms.dataset.engine.datasets_vision.MnistDataset + or type(dataset) == ms.dataset.engine.datasets_vision.Cifar10Dataset + ) + assert dataset is not None diff --git a/tests/modules/test_loader.py b/tests/modules/test_loader.py new file mode 100644 index 0000000..343c913 --- /dev/null +++ b/tests/modules/test_loader.py @@ -0,0 +1,92 @@ +import os +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms +from mindspore.dataset.transforms import OneHot + +from mindcv.data import create_dataset, create_loader, get_dataset_download_root +from mindcv.utils.download import DownLoad + +num_classes = 1 + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("split", ["train"]) +@pytest.mark.parametrize("batch_size", [1, 16]) +@pytest.mark.parametrize("drop_remainder", [True, False]) +@pytest.mark.parametrize("is_training", [True, False]) +@pytest.mark.parametrize("transform", [None]) +@pytest.mark.parametrize("target_transform", [None, [OneHot(num_classes)]]) +@pytest.mark.parametrize("mixup", [0, 1]) +@pytest.mark.parametrize("num_classes", [2]) +@pytest.mark.parametrize("num_parallel_workers", [None, 2]) +@pytest.mark.parametrize("python_multiprocessing", [True, False]) +def test_dataset_loader_standalone( + mode, + split, + batch_size, + drop_remainder, + is_training, + transform, + target_transform, + mixup, + num_classes, + num_parallel_workers, + python_multiprocessing, +): + """ + test create_dataset API(standalone) + command: pytest -s test_dataset.py::test_create_dataset_standalone + API Args: + name: str = '', + root: str = './', + split: str = 'train', + shuffle: bool = True, + num_samples: Optional[bool] = None, + num_shards: Optional[int] = None, + shard_id: Optional[int] = None, + num_parallel_workers: Optional[int] = None, + download: bool = False, + **kwargs + """ + + ms.set_context(mode=mode) + name = "ImageNet" + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + + dataset = create_dataset( + name=name, + root=data_dir, + split=split, + shuffle=False, + num_parallel_workers=num_parallel_workers, + download=False, + ) + + # load dataset + loader_train = create_loader( + dataset=dataset, + batch_size=batch_size, + drop_remainder=drop_remainder, + is_training=is_training, + transform=transform, + target_transform=target_transform, + mixup=mixup, + num_classes=num_classes, + num_parallel_workers=num_parallel_workers, + python_multiprocessing=python_multiprocessing, + ) + out_batch_size = loader_train.get_batch_size() + out_shapes = loader_train.output_shapes()[0] + assert out_batch_size == batch_size + assert out_shapes == [batch_size, 3, 224, 224] diff --git a/tests/modules/test_loss.py b/tests/modules/test_loss.py new file mode 100644 index 0000000..8995d0b --- /dev/null +++ b/tests/modules/test_loss.py @@ -0,0 +1,124 @@ +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +from mindspore import nn +from mindspore.common.initializer import Normal +from mindspore.nn import TrainOneStepCell, WithLossCell + +from mindcv.loss import create_loss +from mindcv.optim import create_optimizer + +ms.set_seed(1) +np.random.seed(1) + + +class SimpleCNN(nn.Cell): + def __init__(self, num_classes=10, in_channels=1, include_top=True, aux_head=False, aux_head2=False): + super(SimpleCNN, self).__init__() + self.include_top = include_top + self.aux_head = aux_head + self.aux_head2 = aux_head2 + + self.conv1 = nn.Conv2d(in_channels, 6, 5, pad_mode="valid") + self.conv2 = nn.Conv2d(6, 16, 5, pad_mode="valid") + self.relu = nn.ReLU() + self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + + if self.include_top: + self.flatten = nn.Flatten() + self.fc = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + if self.aux_head: + self.fc_aux = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + if self.aux_head: + self.fc_aux2 = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + def construct(self, x): + x = self.conv1(x) + x = self.relu(x) + x = self.max_pool2d(x) + x = self.conv2(x) + x = self.relu(x) + x = self.max_pool2d(x) + ret = x + if self.include_top: + x_flatten = self.flatten(x) + x = self.fc(x_flatten) + ret = x + if self.aux_head: + x_aux = self.fc_aux(x_flatten) + ret = (x, x_aux) + if self.aux_head2: + x_aux2 = self.fc_aux2(x_flatten) + ret = (x, x_aux, x_aux2) + return ret + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["CE", "BCE"]) +@pytest.mark.parametrize("reduction", ["mean", "sum"]) +@pytest.mark.parametrize("label_smoothing", [0.0, 0.1]) +@pytest.mark.parametrize("aux_factor", [0.0, 0.2]) +@pytest.mark.parametrize("weight", [None, 1]) +@pytest.mark.parametrize("double_aux", [False, True]) # TODO: decouple as one test case +def test_loss(mode, name, reduction, label_smoothing, aux_factor, weight, double_aux): + weight = None + print( + f"mode={mode}; loss_name={name}; has_weight=False; reduction={reduction};\ + label_smoothing={label_smoothing}; aux_factor={aux_factor}" + ) + ms.set_context(mode=mode) + + bs = 8 + num_classes = c = 10 + # create data + x = ms.Tensor(np.random.randn(bs, 1, 32, 32), ms.float32) + # logits = ms.Tensor(np.random.rand(bs, c), ms.float32) + y = np.random.randint(0, c, size=(bs)) + y_onehot = np.eye(c)[y] + y = ms.Tensor(y, ms.int32) # C + y_onehot = ms.Tensor(y_onehot, ms.float32) # N, C + if name == "BCE": + label = y_onehot + else: + label = y + + if weight is not None: + weight = np.random.randn(c) + weight = weight / weight.sum() # normalize + weight = ms.Tensor(weight, ms.float32) + + # set network + aux_head = aux_factor > 0.0 + aux_head2 = aux_head and double_aux + network = SimpleCNN(in_channels=1, num_classes=num_classes, aux_head=aux_head, aux_head2=aux_head2) + + # set loss + net_loss = create_loss( + name=name, weight=weight, reduction=reduction, label_smoothing=label_smoothing, aux_factor=aux_factor + ) + + # optimize + net_with_loss = WithLossCell(network, net_loss) + + net_opt = create_optimizer(network.trainable_params(), "adam", lr=0.001, weight_decay=1e-7) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(x, label) + for _ in range(10): + cur_loss = train_network(x, label) + + print("begin loss: {}, end loss: {}".format(begin_loss, cur_loss)) + + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +if __name__ == "__main__": + test_loss(0, "BCE", "mean", 0.1, 0.1, None, True) diff --git a/tests/modules/test_models.py b/tests/modules/test_models.py new file mode 100644 index 0000000..3ddd9df --- /dev/null +++ b/tests/modules/test_models.py @@ -0,0 +1,163 @@ +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +from mindspore import Tensor + +from mindcv import list_models, list_modules +from mindcv.models import ( + create_model, + get_pretrained_cfg_value, + is_model_in_modules, + is_model_pretrained, + model_entrypoint, +) + +# TODO: the global avg pooling op used in EfficientNet is not supported for CPU. +# TODO: memory resource is limited on free github action runner, ask the PM for self-hosted runners! +model_name_list = [ + "BiTresnet50", + "RepMLPNet_T224", + "convit_tiny", + "convnext_tiny", + "crossvit9", + "densenet121", + "dpn92", + "edgenext_small", + "ghostnet_1x", + "googlenet", + "hrnet_w32", + "inception_v3", + "inception_v4", + "mixnet_s", + "mnasnet0_5", + "mobilenet_v1_025_224", + "mobilenet_v2_035_128", + "mobilenet_v3_small_075", + "nasnet_a_4x1056", + "pnasnet", + "poolformer_s12", + "pvt_tiny", + "pvt_v2_b0", + "regnet_x_200mf", + "repvgg_a0", + "res2net50", + "resnet18", + "resnext50_32x4d", + "rexnet_x09", + "seresnet18", + "shufflenet_v1_g3_x0_5", + "shufflenet_v2_x0_5", + "skresnet18", + "squeezenet1_0", + "swin_tiny", + "visformer_tiny", + "vit_b_32_224", + "xception", +] + +check_loss_decrease = False + + +# @pytest.mark.parametrize('mode', [ms.PYNATIVE_MODE, ms.GRAPH_MODE]) +@pytest.mark.parametrize("name", model_name_list) +def test_model_forward(name): + # ms.set_context(mode=ms.PYNATIVE_MODE) + bs = 2 + c = 10 + model = create_model(model_name=name, num_classes=c) + input_size = get_pretrained_cfg_value(model_name=name, cfg_key="input_size") + if input_size: + input_size = (bs,) + tuple(input_size) + else: + input_size = (bs, 3, 224, 224) + dummy_input = Tensor(np.random.rand(*input_size), dtype=ms.float32) + y = model(dummy_input) + assert y.shape == (bs, 10), "output shape not match" + + +""" +@pytest.mark.parametrize('name', model_name_list) +def test_model_backward(name): + # TODO: check number of gradient == number of parameters + bs = 8 + c = 2 + input_data = Tensor(np.random.rand(bs, 3, 224, 224), dtype=ms.float32) + label = Tensor(np.random.randint(0, high=c, size=(bs)), dtype=ms.int32) + + model= create_model(model_name=name, num_classes=c) + + net_loss = create_loss(name='CE') + net_opt = create_optimizer(model.trainable_params(), 'adam', lr=0.0001) + net_with_loss = WithLossCell(model, net_loss) + + train_network = TrainOneStepCell(net_with_loss, net_opt) + + begin_loss = train_network(input_data, label) + for _ in range(2): + cur_loss = train_network(input_data, label) + print("begin loss: {}, end loss: {}".format(begin_loss, cur_loss)) + + assert not math.isnan(cur_loss), 'loss NaN when training {name}' + if check_loss_decrease: + assert cur_loss < begin_loss, 'Loss does NOT decrease' +""" + + +def test_list_models(): + model_name_list = list_models() + for model_name in model_name_list: + print(model_name) + + +def test_model_entrypoint(): + model_name_list = list_models() + for model_name in model_name_list: + print(model_entrypoint(model_name)) + + +def test_list_modules(): + module_name_list = list_modules() + for module_name in module_name_list: + print(module_name) + + +def test_is_model_in_modules(): + model_name_list = list_models() + module_names = list_modules() + ouptput_false_list = [] + for model_name in model_name_list: + if not is_model_in_modules(model_name, module_names): + ouptput_false_list.append(model_name) + assert ouptput_false_list == [], "{}\n, Above mentioned models do not exist within a subset of modules.".format( + ouptput_false_list + ) + + +def test_is_model_pretrained(): + model_name_list = list_models() + ouptput_false_list = [] + num_pretrained = 0 + for model_name in model_name_list: + if not is_model_pretrained(model_name): + ouptput_false_list.append(model_name) + else: + num_pretrained += 1 + # assert ouptput_false_list == [], \ + # '{}\n, Above mentioned models do not have pretrained models.'.format(ouptput_false_list) + + assert num_pretrained > 0, "No pretrained models" + + +if __name__ == "__main__": + test_model_forward("pnasnet") + """ + for model in model_name_list: + if '384' in model: + print(model) + test_model_forward(model) + """ diff --git a/tests/modules/test_optim.py b/tests/modules/test_optim.py new file mode 100644 index 0000000..ac898ef --- /dev/null +++ b/tests/modules/test_optim.py @@ -0,0 +1,548 @@ +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +import mindspore.nn as nn +from mindspore import Tensor +from mindspore.common.initializer import Normal +from mindspore.nn import TrainOneStepCell, WithLossCell + +from mindcv.optim import create_optimizer + + +class SimpleCNN(nn.Cell): + def __init__(self, num_classes=10, in_channels=1, include_top=True): + super(SimpleCNN, self).__init__() + self.include_top = include_top + + self.conv1 = nn.Conv2d(in_channels, 6, 5, pad_mode="valid") + self.conv2 = nn.Conv2d(6, 16, 5, pad_mode="valid") + self.relu = nn.ReLU() + self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + + if self.include_top: + self.flatten = nn.Flatten() + self.fc = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + def construct(self, x): + x = self.conv1(x) + x = self.relu(x) + x = self.max_pool2d(x) + x = self.conv2(x) + x = self.relu(x) + x = self.max_pool2d(x) + if self.include_top: + x = self.flatten(x) + x = self.fc(x) + return x + + +@pytest.mark.parametrize("opt", ["sgd", "momentum"]) +@pytest.mark.parametrize("nesterov", [True, False]) +@pytest.mark.parametrize("filter_bias_and_bn", [True, False]) +def test_sgd_optimizer(opt, nesterov, filter_bias_and_bn): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), + opt, + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + nesterov=nesterov, + filter_bias_and_bn=filter_bias_and_bn, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("bs", [1, 2, 4, 8, 16]) +@pytest.mark.parametrize("opt", ["adam", "adamW", "rmsprop", "adagrad"]) +def test_bs_adam_optimizer(opt, bs): + network = SimpleCNN(num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer(network.trainable_params(), opt, lr=0.01, weight_decay=1e-5) + + bs = bs + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("loss_scale", [0.1, 0.2, 0.3, 0.5, 0.9, 1.0]) +@pytest.mark.parametrize("weight_decay", [0.00001, 0.0001, 0.001, 0.005, 0.01, 0.05]) +@pytest.mark.parametrize("lr", [0.0001, 0.001, 0.005, 0.05, 0.1, 0.2]) +def test_lr_weight_decay_loss_scale_optimizer(lr, weight_decay, loss_scale): + network = SimpleCNN(num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), "adamW", lr=lr, weight_decay=weight_decay, loss_scale=loss_scale + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + + print(f"{lr}, {weight_decay}, {loss_scale}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("momentum", [0.1, 0.2, 0.5, 0.9, 0.99]) +def test_momentum_optimizer(momentum): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + + net_opt = create_optimizer( + network.trainable_params(), "momentum", lr=0.01, weight_decay=1e-5, momentum=momentum, nesterov=False + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{momentum}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_001_filter_bias_and_bn_optimizer(): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.01}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, "adamW", lr=0.01, weight_decay=1e-5, momentum=0.9, nesterov=False, filter_bias_and_bn=False + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_0001_filter_bias_and_bn_optimizer(): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.001}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, "adamW", lr=0.01, weight_decay=1e-5, momentum=0.9, nesterov=False, filter_bias_and_bn=False + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("momentum", [-0.1, -1.0, -2]) +def test_wrong_momentum_optimizer(momentum): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "momentum", + lr=0.01, + weight_decay=0.0001, + momentum=momentum, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{momentum}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("loss_scale", [-0.1, -1.0]) +def test_wrong_loss_scale_optimizer(loss_scale): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "momentum", + lr=0.01, + weight_decay=0.0001, + momentum=0.9, + loss_scale=loss_scale, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{loss_scale}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + if cur_loss < begin_loss: + raise ValueError + + +@pytest.mark.parametrize("weight_decay", [-0.1, -1.0]) +def test_wrong_weight_decay_optimizer(weight_decay): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "adamW", + lr=0.01, + weight_decay=weight_decay, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{weight_decay}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("lr", [-1.0, -0.1]) +def test_wrong_lr_optimizer(lr): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + "adamW", + lr=lr, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{lr}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_param_lr_01_filter_bias_and_bn_optimizer(): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.1}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, "momentum", lr=0.01, weight_decay=1e-5, momentum=0.9, nesterov=False, filter_bias_and_bn=False + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize("opt", ["test", "bdam", "mindspore"]) +def test_wrong_opt_optimizer(opt): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + network.trainable_params(), + opt, + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=True, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f"{opt}, begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_wrong_params_more_optimizer(): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + conv_params.append("test") + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.0}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +def test_wrong_params_input_optimizer(): + with pytest.raises((RuntimeError, TypeError, ValueError)): + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = [1, 2, 3, 4] + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.0}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, + "momentum", + lr=0.01, + weight_decay=1e-5, + momentum=0.9, + loss_scale=1.0, + nesterov=False, + filter_bias_and_bn=False, + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" + + +@pytest.mark.parametrize( + "mode", + [ + ms.GRAPH_MODE, + ms.PYNATIVE_MODE, + ], +) +def test_mode_mult_single_optimizer(mode): + ms.set_context(mode=mode) + network = SimpleCNN(in_channels=1, num_classes=10) + conv_params = list(filter(lambda x: "conv" in x.name, network.trainable_params())) + no_conv_params = list(filter(lambda x: "conv" not in x.name, network.trainable_params())) + group_params = [ + {"params": conv_params, "weight_decay": 0.01, "grad_centralization": True}, + {"params": no_conv_params, "lr": 0.1}, + {"order_params": network.trainable_params()}, + ] + net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") + net_opt = create_optimizer( + group_params, "momentum", lr=0.01, weight_decay=1e-5, momentum=0.9, nesterov=False, filter_bias_and_bn=False + ) + + bs = 8 + input_data = Tensor(np.ones([bs, 1, 32, 32]).astype(np.float32) * 0.01) + label = Tensor(np.ones([bs]).astype(np.int32)) + + net_with_loss = WithLossCell(network, net_loss) + train_network = TrainOneStepCell(net_with_loss, net_opt) + + train_network.set_train() + + begin_loss = train_network(input_data, label) + for i in range(10): + cur_loss = train_network(input_data, label) + print(f" begin loss: {begin_loss}, end loss: {cur_loss}") + + # check output correctness + assert cur_loss < begin_loss, "Loss does NOT decrease" diff --git a/tests/modules/test_scheduler.py b/tests/modules/test_scheduler.py new file mode 100644 index 0000000..1830ae1 --- /dev/null +++ b/tests/modules/test_scheduler.py @@ -0,0 +1,92 @@ +import sys + +sys.path.append(".") + +import numpy as np + +from mindcv.scheduler import dynamic_lr + + +# fmt: off +def test_scheduler_dynamic(): + # constant_lr + lrs_manually = [0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, + 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05] + lrs_ms = dynamic_lr.constant_lr(0.5, 4, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # linear_lr + lrs_manually = [0.025, 0.025, 0.03125, 0.03125, 0.0375, 0.0375, 0.04375, 0.04375, + 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05] + lrs_ms = dynamic_lr.linear_lr(0.5, 1.0, 4, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # linear_refined_lr + lrs_manually = [0.025, 0.028125, 0.03125, 0.034375, 0.0375, 0.040625, 0.04375, 0.046875, + 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05] + lrs_ms = dynamic_lr.linear_refined_lr(0.5, 1.0, 4, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # polynomial_lr + lrs_manually = [0.05, 0.05, 0.0375, 0.0375, 0.025, 0.025, 0.0125, 0.0125, + 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] + lrs_ms = dynamic_lr.polynomial_lr(4, 1.0, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # polynomial_refined_lr + lrs_manually = [0.05, 0.04375, 0.0375, 0.03125, 0.025, 0.01875, 0.0125, 0.00625, + 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] + lrs_ms = dynamic_lr.polynomial_refined_lr(4, 1.0, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # exponential_lr + lrs_manually = [0.05, 0.05, 0.045, 0.045, 0.0405, 0.0405, + 0.03645, 0.03645, 0.032805, 0.032805, 0.0295245, 0.0295245, + 0.02657205, 0.02657205, 0.023914845, 0.023914845, + 0.0215233605, 0.0215233605, 0.01937102445, 0.01937102445] + lrs_ms = dynamic_lr.exponential_lr(0.9, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # exponential_refined_lr + lrs_manually = [0.05, 0.047434164902525694, 0.045, 0.042690748412273126, 0.0405, 0.03842167357104581, + 0.03645, 0.03457950621394123, 0.032805, 0.031121555592547107, 0.0295245, 0.0280094000332924, + 0.02657205, 0.02520846002996316, 0.023914845, 0.022687614026966844, + 0.0215233605, 0.02041885262427016, 0.01937102445, 0.018376967361843143] + lrs_ms = dynamic_lr.exponential_refined_lr(0.9, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # step_lr + lrs_manually = [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, + 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, + 0.0125, 0.0125, 0.0125, 0.0125, 0.0125, 0.0125, 0.00625, 0.00625] + lrs_ms = dynamic_lr.step_lr(3, 0.5, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # multi_step_lr + lrs_manually = [0.05, 0.05, 0.05, 0.05, 0.05, 0.05, + 0.025, 0.025, 0.025, 0.025, 0.025, 0.025, + 0.0125, 0.0125, 0.0125, 0.0125, 0.0125, 0.0125, 0.0125, 0.0125] + lrs_ms = dynamic_lr.multi_step_lr([3, 6], 0.5, lr=0.05, steps_per_epoch=2, epochs=10) + assert np.allclose(lrs_ms, lrs_manually) + + # cosine_annealing_lr + lrs_manually = [1.0, 1.0, 0.9045084971874737, 0.9045084971874737, 0.6545084971874737, 0.6545084971874737, + 0.34549150281252633, 0.34549150281252633, 0.09549150281252633, 0.09549150281252633, + 0.0, 0.0, 0.09549150281252622, 0.09549150281252622, 0.3454915028125262, 0.3454915028125262, + 0.6545084971874736, 0.6545084971874736, 0.9045084971874737, 0.9045084971874737, + 1.0, 1.0, 0.9045084971874741, 0.9045084971874741, 0.6545084971874738, 0.6545084971874738, + 0.34549150281252644, 0.34549150281252644, 0.09549150281252639, 0.09549150281252639] + lrs_ms = dynamic_lr.cosine_annealing_lr(5, 0.0, eta_max=1.0, steps_per_epoch=2, epochs=15) + assert np.allclose(lrs_ms, lrs_manually) + + # cosine_annealing_warm_restarts_lr + lrs_manually = [1.0, 0.9755282581475768, 0.9045084971874737, 0.7938926261462366, 0.6545084971874737, 0.5, + 0.34549150281252633, 0.2061073738537635, 0.09549150281252633, 0.024471741852423234, + 1.0, 0.9938441702975689, 0.9755282581475768, 0.9455032620941839, 0.9045084971874737, + 0.8535533905932737, 0.7938926261462366, 0.7269952498697734, 0.6545084971874737, 0.5782172325201155, + 0.5, 0.4217827674798846, 0.34549150281252633, 0.2730047501302266, 0.2061073738537635, + 0.14644660940672627, 0.09549150281252633, 0.054496737905816106, 0.024471741852423234, + 0.00615582970243117] + lrs_ms = dynamic_lr.cosine_annealing_warm_restarts_lr(5, 2, 0.0, eta_max=1.0, steps_per_epoch=2, epochs=15) + assert np.allclose(lrs_ms, lrs_manually) +# fmt: on diff --git a/tests/modules/test_transforms.py b/tests/modules/test_transforms.py new file mode 100644 index 0000000..8016ae4 --- /dev/null +++ b/tests/modules/test_transforms.py @@ -0,0 +1,220 @@ +import collections +import os +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms + +from mindcv.data import create_dataset, create_loader, create_transforms, get_dataset_download_root +from mindcv.utils.download import DownLoad + + +# test imagenet +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["ImageNet"]) +@pytest.mark.parametrize("image_resize", [224, 256]) +@pytest.mark.parametrize("is_training", [True, False]) +def test_transforms_standalone_imagenet(mode, name, image_resize, is_training): + """ + test transform_list API(distribute) + command: pytest -s test_transforms.py::test_transforms_standalone_imagenet + + API Args: + dataset_name='', + image_resize=224, + is_training=False, + **kwargs + """ + ms.set_context(mode=mode) + + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + + dataset = create_dataset( + name=name, + root=data_dir, + split="train", + shuffle=True, + num_samples=None, + num_parallel_workers=2, + download=False, + ) + + # create transforms + transform_list = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=is_training, + ) + + # load dataset + loader = create_loader( + dataset=dataset, + batch_size=32, + drop_remainder=True, + is_training=is_training, + transform=transform_list, + num_parallel_workers=2, + ) + + assert loader.output_shapes()[0][2] == image_resize, "image_resize error !" + + +# test mnist cifar10 +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["MNIST", "CIFAR10"]) +@pytest.mark.parametrize("image_resize", [224, 256]) +@pytest.mark.parametrize("is_training", [True, False]) +@pytest.mark.parametrize("download", [True]) +def test_transforms_standalone_dataset_mc(mode, name, image_resize, is_training, download): + """ + test transform_list API(distribute) + command: pytest -s test_transforms.py::test_transforms_standalone_imagenet_mc + + API Args: + dataset_name='', + image_resize=224, + is_training=False, + **kwargs + """ + ms.set_context(mode=mode) + + dataset = create_dataset( + name=name, + split="train", + shuffle=True, + num_samples=None, + num_parallel_workers=2, + download=download, + ) + + # create transforms + transform_list = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=is_training, + ) + + # load dataset + loader = create_loader( + dataset=dataset, + batch_size=32, + drop_remainder=True, + is_training=is_training, + transform=transform_list, + num_parallel_workers=2, + ) + + assert loader.output_shapes()[0][2] == image_resize, "image_resize error !" + + +# test is_training +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("name", ["ImageNet"]) +@pytest.mark.parametrize("image_resize", [224, 256]) +def test_transforms_standalone_imagenet_is_training(mode, name, image_resize): + """ + test transform_list API(distribute) + command: pytest -s test_transforms.py::test_transforms_standalone_imagenet_is_training + + API Args: + dataset_name='', + image_resize=224, + is_training=False, + **kwargs + """ + ms.set_context(mode=mode) + + # create transforms + transform_list_train = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=True, + ) + transform_list_val = create_transforms( + dataset_name=name, + image_resize=image_resize, + is_training=False, + ) + + assert type(transform_list_train) == list + assert type(transform_list_val) == list + assert transform_list_train != transform_list_val + + +def test_repeated_aug(): + distribute = False + # ms.set_context(mode=ms.PYNATIVE_MODE) + if distribute: + from mindspore.communication import get_group_size, get_rank, init + + ms.set_context(mode=ms.GRAPH_MODE) + init() + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context( + device_num=device_num, + parallel_mode="data_parallel", + gradients_mean=True, + ) + else: + device_num = 1 + rank_id = 0 + + name = "imagenet" + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + num_classes = 2 + num_aug_repeats = 3 + + dataset = create_dataset( + name=name, + root=data_dir, + split="val", + shuffle=True, + num_samples=None, + num_parallel_workers=8, + num_shards=device_num, + shard_id=rank_id, + download=False, + num_aug_repeats=num_aug_repeats, + ) + + # load dataset + loader = create_loader( + dataset=dataset, + batch_size=32, + drop_remainder=True, + is_training=False, + transform=None, + num_classes=num_classes, + num_parallel_workers=2, + ) + for epoch in range(1): + # cnt = 1 + for batch, (data, label) in enumerate(loader.create_tuple_iterator()): + mean_vals = data.mean(axis=[1, 2, 3]) + # print(mean_vals, mean_vals.shape) + rounded = [int(val * 10e8) for val in mean_vals] + rep_ele = [item for item, count in collections.Counter(rounded).items() if count > 1] + # print('repeated instance indices: ', len(rep_ele)) #, rep_ele) + assert len(rep_ele) > 0, "Not replicated instances found in the batch" + if batch == 0: + print("Epoch: ", epoch, "Batch: ", batch, "Rank: ", rank_id, "Label: ", label[:4]) + + +if __name__ == "__main__": + test_repeated_aug() diff --git a/tests/modules/test_utils.py b/tests/modules/test_utils.py new file mode 100644 index 0000000..9aecd7a --- /dev/null +++ b/tests/modules/test_utils.py @@ -0,0 +1,89 @@ +"""Test utils""" +import os +import sys + +sys.path.append(".") + +import numpy as np +import pytest + +import mindspore as ms +from mindspore import nn +from mindspore.common.initializer import Normal +from mindspore.nn import TrainOneStepCell, WithLossCell + +from mindcv.loss import create_loss +from mindcv.optim import create_optimizer +from mindcv.utils import CheckpointManager + +ms.set_seed(1) +np.random.seed(1) + + +class SimpleCNN(nn.Cell): + def __init__(self, num_classes=10, in_channels=1, include_top=True): + super(SimpleCNN, self).__init__() + self.include_top = include_top + self.conv1 = nn.Conv2d(in_channels, 6, 5, pad_mode="valid") + self.conv2 = nn.Conv2d(6, 16, 5, pad_mode="valid") + self.relu = nn.ReLU() + self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + + if self.include_top: + self.flatten = nn.Flatten() + self.fc = nn.Dense(16 * 5 * 5, num_classes, weight_init=Normal(0.02)) + + def construct(self, x): + x = self.conv1(x) + x = self.relu(x) + x = self.max_pool2d(x) + x = self.conv2(x) + x = self.relu(x) + x = self.max_pool2d(x) + ret = x + if self.include_top: + x_flatten = self.flatten(x) + x = self.fc(x_flatten) + ret = x + return ret + + +def validate(model, data, label): + model.set_train(False) + pred = model(data) + total = len(data) + acc = (pred.argmax(1) == label).sum() + acc /= total + return acc + + +@pytest.mark.parametrize("mode", [0, 1]) +@pytest.mark.parametrize("ckpt_save_policy", ["top_k", "latest_k"]) +def test_checkpoint_manager(mode, ckpt_save_policy): + ms.set_context(mode=mode) + + bs = 8 + num_classes = c = 10 + # create data + x = ms.Tensor(np.random.randn(bs, 1, 32, 32), ms.float32) + test_data = ms.Tensor(np.random.randn(bs, 1, 32, 32), ms.float32) + test_label = ms.Tensor(np.random.randint(0, c, size=(bs)), ms.int32) + y = np.random.randint(0, c, size=(bs)) + y = ms.Tensor(y, ms.int32) + label = y + + network = SimpleCNN(in_channels=1, num_classes=num_classes) + net_loss = create_loss() + + net_with_loss = WithLossCell(network, net_loss) + net_opt = create_optimizer(network.trainable_params(), "adam", lr=0.001, weight_decay=1e-7) + train_network = TrainOneStepCell(net_with_loss, net_opt) + train_network.set_train() + manager = CheckpointManager(ckpt_save_policy=ckpt_save_policy) + for t in range(3): + train_network(x, label) + acc = validate(network, test_data, test_label) + save_path = os.path.join("./" + f"network_{t + 1}.ckpt") + ckpoint_filelist = manager.save_ckpoint(network, num_ckpt=2, metric=acc, save_path=save_path) + + assert len(ckpoint_filelist) == 2, "num of checkpoints is NOT correct" diff --git a/tests/tasks/non_cpu/test_train_val_imagenet_subset.py b/tests/tasks/non_cpu/test_train_val_imagenet_subset.py new file mode 100644 index 0000000..f6517e7 --- /dev/null +++ b/tests/tasks/non_cpu/test_train_val_imagenet_subset.py @@ -0,0 +1,78 @@ +""" +Test train and validate pipelines. +For training, both graph mode and pynative mode with ms_function will be tested. +""" +import os +import subprocess +import sys + +sys.path.append(".") + +import pytest + +from mindcv.data import get_dataset_download_root +from mindcv.utils.download import DownLoad + +check_acc = True + + +@pytest.mark.parametrize("ema", [True, False]) +@pytest.mark.parametrize("val_while_train", [True, False]) +def test_train_ema(ema, val_while_train, model="resnet18"): + """train on a imagenet subset dataset""" + # prepare data + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + + # ---------------- test running train.py using the toy data --------- + dataset = "imagenet" + num_classes = 2 + ckpt_dir = "./tests/ckpt_tmp" + num_samples = 160 + num_epochs = 5 + batch_size = 20 + if os.path.exists(ckpt_dir): + os.system(f"rm {ckpt_dir} -rf") + if os.path.exists(data_dir): + download_str = f"--data_dir {data_dir}" + else: + download_str = "--download" + train_file = "train.py" + + cmd = ( + f"python {train_file} --dataset={dataset} --num_classes={num_classes} --model={model} " + f"--epoch_size={num_epochs} --ckpt_save_interval=2 --lr=0.0001 --num_samples={num_samples} " + f"--loss=CE --weight_decay=1e-6 --ckpt_save_dir={ckpt_dir} {download_str} --train_split=train " + f"--batch_size={batch_size} --pretrained --num_parallel_workers=2 --val_while_train={val_while_train} " + f"--val_split=val --val_interval=1 --ema" + ) + + print(f"Running command: \n{cmd}") + ret = subprocess.call(cmd.split(), stdout=sys.stdout, stderr=sys.stderr) + assert ret == 0, "Training fails" + + # --------- Test running validate.py using the trained model ------------- # + # begin_ckpt = os.path.join(ckpt_dir, f'{model}-1_1.ckpt') + end_ckpt = os.path.join(ckpt_dir, f"{model}-{num_epochs}_{num_samples // batch_size}.ckpt") + cmd = ( + f"python validate.py --model={model} --dataset={dataset} --val_split=val --data_dir={data_dir} " + f"--num_classes={num_classes} --ckpt_path={end_ckpt} --batch_size=40 --num_parallel_workers=2" + ) + # ret = subprocess.call(cmd.split(), stdout=sys.stdout, stderr=sys.stderr) + print(f"Running command: \n{cmd}") + p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE) + out, err = p.communicate() + # assert ret==0, 'Validation fails' + print(out) + + if check_acc: + res = out.decode() + idx = res.find("Accuracy") + acc = res[idx:].split(",")[0].split(":")[1] + print("Val acc: ", acc) + assert float(acc) > 0.5, "Acc is too low" diff --git a/tests/tasks/test_train_mnist.py b/tests/tasks/test_train_mnist.py new file mode 100644 index 0000000..73fbcd8 --- /dev/null +++ b/tests/tasks/test_train_mnist.py @@ -0,0 +1,105 @@ +import sys + +sys.path.append(".") + +import pytest + +import mindspore as ms +from mindspore import nn + +from mindcv.data import create_dataset, create_loader, create_transforms +from mindcv.loss import create_loss +from mindcv.models import create_model +from mindcv.optim import create_optimizer +from mindcv.scheduler import create_scheduler + + +@pytest.mark.parametrize("mode", [ms.GRAPH_MODE, ms.PYNATIVE_MODE]) +def test_train_mnist(mode): + """ + test mobilenet_v1_train_gpu(single) + """ + num_workers = 2 + num_classes = 10 + batch_size = 16 + num_epochs = 1 # noqa: F841 + + set_sink_mode = True # noqa: F841 + + dataset_name = "mnist" + model_name = "resnet18" + scheduler_name = "constant" + lr = 1e-3 + loss_name = "CE" + opt_name = "adam" + + ms.set_seed(1) + ms.set_context(mode=mode) + + device_num = None + rank_id = None + + dataset_train = create_dataset( + name=dataset_name, + num_samples=100, + num_shards=device_num, + shard_id=rank_id, + download=True, + ) + + transform_train = create_transforms(dataset_name=dataset_name) + + loader_train = create_loader( + dataset=dataset_train, + batch_size=batch_size, + is_training=True, + num_classes=num_classes, + transform=transform_train, + num_parallel_workers=num_workers, + drop_remainder=True, + ) + + network = create_model( + model_name=model_name, + in_channels=1, + num_classes=num_classes, + ) + + loss = create_loss(name=loss_name) + + net_with_criterion = nn.WithLossCell(network, loss) + + steps_per_epoch = loader_train.get_dataset_size() + print("Steps per epoch: ", steps_per_epoch) + + lr_scheduler = create_scheduler( + steps_per_epoch=steps_per_epoch, + scheduler=scheduler_name, + lr=lr, + ) + + opt = create_optimizer( + network.trainable_params(), + opt=opt_name, + lr=lr_scheduler, + ) + + train_network = nn.TrainOneStepCell(network=net_with_criterion, optimizer=opt) + train_network.set_train() + losses = [] + + num_steps = 0 + max_steps = 10 + while num_steps < max_steps: + for batch, (data, label) in enumerate(loader_train.create_tuple_iterator()): + loss = train_network(data, label) + losses.append(loss) + print(loss) + + num_steps += 1 + + assert losses[num_steps - 1] < losses[0], "Loss does NOT decrease" + + +if __name__ == "__main__": + test_train_mnist(ms.GRAPH_MODE) diff --git a/tests/tasks/test_train_val_imagenet_subset.py b/tests/tasks/test_train_val_imagenet_subset.py new file mode 100644 index 0000000..60682d2 --- /dev/null +++ b/tests/tasks/test_train_val_imagenet_subset.py @@ -0,0 +1,79 @@ +""" +Test train and validate pipelines. +For training, both graph mode and pynative mode with ms_function will be tested. +""" +import os +import subprocess +import sys + +sys.path.append(".") + +import pytest + +from mindcv.data import get_dataset_download_root +from mindcv.utils.download import DownLoad + +check_acc = True + + +@pytest.mark.parametrize("mode", ["GRAPH", "PYNATIVE_FUNC"]) +@pytest.mark.parametrize("val_while_train", [True, False]) +def test_train(mode, val_while_train, model="resnet18"): + """train on a imagenet subset dataset""" + # prepare data + dataset_url = ( + "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" + ) + root_dir = os.path.join(get_dataset_download_root(), "Canidae") + data_dir = os.path.join(root_dir, "data", "Canidae") # Canidae has prefix path "data/Canidae" in unzipped file. + if not os.path.exists(data_dir): + DownLoad().download_and_extract_archive(dataset_url, root_dir) + + # ---------------- test running train.py using the toy data --------- + dataset = "imagenet" + num_classes = 2 + ckpt_dir = "./tests/ckpt_tmp" + num_samples = 160 + num_epochs = 5 + batch_size = 20 + if os.path.exists(ckpt_dir): + os.system(f"rm {ckpt_dir} -rf") + if os.path.exists(data_dir): + download_str = f"--data_dir {data_dir}" + else: + download_str = "--download" + train_file = "train.py" if mode == "GRAPH" else "train_with_func.py" + + cmd = ( + f"python {train_file} --dataset={dataset} --num_classes={num_classes} --model={model} " + f"--epoch_size={num_epochs} --ckpt_save_interval=2 --lr=0.0001 --num_samples={num_samples} --loss=CE " + f"--weight_decay=1e-6 --ckpt_save_dir={ckpt_dir} {download_str} --train_split=train --batch_size={batch_size} " + f"--pretrained --num_parallel_workers=2 --val_while_train={val_while_train} --val_split=val --val_interval=1" + ) + + print(f"Running command: \n{cmd}") + ret = subprocess.call(cmd.split(), stdout=sys.stdout, stderr=sys.stderr) + assert ret == 0, "Training fails" + + # --------- Test running validate.py using the trained model ------------- # + # begin_ckpt = os.path.join(ckpt_dir, f'{model}-1_1.ckpt') + end_ckpt = os.path.join(ckpt_dir, f"{model}-{num_epochs}_{num_samples//batch_size}.ckpt") + cmd = ( + f"python validate.py --model={model} --dataset={dataset} --val_split=val --data_dir={data_dir} " + f"--num_classes={num_classes} --ckpt_path={end_ckpt} --batch_size=40 --num_parallel_workers=2" + ) + # ret = subprocess.call(cmd.split(), stdout=sys.stdout, stderr=sys.stderr) + print(f"Running command: \n{cmd}") + p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE) + out, err = p.communicate() + # assert ret==0, 'Validation fails' + print(out) + + if check_acc: + res = out.decode() + idx = res.find("Accuracy") + acc = res[idx:].split(",")[0].split(":")[1] + print("Val acc: ", acc) + assert float(acc) > 0.5, "Acc is too low" + + p.kill() diff --git a/tutorials/README.md b/tutorials/README.md new file mode 100644 index 0000000..cbaba76 --- /dev/null +++ b/tutorials/README.md @@ -0,0 +1 @@ +This folder contains jupyter notebooks used in the tutorial. diff --git a/tutorials/data/test/dog/dog.jpg b/tutorials/data/test/dog/dog.jpg new file mode 100644 index 0000000..81c91b1 Binary files /dev/null and b/tutorials/data/test/dog/dog.jpg differ diff --git a/tutorials/deployment.md b/tutorials/deployment.md new file mode 100644 index 0000000..02a4d5f --- /dev/null +++ b/tutorials/deployment.md @@ -0,0 +1,151 @@ +# Inference Service Deployment + +MindSpore Serving is a lightweight and high-performance service module that helps MindSpore developers efficiently deploy online inference services in the production environment. After completing model training on MindSpore, you can export the MindSpore model and use MindSpore Serving to create an inference service for the model. + +This tutorial uses mobilenet_v3_small_100 network as an example to describe how to deploy the Inference Service based on MindSpore Serving. + +## Environment Preparation +Before deploying, ensure that MindSpore Serving has been properly installed and the environment variables are configured. To install and configure MindSpore Serving on your PC, go to the [MindSpore Serving installation page](https://www.mindspore.cn/serving/docs/en/master/serving_install.html). + + +## Exporting the Model +To implement cross-platform or hardware inference (e.g., Ascend AI processor, MindSpore device side, GPU, etc.), the model file of MindIR format should be generated by network definition and CheckPoint. In MindSpore, the function of exporting the network model is `export`, and the main parameters are as follows: + +- `net`: MindSpore network structure. +- `inputs`: Network input, and the supported input type is `Tensor`. If multiple values are input, the values should be input at the same time, for example, `ms.export(network, ms.Tensor(input1), ms.Tensor(input2), file_name='network', file_format='MINDIR')`. +- `file_name`: Name of the exported model file. If `file_name` doesn't contain the corresponding suffix (for example, .mindir), the system will automatically add one after `file_format` is set. +- `file_format`: MindSpore currently supports ‘AIR’, ‘ONNX’ and ‘MINDIR’ format for exported model. + +The following code uses mobilenet_v3_small_100 as an example to export the pretrained network model of MindCV and obtain the model file in MindIR format. + + +```python +from mindcv.models import create_model +import numpy as np +import mindspore as ms + +model = create_model(model_name='mobilenet_v3_small_100', num_classes=1000, pretrained=True) + +input_np = np.random.uniform(0.0, 1.0, size=[1, 3, 224, 224]).astype(np.float32) + +# Export mobilenet_v3_small_100.mindir to current folder. +ms.export(model, ms.Tensor(input_np), file_name='mobilenet_v3_small_100', file_format='MINDIR') +``` + +## Deploying the Serving Inference Service + +### Configuring the Service +Start Serving with the following files: +```Text +demo +├── mobilenet_v3_small_100 +│ ├── 1 +│ │ └── mobilenet_v3_small_100.mindir +│ └── servable_config.py +│── serving_server.py +├── serving_client.py +└── test_image + ├─ dog + │ ├─ dog.jpg + │ └─ …… + └─ …… +``` + + +- `mobilenet_v3_small_100`: Model folder. The folder name is the model name. +- `mobilenet_v3_small_100.mindir`: Model file generated by the network in the previous step, which is stored in folder 1 (the number indicates the version number). Different versions are stored in different folders. The version number must be a string of digits. By default, the latest model file is started. +- `servable_config.py`: Model configuration script. Declare the model and specify the input and output parameters of the model. +- `serving_server.py`: Script to start the Serving server. +- `serving_client.py`: Script to start the Python client. +- `test_image`: Test images. + +Content of the configuration file `servable_config.py`: +```python +from mindspore_serving.server import register + +# Declare the model. The parameter model_file indicates the name of the model file, and model_format indicates the model type. +model = register.declare_model(model_file="mobilenet_v3_small_100.mindir", model_format="MindIR") + +# The input parameters of the Servable method are specified by the input parameters of the Python method. The output parameters of the Servable method are specified by the output_names of register_method. +@register.register_method(output_names=["score"]) +def predict(image): + x = register.add_stage(model, image, outputs_count=1) + return x +``` + +### Starting the Service + +The `server` function of MindSpore can provide deployment service through either gRPC or RESTful. The following uses gRPC as an example. The service startup script `serving_server.py` deploys the `mobilenet_v3_small_100` in the local directory to device 0 and starts the gRPC server at 127.0.0.1:5500. Content of the script: +```python +import os +import sys +from mindspore_serving import server + +def start(): + servable_dir = os.path.dirname(os.path.realpath(sys.argv[0])) + + servable_config = server.ServableStartConfig(servable_directory=servable_dir, servable_name="mobilenet_v3_small_100", + device_ids=0) + server.start_servables(servable_configs=servable_config) + server.start_grpc_server(address="127.0.0.1:5500") + +if __name__ == "__main__": + start() +``` + +If the following log information is displayed on the server, the gRPC service is started successfully. + +```text +Serving gRPC server start success, listening on 127.0.0.1:5500 +``` + +### Inference Execution +Start the Python client using `serving_client.py`. The client script uses the `create_transforms`, `create_dataset` and `create_loader` functions of `mindcv.data` to preprocess the image and send the image to the serving server. Then postprocesse the result returned by the server and prints the prediction label of the image. +```python +import os +from mindspore_serving.client import Client +import numpy as np +from mindcv.data import create_transforms, create_dataset, create_loader + +num_workers = 1 + +# Dataset directory path +data_dir = "./test_image/" + +dataset = create_dataset(root=data_dir, split='', num_parallel_workers=num_workers) +transforms_list = create_transforms(dataset_name='ImageNet', is_training=False) +data_loader = create_loader( + dataset=dataset, + batch_size=1, + is_training=False, + num_classes=1000, + transform=transforms_list, + num_parallel_workers=num_workers + ) +with open("imagenet1000_clsidx_to_labels.txt") as f: + idx2label = eval(f.read()) + +def postprocess(score): + max_idx = np.argmax(score) + return idx2label[max_idx] + +def predict(): + client = Client("127.0.0.1:5500", "mobilenet_v3_small_100", "predict") + instances = [] + images, _ = next(data_loader.create_tuple_iterator()) + image_np = images.asnumpy().squeeze() + instances.append({"image": image_np}) + result = client.infer(instances) + + for instance in result: + label = postprocess(instance["score"]) + print(label) + +if __name__ == '__main__': + predict() +``` + +If the following information is displayed, Serving service has correctly executed the inference of the mobilenet_v3_small_100 model: +```text +Labrador retriever +``` diff --git a/tutorials/deployment_CN.md b/tutorials/deployment_CN.md new file mode 100644 index 0000000..335c653 --- /dev/null +++ b/tutorials/deployment_CN.md @@ -0,0 +1,152 @@ +# 部署推理服务 + +MindSpore Serving是一个轻量级、高性能的推理服务模块,旨在帮助MindSpore开发者在生产环境中高效部署在线推理服务。当用户使用MindSpore完成模型训练后,导出MindSpore模型,即可使用MindSpore Serving创建该模型的推理服务。 + +本文以mobilenet_v3_small_100网络为例,演示基于MindSpore Serving进行部署推理服务的方法。 + + +## 环境准备 +进行部署前,需确保已经正确安装了MindSpore Serving,并配置了环境变量。MindSpore Serving安装和配置可以参考[MindSpore 安装页面](https://www.mindspore.cn/serving/docs/zh-CN/master/serving_install.html) 。 + +## 模型导出 + +实现跨平台或硬件执行推理(如昇腾AI处理器、MindSpore端侧、GPU等),需要通过网络定义和CheckPoint生成MindIR格式模型文件。在MindSpore中,网络模型导出的函数为`export`,主要参数如下所示: + +- `net`:MindSpore网络结构。 +- `inputs`:网络的输入,支持输入类型为Tensor。当输入有多个时,需要一起传入,如`ms.export(network, ms.Tensor(input1), ms.Tensor(input2), file_name='network', file_format='MINDIR')`。 +- `file_name`:导出模型的文件名称,如果`file_name`没有包含对应的后缀名(如.mindir),设置`file_format`后系统会为文件名自动添加后缀。 +- `file_format`:MindSpore目前支持导出”AIR”,”ONNX”和”MINDIR”格式的模型。 + +下面代码以mobilenet_v3_small_100为例,导出MindCV的预训练网络模型,获得MindIR格式模型文件。 + + +```python +from mindcv.models import create_model +import numpy as np +import mindspore as ms + +model = create_model(model_name='mobilenet_v3_small_100', num_classes=1000, pretrained=True) + +input_np = np.random.uniform(0.0, 1.0, size=[1, 3, 224, 224]).astype(np.float32) + +# 导出文件mobilenet_v3_small_100.mindir到当前文件夹 +ms.export(model, ms.Tensor(input_np), file_name='mobilenet_v3_small_100', file_format='MINDIR') +``` + +## 部署Serving推理服务 + +### 配置服务 +启动Serving服务,执行本教程需要如下文件列表: +```Text +demo +├── mobilenet_v3_small_100 +│ ├── 1 +│ │ └── mobilenet_v3_small_100.mindir +│ └── servable_config.py +│── serving_server.py +├── serving_client.py +└── test_image + ├─ dog + │ ├─ dog.jpg + │ └─ …… + └─ …… +``` + + +- `mobilenet_v3_small_100`为模型文件夹,文件夹名即为模型名。 +- `mobilenet_v3_small_100.mindir`为上一步网络生成的模型文件,放置在文件夹1下,1为版本号,不同的版本放置在不同的文件夹下,版本号需以纯数字串命名,默认配置下启动最大数值的版本号的模型文件。 +- `servable_config.py`为模型配置脚本,对模型进行声明、入参和出参定义。 +- `serving_server.py`为启动服务脚本文件。 +- `serving_client.py`为启动客户端脚本文件。 +- `test_image`中为测试图片。 + +其中,模型配置文件`servable_config.py`内容如下: +```python +from mindspore_serving.server import register + +# 进行模型声明,其中declare_model入参model_file指示模型的文件名称,model_format指示模型的模型类别 +model = register.declare_model(model_file="mobilenet_v3_small_100.mindir", model_format="MindIR") + +# Servable方法的入参由Python方法的入参指定,Servable方法的出参由register_method的output_names指定 +@register.register_method(output_names=["score"]) +def predict(image): + x = register.add_stage(model, image, outputs_count=1) + return x +``` + +### 启动服务 + +MindSpore的`server`函数提供两种服务部署,一种是gRPC方式,一种是通过RESTful方式,本教程以gRPC方式为例。服务启动脚本`serving_server.py`把本地目录下的`mobilenet_v3_small_100`部署到设备0,并启动地址为127.0.0.1:5500的gRPC服务器。脚本文件内容如下: +```python +import os +import sys +from mindspore_serving import server + +def start(): + servable_dir = os.path.dirname(os.path.realpath(sys.argv[0])) + + servable_config = server.ServableStartConfig(servable_directory=servable_dir, servable_name="mobilenet_v3_small_100", + device_ids=0) + server.start_servables(servable_configs=servable_config) + server.start_grpc_server(address="127.0.0.1:5500") + +if __name__ == "__main__": + start() +``` + +当服务端打印如下日志时,表示Serving gRPC服务启动成功。 + +```text +Serving gRPC server start success, listening on 127.0.0.1:5500 +``` + +### 执行推理 +使用`serving_client.py`,启动Python客户端。客户端脚本使用`mindcv.data`的`create_transforms`, `create_dataset`和`create_loader`函数,进行图片预处理,再传送给Serving服务器。对服务器返回的结果进行后处理,打印图片的预测标签。 +```python +import os +from mindspore_serving.client import Client +import numpy as np +from mindcv.data import create_transforms, create_dataset, create_loader + +num_workers = 1 + +# 数据集目录路径 +data_dir = "./test_image/" + +dataset = create_dataset(root=data_dir, split='', num_parallel_workers=num_workers) +transforms_list = create_transforms(dataset_name='ImageNet', is_training=False) +data_loader = create_loader( + dataset=dataset, + batch_size=1, + is_training=False, + num_classes=1000, + transform=transforms_list, + num_parallel_workers=num_workers + ) +with open("imagenet1000_clsidx_to_labels.txt") as f: + idx2label = eval(f.read()) + +def postprocess(score): + max_idx = np.argmax(score) + return idx2label[max_idx] + +def predict(): + client = Client("127.0.0.1:5500", "mobilenet_v3_small_100", "predict") + instances = [] + images, _ = next(data_loader.create_tuple_iterator()) + image_np = images.asnumpy().squeeze() + instances.append({"image": image_np}) + result = client.infer(instances) + + for instance in result: + label = postprocess(instance["score"]) + print(label) + +if __name__ == '__main__': + predict() +``` + +执行后显示如下返回值,说明Serving服务已正确执行mobilenet_v3_small_100网络模型的推理。 +```text +Labrador retriever +``` diff --git a/tutorials/finetune.md b/tutorials/finetune.md new file mode 100644 index 0000000..0406858 --- /dev/null +++ b/tutorials/finetune.md @@ -0,0 +1,451 @@ +# Model Fine-Tuning Training + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/finetune.ipynb)  + + +In this tutorial, you will learn how to use MindCV for transfer Learning to solve the problem of image classification on custom datasets. In the deep learning task, we often encounter the problem of insufficient training data. At this time, it is difficult to train the entire network directly to achieve the desired accuracy. A better approach is to use a pretrained model on a large dataset (close to the task data), and then use the model to initialize the network's weight parameters or apply it to specific tasks as a fixed feature extractor. + +This tutorial will use the DenseNet model pretrained on ImageNet as an example to introduce two different fine-tuning strategies to solve the image classification problem of wolves and dogs in the case of small samples: + +1. Overall model fine-tuning. +2. Freeze backbone and only fine tune the classifier. + +> For details of transfer learning, see [Stanford University CS231n](https://cs231n.github.io/transfer-learning/#tf) + +## Data Preparation + +### Download Dataset + +Download the [dog and wolf classification dataset](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip) used in the case. Each category has 120 training images and 30 verification images. Use the `mindcv.utils.download` interface to download the dataset, and automatically unzip the downloaded dataset to the current directory. + + +```python +import sys +sys.path.append('../') + +from mindcv.utils.download import DownLoad +import os + +dataset_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" +root_dir = "./" + +if not os.path.exists(os.path.join(root_dir, 'data/Canidae')): + DownLoad().download_and_extract_archive(dataset_url, root_dir) +``` + +The directory structure of the dataset is as follows: + +```Text +data/ +└── Canidae + ├── train + │   ├── dogs + │   └── wolves + └── val + ├── dogs + └── wolves +``` + +## Dataset Loading and Processing + + +### Loading Custom Datasets + +By calling the `create_dataset` function in `mindcv.data`, we can easily load preset and customized datasets. + +- When the parameter `name` is set to null, it is specified as a user-defined dataset. (Default) +- When the parameter `name` is set to `MNIST`, `CIFAR10` and other standard data set names, it is specified as the preset data set. + +At the same time, we need to set the path `data_dir` of the dataset and the name `split` of the data segmentation (such as train, val) to load the corresponding training set or validation set. + + +```python +from mindcv.data import create_dataset, create_transforms, create_loader + +num_workers = 8 + +# path of dataset +data_dir = "./data/Canidae/" + +# load datset +dataset_train = create_dataset(root=data_dir, split='train', num_parallel_workers=num_workers) +dataset_val = create_dataset(root=data_dir, split='val', num_parallel_workers=num_workers) +``` + +> Note: The directory structure of the custom dataset should be the same as ImageNet, that is, the hierarchy of root ->split ->class ->image + +```Text +DATASET_NAME + ├── split1(e.g. train)/ + │ ├── class1/ + │ │ ├── 000001.jpg + │ │ ├── 000002.jpg + │ │ └── .... + │ └── class2/ + │ ├── 000001.jpg + │ ├── 000002.jpg + │ └── .... + └── split2/ + ├── class1/ + │ ├── 000001.jpg + │ ├── 000002.jpg + │ └── .... + └── class2/ + ├── 000001.jpg + ├── 000002.jpg + └── .... +``` + +### Data Processing and Augmentation + +First, we call the `create_transforms` function to obtain the preset data processing and augmentation strategy (transform list). In this task, because the wolf dog image and ImageNet data are consistent (that is, the domain is consistent), we specify the parameter `dataset_name` as ImageNet, and directly use the preset ImageNet data processing and image augmentation strategy. `create_transforms` also supports a variety of customized processing and enhancement operations, as well as automatic enhancement policies (AutoAug). See API description for details. + +We will transfer the obtained transform list to the `create_loader()`, and specify `batch_size` and other parameters to complete the preparation of training and validation data, and return the `Dataset` Object as the input of the model. + + +```python +# Define and acquire data processing and augment operations +trans_train = create_transforms(dataset_name='ImageNet', is_training=True) +trans_val = create_transforms(dataset_name='ImageNet',is_training=False) + +# +loader_train = create_loader( + dataset=dataset_train, + batch_size=16, + is_training=True, + num_classes=2, + transform=trans_train, + num_parallel_workers=num_workers, + ) + + +loader_val = create_loader( + dataset=dataset_val, + batch_size=5, + is_training=True, + num_classes=2, + transform=trans_val, + num_parallel_workers=num_workers, + ) +``` + +### Dataset Visualization + +For the Dataset object returned by the `create_loader` interface to complete data loading, we can create a data iterator through the `create_tuple_iterator` interface, access the dataset using the `next` iteration, and read a batch of data. + + +```python +images, labels= next(loader_train.create_tuple_iterator()) +#images = data["image"] +#labels = data["label"] + +print("Tensor of image", images.shape) +print("Labels:", labels) +``` + + Tensor of image (16, 3, 224, 224) + Labels: [0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1] + + +Visualize the acquired image and label data, and the title is the label name corresponding to the image. + + +```python +import matplotlib.pyplot as plt +import numpy as np + +# class_name corresponds to label, and labels are marked in the order of folder string from small to large +class_name = {0: "dogs", 1: "wolves"} + +plt.figure(figsize=(15, 7)) +for i in range(len(labels)): + # Get the image and its corresponding label + data_image = images[i].asnumpy() + data_label = labels[i] + # Process images for display + data_image = np.transpose(data_image, (1, 2, 0)) + mean = np.array([0.485, 0.456, 0.406]) + std = np.array([0.229, 0.224, 0.225]) + data_image = std * data_image + mean + data_image = np.clip(data_image, 0, 1) + # Show Image + plt.subplot(3, 6, i + 1) + plt.imshow(data_image) + plt.title(class_name[int(labels[i].asnumpy())]) + plt.axis("off") + +plt.show() +``` + + +![png](output_11_0.png) + + +## Model Fine-Tuning + +### 1. Overall Model Fine-Tuning + +#### Pretraining Model Loading +We use `mindcv.models.densenet` to define the DenseNet121 network. When the `pretrained` parameter in the interface is set to True, the network weight can be automatically downloaded. Since the pretraining model is used to classify 1000 categories in the ImageNet dataset, we set `num_classes=2`, and the output of DenseNet's classifier (the last FC layer) is adjusted to two dimensions. At this time, only the pre training weights of backbone are loaded, while the classifier uses the initial value. + + +```python +from mindcv.models import create_model + +network = create_model(model_name='densenet121', num_classes=2, pretrained=True) +``` + + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.786.568 [mindspore/train/serialization.py:709] For 'load_param_into_net', 2 parameters in the 'net' are not loaded, because they are not in the 'parameter_dict', please check whether the network structure is consistent when training and loading checkpoint. + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.787.703 [mindspore/train/serialization.py:714] classifier.weight is not loaded. + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.788.408 [mindspore/train/serialization.py:714] classifier.bias is not loaded. + + + +>For the specific structure of DenseNet, see the [DenseNet paper](https://arxiv.org/pdf/1608.06993.pdf). + +#### Model Training +Use the loaded and processed wolf and dog images with tags to fine-tune the DenseNet network. Note that smaller learning rates should be used when fine-tuning the overall model. + + + +```python +from mindcv.loss import create_loss +from mindcv.optim import create_optimizer +from mindcv.scheduler import create_scheduler +from mindspore import Model, LossMonitor, TimeMonitor #, CheckpointConfig, ModelCheckpoint + + +# Define optimizer and loss function +opt = create_optimizer(network.trainable_params(), opt='adam', lr=1e-4) +loss = create_loss(name='CE') + +# Instantiated model +model = Model(network, loss_fn=loss, optimizer=opt, metrics={'accuracy'}) +``` + + +```python +model.train(10, loader_train, callbacks=[LossMonitor(5), TimeMonitor(5)], dataset_sink_mode=False) +``` + + epoch: 1 step: 5, loss is 0.5195528864860535 + epoch: 1 step: 10, loss is 0.2654373049736023 + epoch: 1 step: 15, loss is 0.28758567571640015 + Train epoch time: 17270.144 ms, per step time: 1151.343 ms + epoch: 2 step: 5, loss is 0.1807008981704712 + epoch: 2 step: 10, loss is 0.1700802594423294 + epoch: 2 step: 15, loss is 0.09752683341503143 + Train epoch time: 1372.549 ms, per step time: 91.503 ms + epoch: 3 step: 5, loss is 0.13594701886177063 + epoch: 3 step: 10, loss is 0.03628234937787056 + epoch: 3 step: 15, loss is 0.039737217128276825 + Train epoch time: 1453.237 ms, per step time: 96.882 ms + epoch: 4 step: 5, loss is 0.014213413000106812 + epoch: 4 step: 10, loss is 0.030747078359127045 + epoch: 4 step: 15, loss is 0.0798817127943039 + Train epoch time: 1331.237 ms, per step time: 88.749 ms + epoch: 5 step: 5, loss is 0.009510636329650879 + epoch: 5 step: 10, loss is 0.02603740245103836 + epoch: 5 step: 15, loss is 0.051846928894519806 + Train epoch time: 1312.737 ms, per step time: 87.516 ms + epoch: 6 step: 5, loss is 0.1163717582821846 + epoch: 6 step: 10, loss is 0.02439398318529129 + epoch: 6 step: 15, loss is 0.02564268559217453 + Train epoch time: 1434.704 ms, per step time: 95.647 ms + epoch: 7 step: 5, loss is 0.013310655951499939 + epoch: 7 step: 10, loss is 0.02289542555809021 + epoch: 7 step: 15, loss is 0.1992517113685608 + Train epoch time: 1275.935 ms, per step time: 85.062 ms + epoch: 8 step: 5, loss is 0.015928998589515686 + epoch: 8 step: 10, loss is 0.011409260332584381 + epoch: 8 step: 15, loss is 0.008141174912452698 + Train epoch time: 1323.102 ms, per step time: 88.207 ms + epoch: 9 step: 5, loss is 0.10395607352256775 + epoch: 9 step: 10, loss is 0.23055407404899597 + epoch: 9 step: 15, loss is 0.04896317049860954 + Train epoch time: 1261.067 ms, per step time: 84.071 ms + epoch: 10 step: 5, loss is 0.03162381425499916 + epoch: 10 step: 10, loss is 0.13094250857830048 + epoch: 10 step: 15, loss is 0.020028553903102875 + Train epoch time: 1217.958 ms, per step time: 81.197 ms + + +#### Model Evaluation + +After the training, we evaluate the accuracy of the model on the validation set. + + +```python +res = model.eval(loader_val) +print(res) +``` + + {'accuracy': 1.0} + + +##### Visual Model Inference Results +Define `visualize_mode` function and visualize model prediction. + + +```python +import matplotlib.pyplot as plt +import mindspore as ms + +def visualize_model(model, val_dl, num_classes=2): + # Load the data of the validation set for validation + images, labels= next(val_dl.create_tuple_iterator()) + # Predict image class + output = model.predict(images) + pred = np.argmax(output.asnumpy(), axis=1) + # Display images and their predicted values + images = images.asnumpy() + labels = labels.asnumpy() + class_name = {0: "dogs", 1: "wolves"} + plt.figure(figsize=(15, 7)) + for i in range(len(labels)): + plt.subplot(3, 6, i + 1) + # If the prediction is correct, it is displayed in blue; If the prediction is wrong, it is displayed in red + color = 'blue' if pred[i] == labels[i] else 'red' + plt.title('predict:{}'.format(class_name[pred[i]]), color=color) + picture_show = np.transpose(images[i], (1, 2, 0)) + mean = np.array([0.485, 0.456, 0.406]) + std = np.array([0.229, 0.224, 0.225]) + picture_show = std * picture_show + mean + picture_show = np.clip(picture_show, 0, 1) + plt.imshow(picture_show) + plt.axis('off') + + plt.show() +``` + +Use the finely tuned model piece to predict the wolf and dog image data of the verification set. If the prediction font is blue, the prediction is correct; if the prediction font is red, the prediction is wrong. + + +```python +visualize_model(model, loader_val) +``` + + +![png](output_23_0.png) + + +### 2. Freeze Backbone and Fine-Tune the Classifier + +#### Freezing Backbone Parameters + +First, we need to freeze all network layers except the last layer classifier, that is, set the `requires_grad` attribute of the corresponding layer parameter to False, so that it does not calculate the gradient and update the parameters in the back propagation. + +Because all models in `mindcv.models` use a `classifier` to identify and name the classifier of the model (i.e., the Dense layer), the parameters of each layer outside the classifier can be filtered through `classifier.weight` and `classifier.bias`, and its `requires_grad` attribute is set to False. + + +```python +# freeze backbone +for param in network.get_parameters(): + if param.name not in ["classifier.weight", "classifier.bias"]: + param.requires_grad = False +``` + +#### Fine-Tune Classifier + +Because the feature network has been fixed, we don't have to worry about distortpratised features in the training process. Therefore, compared with the first method, we can increase the learning rate. + +Compared with no pretraining model, it will save more than half of the time, because partial gradient can not be calculated at this time. + + +```python +# dataset load +dataset_train = create_dataset(root=data_dir, split='train', num_parallel_workers=num_workers) +loader_train = create_loader( + dataset=dataset_train, + batch_size=16, + is_training=True, + num_classes=2, + transform=trans_train, + num_parallel_workers=num_workers, + ) + +# Define optimizer and loss function +opt = create_optimizer(network.trainable_params(), opt='adam', lr=1e-3) +loss = create_loss(name='CE') + +# Instantiated model +model = Model(network, loss_fn=loss, optimizer=opt, metrics={'accuracy'}) + +model.train(10, loader_train, callbacks=[LossMonitor(5), TimeMonitor(5)], dataset_sink_mode=False) +``` + + epoch: 1 step: 5, loss is 0.051333948969841 + epoch: 1 step: 10, loss is 0.02043312042951584 + epoch: 1 step: 15, loss is 0.16161368787288666 + Train epoch time: 10228.601 ms, per step time: 681.907 ms + epoch: 2 step: 5, loss is 0.002121545374393463 + epoch: 2 step: 10, loss is 0.0009798109531402588 + epoch: 2 step: 15, loss is 0.015776708722114563 + Train epoch time: 562.543 ms, per step time: 37.503 ms + epoch: 3 step: 5, loss is 0.008056879043579102 + epoch: 3 step: 10, loss is 0.0009347647428512573 + epoch: 3 step: 15, loss is 0.028648357838392258 + Train epoch time: 523.249 ms, per step time: 34.883 ms + epoch: 4 step: 5, loss is 0.001014217734336853 + epoch: 4 step: 10, loss is 0.0003159046173095703 + epoch: 4 step: 15, loss is 0.0007699579000473022 + Train epoch time: 508.886 ms, per step time: 33.926 ms + epoch: 5 step: 5, loss is 0.0015687644481658936 + epoch: 5 step: 10, loss is 0.012090332806110382 + epoch: 5 step: 15, loss is 0.004598274827003479 + Train epoch time: 507.243 ms, per step time: 33.816 ms + epoch: 6 step: 5, loss is 0.010022152215242386 + epoch: 6 step: 10, loss is 0.0066385045647621155 + epoch: 6 step: 15, loss is 0.0036080628633499146 + Train epoch time: 517.646 ms, per step time: 34.510 ms + epoch: 7 step: 5, loss is 0.01344013586640358 + epoch: 7 step: 10, loss is 0.0008538365364074707 + epoch: 7 step: 15, loss is 0.14135593175888062 + Train epoch time: 511.513 ms, per step time: 34.101 ms + epoch: 8 step: 5, loss is 0.01626245677471161 + epoch: 8 step: 10, loss is 0.02871556021273136 + epoch: 8 step: 15, loss is 0.010110966861248016 + Train epoch time: 545.678 ms, per step time: 36.379 ms + epoch: 9 step: 5, loss is 0.008498094975948334 + epoch: 9 step: 10, loss is 0.2588501274585724 + epoch: 9 step: 15, loss is 0.0014278888702392578 + Train epoch time: 499.243 ms, per step time: 33.283 ms + epoch: 10 step: 5, loss is 0.021337147802114487 + epoch: 10 step: 10, loss is 0.00829876959323883 + epoch: 10 step: 15, loss is 0.008352771401405334 + Train epoch time: 465.600 ms, per step time: 31.040 ms + + + +```python +dataset_val = create_dataset(root=data_dir, split='val', num_parallel_workers=num_workers) +loader_val = create_loader( + dataset=dataset_val, + batch_size=5, + is_training=True, + num_classes=2, + transform=trans_val, + num_parallel_workers=num_workers, + ) + +res = model.eval(loader_val) +print(res) +``` + + {'accuracy': 1.0} + + +##### Visual Model Prediction + +Use the finely tuned model piece to predict the wolf and dog image data of the verification set. If the prediction font is blue, the prediction is correct; if the prediction font is red, the prediction is wrong. + + +```python +visualize_model(model, loader_val) +``` + + +![png](output_30_0.png) + + +The prediction results of wolf dog after fine-tuning are correct. diff --git a/tutorials/finetune_CN.md b/tutorials/finetune_CN.md new file mode 100644 index 0000000..c9fec5c --- /dev/null +++ b/tutorials/finetune_CN.md @@ -0,0 +1,451 @@ +# 自定义数据集上的模型微调训练 + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/finetune_CN.ipynb)  + + +在此教程中,您将学会如何使用MindCV套件进行迁移学习,以解决自定义数据集上的图像分类问题。在深度学习任务中,常见遇到训练数据不足的问题,此时直接训练整个网络往往难以达到理想的精度。一个比较好的做法是,使用一个在大规模数据集上(与任务数据较为接近)预训练好的模型,然后使用该模型来初始化网络的权重参数或作为固定特征提取器应用于特定的任务中。 + +此教程将以使用ImageNet上预训练的DenseNet模型为例,介绍两种不同的微调策略,解决小样本情况下狼和狗的图像分类问题: + +1. 整体模型微调。 +2. 冻结特征网络(freeze backbone),只微调分类器。 + +> 迁移学习详细内容见[Stanford University CS231n](https://cs231n.github.io/transfer-learning/#tf) + +## 数据准备 + +### 下载数据集 + +下载案例所用到的[狗与狼分类数据集](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip),每个类别各有120张训练图像与30张验证图像。使用`mindcv.utils.download`接口下载数据集,并将下载后的数据集自动解压到当前目录下。 + + +```python +import sys +sys.path.append('../') + +from mindcv.utils.download import DownLoad +import os + +dataset_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/intermediate/Canidae_data.zip" +root_dir = "./" + +if not os.path.exists(os.path.join(root_dir, 'data/Canidae')): + DownLoad().download_and_extract_archive(dataset_url, root_dir) +``` + +数据集的目录结构如下: + +```Text +data/ +└── Canidae + ├── train + │   ├── dogs + │   └── wolves + └── val + ├── dogs + └── wolves +``` + +## 数据集加载及处理 + + +### 自定义数据集的加载 +通过调用`mindcv.data`中的`create_dataset`函数,我们可轻松地加载预设的数据集以及自定义的数据集。 +- 当参数`name`设为空时,指定为自定义数据集。(默认值) +- 当参数`name`设为`MNIST`, `CIFAR10`等标准数据集名称时,指定为预设数据集。 + +同时,我们需要设定数据集的路经`data_dir`和数据切分的名称`split` (如train, val),以加载对应的训练集或者验证集。 + + +```python +from mindcv.data import create_dataset, create_transforms, create_loader + +num_workers = 8 + +# 数据集目录路径 +data_dir = "./data/Canidae/" + +# 加载自定义数据集 +dataset_train = create_dataset(root=data_dir, split='train', num_parallel_workers=num_workers) +dataset_val = create_dataset(root=data_dir, split='val', num_parallel_workers=num_workers) +``` + +> 注意: 自定义数据集的目录结构应与ImageNet一样,即root -> split -> class -> image 的层次结构 + +```Text +DATASET_NAME + ├── split1(e.g. train)/ + │ ├── class1/ + │ │ ├── 000001.jpg + │ │ ├── 000002.jpg + │ │ └── .... + │ └── class2/ + │ ├── 000001.jpg + │ ├── 000002.jpg + │ └── .... + └── split2/ + ├── class1/ + │ ├── 000001.jpg + │ ├── 000002.jpg + │ └── .... + └── class2/ + ├── 000001.jpg + ├── 000002.jpg + └── .... +``` + +### 数据处理及增强 + +首先我们通过调用`create_transforms`函数, 获得预设的数据处理和增强策略(transform list),此任务中,因狼狗图像和ImageNet数据一致(即domain一致),我们指定参数`dataset_name`为ImageNet,直接用预设好的ImageNet的数据处理和图像增强策略。`create_transforms` 同样支持多种自定义的处理和增强操作,以及自动增强策略(AutoAug)。详见API说明。 + +我们将得到的transform list传入`create_loader()`,并指定`batch_size`和其他参数,即可完成训练和验证数据的准备,返回`Dataset` Object,作为模型的输入。 + + +```python +# 定义和获取数据处理及增强操作 +trans_train = create_transforms(dataset_name='ImageNet', is_training=True) +trans_val = create_transforms(dataset_name='ImageNet',is_training=False) + +# +loader_train = create_loader( + dataset=dataset_train, + batch_size=16, + is_training=True, + num_classes=2, + transform=trans_train, + num_parallel_workers=num_workers, + ) + + +loader_val = create_loader( + dataset=dataset_val, + batch_size=5, + is_training=True, + num_classes=2, + transform=trans_val, + num_parallel_workers=num_workers, + ) +``` + +### 数据集可视化 + +对于`create_loader`接口返回的完成数据加载的Dataset object,我们可以通过 `create_tuple_iterator` 接口创建数据迭代器,使用 `next` 迭代访问数据集,读取到一个batch的数据。 + + +```python +images, labels= next(loader_train.create_tuple_iterator()) +#images = data["image"] +#labels = data["label"] + +print("Tensor of image", images.shape) +print("Labels:", labels) +``` + + Tensor of image (16, 3, 224, 224) + Labels: [0 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1] + + +对获取到的图像及标签数据进行可视化,标题为图像对应的label名称。 + + +```python +import matplotlib.pyplot as plt +import numpy as np + +# class_name对应label,按文件夹字符串从小到大的顺序标记label +class_name = {0: "dogs", 1: "wolves"} + +plt.figure(figsize=(15, 7)) +for i in range(len(labels)): + # 获取图像及其对应的label + data_image = images[i].asnumpy() + data_label = labels[i] + # 处理图像供展示使用 + data_image = np.transpose(data_image, (1, 2, 0)) + mean = np.array([0.485, 0.456, 0.406]) + std = np.array([0.229, 0.224, 0.225]) + data_image = std * data_image + mean + data_image = np.clip(data_image, 0, 1) + # 显示图像 + plt.subplot(3, 6, i + 1) + plt.imshow(data_image) + plt.title(class_name[int(labels[i].asnumpy())]) + plt.axis("off") + +plt.show() +``` + + +![png](output_11_0.png) + + +## 模型微调 + +### 1. 整体模型微调 + +#### 预训练模型加载 +我们使用`mindcv.models.densenet`中定义DenseNet121网络,当接口中的`pretrained`参数设置为True时,可以自动下载网络权重。 +由于该预训练模型是针对ImageNet数据集中的1000个类别进行分类的,这里我们设定`num_classes=2`, DenseNet的classifier(即最后的FC层)输出调整为两维,此时只加载backbone的预训练权重,而classifier则使用初始值。 + + +```python +from mindcv.models import create_model + +network = create_model(model_name='densenet121', num_classes=2, pretrained=True) +``` + + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.786.568 [mindspore/train/serialization.py:709] For 'load_param_into_net', 2 parameters in the 'net' are not loaded, because they are not in the 'parameter_dict', please check whether the network structure is consistent when training and loading checkpoint. + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.787.703 [mindspore/train/serialization.py:714] classifier.weight is not loaded. + [WARNING] ME(116613:140051982694208,MainProcess):2022-09-20-03:44:58.788.408 [mindspore/train/serialization.py:714] classifier.bias is not loaded. + + + +> DenseNet的具体结构可参见[DenseNet论文](https://arxiv.org/pdf/1608.06993.pdf)。 + +#### 模型训练 + +使用已加载处理好的带标签的狼和狗图像,对DenseNet进行微调网络。 +注意,对整体模型做微调时,应使用较小的learning rate。 + + +```python +from mindcv.loss import create_loss +from mindcv.optim import create_optimizer +from mindcv.scheduler import create_scheduler +from mindspore import Model, LossMonitor, TimeMonitor #, CheckpointConfig, ModelCheckpoint + + +# 定义优化器和损失函数 +opt = create_optimizer(network.trainable_params(), opt='adam', lr=1e-4) +loss = create_loss(name='CE') + +# 实例化模型 +model = Model(network, loss_fn=loss, optimizer=opt, metrics={'accuracy'}) +``` + + +```python +model.train(10, loader_train, callbacks=[LossMonitor(5), TimeMonitor(5)], dataset_sink_mode=False) +``` + + epoch: 1 step: 5, loss is 0.5195528864860535 + epoch: 1 step: 10, loss is 0.2654373049736023 + epoch: 1 step: 15, loss is 0.28758567571640015 + Train epoch time: 17270.144 ms, per step time: 1151.343 ms + epoch: 2 step: 5, loss is 0.1807008981704712 + epoch: 2 step: 10, loss is 0.1700802594423294 + epoch: 2 step: 15, loss is 0.09752683341503143 + Train epoch time: 1372.549 ms, per step time: 91.503 ms + epoch: 3 step: 5, loss is 0.13594701886177063 + epoch: 3 step: 10, loss is 0.03628234937787056 + epoch: 3 step: 15, loss is 0.039737217128276825 + Train epoch time: 1453.237 ms, per step time: 96.882 ms + epoch: 4 step: 5, loss is 0.014213413000106812 + epoch: 4 step: 10, loss is 0.030747078359127045 + epoch: 4 step: 15, loss is 0.0798817127943039 + Train epoch time: 1331.237 ms, per step time: 88.749 ms + epoch: 5 step: 5, loss is 0.009510636329650879 + epoch: 5 step: 10, loss is 0.02603740245103836 + epoch: 5 step: 15, loss is 0.051846928894519806 + Train epoch time: 1312.737 ms, per step time: 87.516 ms + epoch: 6 step: 5, loss is 0.1163717582821846 + epoch: 6 step: 10, loss is 0.02439398318529129 + epoch: 6 step: 15, loss is 0.02564268559217453 + Train epoch time: 1434.704 ms, per step time: 95.647 ms + epoch: 7 step: 5, loss is 0.013310655951499939 + epoch: 7 step: 10, loss is 0.02289542555809021 + epoch: 7 step: 15, loss is 0.1992517113685608 + Train epoch time: 1275.935 ms, per step time: 85.062 ms + epoch: 8 step: 5, loss is 0.015928998589515686 + epoch: 8 step: 10, loss is 0.011409260332584381 + epoch: 8 step: 15, loss is 0.008141174912452698 + Train epoch time: 1323.102 ms, per step time: 88.207 ms + epoch: 9 step: 5, loss is 0.10395607352256775 + epoch: 9 step: 10, loss is 0.23055407404899597 + epoch: 9 step: 15, loss is 0.04896317049860954 + Train epoch time: 1261.067 ms, per step time: 84.071 ms + epoch: 10 step: 5, loss is 0.03162381425499916 + epoch: 10 step: 10, loss is 0.13094250857830048 + epoch: 10 step: 15, loss is 0.020028553903102875 + Train epoch time: 1217.958 ms, per step time: 81.197 ms + + +#### 模型评估 + +在训练完成后,我们在验证集上评估模型的精度。 + + +```python +res = model.eval(loader_val) +print(res) +``` + + {'accuracy': 1.0} + + +##### 可视化模型推理结果 + +定义 `visualize_mode` 函数,可视化模型预测。 + + +```python +import matplotlib.pyplot as plt +import mindspore as ms + +def visualize_model(model, val_dl, num_classes=2): + # 加载验证集的数据进行验证 + images, labels= next(val_dl.create_tuple_iterator()) + # 预测图像类别 + output = model.predict(images) + pred = np.argmax(output.asnumpy(), axis=1) + # 显示图像及图像的预测值 + images = images.asnumpy() + labels = labels.asnumpy() + class_name = {0: "dogs", 1: "wolves"} + plt.figure(figsize=(15, 7)) + for i in range(len(labels)): + plt.subplot(3, 6, i + 1) + # 若预测正确,显示为蓝色;若预测错误,显示为红色 + color = 'blue' if pred[i] == labels[i] else 'red' + plt.title('predict:{}'.format(class_name[pred[i]]), color=color) + picture_show = np.transpose(images[i], (1, 2, 0)) + mean = np.array([0.485, 0.456, 0.406]) + std = np.array([0.229, 0.224, 0.225]) + picture_show = std * picture_show + mean + picture_show = np.clip(picture_show, 0, 1) + plt.imshow(picture_show) + plt.axis('off') + + plt.show() +``` + +使用微调过后的模型件对验证集的狼和狗图像数据进行预测。若预测字体为蓝色表示预测正确,若预测字体为红色表示预测错误。 + + +```python +visualize_model(model, loader_val) +``` + + +![png](output_23_0.png) + + +### 2. 冻结特征网络, 微调分类器 + +#### 冻结特征网络的参数 + +首先,我们要冻结除最后一层分类器之外的所有网络层,即将相应的层参数的`requires_grad`属性设置为`False`,使其不在反向传播中计算梯度及更新参数。 + +因为`mindcv.models` 中所有的模型均以`classifier` 来标识和命名模型的分类器(即Dense层),所以通过 `classifier.weight` 和 `classifier.bias` 即可筛选出分类器外的各层参数,将其`requires_grad`属性设置为`False`. + + +```python +# freeze backbone +for param in network.get_parameters(): + if param.name not in ["classifier.weight", "classifier.bias"]: + param.requires_grad = False +``` + +#### 微调分类器 +因为特征网络已经固定,我们不必担心训练过程会distort pratrained features,因此,相比于第一种方法,我们可以将learning rate调大一些。 + +与没有预训练模型相比,将节约一大半时间,因为此时可以不用计算部分梯度。 + + +```python +# 加载数据集 +dataset_train = create_dataset(root=data_dir, split='train', num_parallel_workers=num_workers) +loader_train = create_loader( + dataset=dataset_train, + batch_size=16, + is_training=True, + num_classes=2, + transform=trans_train, + num_parallel_workers=num_workers, + ) + +# 定义优化器和损失函数 +opt = create_optimizer(network.trainable_params(), opt='adam', lr=1e-3) +loss = create_loss(name='CE') + +# 实例化模型 +model = Model(network, loss_fn=loss, optimizer=opt, metrics={'accuracy'}) + +model.train(10, loader_train, callbacks=[LossMonitor(5), TimeMonitor(5)], dataset_sink_mode=False) +``` + + epoch: 1 step: 5, loss is 0.051333948969841 + epoch: 1 step: 10, loss is 0.02043312042951584 + epoch: 1 step: 15, loss is 0.16161368787288666 + Train epoch time: 10228.601 ms, per step time: 681.907 ms + epoch: 2 step: 5, loss is 0.002121545374393463 + epoch: 2 step: 10, loss is 0.0009798109531402588 + epoch: 2 step: 15, loss is 0.015776708722114563 + Train epoch time: 562.543 ms, per step time: 37.503 ms + epoch: 3 step: 5, loss is 0.008056879043579102 + epoch: 3 step: 10, loss is 0.0009347647428512573 + epoch: 3 step: 15, loss is 0.028648357838392258 + Train epoch time: 523.249 ms, per step time: 34.883 ms + epoch: 4 step: 5, loss is 0.001014217734336853 + epoch: 4 step: 10, loss is 0.0003159046173095703 + epoch: 4 step: 15, loss is 0.0007699579000473022 + Train epoch time: 508.886 ms, per step time: 33.926 ms + epoch: 5 step: 5, loss is 0.0015687644481658936 + epoch: 5 step: 10, loss is 0.012090332806110382 + epoch: 5 step: 15, loss is 0.004598274827003479 + Train epoch time: 507.243 ms, per step time: 33.816 ms + epoch: 6 step: 5, loss is 0.010022152215242386 + epoch: 6 step: 10, loss is 0.0066385045647621155 + epoch: 6 step: 15, loss is 0.0036080628633499146 + Train epoch time: 517.646 ms, per step time: 34.510 ms + epoch: 7 step: 5, loss is 0.01344013586640358 + epoch: 7 step: 10, loss is 0.0008538365364074707 + epoch: 7 step: 15, loss is 0.14135593175888062 + Train epoch time: 511.513 ms, per step time: 34.101 ms + epoch: 8 step: 5, loss is 0.01626245677471161 + epoch: 8 step: 10, loss is 0.02871556021273136 + epoch: 8 step: 15, loss is 0.010110966861248016 + Train epoch time: 545.678 ms, per step time: 36.379 ms + epoch: 9 step: 5, loss is 0.008498094975948334 + epoch: 9 step: 10, loss is 0.2588501274585724 + epoch: 9 step: 15, loss is 0.0014278888702392578 + Train epoch time: 499.243 ms, per step time: 33.283 ms + epoch: 10 step: 5, loss is 0.021337147802114487 + epoch: 10 step: 10, loss is 0.00829876959323883 + epoch: 10 step: 15, loss is 0.008352771401405334 + Train epoch time: 465.600 ms, per step time: 31.040 ms + + + +```python +dataset_val = create_dataset(root=data_dir, split='val', num_parallel_workers=num_workers) +loader_val = create_loader( + dataset=dataset_val, + batch_size=5, + is_training=True, + num_classes=2, + transform=trans_val, + num_parallel_workers=num_workers, + ) + +res = model.eval(loader_val) +print(res) +``` + + {'accuracy': 1.0} + + +##### 可视化模型预测 + +使用微调过后的模型件对验证集的狼和狗图像数据进行预测。若预测字体为蓝色表示预测正确,若预测字体为红色表示预测错误。 + + +```python +visualize_model(model, loader_val) +``` + + +![png](output_30_0.png) + + +微调后的狼狗预测结果均正确 diff --git a/tutorials/imagenet1000_clsidx_to_labels.txt b/tutorials/imagenet1000_clsidx_to_labels.txt new file mode 100644 index 0000000..5a51e6f --- /dev/null +++ b/tutorials/imagenet1000_clsidx_to_labels.txt @@ -0,0 +1,1000 @@ +{0: 'tench, Tinca tinca', + 1: 'goldfish, Carassius auratus', + 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', + 3: 'tiger shark, Galeocerdo cuvieri', + 4: 'hammerhead, hammerhead shark', + 5: 'electric ray, crampfish, numbfish, torpedo', + 6: 'stingray', + 7: 'cock', + 8: 'hen', + 9: 'ostrich, Struthio camelus', + 10: 'brambling, Fringilla montifringilla', + 11: 'goldfinch, Carduelis carduelis', + 12: 'house finch, linnet, Carpodacus mexicanus', + 13: 'junco, snowbird', + 14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea', + 15: 'robin, American robin, Turdus migratorius', + 16: 'bulbul', + 17: 'jay', + 18: 'magpie', + 19: 'chickadee', + 20: 'water ouzel, dipper', + 21: 'kite', + 22: 'bald eagle, American eagle, Haliaeetus leucocephalus', + 23: 'vulture', + 24: 'great grey owl, great gray owl, Strix nebulosa', + 25: 'European fire salamander, Salamandra salamandra', + 26: 'common newt, Triturus vulgaris', + 27: 'eft', + 28: 'spotted salamander, Ambystoma maculatum', + 29: 'axolotl, mud puppy, Ambystoma mexicanum', + 30: 'bullfrog, Rana catesbeiana', + 31: 'tree frog, tree-frog', + 32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui', + 33: 'loggerhead, loggerhead turtle, Caretta caretta', + 34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea', + 35: 'mud turtle', + 36: 'terrapin', + 37: 'box turtle, box tortoise', + 38: 'banded gecko', + 39: 'common iguana, iguana, Iguana iguana', + 40: 'American chameleon, anole, Anolis carolinensis', + 41: 'whiptail, whiptail lizard', + 42: 'agama', + 43: 'frilled lizard, Chlamydosaurus kingi', + 44: 'alligator lizard', + 45: 'Gila monster, Heloderma suspectum', + 46: 'green lizard, Lacerta viridis', + 47: 'African chameleon, Chamaeleo chamaeleon', + 48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis', + 49: 'African crocodile, Nile crocodile, Crocodylus niloticus', + 50: 'American alligator, Alligator mississipiensis', + 51: 'triceratops', + 52: 'thunder snake, worm snake, Carphophis amoenus', + 53: 'ringneck snake, ring-necked snake, ring snake', + 54: 'hognose snake, puff adder, sand viper', + 55: 'green snake, grass snake', + 56: 'king snake, kingsnake', + 57: 'garter snake, grass snake', + 58: 'water snake', + 59: 'vine snake', + 60: 'night snake, Hypsiglena torquata', + 61: 'boa constrictor, Constrictor constrictor', + 62: 'rock python, rock snake, Python sebae', + 63: 'Indian cobra, Naja naja', + 64: 'green mamba', + 65: 'sea snake', + 66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus', + 67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus', + 68: 'sidewinder, horned rattlesnake, Crotalus cerastes', + 69: 'trilobite', + 70: 'harvestman, daddy longlegs, Phalangium opilio', + 71: 'scorpion', + 72: 'black and gold garden spider, Argiope aurantia', + 73: 'barn spider, Araneus cavaticus', + 74: 'garden spider, Aranea diademata', + 75: 'black widow, Latrodectus mactans', + 76: 'tarantula', + 77: 'wolf spider, hunting spider', + 78: 'tick', + 79: 'centipede', + 80: 'black grouse', + 81: 'ptarmigan', + 82: 'ruffed grouse, partridge, Bonasa umbellus', + 83: 'prairie chicken, prairie grouse, prairie fowl', + 84: 'peacock', + 85: 'quail', + 86: 'partridge', + 87: 'African grey, African gray, Psittacus erithacus', + 88: 'macaw', + 89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita', + 90: 'lorikeet', + 91: 'coucal', + 92: 'bee eater', + 93: 'hornbill', + 94: 'hummingbird', + 95: 'jacamar', + 96: 'toucan', + 97: 'drake', + 98: 'red-breasted merganser, Mergus serrator', + 99: 'goose', + 100: 'black swan, Cygnus atratus', + 101: 'tusker', + 102: 'echidna, spiny anteater, anteater', + 103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus', + 104: 'wallaby, brush kangaroo', + 105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus', + 106: 'wombat', + 107: 'jellyfish', + 108: 'sea anemone, anemone', + 109: 'brain coral', + 110: 'flatworm, platyhelminth', + 111: 'nematode, nematode worm, roundworm', + 112: 'conch', + 113: 'snail', + 114: 'slug', + 115: 'sea slug, nudibranch', + 116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore', + 117: 'chambered nautilus, pearly nautilus, nautilus', + 118: 'Dungeness crab, Cancer magister', + 119: 'rock crab, Cancer irroratus', + 120: 'fiddler crab', + 121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica', + 122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus', + 123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish', + 124: 'crayfish, crawfish, crawdad, crawdaddy', + 125: 'hermit crab', + 126: 'isopod', + 127: 'white stork, Ciconia ciconia', + 128: 'black stork, Ciconia nigra', + 129: 'spoonbill', + 130: 'flamingo', + 131: 'little blue heron, Egretta caerulea', + 132: 'American egret, great white heron, Egretta albus', + 133: 'bittern', + 134: 'crane', + 135: 'limpkin, Aramus pictus', + 136: 'European gallinule, Porphyrio porphyrio', + 137: 'American coot, marsh hen, mud hen, water hen, Fulica americana', + 138: 'bustard', + 139: 'ruddy turnstone, Arenaria interpres', + 140: 'red-backed sandpiper, dunlin, Erolia alpina', + 141: 'redshank, Tringa totanus', + 142: 'dowitcher', + 143: 'oystercatcher, oyster catcher', + 144: 'pelican', + 145: 'king penguin, Aptenodytes patagonica', + 146: 'albatross, mollymawk', + 147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus', + 148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca', + 149: 'dugong, Dugong dugon', + 150: 'sea lion', + 151: 'Chihuahua', + 152: 'Japanese spaniel', + 153: 'Maltese dog, Maltese terrier, Maltese', + 154: 'Pekinese, Pekingese, Peke', + 155: 'Shih-Tzu', + 156: 'Blenheim spaniel', + 157: 'papillon', + 158: 'toy terrier', + 159: 'Rhodesian ridgeback', + 160: 'Afghan hound, Afghan', + 161: 'basset, basset hound', + 162: 'beagle', + 163: 'bloodhound, sleuthhound', + 164: 'bluetick', + 165: 'black-and-tan coonhound', + 166: 'Walker hound, Walker foxhound', + 167: 'English foxhound', + 168: 'redbone', + 169: 'borzoi, Russian wolfhound', + 170: 'Irish wolfhound', + 171: 'Italian greyhound', + 172: 'whippet', + 173: 'Ibizan hound, Ibizan Podenco', + 174: 'Norwegian elkhound, elkhound', + 175: 'otterhound, otter hound', + 176: 'Saluki, gazelle hound', + 177: 'Scottish deerhound, deerhound', + 178: 'Weimaraner', + 179: 'Staffordshire bullterrier, Staffordshire bull terrier', + 180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier', + 181: 'Bedlington terrier', + 182: 'Border terrier', + 183: 'Kerry blue terrier', + 184: 'Irish terrier', + 185: 'Norfolk terrier', + 186: 'Norwich terrier', + 187: 'Yorkshire terrier', + 188: 'wire-haired fox terrier', + 189: 'Lakeland terrier', + 190: 'Sealyham terrier, Sealyham', + 191: 'Airedale, Airedale terrier', + 192: 'cairn, cairn terrier', + 193: 'Australian terrier', + 194: 'Dandie Dinmont, Dandie Dinmont terrier', + 195: 'Boston bull, Boston terrier', + 196: 'miniature schnauzer', + 197: 'giant schnauzer', + 198: 'standard schnauzer', + 199: 'Scotch terrier, Scottish terrier, Scottie', + 200: 'Tibetan terrier, chrysanthemum dog', + 201: 'silky terrier, Sydney silky', + 202: 'soft-coated wheaten terrier', + 203: 'West Highland white terrier', + 204: 'Lhasa, Lhasa apso', + 205: 'flat-coated retriever', + 206: 'curly-coated retriever', + 207: 'golden retriever', + 208: 'Labrador retriever', + 209: 'Chesapeake Bay retriever', + 210: 'German short-haired pointer', + 211: 'vizsla, Hungarian pointer', + 212: 'English setter', + 213: 'Irish setter, red setter', + 214: 'Gordon setter', + 215: 'Brittany spaniel', + 216: 'clumber, clumber spaniel', + 217: 'English springer, English springer spaniel', + 218: 'Welsh springer spaniel', + 219: 'cocker spaniel, English cocker spaniel, cocker', + 220: 'Sussex spaniel', + 221: 'Irish water spaniel', + 222: 'kuvasz', + 223: 'schipperke', + 224: 'groenendael', + 225: 'malinois', + 226: 'briard', + 227: 'kelpie', + 228: 'komondor', + 229: 'Old English sheepdog, bobtail', + 230: 'Shetland sheepdog, Shetland sheep dog, Shetland', + 231: 'collie', + 232: 'Border collie', + 233: 'Bouvier des Flandres, Bouviers des Flandres', + 234: 'Rottweiler', + 235: 'German shepherd, German shepherd dog, German police dog, alsatian', + 236: 'Doberman, Doberman pinscher', + 237: 'miniature pinscher', + 238: 'Greater Swiss Mountain dog', + 239: 'Bernese mountain dog', + 240: 'Appenzeller', + 241: 'EntleBucher', + 242: 'boxer', + 243: 'bull mastiff', + 244: 'Tibetan mastiff', + 245: 'French bulldog', + 246: 'Great Dane', + 247: 'Saint Bernard, St Bernard', + 248: 'Eskimo dog, husky', + 249: 'malamute, malemute, Alaskan malamute', + 250: 'Siberian husky', + 251: 'dalmatian, coach dog, carriage dog', + 252: 'affenpinscher, monkey pinscher, monkey dog', + 253: 'basenji', + 254: 'pug, pug-dog', + 255: 'Leonberg', + 256: 'Newfoundland, Newfoundland dog', + 257: 'Great Pyrenees', + 258: 'Samoyed, Samoyede', + 259: 'Pomeranian', + 260: 'chow, chow chow', + 261: 'keeshond', + 262: 'Brabancon griffon', + 263: 'Pembroke, Pembroke Welsh corgi', + 264: 'Cardigan, Cardigan Welsh corgi', + 265: 'toy poodle', + 266: 'miniature poodle', + 267: 'standard poodle', + 268: 'Mexican hairless', + 269: 'timber wolf, grey wolf, gray wolf, Canis lupus', + 270: 'white wolf, Arctic wolf, Canis lupus tundrarum', + 271: 'red wolf, maned wolf, Canis rufus, Canis niger', + 272: 'coyote, prairie wolf, brush wolf, Canis latrans', + 273: 'dingo, warrigal, warragal, Canis dingo', + 274: 'dhole, Cuon alpinus', + 275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus', + 276: 'hyena, hyaena', + 277: 'red fox, Vulpes vulpes', + 278: 'kit fox, Vulpes macrotis', + 279: 'Arctic fox, white fox, Alopex lagopus', + 280: 'grey fox, gray fox, Urocyon cinereoargenteus', + 281: 'tabby, tabby cat', + 282: 'tiger cat', + 283: 'Persian cat', + 284: 'Siamese cat, Siamese', + 285: 'Egyptian cat', + 286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor', + 287: 'lynx, catamount', + 288: 'leopard, Panthera pardus', + 289: 'snow leopard, ounce, Panthera uncia', + 290: 'jaguar, panther, Panthera onca, Felis onca', + 291: 'lion, king of beasts, Panthera leo', + 292: 'tiger, Panthera tigris', + 293: 'cheetah, chetah, Acinonyx jubatus', + 294: 'brown bear, bruin, Ursus arctos', + 295: 'American black bear, black bear, Ursus americanus, Euarctos americanus', + 296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus', + 297: 'sloth bear, Melursus ursinus, Ursus ursinus', + 298: 'mongoose', + 299: 'meerkat, mierkat', + 300: 'tiger beetle', + 301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle', + 302: 'ground beetle, carabid beetle', + 303: 'long-horned beetle, longicorn, longicorn beetle', + 304: 'leaf beetle, chrysomelid', + 305: 'dung beetle', + 306: 'rhinoceros beetle', + 307: 'weevil', + 308: 'fly', + 309: 'bee', + 310: 'ant, emmet, pismire', + 311: 'grasshopper, hopper', + 312: 'cricket', + 313: 'walking stick, walkingstick, stick insect', + 314: 'cockroach, roach', + 315: 'mantis, mantid', + 316: 'cicada, cicala', + 317: 'leafhopper', + 318: 'lacewing, lacewing fly', + 319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk", + 320: 'damselfly', + 321: 'admiral', + 322: 'ringlet, ringlet butterfly', + 323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus', + 324: 'cabbage butterfly', + 325: 'sulphur butterfly, sulfur butterfly', + 326: 'lycaenid, lycaenid butterfly', + 327: 'starfish, sea star', + 328: 'sea urchin', + 329: 'sea cucumber, holothurian', + 330: 'wood rabbit, cottontail, cottontail rabbit', + 331: 'hare', + 332: 'Angora, Angora rabbit', + 333: 'hamster', + 334: 'porcupine, hedgehog', + 335: 'fox squirrel, eastern fox squirrel, Sciurus niger', + 336: 'marmot', + 337: 'beaver', + 338: 'guinea pig, Cavia cobaya', + 339: 'sorrel', + 340: 'zebra', + 341: 'hog, pig, grunter, squealer, Sus scrofa', + 342: 'wild boar, boar, Sus scrofa', + 343: 'warthog', + 344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius', + 345: 'ox', + 346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis', + 347: 'bison', + 348: 'ram, tup', + 349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis', + 350: 'ibex, Capra ibex', + 351: 'hartebeest', + 352: 'impala, Aepyceros melampus', + 353: 'gazelle', + 354: 'Arabian camel, dromedary, Camelus dromedarius', + 355: 'llama', + 356: 'weasel', + 357: 'mink', + 358: 'polecat, fitch, foulmart, foumart, Mustela putorius', + 359: 'black-footed ferret, ferret, Mustela nigripes', + 360: 'otter', + 361: 'skunk, polecat, wood pussy', + 362: 'badger', + 363: 'armadillo', + 364: 'three-toed sloth, ai, Bradypus tridactylus', + 365: 'orangutan, orang, orangutang, Pongo pygmaeus', + 366: 'gorilla, Gorilla gorilla', + 367: 'chimpanzee, chimp, Pan troglodytes', + 368: 'gibbon, Hylobates lar', + 369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus', + 370: 'guenon, guenon monkey', + 371: 'patas, hussar monkey, Erythrocebus patas', + 372: 'baboon', + 373: 'macaque', + 374: 'langur', + 375: 'colobus, colobus monkey', + 376: 'proboscis monkey, Nasalis larvatus', + 377: 'marmoset', + 378: 'capuchin, ringtail, Cebus capucinus', + 379: 'howler monkey, howler', + 380: 'titi, titi monkey', + 381: 'spider monkey, Ateles geoffroyi', + 382: 'squirrel monkey, Saimiri sciureus', + 383: 'Madagascar cat, ring-tailed lemur, Lemur catta', + 384: 'indri, indris, Indri indri, Indri brevicaudatus', + 385: 'Indian elephant, Elephas maximus', + 386: 'African elephant, Loxodonta africana', + 387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens', + 388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca', + 389: 'barracouta, snoek', + 390: 'eel', + 391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch', + 392: 'rock beauty, Holocanthus tricolor', + 393: 'anemone fish', + 394: 'sturgeon', + 395: 'gar, garfish, garpike, billfish, Lepisosteus osseus', + 396: 'lionfish', + 397: 'puffer, pufferfish, blowfish, globefish', + 398: 'abacus', + 399: 'abaya', + 400: "academic gown, academic robe, judge's robe", + 401: 'accordion, piano accordion, squeeze box', + 402: 'acoustic guitar', + 403: 'aircraft carrier, carrier, flattop, attack aircraft carrier', + 404: 'airliner', + 405: 'airship, dirigible', + 406: 'altar', + 407: 'ambulance', + 408: 'amphibian, amphibious vehicle', + 409: 'analog clock', + 410: 'apiary, bee house', + 411: 'apron', + 412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin', + 413: 'assault rifle, assault gun', + 414: 'backpack, back pack, knapsack, packsack, rucksack, haversack', + 415: 'bakery, bakeshop, bakehouse', + 416: 'balance beam, beam', + 417: 'balloon', + 418: 'ballpoint, ballpoint pen, ballpen, Biro', + 419: 'Band Aid', + 420: 'banjo', + 421: 'bannister, banister, balustrade, balusters, handrail', + 422: 'barbell', + 423: 'barber chair', + 424: 'barbershop', + 425: 'barn', + 426: 'barometer', + 427: 'barrel, cask', + 428: 'barrow, garden cart, lawn cart, wheelbarrow', + 429: 'baseball', + 430: 'basketball', + 431: 'bassinet', + 432: 'bassoon', + 433: 'bathing cap, swimming cap', + 434: 'bath towel', + 435: 'bathtub, bathing tub, bath, tub', + 436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon', + 437: 'beacon, lighthouse, beacon light, pharos', + 438: 'beaker', + 439: 'bearskin, busby, shako', + 440: 'beer bottle', + 441: 'beer glass', + 442: 'bell cote, bell cot', + 443: 'bib', + 444: 'bicycle-built-for-two, tandem bicycle, tandem', + 445: 'bikini, two-piece', + 446: 'binder, ring-binder', + 447: 'binoculars, field glasses, opera glasses', + 448: 'birdhouse', + 449: 'boathouse', + 450: 'bobsled, bobsleigh, bob', + 451: 'bolo tie, bolo, bola tie, bola', + 452: 'bonnet, poke bonnet', + 453: 'bookcase', + 454: 'bookshop, bookstore, bookstall', + 455: 'bottlecap', + 456: 'bow', + 457: 'bow tie, bow-tie, bowtie', + 458: 'brass, memorial tablet, plaque', + 459: 'brassiere, bra, bandeau', + 460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty', + 461: 'breastplate, aegis, egis', + 462: 'broom', + 463: 'bucket, pail', + 464: 'buckle', + 465: 'bulletproof vest', + 466: 'bullet train, bullet', + 467: 'butcher shop, meat market', + 468: 'cab, hack, taxi, taxicab', + 469: 'caldron, cauldron', + 470: 'candle, taper, wax light', + 471: 'cannon', + 472: 'canoe', + 473: 'can opener, tin opener', + 474: 'cardigan', + 475: 'car mirror', + 476: 'carousel, carrousel, merry-go-round, roundabout, whirligig', + 477: "carpenter's kit, tool kit", + 478: 'carton', + 479: 'car wheel', + 480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM', + 481: 'cassette', + 482: 'cassette player', + 483: 'castle', + 484: 'catamaran', + 485: 'CD player', + 486: 'cello, violoncello', + 487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone', + 488: 'chain', + 489: 'chainlink fence', + 490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour', + 491: 'chain saw, chainsaw', + 492: 'chest', + 493: 'chiffonier, commode', + 494: 'chime, bell, gong', + 495: 'china cabinet, china closet', + 496: 'Christmas stocking', + 497: 'church, church building', + 498: 'cinema, movie theater, movie theatre, movie house, picture palace', + 499: 'cleaver, meat cleaver, chopper', + 500: 'cliff dwelling', + 501: 'cloak', + 502: 'clog, geta, patten, sabot', + 503: 'cocktail shaker', + 504: 'coffee mug', + 505: 'coffeepot', + 506: 'coil, spiral, volute, whorl, helix', + 507: 'combination lock', + 508: 'computer keyboard, keypad', + 509: 'confectionery, confectionary, candy store', + 510: 'container ship, containership, container vessel', + 511: 'convertible', + 512: 'corkscrew, bottle screw', + 513: 'cornet, horn, trumpet, trump', + 514: 'cowboy boot', + 515: 'cowboy hat, ten-gallon hat', + 516: 'cradle', + 517: 'crane', + 518: 'crash helmet', + 519: 'crate', + 520: 'crib, cot', + 521: 'Crock Pot', + 522: 'croquet ball', + 523: 'crutch', + 524: 'cuirass', + 525: 'dam, dike, dyke', + 526: 'desk', + 527: 'desktop computer', + 528: 'dial telephone, dial phone', + 529: 'diaper, nappy, napkin', + 530: 'digital clock', + 531: 'digital watch', + 532: 'dining table, board', + 533: 'dishrag, dishcloth', + 534: 'dishwasher, dish washer, dishwashing machine', + 535: 'disk brake, disc brake', + 536: 'dock, dockage, docking facility', + 537: 'dogsled, dog sled, dog sleigh', + 538: 'dome', + 539: 'doormat, welcome mat', + 540: 'drilling platform, offshore rig', + 541: 'drum, membranophone, tympan', + 542: 'drumstick', + 543: 'dumbbell', + 544: 'Dutch oven', + 545: 'electric fan, blower', + 546: 'electric guitar', + 547: 'electric locomotive', + 548: 'entertainment center', + 549: 'envelope', + 550: 'espresso maker', + 551: 'face powder', + 552: 'feather boa, boa', + 553: 'file, file cabinet, filing cabinet', + 554: 'fireboat', + 555: 'fire engine, fire truck', + 556: 'fire screen, fireguard', + 557: 'flagpole, flagstaff', + 558: 'flute, transverse flute', + 559: 'folding chair', + 560: 'football helmet', + 561: 'forklift', + 562: 'fountain', + 563: 'fountain pen', + 564: 'four-poster', + 565: 'freight car', + 566: 'French horn, horn', + 567: 'frying pan, frypan, skillet', + 568: 'fur coat', + 569: 'garbage truck, dustcart', + 570: 'gasmask, respirator, gas helmet', + 571: 'gas pump, gasoline pump, petrol pump, island dispenser', + 572: 'goblet', + 573: 'go-kart', + 574: 'golf ball', + 575: 'golfcart, golf cart', + 576: 'gondola', + 577: 'gong, tam-tam', + 578: 'gown', + 579: 'grand piano, grand', + 580: 'greenhouse, nursery, glasshouse', + 581: 'grille, radiator grille', + 582: 'grocery store, grocery, food market, market', + 583: 'guillotine', + 584: 'hair slide', + 585: 'hair spray', + 586: 'half track', + 587: 'hammer', + 588: 'hamper', + 589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier', + 590: 'hand-held computer, hand-held microcomputer', + 591: 'handkerchief, hankie, hanky, hankey', + 592: 'hard disc, hard disk, fixed disk', + 593: 'harmonica, mouth organ, harp, mouth harp', + 594: 'harp', + 595: 'harvester, reaper', + 596: 'hatchet', + 597: 'holster', + 598: 'home theater, home theatre', + 599: 'honeycomb', + 600: 'hook, claw', + 601: 'hoopskirt, crinoline', + 602: 'horizontal bar, high bar', + 603: 'horse cart, horse-cart', + 604: 'hourglass', + 605: 'iPod', + 606: 'iron, smoothing iron', + 607: "jack-o'-lantern", + 608: 'jean, blue jean, denim', + 609: 'jeep, landrover', + 610: 'jersey, T-shirt, tee shirt', + 611: 'jigsaw puzzle', + 612: 'jinrikisha, ricksha, rickshaw', + 613: 'joystick', + 614: 'kimono', + 615: 'knee pad', + 616: 'knot', + 617: 'lab coat, laboratory coat', + 618: 'ladle', + 619: 'lampshade, lamp shade', + 620: 'laptop, laptop computer', + 621: 'lawn mower, mower', + 622: 'lens cap, lens cover', + 623: 'letter opener, paper knife, paperknife', + 624: 'library', + 625: 'lifeboat', + 626: 'lighter, light, igniter, ignitor', + 627: 'limousine, limo', + 628: 'liner, ocean liner', + 629: 'lipstick, lip rouge', + 630: 'Loafer', + 631: 'lotion', + 632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system', + 633: "loupe, jeweler's loupe", + 634: 'lumbermill, sawmill', + 635: 'magnetic compass', + 636: 'mailbag, postbag', + 637: 'mailbox, letter box', + 638: 'maillot', + 639: 'maillot, tank suit', + 640: 'manhole cover', + 641: 'maraca', + 642: 'marimba, xylophone', + 643: 'mask', + 644: 'matchstick', + 645: 'maypole', + 646: 'maze, labyrinth', + 647: 'measuring cup', + 648: 'medicine chest, medicine cabinet', + 649: 'megalith, megalithic structure', + 650: 'microphone, mike', + 651: 'microwave, microwave oven', + 652: 'military uniform', + 653: 'milk can', + 654: 'minibus', + 655: 'miniskirt, mini', + 656: 'minivan', + 657: 'missile', + 658: 'mitten', + 659: 'mixing bowl', + 660: 'mobile home, manufactured home', + 661: 'Model T', + 662: 'modem', + 663: 'monastery', + 664: 'monitor', + 665: 'moped', + 666: 'mortar', + 667: 'mortarboard', + 668: 'mosque', + 669: 'mosquito net', + 670: 'motor scooter, scooter', + 671: 'mountain bike, all-terrain bike, off-roader', + 672: 'mountain tent', + 673: 'mouse, computer mouse', + 674: 'mousetrap', + 675: 'moving van', + 676: 'muzzle', + 677: 'nail', + 678: 'neck brace', + 679: 'necklace', + 680: 'nipple', + 681: 'notebook, notebook computer', + 682: 'obelisk', + 683: 'oboe, hautboy, hautbois', + 684: 'ocarina, sweet potato', + 685: 'odometer, hodometer, mileometer, milometer', + 686: 'oil filter', + 687: 'organ, pipe organ', + 688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO', + 689: 'overskirt', + 690: 'oxcart', + 691: 'oxygen mask', + 692: 'packet', + 693: 'paddle, boat paddle', + 694: 'paddlewheel, paddle wheel', + 695: 'padlock', + 696: 'paintbrush', + 697: "pajama, pyjama, pj's, jammies", + 698: 'palace', + 699: 'panpipe, pandean pipe, syrinx', + 700: 'paper towel', + 701: 'parachute, chute', + 702: 'parallel bars, bars', + 703: 'park bench', + 704: 'parking meter', + 705: 'passenger car, coach, carriage', + 706: 'patio, terrace', + 707: 'pay-phone, pay-station', + 708: 'pedestal, plinth, footstall', + 709: 'pencil box, pencil case', + 710: 'pencil sharpener', + 711: 'perfume, essence', + 712: 'Petri dish', + 713: 'photocopier', + 714: 'pick, plectrum, plectron', + 715: 'pickelhaube', + 716: 'picket fence, paling', + 717: 'pickup, pickup truck', + 718: 'pier', + 719: 'piggy bank, penny bank', + 720: 'pill bottle', + 721: 'pillow', + 722: 'ping-pong ball', + 723: 'pinwheel', + 724: 'pirate, pirate ship', + 725: 'pitcher, ewer', + 726: "plane, carpenter's plane, woodworking plane", + 727: 'planetarium', + 728: 'plastic bag', + 729: 'plate rack', + 730: 'plow, plough', + 731: "plunger, plumber's helper", + 732: 'Polaroid camera, Polaroid Land camera', + 733: 'pole', + 734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria', + 735: 'poncho', + 736: 'pool table, billiard table, snooker table', + 737: 'pop bottle, soda bottle', + 738: 'pot, flowerpot', + 739: "potter's wheel", + 740: 'power drill', + 741: 'prayer rug, prayer mat', + 742: 'printer', + 743: 'prison, prison house', + 744: 'projectile, missile', + 745: 'projector', + 746: 'puck, hockey puck', + 747: 'punching bag, punch bag, punching ball, punchball', + 748: 'purse', + 749: 'quill, quill pen', + 750: 'quilt, comforter, comfort, puff', + 751: 'racer, race car, racing car', + 752: 'racket, racquet', + 753: 'radiator', + 754: 'radio, wireless', + 755: 'radio telescope, radio reflector', + 756: 'rain barrel', + 757: 'recreational vehicle, RV, R.V.', + 758: 'reel', + 759: 'reflex camera', + 760: 'refrigerator, icebox', + 761: 'remote control, remote', + 762: 'restaurant, eating house, eating place, eatery', + 763: 'revolver, six-gun, six-shooter', + 764: 'rifle', + 765: 'rocking chair, rocker', + 766: 'rotisserie', + 767: 'rubber eraser, rubber, pencil eraser', + 768: 'rugby ball', + 769: 'rule, ruler', + 770: 'running shoe', + 771: 'safe', + 772: 'safety pin', + 773: 'saltshaker, salt shaker', + 774: 'sandal', + 775: 'sarong', + 776: 'sax, saxophone', + 777: 'scabbard', + 778: 'scale, weighing machine', + 779: 'school bus', + 780: 'schooner', + 781: 'scoreboard', + 782: 'screen, CRT screen', + 783: 'screw', + 784: 'screwdriver', + 785: 'seat belt, seatbelt', + 786: 'sewing machine', + 787: 'shield, buckler', + 788: 'shoe shop, shoe-shop, shoe store', + 789: 'shoji', + 790: 'shopping basket', + 791: 'shopping cart', + 792: 'shovel', + 793: 'shower cap', + 794: 'shower curtain', + 795: 'ski', + 796: 'ski mask', + 797: 'sleeping bag', + 798: 'slide rule, slipstick', + 799: 'sliding door', + 800: 'slot, one-armed bandit', + 801: 'snorkel', + 802: 'snowmobile', + 803: 'snowplow, snowplough', + 804: 'soap dispenser', + 805: 'soccer ball', + 806: 'sock', + 807: 'solar dish, solar collector, solar furnace', + 808: 'sombrero', + 809: 'soup bowl', + 810: 'space bar', + 811: 'space heater', + 812: 'space shuttle', + 813: 'spatula', + 814: 'speedboat', + 815: "spider web, spider's web", + 816: 'spindle', + 817: 'sports car, sport car', + 818: 'spotlight, spot', + 819: 'stage', + 820: 'steam locomotive', + 821: 'steel arch bridge', + 822: 'steel drum', + 823: 'stethoscope', + 824: 'stole', + 825: 'stone wall', + 826: 'stopwatch, stop watch', + 827: 'stove', + 828: 'strainer', + 829: 'streetcar, tram, tramcar, trolley, trolley car', + 830: 'stretcher', + 831: 'studio couch, day bed', + 832: 'stupa, tope', + 833: 'submarine, pigboat, sub, U-boat', + 834: 'suit, suit of clothes', + 835: 'sundial', + 836: 'sunglass', + 837: 'sunglasses, dark glasses, shades', + 838: 'sunscreen, sunblock, sun blocker', + 839: 'suspension bridge', + 840: 'swab, swob, mop', + 841: 'sweatshirt', + 842: 'swimming trunks, bathing trunks', + 843: 'swing', + 844: 'switch, electric switch, electrical switch', + 845: 'syringe', + 846: 'table lamp', + 847: 'tank, army tank, armored combat vehicle, armoured combat vehicle', + 848: 'tape player', + 849: 'teapot', + 850: 'teddy, teddy bear', + 851: 'television, television system', + 852: 'tennis ball', + 853: 'thatch, thatched roof', + 854: 'theater curtain, theatre curtain', + 855: 'thimble', + 856: 'thresher, thrasher, threshing machine', + 857: 'throne', + 858: 'tile roof', + 859: 'toaster', + 860: 'tobacco shop, tobacconist shop, tobacconist', + 861: 'toilet seat', + 862: 'torch', + 863: 'totem pole', + 864: 'tow truck, tow car, wrecker', + 865: 'toyshop', + 866: 'tractor', + 867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi', + 868: 'tray', + 869: 'trench coat', + 870: 'tricycle, trike, velocipede', + 871: 'trimaran', + 872: 'tripod', + 873: 'triumphal arch', + 874: 'trolleybus, trolley coach, trackless trolley', + 875: 'trombone', + 876: 'tub, vat', + 877: 'turnstile', + 878: 'typewriter keyboard', + 879: 'umbrella', + 880: 'unicycle, monocycle', + 881: 'upright, upright piano', + 882: 'vacuum, vacuum cleaner', + 883: 'vase', + 884: 'vault', + 885: 'velvet', + 886: 'vending machine', + 887: 'vestment', + 888: 'viaduct', + 889: 'violin, fiddle', + 890: 'volleyball', + 891: 'waffle iron', + 892: 'wall clock', + 893: 'wallet, billfold, notecase, pocketbook', + 894: 'wardrobe, closet, press', + 895: 'warplane, military plane', + 896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin', + 897: 'washer, automatic washer, washing machine', + 898: 'water bottle', + 899: 'water jug', + 900: 'water tower', + 901: 'whiskey jug', + 902: 'whistle', + 903: 'wig', + 904: 'window screen', + 905: 'window shade', + 906: 'Windsor tie', + 907: 'wine bottle', + 908: 'wing', + 909: 'wok', + 910: 'wooden spoon', + 911: 'wool, woolen, woollen', + 912: 'worm fence, snake fence, snake-rail fence, Virginia fence', + 913: 'wreck', + 914: 'yawl', + 915: 'yurt', + 916: 'web site, website, internet site, site', + 917: 'comic book', + 918: 'crossword puzzle, crossword', + 919: 'street sign', + 920: 'traffic light, traffic signal, stoplight', + 921: 'book jacket, dust cover, dust jacket, dust wrapper', + 922: 'menu', + 923: 'plate', + 924: 'guacamole', + 925: 'consomme', + 926: 'hot pot, hotpot', + 927: 'trifle', + 928: 'ice cream, icecream', + 929: 'ice lolly, lolly, lollipop, popsicle', + 930: 'French loaf', + 931: 'bagel, beigel', + 932: 'pretzel', + 933: 'cheeseburger', + 934: 'hotdog, hot dog, red hot', + 935: 'mashed potato', + 936: 'head cabbage', + 937: 'broccoli', + 938: 'cauliflower', + 939: 'zucchini, courgette', + 940: 'spaghetti squash', + 941: 'acorn squash', + 942: 'butternut squash', + 943: 'cucumber, cuke', + 944: 'artichoke, globe artichoke', + 945: 'bell pepper', + 946: 'cardoon', + 947: 'mushroom', + 948: 'Granny Smith', + 949: 'strawberry', + 950: 'orange', + 951: 'lemon', + 952: 'fig', + 953: 'pineapple, ananas', + 954: 'banana', + 955: 'jackfruit, jak, jack', + 956: 'custard apple', + 957: 'pomegranate', + 958: 'hay', + 959: 'carbonara', + 960: 'chocolate sauce, chocolate syrup', + 961: 'dough', + 962: 'meat loaf, meatloaf', + 963: 'pizza, pizza pie', + 964: 'potpie', + 965: 'burrito', + 966: 'red wine', + 967: 'espresso', + 968: 'cup', + 969: 'eggnog', + 970: 'alp', + 971: 'bubble', + 972: 'cliff, drop, drop-off', + 973: 'coral reef', + 974: 'geyser', + 975: 'lakeside, lakeshore', + 976: 'promontory, headland, head, foreland', + 977: 'sandbar, sand bar', + 978: 'seashore, coast, seacoast, sea-coast', + 979: 'valley, vale', + 980: 'volcano', + 981: 'ballplayer, baseball player', + 982: 'groom, bridegroom', + 983: 'scuba diver', + 984: 'rapeseed', + 985: 'daisy', + 986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum", + 987: 'corn', + 988: 'acorn', + 989: 'hip, rose hip, rosehip', + 990: 'buckeye, horse chestnut, conker', + 991: 'coral fungus', + 992: 'agaric', + 993: 'gyromitra', + 994: 'stinkhorn, carrion fungus', + 995: 'earthstar', + 996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa', + 997: 'bolete', + 998: 'ear, spike, capitulum', + 999: 'toilet tissue, toilet paper, bathroom tissue'} diff --git a/tutorials/inference.md b/tutorials/inference.md new file mode 100644 index 0000000..28f8176 --- /dev/null +++ b/tutorials/inference.md @@ -0,0 +1,148 @@ +# Image Classification Prediction + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/inference.ipynb)  + + +This tutorial introduces how to call the pretraining model in MindCV to make classification prediction on the test image. + +## Model Loading + +### View All Available Models + +By calling the `registry.list_models` function in `mindcv.models`, the names of all network models can be printed. The models of a network in different parameter configurations will also be printed, such as resnet18 / resnet34 / resnet50 / resnet101 / resnet152. + + +```python +import sys +sys.path.append("..") +from mindcv.models import registry +registry.list_models() +``` + + + + + ['BiTresnet50', + 'RepMLPNet_B224', + 'RepMLPNet_B256', + 'RepMLPNet_D256', + 'RepMLPNet_L256', + 'RepMLPNet_T224', + 'RepMLPNet_T256', + 'convit_base', + 'convit_base_plus', + 'convit_small', + ... + 'visformer_small', + 'visformer_small_v2', + 'visformer_tiny', + 'visformer_tiny_v2', + 'vit_b_16_224', + 'vit_b_16_384', + 'vit_b_32_224', + 'vit_b_32_384', + 'vit_l_16_224', + 'vit_l_16_384', + 'vit_l_32_224', + 'xception'] + + + +### Load Pretraining Model + +Taking the resnet50 model as an example, we introduce two methods to load the model checkpoint using the `create_model` function in `mindcv.models`. 1). When the `pretrained` parameter in the interface is set to True, network weights can be automatically downloaded. + + +```python +from mindcv.models import create_model +model = create_model(model_name='resnet50', num_classes=1000, pretrained=True) +# Switch the execution logic of the network to the inference scenario +model.set_train(False) +``` + 102453248B [00:16, 6092186.31B/s] + + ResNet< + (conv1): Conv2d + (bn1): BatchNorm2d + (relu): ReLU<> + (max_pool): MaxPool2d + ... + (pool): GlobalAvgPooling<> + (classifier): Dense + > + + + +2). When the `checkpoint_path` parameter in the interface is set to the file path, the model parameter file with the `.ckpt` can be loaded. + + +```python +from mindcv.models import create_model +model = create_model(model_name='resnet50', num_classes=1000, checkpoint_path='./resnet50_224.ckpt') +# Switch the execution logic of the network to the inference scenario +model.set_train(False) +``` + + +## Data Preparation + +### Create Dataset + +Here, we download a Wikipedia image as a test image, and use the `create_dataset` function in `mindcv.data` to construct a custom dataset for a single image. + + +```python +from mindcv.data import create_dataset +num_workers = 1 +# path of dataset +data_dir = "./data/" +dataset = create_dataset(root=data_dir, split='test', num_parallel_workers=num_workers) +# Image visualization +from PIL import Image +Image.open("./data/test/dog/dog.jpg") +``` + +![png](output_8_0.png) + + +### Data Preprocessing + +Call the `create_transforms` function to obtain the data processing strategy (transform list) of the ImageNet dataset used by the pre training model. + +We pass the obtained transform list into the `create_loader` function, specify `batch_size=1` and other parameters, and then complete the preparation of test data. The `Dataset` object is returned as the input of the model. + + +```python +from mindcv.data import create_transforms, create_loader +transforms_list = create_transforms(dataset_name='imagenet', is_training=False) +data_loader = create_loader( + dataset=dataset, + batch_size=1, + is_training=False, + num_classes=1000, + transform=transforms_list, + num_parallel_workers=num_workers + ) +``` + +## Model Inference +The picture of the user-defined dataset is transferred to the model to obtain the inference result. Here, use the `Squeeze` function of `mindspore.ops` to remove the batch dimension. + + + +```python +import mindspore.ops as P +import numpy as np +images, _ = next(data_loader.create_tuple_iterator()) +output = P.Squeeze()(model(images)) +pred = np.argmax(output.asnumpy()) +``` + + +```python +with open("imagenet1000_clsidx_to_labels.txt") as f: + idx2label = eval(f.read()) +print('predict: {}'.format(idx2label[pred])) +``` + + predict: Labrador retriever diff --git a/tutorials/inference_CN.md b/tutorials/inference_CN.md new file mode 100644 index 0000000..4096ca2 --- /dev/null +++ b/tutorials/inference_CN.md @@ -0,0 +1,159 @@ +# 图像分类预测 + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/inference_CN.ipynb)  + + +本教程介绍如何在MindCV中调用预训练模型,在测试图像上进行分类预测。 + +## 模型加载 + +### 查看全部可用的网络模型 +通过调用`mindcv.models`中的`registry.list_models`函数,可以打印出全部网络模型的名字,一个网络在不同参数配置下的模型也会分别打印出来,例如resnet18 / resnet34 / resnet50 / resnet101 / resnet152。 + + +```python +import sys +sys.path.append("..") +from mindcv.models import registry +registry.list_models() +``` + + + + + ['BiTresnet50', + 'RepMLPNet_B224', + 'RepMLPNet_B256', + 'RepMLPNet_D256', + 'RepMLPNet_L256', + 'RepMLPNet_T224', + 'RepMLPNet_T256', + 'convit_base', + 'convit_base_plus', + 'convit_small', + ... + 'visformer_small', + 'visformer_small_v2', + 'visformer_tiny', + 'visformer_tiny_v2', + 'vit_b_16_224', + 'vit_b_16_384', + 'vit_b_32_224', + 'vit_b_32_384', + 'vit_l_16_224', + 'vit_l_16_384', + 'vit_l_32_224', + 'xception'] + + + +### 加载预训练模型 +我们以resnet50模型为例,介绍两种使用`mindcv.models`中`create_model`函数进行模型checkpoint加载的方法。 +1). 当接口中的`pretrained`参数设置为True时,可以自动下载网络权重。 + + +```python +from mindcv.models import create_model +model = create_model(model_name='resnet50', num_classes=1000, pretrained=True) +# 切换网络的执行逻辑为推理场景 +model.set_train(False) +``` + 102453248B [00:16, 6092186.31B/s] + + ResNet< + (conv1): Conv2d + (bn1): BatchNorm2d + (relu): ReLU<> + (max_pool): MaxPool2d + ... + (pool): GlobalAvgPooling<> + (classifier): Dense + > + + + +2). 当接口中的`checkpoint_path`参数设置为文件路径时,可以从本地加载后缀为`.ckpt`的模型参数文件。 + + +```python +from mindcv.models import create_model +model = create_model(model_name='resnet50', num_classes=1000, checkpoint_path='./resnet50_224.ckpt') +# 切换网络的执行逻辑为推理场景 +model.set_train(False) +``` + + +## 数据准备 + +### 构造数据集 +这里,我们下载一张Wikipedia的图片作为测试图片,使用`mindcv.data`中的`create_dataset`函数,为单张图片构造自定义数据集。 + + +```python +from mindcv.data import create_dataset +num_workers = 1 +# 数据集目录路径 +data_dir = "./data/" +dataset = create_dataset(root=data_dir, split='test', num_parallel_workers=num_workers) +# 图像可视 +from PIL import Image +Image.open("./data/test/dog/dog.jpg") +``` + + + + +![png](output_8_0.png) + + + +数据集的目录结构如下: + +```Text +data/ +└─ test + ├─ dog + │   ├─ dog.jpg + │   └─ …… + └─ …… +``` + +### 数据预处理 +通过调用`create_transforms`函数,获得预训练模型使用的ImageNet数据集的数据处理策略(transform list)。 + +我们将得到的transform list传入`create_loader`函数,指定`batch_size=1`和其他参数,即可完成测试数据的准备,返回`Dataset` Object,作为模型的输入。 + + +```python +from mindcv.data import create_transforms, create_loader +transforms_list = create_transforms(dataset_name='imagenet', is_training=False) +data_loader = create_loader( + dataset=dataset, + batch_size=1, + is_training=False, + num_classes=1000, + transform=transforms_list, + num_parallel_workers=num_workers + ) +``` + +## 模型推理 +将自定义数据集的图片传入模型,获得推理的结果。这里使用`mindspore.ops`的`Squeeze`函数去除batch维度。 + + +```python +import mindspore.ops as P +import numpy as np +images, _ = next(data_loader.create_tuple_iterator()) +output = P.Squeeze()(model(images)) +pred = np.argmax(output.asnumpy()) +``` + + +```python +with open("imagenet1000_clsidx_to_labels.txt") as f: + idx2label = eval(f.read()) +print('predict: {}'.format(idx2label[pred])) +``` + + predict: Labrador retriever diff --git a/tutorials/learn_about_config.md b/tutorials/learn_about_config.md new file mode 100644 index 0000000..2049516 --- /dev/null +++ b/tutorials/learn_about_config.md @@ -0,0 +1,420 @@ +# Understanding Model Configuration + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/learn_about_config.ipynb)  + + +`mindcv` can parse the yaml file of the model through the `argparse` library and `pyyaml` library to configure parameters. Let's use squeezenet_1.0 model as an example to explain how to configure the corresponding parameters. + + +## Basic Environment + +1. Parameter description + +- mode: Use graph mode (0) or pynative mode (1). + +- distribute: Whether to use distributed. + + +2. Sample yaml file + +```text +mode: 0 +distribute: True +... +``` + +3. Parse parameter setting + +```text +python train.py --mode 0 --distribute False ... +``` + +4. Corresponding code example + +> `args.model` represents the parameter `mode`, `args.distribute` represents the parameter `distribute`。 + +```python +def train(args): + ms.set_context(mode=args.mode) + + if args.distribute: + init() + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context(device_num=device_num, + parallel_mode='data_parallel', + gradients_mean=True) + else: + device_num = None + rank_id = None + ... +``` + + +## Dataset + +1. Parameter description + +- dataset: dataset name. + +- data_dir: Path of dataset file. + +- shuffle: whether to shuffle the dataset. + +- dataset_download: whether to download the dataset. + +- batch_size: The number of rows each batch. + +- drop_remainder: Determines whether to drop the last block whose data row number is less than batch size. + +- num_parallel_workers: Number of workers(threads) to process the dataset in parallel. + + +2. Sample yaml file + +```text +dataset: 'imagenet' +data_dir: './imagenet2012' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +num_parallel_workers: 8 +... +``` + +3. Parse parameter setting + +```text +python train.py ... --dataset imagenet --data_dir ./imagenet2012 --shuffle True \ + --dataset_download False --batch_size 32 --drop_remainder True \ + --num_parallel_workers 8 ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + dataset_train = create_dataset( + name=args.dataset, + root=args.data_dir, + split='train', + shuffle=args.shuffle, + num_samples=args.num_samples, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=args.num_parallel_workers, + download=args.dataset_download, + num_aug_repeats=args.aug_repeats) + + ... + target_transform = transforms.OneHot(num_classes) if args.loss == 'BCE' else None + + loader_train = create_loader( + dataset=dataset_train, + batch_size=args.batch_size, + drop_remainder=args.drop_remainder, + is_training=True, + mixup=args.mixup, + cutmix=args.cutmix, + cutmix_prob=args.cutmix_prob, + num_classes=args.num_classes, + transform=transform_list, + target_transform=target_transform, + num_parallel_workers=args.num_parallel_workers, + ) + + ... +``` + +## Data Augmentation + +1. Parameter description + +- image_resize: the image size after resize for adapting to network. + +- scale: random resize scale. + +- ratio: random resize aspect ratio. + +- hfilp: horizontal flip training aug probability. + +- interpolation: image interpolation mode for resize operator. + +- crop_pct: input image center crop percent. + +- color_jitter: color jitter factor. + +- re_prob: probability of performing erasing. + +2. Sample yaml file + +```text +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 0.5 +... +``` + +3. Parse parameter setting + +```text +python train.py ... --image_resize 224 --scale [0.08, 1.0] --ratio [0.75, 1.333] \ + --hflip 0.5 --interpolation "bilinear" --crop_pct 0.875 \ + --color_jitter [0.4, 0.4, 0.4] --re_prob 0.5 ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + transform_list = create_transforms( + dataset_name=args.dataset, + is_training=True, + image_resize=args.image_resize, + scale=args.scale, + ratio=args.ratio, + hflip=args.hflip, + vflip=args.vflip, + color_jitter=args.color_jitter, + interpolation=args.interpolation, + auto_augment=args.auto_augment, + mean=args.mean, + std=args.std, + re_prob=args.re_prob, + re_scale=args.re_scale, + re_ratio=args.re_ratio, + re_value=args.re_value, + re_max_attempts=args.re_max_attempts + ) + ... +``` + +## Model + +1. Parameter description + +- model: model name。 + +- num_classes: number of label classes.。 + +- pretrained: whether load pretrained model。 + +- ckpt_path: initialize model from this checkpoint.。 + +- keep_checkpoint_max: max number of checkpoint files。 + +- ckpt_save_dir: path of checkpoint. + +- epoch_size: train epoch size. + +- dataset_sink_mode: the dataset sink mode。 + +- amp_level: auto mixed precision level for saving memory and acceleration. + +2. Sample yaml file + +```text +model: 'squeezenet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: './squeezenet1_0_gpu.ckpt' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt/' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' +... +``` + +3. Parse parameter setting + +```text +python train.py ... --model squeezenet1_0 --num_classes 1000 --pretrained False \ + --ckpt_path ./squeezenet1_0_gpu.ckpt --keep_checkpoint_max 10 \ + --ckpt_save_path ./ckpt/ --epoch_size 200 --dataset_sink_mode True \ + --amp_level O0 ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + network = create_model(model_name=args.model, + num_classes=args.num_classes, + in_channels=args.in_channels, + drop_rate=args.drop_rate, + drop_path_rate=args.drop_path_rate, + pretrained=args.pretrained, + checkpoint_path=args.ckpt_path, + ema=args.ema) + ... +``` + +## Loss Function + +1. Parameter description + +- loss: name of loss function, BCE (BinaryCrossEntropy) or CE (CrossEntropy). + +- label_smoothing: use label smoothing. + +2. Sample yaml file + +```text +loss: 'CE' +label_smoothing: 0.1 +... +``` + +3. Parse parameter setting + +```text +python train.py ... --loss CE --label_smoothing 0.1 ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + loss = create_loss(name=args.loss, + reduction=args.reduction, + label_smoothing=args.label_smoothing, + aux_factor=args.aux_factor) + ... +``` + +## Learning Rate Scheduler + +1. Parameter description + +- scheduler: name of scheduler. + +- min_lr: the minimum value of learning rate if scheduler supports. + +- lr: learning rate. + +- warmup_epochs: warmup epochs. + +- decay_epochs: decay epochs. + +2. Sample yaml file + +```text +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.01 +warmup_epochs: 0 +decay_epochs: 200 +... +``` + +3. Parse parameter setting + +```text +python train.py ... --scheduler cosine_decay --min_lr 0.0 --lr 0.01 \ + --warmup_epochs 0 --decay_epochs 200 ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + lr_scheduler = create_scheduler(num_batches, + scheduler=args.scheduler, + lr=args.lr, + min_lr=args.min_lr, + warmup_epochs=args.warmup_epochs, + warmup_factor=args.warmup_factor, + decay_epochs=args.decay_epochs, + decay_rate=args.decay_rate, + milestones=args.multi_step_decay_milestones, + num_epochs=args.epoch_size, + lr_epoch_stair=args.lr_epoch_stair) + ... +``` + + +## optimizer + +1. Parameter description + +- opt: name of optimizer。 + +- filter_bias_and_bn: filter Bias and BatchNorm. + +- momentum: Hyperparameter of type float, means momentum for the moving average. + +- weight_decay: weight decay(L2 penalty)。 + +- loss_scale: gradient scaling factor + +- use_nesterov: whether enables the Nesterov momentum + +2. Sample yaml file + +```text +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00007 +loss_scale: 1024 +use_nesterov: False +... +``` + +3. Parse parameter setting + +```text +python train.py ... --opt momentum --filter_bias_and_bn True --weight_decay 0.00007 \ + --loss_scale 1024 --use_nesterov False ... +``` + +4. Corresponding code example + +```python +def train(args): + ... + if args.ema: + optimizer = create_optimizer(network.trainable_params(), + opt=args.opt, + lr=lr_scheduler, + weight_decay=args.weight_decay, + momentum=args.momentum, + nesterov=args.use_nesterov, + filter_bias_and_bn=args.filter_bias_and_bn, + loss_scale=args.loss_scale, + checkpoint_path=opt_ckpt_path, + eps=args.eps) + else: + optimizer = create_optimizer(network.trainable_params(), + opt=args.opt, + lr=lr_scheduler, + weight_decay=args.weight_decay, + momentum=args.momentum, + nesterov=args.use_nesterov, + filter_bias_and_bn=args.filter_bias_and_bn, + checkpoint_path=opt_ckpt_path, + eps=args.eps) + ... +``` + + +## Combination of Yaml and Parse + +You can override the parameter settings in the yaml file by using parse to set parameters. Take the following shell command as an example, + +```shell +python train.py -c ./configs/squeezenet/squeezenet_1.0_gpu.yaml --data_dir ./data +``` +The above command overwrites the value of `args.data_dir` parameter from ./imaget2012 in yaml file to ./data. diff --git a/tutorials/learn_about_config_CN.md b/tutorials/learn_about_config_CN.md new file mode 100644 index 0000000..f8596cf --- /dev/null +++ b/tutorials/learn_about_config_CN.md @@ -0,0 +1,420 @@ +# 了解模型配置 + +[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_notebook.png)](https://download.mindspore.cn/toolkits/mindcv/tutorials/learn_about_config_CN.ipynb)  + + +`mindcv`套件可以通过python的`argparse`库和`pyyaml`库解析模型的yaml文件来进行参数的配置,下面我们以squeezenet_1.0模型为例,解释如何配置相应的参数。 + + +## 基础环境 + +1. 参数说明 + +- mode:使用静态图模式(0)或动态图模式(1)。 + +- distribute:是否使用分布式。 + + +2. yaml文件样例 + +```text +mode: 0 +distribute: True +... +``` + +3. parse参数设置 + +```text +python train.py --mode 0 --distribute False ... +``` + +4. 对应的代码示例 + +> `args.model`代表参数`mode`, `args.distribute`代表参数`distribute`。 + +```python +def train(args): + ms.set_context(mode=args.mode) + + if args.distribute: + init() + device_num = get_group_size() + rank_id = get_rank() + ms.set_auto_parallel_context(device_num=device_num, + parallel_mode='data_parallel', + gradients_mean=True) + else: + device_num = None + rank_id = None + ... +``` + + +## 数据集 + +1. 参数说明 + +- dataset:数据集名称。 + +- data_dir:数据集文件所在路径。 + +- shuffle:是否进行数据混洗。 + +- dataset_download:是否下载数据集。 + +- batch_size:每个批处理数据包含的数据条目。 + +- drop_remainder:当最后一个批处理数据包含的数据条目小于 batch_size 时,是否将该批处理丢弃。 + +- num_parallel_workers:读取数据的工作线程数。 + + +2. yaml文件样例 + +```text +dataset: 'imagenet' +data_dir: './imagenet2012' +shuffle: True +dataset_download: False +batch_size: 32 +drop_remainder: True +num_parallel_workers: 8 +... +``` + +3. parse参数设置 + +```text +python train.py ... --dataset imagenet --data_dir ./imagenet2012 --shuffle True \ + --dataset_download False --batch_size 32 --drop_remainder True \ + --num_parallel_workers 8 ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + dataset_train = create_dataset( + name=args.dataset, + root=args.data_dir, + split='train', + shuffle=args.shuffle, + num_samples=args.num_samples, + num_shards=device_num, + shard_id=rank_id, + num_parallel_workers=args.num_parallel_workers, + download=args.dataset_download, + num_aug_repeats=args.aug_repeats) + + ... + target_transform = transforms.OneHot(num_classes) if args.loss == 'BCE' else None + + loader_train = create_loader( + dataset=dataset_train, + batch_size=args.batch_size, + drop_remainder=args.drop_remainder, + is_training=True, + mixup=args.mixup, + cutmix=args.cutmix, + cutmix_prob=args.cutmix_prob, + num_classes=args.num_classes, + transform=transform_list, + target_transform=target_transform, + num_parallel_workers=args.num_parallel_workers, + ) + + ... +``` + +## 数据增强 + +1. 参数说明 + +- image_resize:图像的输出尺寸大小。 + +- scale:要裁剪的原始尺寸大小的各个尺寸的范围。 + +- ratio:裁剪宽高比的范围。 + +- hfilp:图像被翻转的概率。 + +- interpolation:图像插值方式。 + +- crop_pct:输入图像中心裁剪百分比。 + +- color_jitter:颜色抖动因子(亮度调整因子,对比度调整因子,饱和度调整因子)。 + +- re_prob:执行随机擦除的概率。 + +2. yaml文件样例 + +```text +image_resize: 224 +scale: [0.08, 1.0] +ratio: [0.75, 1.333] +hflip: 0.5 +interpolation: 'bilinear' +crop_pct: 0.875 +color_jitter: [0.4, 0.4, 0.4] +re_prob: 0.5 +... +``` + +3. parse参数设置 + +```text +python train.py ... --image_resize 224 --scale [0.08, 1.0] --ratio [0.75, 1.333] \ + --hflip 0.5 --interpolation "bilinear" --crop_pct 0.875 \ + --color_jitter [0.4, 0.4, 0.4] --re_prob 0.5 ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + transform_list = create_transforms( + dataset_name=args.dataset, + is_training=True, + image_resize=args.image_resize, + scale=args.scale, + ratio=args.ratio, + hflip=args.hflip, + vflip=args.vflip, + color_jitter=args.color_jitter, + interpolation=args.interpolation, + auto_augment=args.auto_augment, + mean=args.mean, + std=args.std, + re_prob=args.re_prob, + re_scale=args.re_scale, + re_ratio=args.re_ratio, + re_value=args.re_value, + re_max_attempts=args.re_max_attempts + ) + ... +``` + +## 模型 + +1. 参数说明 + +- model:模型名称。 + +- num_classes:分类的类别数。 + +- pretrained:是否加载预训练模型。 + +- ckpt_path:参数文件所在的路径。 + +- keep_checkpoint_max:最多保存多少个checkpoint文件。 + +- ckpt_save_dir:保存参数文件的路径。 + +- epoch_size:训练执行轮次。 + +- dataset_sink_mode:数据是否直接下沉至处理器进行处理。 + +- amp_level:混合精度等级。 + +2. yaml文件样例 + +```text +model: 'squeezenet1_0' +num_classes: 1000 +pretrained: False +ckpt_path: './squeezenet1_0_gpu.ckpt' +keep_checkpoint_max: 10 +ckpt_save_dir: './ckpt/' +epoch_size: 200 +dataset_sink_mode: True +amp_level: 'O0' +... +``` + +3. parse参数设置 + +```text +python train.py ... --model squeezenet1_0 --num_classes 1000 --pretrained False \ + --ckpt_path ./squeezenet1_0_gpu.ckpt --keep_checkpoint_max 10 \ + --ckpt_save_path ./ckpt/ --epoch_size 200 --dataset_sink_mode True \ + --amp_level O0 ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + network = create_model(model_name=args.model, + num_classes=args.num_classes, + in_channels=args.in_channels, + drop_rate=args.drop_rate, + drop_path_rate=args.drop_path_rate, + pretrained=args.pretrained, + checkpoint_path=args.ckpt_path, + ema=args.ema) + ... +``` + +## 损失函数 + +1. 参数说明 + +- loss:损失函数的简称。 + +- label_smoothing:标签平滑值,用于计算Loss时防止模型过拟合的正则化手段。 + +2. yaml文件样例 + +```text +loss: 'CE' +label_smoothing: 0.1 +... +``` + +3. parse参数设置 + +```text +python train.py ... --loss CE --label_smoothing 0.1 ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + loss = create_loss(name=args.loss, + reduction=args.reduction, + label_smoothing=args.label_smoothing, + aux_factor=args.aux_factor) + ... +``` + +## 学习率策略 + +1. 参数说明 + +- scheduler:学习率策略的名称。 + +- min_lr:学习率的最小值。 + +- lr:学习率的最大值。 + +- warmup_epochs:学习率warmup的轮次。 + +- decay_epochs:进行衰减的step数。 + +2. yaml文件样例 + +```text +scheduler: 'cosine_decay' +min_lr: 0.0 +lr: 0.01 +warmup_epochs: 0 +decay_epochs: 200 +... +``` + +3. parse参数设置 + +```text +python train.py ... --scheduler cosine_decay --min_lr 0.0 --lr 0.01 \ + --warmup_epochs 0 --decay_epochs 200 ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + lr_scheduler = create_scheduler(num_batches, + scheduler=args.scheduler, + lr=args.lr, + min_lr=args.min_lr, + warmup_epochs=args.warmup_epochs, + warmup_factor=args.warmup_factor, + decay_epochs=args.decay_epochs, + decay_rate=args.decay_rate, + milestones=args.multi_step_decay_milestones, + num_epochs=args.epoch_size, + lr_epoch_stair=args.lr_epoch_stair) + ... +``` + + +## 优化器 + +1. 参数说明 + +- opt:优化器名称。 + +- filter_bias_and_bn:参数中是否包含bias,gamma或者beta。 + +- momentum:移动平均的动量。 + +- weight_decay:权重衰减(L2 penalty)。 + +- loss_scale:梯度缩放系数 + +- use_nesterov:是否使用Nesterov Accelerated Gradient (NAG)算法更新梯度。 + +2. yaml文件样例 + +```text +opt: 'momentum' +filter_bias_and_bn: True +momentum: 0.9 +weight_decay: 0.00007 +loss_scale: 1024 +use_nesterov: False +... +``` + +3. parse参数设置 + +```text +python train.py ... --opt momentum --filter_bias_and_bn True --weight_decay 0.00007 \ + --loss_scale 1024 --use_nesterov False ... +``` + +4. 对应的代码示例 + +```python +def train(args): + ... + if args.ema: + optimizer = create_optimizer(network.trainable_params(), + opt=args.opt, + lr=lr_scheduler, + weight_decay=args.weight_decay, + momentum=args.momentum, + nesterov=args.use_nesterov, + filter_bias_and_bn=args.filter_bias_and_bn, + loss_scale=args.loss_scale, + checkpoint_path=opt_ckpt_path, + eps=args.eps) + else: + optimizer = create_optimizer(network.trainable_params(), + opt=args.opt, + lr=lr_scheduler, + weight_decay=args.weight_decay, + momentum=args.momentum, + nesterov=args.use_nesterov, + filter_bias_and_bn=args.filter_bias_and_bn, + checkpoint_path=opt_ckpt_path, + eps=args.eps) + ... +``` + + +## Yaml和Parse组合使用 + +使用parse设置参数可以覆盖yaml文件中的参数设置。以下面的shell命令为例, + +```shell +python train.py -c ./configs/squeezenet/squeezenet_1.0_gpu.yaml --data_dir ./data +``` +上面的命令将`args.data_dir`参数的值由yaml文件中的 ./imagenet2012 覆盖为 ./data。 diff --git a/tutorials/output_11_0.png b/tutorials/output_11_0.png new file mode 100644 index 0000000..178a490 Binary files /dev/null and b/tutorials/output_11_0.png differ diff --git a/tutorials/output_23_0.png b/tutorials/output_23_0.png new file mode 100644 index 0000000..ad46ec5 Binary files /dev/null and b/tutorials/output_23_0.png differ diff --git a/tutorials/output_30_0.png b/tutorials/output_30_0.png new file mode 100644 index 0000000..17d482c Binary files /dev/null and b/tutorials/output_30_0.png differ diff --git a/tutorials/output_8_0.png b/tutorials/output_8_0.png new file mode 100644 index 0000000..038c4df Binary files /dev/null and b/tutorials/output_8_0.png differ