forked from mindspore-Ecosystem/mindspore
Add model_zoo net Densenet121
This commit is contained in:
parent
034453e48f
commit
f6a7916ca5
|
@ -0,0 +1,272 @@
|
||||||
|
# Contents
|
||||||
|
|
||||||
|
- [DenseNet121 Description](#densenet121-description)
|
||||||
|
- [Model Architecture](#model-architecture)
|
||||||
|
- [Dataset](#dataset)
|
||||||
|
- [Features](#features)
|
||||||
|
- [Mixed Precision](#mixed-precision)
|
||||||
|
- [Environment Requirements](#environment-requirements)
|
||||||
|
- [Quick Start](#quick-start)
|
||||||
|
- [Script Description](#script-description)
|
||||||
|
- [Script and Sample Code](#script-and-sample-code)
|
||||||
|
- [Script Parameters](#script-parameters)
|
||||||
|
- [Training Process](#training-process)
|
||||||
|
- [Training](#training)
|
||||||
|
- [Distributed Training](#distributed-training)
|
||||||
|
- [Evaluation Process](#evaluation-process)
|
||||||
|
- [Evaluation](#evaluation)
|
||||||
|
- [Model Description](#model-description)
|
||||||
|
- [Performance](#performance)
|
||||||
|
- [Training accuracy results](#training-accuracy-results)
|
||||||
|
- [Training performance results](#yraining-performance-results)
|
||||||
|
- [Description of Random Situation](#description-of-random-situation)
|
||||||
|
- [ModelZoo Homepage](#modelzoo-homepage)
|
||||||
|
|
||||||
|
|
||||||
|
# [DenseNet121 Description](#contents)
|
||||||
|
|
||||||
|
DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/).
|
||||||
|
|
||||||
|
The repository also contains scripts to launch training and inference routines.
|
||||||
|
|
||||||
|
# [Model Architecture](#contents)
|
||||||
|
|
||||||
|
DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Dataset](#contents)
|
||||||
|
|
||||||
|
Dataset used: ImageNet
|
||||||
|
The default configuration of the Dataset are as follows:
|
||||||
|
- Training Dataset preprocess:
|
||||||
|
- Input size of images is 224\*224
|
||||||
|
- Range (min, max) of respective size of the original size to be cropped is (0.08, 1.0)
|
||||||
|
- Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
|
||||||
|
- Probability of the image being flipped set to 0.5
|
||||||
|
- Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
|
||||||
|
- Normalize the input image with respect to mean and standard deviation
|
||||||
|
|
||||||
|
- Test Dataset preprocess:
|
||||||
|
- Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
|
||||||
|
- Normalize the input image with respect to mean and standard deviation
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Features](#contents)
|
||||||
|
|
||||||
|
## Mixed Precision
|
||||||
|
|
||||||
|
The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
|
||||||
|
For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Environment Requirements](#contents)
|
||||||
|
|
||||||
|
- Hardware(Ascend)
|
||||||
|
- Prepare hardware environment with Ascend AI processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
|
||||||
|
- Framework
|
||||||
|
- [MindSpore](https://www.mindspore.cn/install/en)
|
||||||
|
- For more information, please check the resources below:
|
||||||
|
- [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html)
|
||||||
|
- [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Quick Start](#contents)
|
||||||
|
|
||||||
|
After installing MindSpore via the official website, you can start training and evaluation as follows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# run training example
|
||||||
|
python train.py --data_dir /PATH/TO/DATASET --is_distributed 0> train.log 2>&1 &
|
||||||
|
|
||||||
|
# run distributed training example
|
||||||
|
sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
|
||||||
|
|
||||||
|
# run evaluation example
|
||||||
|
python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 &
|
||||||
|
OR
|
||||||
|
sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
|
||||||
|
```
|
||||||
|
|
||||||
|
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
|
||||||
|
|
||||||
|
Please follow the instructions in the link below:
|
||||||
|
|
||||||
|
https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Script Description](#contents)
|
||||||
|
|
||||||
|
## [Script and Sample Code](#contents)
|
||||||
|
|
||||||
|
```
|
||||||
|
├── model_zoo
|
||||||
|
├── README.md // descriptions about all the models
|
||||||
|
├── densenet121
|
||||||
|
├── README.md // descriptions about densenet121
|
||||||
|
├── scripts
|
||||||
|
│ ├── run_distribute_train.sh // shell script for distributed on Ascend
|
||||||
|
│ ├── run_distribute_eval.sh // shell script for evaluation on Ascend
|
||||||
|
├── src
|
||||||
|
│ ├── datasets // dataset processing function
|
||||||
|
│ ├── losses
|
||||||
|
│ ├──crossentropy.py // densenet loss function
|
||||||
|
│ ├── lr_scheduler
|
||||||
|
│ ├──lr_scheduler.py // densenet learning rate schedule function
|
||||||
|
│ ├── network
|
||||||
|
│ ├──densenet.py // densenet architecture
|
||||||
|
│ ├──optimizers // densenet optimize function
|
||||||
|
│ ├──utils
|
||||||
|
│ ├──logging.py // logging function
|
||||||
|
│ ├──var_init.py // densenet variable init function
|
||||||
|
│ ├── config.py // network config
|
||||||
|
├── train.py // training script
|
||||||
|
├── eval.py // evaluation script
|
||||||
|
```
|
||||||
|
|
||||||
|
## [Script Parameters](#contents)
|
||||||
|
|
||||||
|
You can modify the training behaviour through the various flags in the `train.py` script. Flags in the `train.py` script are as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
--data_dir train data dir
|
||||||
|
--num_classes num of classes in dataset(default:1000)
|
||||||
|
--image_size image size of the dataset
|
||||||
|
--per_batch_size mini-batch size (default: 256) per gpu
|
||||||
|
--pretrained path of pretrained model
|
||||||
|
--lr_scheduler type of LR schedule: exponential, cosine_annealing
|
||||||
|
--lr initial learning rate
|
||||||
|
--lr_epochs epoch milestone of lr changing
|
||||||
|
--lr_gamma decrease lr by a factor of exponential lr_scheduler
|
||||||
|
--eta_min eta_min in cosine_annealing scheduler
|
||||||
|
--T_max T_max in cosine_annealing scheduler
|
||||||
|
--max_epoch max epoch num to train the model
|
||||||
|
--warmup_epochs warmup epoch(when batchsize is large)
|
||||||
|
--weight_decay weight decay (default: 1e-4)
|
||||||
|
--momentum momentum(default: 0.9)
|
||||||
|
--label_smooth whether to use label smooth in CE
|
||||||
|
--label_smooth_factor smooth strength of original one-hot
|
||||||
|
--log_interval logging interval(dafault:100)
|
||||||
|
--ckpt_path path to save checkpoint
|
||||||
|
--ckpt_interval the interval to save checkpoint
|
||||||
|
--is_save_on_master save checkpoint on master or all rank
|
||||||
|
--is_distributed if multi device(default: 1)
|
||||||
|
--rank local rank of distributed(default: 0)
|
||||||
|
--group_size world size of distributed(default: 1)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## [Training Process](#contents)
|
||||||
|
|
||||||
|
### Training
|
||||||
|
|
||||||
|
- running on Ascend
|
||||||
|
|
||||||
|
```
|
||||||
|
python train.py --data_dir /PATH/TO/DATASET --is_distributed 0> train.log 2>&1 &
|
||||||
|
```
|
||||||
|
|
||||||
|
The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
|
||||||
|
2020-08-22 16:58:56,619:INFO:local passed
|
||||||
|
2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
|
||||||
|
2020-08-22 17:02:19,921:INFO:local passed
|
||||||
|
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
|
||||||
|
2020-08-22 17:05:43,113:INFO:local passed
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Distributed Training
|
||||||
|
|
||||||
|
- running on Ascend
|
||||||
|
|
||||||
|
```
|
||||||
|
sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
|
||||||
|
```
|
||||||
|
|
||||||
|
The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `LOG[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
|
||||||
|
2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
|
||||||
|
2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
|
||||||
|
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
|
||||||
|
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
|
||||||
|
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
|
||||||
|
...
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## [Evaluation Process](#contents)
|
||||||
|
|
||||||
|
### Evaluation
|
||||||
|
|
||||||
|
- evaluation on Ascend
|
||||||
|
|
||||||
|
running the command below for evaluation.
|
||||||
|
|
||||||
|
```
|
||||||
|
python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 &
|
||||||
|
OR
|
||||||
|
sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
|
||||||
|
```
|
||||||
|
|
||||||
|
The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
|
||||||
|
2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Model Description](#contents)
|
||||||
|
## [Performance](#contents)
|
||||||
|
|
||||||
|
### Training accuracy results
|
||||||
|
|
||||||
|
| Parameters | Densenet |
|
||||||
|
| ------------------- | --------------------------- |
|
||||||
|
| Model Version | Inception V1 |
|
||||||
|
| Resource | Ascend 910 |
|
||||||
|
| Uploaded Date | 08/28/2020 (month/day/year) |
|
||||||
|
| MindSpore Version | 0.5.0-alpha |
|
||||||
|
| Dataset | ImageNet |
|
||||||
|
| epochs | 120 |
|
||||||
|
| outputs | probability |
|
||||||
|
| train performance | Top1:75.13%; Top5:92.57% |
|
||||||
|
|
||||||
|
### Training performance results
|
||||||
|
|
||||||
|
| Parameters | Densenet |
|
||||||
|
| ------------------- | --------------------------- |
|
||||||
|
| Model Version | Inception V1 |
|
||||||
|
| Resource | Ascend 910 |
|
||||||
|
| Uploaded Date | 08/28/2020 (month/day/year) |
|
||||||
|
| MindSpore Version | 0.5.0-alpha |
|
||||||
|
| Dataset | ImageNet |
|
||||||
|
| batch_size | 32 |
|
||||||
|
| outputs | probability |
|
||||||
|
| speed | 1pc:760 img/s;8pc:6000 img/s|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# [Description of Random Situation](#contents)
|
||||||
|
|
||||||
|
In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
|
||||||
|
|
||||||
|
|
||||||
|
# [ModelZoo Homepage](#contents)
|
||||||
|
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
|
|
@ -0,0 +1,247 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
##############test densenet example#################
|
||||||
|
python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import argparse
|
||||||
|
import datetime
|
||||||
|
import glob
|
||||||
|
import numpy as np
|
||||||
|
from mindspore import context
|
||||||
|
|
||||||
|
import mindspore.nn as nn
|
||||||
|
from mindspore import Tensor
|
||||||
|
from mindspore.communication.management import init, get_rank, get_group_size, release
|
||||||
|
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||||
|
from mindspore.ops import operations as P
|
||||||
|
from mindspore.ops import functional as F
|
||||||
|
from mindspore.common import dtype as mstype
|
||||||
|
|
||||||
|
from src.utils.logging import get_logger
|
||||||
|
from src.datasets import classification_dataset
|
||||||
|
from src.network import DenseNet121
|
||||||
|
from src.config import config
|
||||||
|
|
||||||
|
devid = int(os.getenv('DEVICE_ID'))
|
||||||
|
context.set_context(mode=context.GRAPH_MODE, device_target="Davinci",
|
||||||
|
save_graphs=True, device_id=devid)
|
||||||
|
|
||||||
|
|
||||||
|
class ParameterReduce(nn.Cell):
|
||||||
|
"""
|
||||||
|
reduce parameter
|
||||||
|
"""
|
||||||
|
def __init__(self):
|
||||||
|
super(ParameterReduce, self).__init__()
|
||||||
|
self.cast = P.Cast()
|
||||||
|
self.reduce = P.AllReduce()
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
one = self.cast(F.scalar_to_array(1.0), mstype.float32)
|
||||||
|
out = x * one
|
||||||
|
ret = self.reduce(out)
|
||||||
|
return ret
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args(cloud_args=None):
|
||||||
|
"""
|
||||||
|
parse args
|
||||||
|
"""
|
||||||
|
parser = argparse.ArgumentParser('mindspore classification test')
|
||||||
|
|
||||||
|
# dataset related
|
||||||
|
parser.add_argument('--data_dir', type=str, default='', help='eval data dir')
|
||||||
|
parser.add_argument('--num_classes', type=int, default=1000, help='num of classes in dataset')
|
||||||
|
parser.add_argument('--image_size', type=str, default='224,224', help='image size of the dataset')
|
||||||
|
# network related
|
||||||
|
parser.add_argument('--backbone', default='resnet50', help='backbone')
|
||||||
|
parser.add_argument('--pretrained', default='', type=str, help='fully path of pretrained model to load.'
|
||||||
|
'If it is a direction, it will test all ckpt')
|
||||||
|
|
||||||
|
# logging related
|
||||||
|
parser.add_argument('--log_path', type=str, default='outputs/', help='path to save log')
|
||||||
|
parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')
|
||||||
|
parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
|
||||||
|
parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
|
||||||
|
|
||||||
|
# roma obs
|
||||||
|
parser.add_argument('--train_url', type=str, default="", help='train url')
|
||||||
|
|
||||||
|
args, _ = parser.parse_known_args()
|
||||||
|
args = merge_args(args, cloud_args)
|
||||||
|
|
||||||
|
args.per_batch_size = config.per_batch_size
|
||||||
|
args.image_size = list(map(int, args.image_size.split(',')))
|
||||||
|
|
||||||
|
return args
|
||||||
|
|
||||||
|
|
||||||
|
def get_top5_acc(top5_arg, gt_class):
|
||||||
|
sub_count = 0
|
||||||
|
for top5, gt in zip(top5_arg, gt_class):
|
||||||
|
if gt in top5:
|
||||||
|
sub_count += 1
|
||||||
|
return sub_count
|
||||||
|
|
||||||
|
def merge_args(args, cloud_args):
|
||||||
|
"""
|
||||||
|
merge args and cloud_args
|
||||||
|
"""
|
||||||
|
args_dict = vars(args)
|
||||||
|
if isinstance(cloud_args, dict):
|
||||||
|
for key in cloud_args.keys():
|
||||||
|
val = cloud_args[key]
|
||||||
|
if key in args_dict and val:
|
||||||
|
arg_type = type(args_dict[key])
|
||||||
|
if arg_type is not type(None):
|
||||||
|
val = arg_type(val)
|
||||||
|
args_dict[key] = val
|
||||||
|
return args
|
||||||
|
|
||||||
|
def test(cloud_args=None):
|
||||||
|
"""
|
||||||
|
network eval function. Get top1 and top5 ACC from classification.
|
||||||
|
The result will be save at [./outputs] by default.
|
||||||
|
"""
|
||||||
|
args = parse_args(cloud_args)
|
||||||
|
|
||||||
|
# init distributed
|
||||||
|
if args.is_distributed:
|
||||||
|
init()
|
||||||
|
args.rank = get_rank()
|
||||||
|
args.group_size = get_group_size()
|
||||||
|
|
||||||
|
args.outputs_dir = os.path.join(args.log_path,
|
||||||
|
datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
|
||||||
|
|
||||||
|
args.logger = get_logger(args.outputs_dir, args.rank)
|
||||||
|
args.logger.save_args(args)
|
||||||
|
|
||||||
|
# network
|
||||||
|
args.logger.important_info('start create network')
|
||||||
|
if os.path.isdir(args.pretrained):
|
||||||
|
models = list(glob.glob(os.path.join(args.pretrained, '*.ckpt')))
|
||||||
|
print(models)
|
||||||
|
|
||||||
|
f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('-')[-1].split('_')[0])
|
||||||
|
|
||||||
|
args.models = sorted(models, key=f)
|
||||||
|
else:
|
||||||
|
args.models = [args.pretrained,]
|
||||||
|
|
||||||
|
for model in args.models:
|
||||||
|
de_dataset = classification_dataset(args.data_dir, image_size=args.image_size,
|
||||||
|
per_batch_size=args.per_batch_size,
|
||||||
|
max_epoch=1, rank=args.rank, group_size=args.group_size,
|
||||||
|
mode='eval')
|
||||||
|
eval_dataloader = de_dataset.create_tuple_iterator()
|
||||||
|
network = DenseNet121(args.num_classes)
|
||||||
|
|
||||||
|
param_dict = load_checkpoint(model)
|
||||||
|
param_dict_new = {}
|
||||||
|
for key, values in param_dict.items():
|
||||||
|
if key.startswith('moments.'):
|
||||||
|
continue
|
||||||
|
elif key.startswith('network.'):
|
||||||
|
param_dict_new[key[8:]] = values
|
||||||
|
else:
|
||||||
|
param_dict_new[key] = values
|
||||||
|
print("key:" + key)
|
||||||
|
print(values.data)
|
||||||
|
load_param_into_net(network, param_dict_new)
|
||||||
|
args.logger.info('load model {} success'.format(model))
|
||||||
|
|
||||||
|
# must add
|
||||||
|
network.add_flags_recursive(fp16=True)
|
||||||
|
|
||||||
|
img_tot = 0
|
||||||
|
top1_correct = 0
|
||||||
|
top5_correct = 0
|
||||||
|
network.set_train(False)
|
||||||
|
for data, gt_classes in eval_dataloader:
|
||||||
|
output = network(Tensor(data, mstype.float32))
|
||||||
|
output = output.asnumpy()
|
||||||
|
|
||||||
|
top1_output = np.argmax(output, (-1))
|
||||||
|
top5_output = np.argsort(output)[:, -5:]
|
||||||
|
|
||||||
|
t1_correct = np.equal(top1_output, gt_classes).sum()
|
||||||
|
top1_correct += t1_correct
|
||||||
|
top5_correct += get_top5_acc(top5_output, gt_classes)
|
||||||
|
img_tot += args.per_batch_size
|
||||||
|
|
||||||
|
results = [[top1_correct], [top5_correct], [img_tot]]
|
||||||
|
args.logger.info('before results={}'.format(results))
|
||||||
|
if args.is_distributed:
|
||||||
|
model_md5 = model.replace('/', '')
|
||||||
|
tmp_dir = '/cache'
|
||||||
|
if not os.path.exists(tmp_dir):
|
||||||
|
os.mkdir(tmp_dir)
|
||||||
|
top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(args.rank, model_md5)
|
||||||
|
top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(args.rank, model_md5)
|
||||||
|
img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(args.rank, model_md5)
|
||||||
|
np.save(top1_correct_npy, top1_correct)
|
||||||
|
np.save(top5_correct_npy, top5_correct)
|
||||||
|
np.save(img_tot_npy, img_tot)
|
||||||
|
while True:
|
||||||
|
rank_ok = True
|
||||||
|
for other_rank in range(args.group_size):
|
||||||
|
top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
if not os.path.exists(top1_correct_npy) or not os.path.exists(top5_correct_npy) \
|
||||||
|
or not os.path.exists(img_tot_npy):
|
||||||
|
rank_ok = False
|
||||||
|
if rank_ok:
|
||||||
|
break
|
||||||
|
|
||||||
|
top1_correct_all = 0
|
||||||
|
top5_correct_all = 0
|
||||||
|
img_tot_all = 0
|
||||||
|
for other_rank in range(args.group_size):
|
||||||
|
top1_correct_npy = '/cache/top1_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
top5_correct_npy = '/cache/top5_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
img_tot_npy = '/cache/img_tot_rank_{}_{}.npy'.format(other_rank, model_md5)
|
||||||
|
top1_correct_all += np.load(top1_correct_npy)
|
||||||
|
top5_correct_all += np.load(top5_correct_npy)
|
||||||
|
img_tot_all += np.load(img_tot_npy)
|
||||||
|
results = [[top1_correct_all], [top5_correct_all], [img_tot_all]]
|
||||||
|
results = np.array(results)
|
||||||
|
|
||||||
|
else:
|
||||||
|
results = np.array(results)
|
||||||
|
|
||||||
|
args.logger.info('after results={}'.format(results))
|
||||||
|
top1_correct = results[0, 0]
|
||||||
|
top5_correct = results[1, 0]
|
||||||
|
img_tot = results[2, 0]
|
||||||
|
acc1 = 100.0 * top1_correct / img_tot
|
||||||
|
acc5 = 100.0 * top5_correct / img_tot
|
||||||
|
args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct,
|
||||||
|
img_tot,
|
||||||
|
acc1))
|
||||||
|
args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct,
|
||||||
|
img_tot,
|
||||||
|
acc5))
|
||||||
|
if args.is_distributed:
|
||||||
|
release()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test()
|
|
@ -0,0 +1,48 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
echo "=============================================================================================================="
|
||||||
|
echo "Please run the scipt as: "
|
||||||
|
echo "sh run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_PATH"
|
||||||
|
echo "for example: sh run_distribute_train.sh 8 /data/hccl.json /path/to/dataset /path/to/ckpt"
|
||||||
|
echo "It is better to use absolute path."
|
||||||
|
echo "================================================================================================================="
|
||||||
|
|
||||||
|
echo "After running the scipt, the network runs in the background. The log will be generated in LOGx/log.txt"
|
||||||
|
|
||||||
|
export RANK_SIZE=$1
|
||||||
|
export RANK_TABLE_FILE=$2
|
||||||
|
DATASET=$3
|
||||||
|
CKPT_PATH=$4
|
||||||
|
|
||||||
|
for((i=0;i<RANK_SIZE;i++))
|
||||||
|
do
|
||||||
|
export DEVICE_ID=$i
|
||||||
|
rm -rf LOG$i
|
||||||
|
mkdir ./LOG$i
|
||||||
|
cp ./*.py ./LOG$i
|
||||||
|
cp -r ./src ./LOG$i
|
||||||
|
cd ./LOG$i || exit
|
||||||
|
export RANK_ID=$i
|
||||||
|
echo "start training for rank $i, device $DEVICE_ID"
|
||||||
|
env > env.log
|
||||||
|
python eval.py \
|
||||||
|
--data_dir=$DATASET \
|
||||||
|
--pretrained=$CKPT_PATH > log.txt 2>&1 &
|
||||||
|
|
||||||
|
cd ../
|
||||||
|
done
|
||||||
|
|
|
@ -0,0 +1,45 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
echo "=============================================================================================================="
|
||||||
|
echo "Please run the scipt as: "
|
||||||
|
echo "sh run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE DATASET"
|
||||||
|
echo "for example: sh run_distribute_train.sh 8 /data/hccl.json /path/to/dataset"
|
||||||
|
echo "It is better to use absolute path."
|
||||||
|
echo "================================================================================================================="
|
||||||
|
|
||||||
|
echo "After running the scipt, the network runs in the background. The log will be generated in LOGx/log.txt"
|
||||||
|
|
||||||
|
export RANK_SIZE=$1
|
||||||
|
export RANK_TABLE_FILE=$2
|
||||||
|
DATASET=$3
|
||||||
|
|
||||||
|
for((i=0;i<RANK_SIZE;i++))
|
||||||
|
do
|
||||||
|
export DEVICE_ID=$i
|
||||||
|
rm -rf LOG$i
|
||||||
|
mkdir ./LOG$i
|
||||||
|
cp ./*.py ./LOG$i
|
||||||
|
cp -r ./src ./LOG$i
|
||||||
|
cd ./LOG$i || exit
|
||||||
|
export RANK_ID=$i
|
||||||
|
echo "start training for rank $i, device $DEVICE_ID"
|
||||||
|
env > env.log
|
||||||
|
python train.py \
|
||||||
|
--data_dir=$DATASET > log.txt 2>&1 &
|
||||||
|
|
||||||
|
cd ../
|
||||||
|
done
|
|
@ -0,0 +1,46 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""config"""
|
||||||
|
from easydict import EasyDict as ed
|
||||||
|
|
||||||
|
config = ed({
|
||||||
|
"image_size": '224,224',
|
||||||
|
"num_classes": 1000,
|
||||||
|
|
||||||
|
"lr": 0.1,
|
||||||
|
"lr_scheduler": 'cosine_annealing',
|
||||||
|
"lr_epochs": '30,60,90,120',
|
||||||
|
"lr_gamma": 0.1,
|
||||||
|
"eta_min": 0,
|
||||||
|
"T_max": 120,
|
||||||
|
"max_epoch": 120,
|
||||||
|
"per_batch_size": 32,
|
||||||
|
"warmup_epochs": 0,
|
||||||
|
|
||||||
|
"weight_decay": 0.0001,
|
||||||
|
"momentum": 0.9,
|
||||||
|
"is_dynamic_loss_scale": 0,
|
||||||
|
"loss_scale": 1024,
|
||||||
|
"label_smooth": 0,
|
||||||
|
"label_smooth_factor": 0.1,
|
||||||
|
|
||||||
|
"log_interval": 100,
|
||||||
|
"ckpt_interval": 2000,
|
||||||
|
"ckpt_path": 'outputs/',
|
||||||
|
"is_save_on_master": 1,
|
||||||
|
|
||||||
|
"rank": 0,
|
||||||
|
"group_size": 1
|
||||||
|
})
|
|
@ -0,0 +1,22 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
read dataset for classification
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .classification import classification_dataset
|
||||||
|
|
||||||
|
__all__ = ["classification_dataset"]
|
|
@ -0,0 +1,155 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
A function that returns a dataset for classification.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
from PIL import Image, ImageFile
|
||||||
|
from mindspore import dtype as mstype
|
||||||
|
import mindspore.dataset as de
|
||||||
|
import mindspore.dataset.transforms.vision.c_transforms as vision_C
|
||||||
|
import mindspore.dataset.transforms.c_transforms as normal_C
|
||||||
|
from src.datasets.sampler import DistributedSampler
|
||||||
|
|
||||||
|
ImageFile.LOAD_TRUNCATED_IMAGES = True
|
||||||
|
|
||||||
|
class TxtDataset():
|
||||||
|
"""
|
||||||
|
read dataset from txt
|
||||||
|
"""
|
||||||
|
def __init__(self, root, txt_name):
|
||||||
|
super(TxtDataset, self).__init__()
|
||||||
|
self.imgs = []
|
||||||
|
self.labels = []
|
||||||
|
fin = open(txt_name, "r")
|
||||||
|
for line in fin:
|
||||||
|
img_name, label = line.strip().split(' ')
|
||||||
|
self.imgs.append(os.path.join(root, img_name))
|
||||||
|
self.labels.append(int(label))
|
||||||
|
fin.close()
|
||||||
|
|
||||||
|
def __getitem__(self, index):
|
||||||
|
img = Image.open(self.imgs[index]).convert('RGB')
|
||||||
|
return img, self.labels[index]
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return len(self.imgs)
|
||||||
|
|
||||||
|
|
||||||
|
def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size,
|
||||||
|
mode='train',
|
||||||
|
input_mode='folder',
|
||||||
|
root='',
|
||||||
|
num_parallel_workers=None,
|
||||||
|
shuffle=None,
|
||||||
|
sampler=None,
|
||||||
|
class_indexing=None,
|
||||||
|
drop_remainder=True,
|
||||||
|
transform=None,
|
||||||
|
target_transform=None):
|
||||||
|
"""
|
||||||
|
A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt".
|
||||||
|
If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images
|
||||||
|
are written into a textfile.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data_dir (str): Path to the root directory that contains the dataset for "input_mode="folder"".
|
||||||
|
Or path of the textfile that contains every image's path of the dataset.
|
||||||
|
image_size (str): Size of the input images.
|
||||||
|
per_batch_size (int): the batch size of evey step during training.
|
||||||
|
max_epoch (int): the number of epochs.
|
||||||
|
rank (int): The shard ID within num_shards (default=None).
|
||||||
|
group_size (int): Number of shards that the dataset should be divided
|
||||||
|
into (default=None).
|
||||||
|
mode (str): "train" or others. Default: " train".
|
||||||
|
input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder".
|
||||||
|
root (str): the images path for "input_mode="txt"". Default: " ".
|
||||||
|
num_parallel_workers (int): Number of workers to read the data. Default: None.
|
||||||
|
shuffle (bool): Whether or not to perform shuffle on the dataset
|
||||||
|
(default=None, performs shuffle).
|
||||||
|
sampler (Sampler): Object used to choose samples from the dataset. Default: None.
|
||||||
|
class_indexing (dict): A str-to-int mapping from folder name to index
|
||||||
|
(default=None, the folder names will be sorted
|
||||||
|
alphabetically and each class will be given a
|
||||||
|
unique index starting from 0).
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> from src.datasets.classification import classification_dataset
|
||||||
|
>>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images
|
||||||
|
>>> dataset_dir = "/path/to/imagefolder_directory"
|
||||||
|
>>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
|
||||||
|
>>> per_batch_size=64, max_epoch=100,
|
||||||
|
>>> rank=0, group_size=4)
|
||||||
|
>>> # Path of the textfile that contains every image's path of the dataset.
|
||||||
|
>>> dataset_dir = "/path/to/dataset/images/train.txt"
|
||||||
|
>>> images_dir = "/path/to/dataset/images"
|
||||||
|
>>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
|
||||||
|
>>> per_batch_size=64, max_epoch=100,
|
||||||
|
>>> rank=0, group_size=4,
|
||||||
|
>>> input_mode="txt", root=images_dir)
|
||||||
|
"""
|
||||||
|
|
||||||
|
mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
|
||||||
|
std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
|
||||||
|
|
||||||
|
if transform is None:
|
||||||
|
if mode == 'train':
|
||||||
|
transform_img = [
|
||||||
|
vision_C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
|
||||||
|
vision_C.RandomHorizontalFlip(prob=0.5),
|
||||||
|
vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4),
|
||||||
|
vision_C.Normalize(mean=mean, std=std),
|
||||||
|
vision_C.HWC2CHW()
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
transform_img = [
|
||||||
|
vision_C.Decode(),
|
||||||
|
vision_C.Resize((256, 256)),
|
||||||
|
vision_C.CenterCrop(image_size),
|
||||||
|
vision_C.Normalize(mean=mean, std=std),
|
||||||
|
vision_C.HWC2CHW()
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
transform_img = transform
|
||||||
|
|
||||||
|
if target_transform is None:
|
||||||
|
transform_label = [
|
||||||
|
normal_C.TypeCast(mstype.int32)
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
transform_label = target_transform
|
||||||
|
|
||||||
|
if input_mode == 'folder':
|
||||||
|
de_dataset = de.ImageFolderDatasetV2(data_dir, num_parallel_workers=num_parallel_workers,
|
||||||
|
shuffle=shuffle, sampler=sampler, class_indexing=class_indexing,
|
||||||
|
num_shards=group_size, shard_id=rank)
|
||||||
|
else:
|
||||||
|
dataset = TxtDataset(root, data_dir)
|
||||||
|
sampler = DistributedSampler(dataset, rank, group_size, shuffle=shuffle)
|
||||||
|
de_dataset = de.GeneratorDataset(dataset, ["image", "label"], sampler=sampler)
|
||||||
|
de_dataset.set_dataset_size(len(sampler))
|
||||||
|
|
||||||
|
de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img)
|
||||||
|
de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label)
|
||||||
|
|
||||||
|
columns_to_project = ["image", "label"]
|
||||||
|
de_dataset = de_dataset.project(columns=columns_to_project)
|
||||||
|
|
||||||
|
de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder)
|
||||||
|
de_dataset = de_dataset.repeat(max_epoch)
|
||||||
|
|
||||||
|
return de_dataset
|
|
@ -0,0 +1,51 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
shuffle and distribute sample
|
||||||
|
"""
|
||||||
|
|
||||||
|
import math
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
class DistributedSampler():
|
||||||
|
"""
|
||||||
|
function to distribute and shuffle sample
|
||||||
|
"""
|
||||||
|
def __init__(self, dataset, rank, group_size, shuffle=True, seed=0):
|
||||||
|
self.dataset = dataset
|
||||||
|
self.rank = rank
|
||||||
|
self.group_size = group_size
|
||||||
|
self.dataset_length = len(self.dataset)
|
||||||
|
self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size))
|
||||||
|
self.total_size = self.num_samples * self.group_size
|
||||||
|
self.shuffle = shuffle
|
||||||
|
self.seed = seed
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
if self.shuffle:
|
||||||
|
self.seed = (self.seed + 1) & 0xffffffff
|
||||||
|
np.random.seed(self.seed)
|
||||||
|
indices = np.random.permutation(self.dataset_length).tolist()
|
||||||
|
else:
|
||||||
|
indices = list(range(len(self.dataset_length)))
|
||||||
|
|
||||||
|
indices += indices[:(self.total_size - len(indices))]
|
||||||
|
indices = indices[self.rank::self.group_size]
|
||||||
|
return iter(indices)
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return self.num_samples
|
|
@ -0,0 +1,19 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
loss function
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .crossentropy import *
|
|
@ -0,0 +1,44 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
loss function CrossEntropy
|
||||||
|
"""
|
||||||
|
|
||||||
|
from mindspore.nn.loss.loss import _Loss
|
||||||
|
from mindspore.ops import operations as P
|
||||||
|
from mindspore.ops import functional as F
|
||||||
|
from mindspore import Tensor
|
||||||
|
from mindspore.common import dtype as mstype
|
||||||
|
import mindspore.nn as nn
|
||||||
|
|
||||||
|
|
||||||
|
class CrossEntropy(_Loss):
|
||||||
|
"""
|
||||||
|
loss function CrossEntropy
|
||||||
|
"""
|
||||||
|
def __init__(self, smooth_factor=0., num_classes=1000):
|
||||||
|
super(CrossEntropy, self).__init__()
|
||||||
|
self.onehot = P.OneHot()
|
||||||
|
self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
|
||||||
|
self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32)
|
||||||
|
self.ce = nn.SoftmaxCrossEntropyWithLogits()
|
||||||
|
self.mean = P.ReduceMean(False)
|
||||||
|
|
||||||
|
def construct(self, logit, label):
|
||||||
|
one_hot_label = self.onehot(label,
|
||||||
|
F.shape(logit)[1], self.on_value, self.off_value)
|
||||||
|
loss = self.ce(logit, one_hot_label)
|
||||||
|
loss = self.mean(loss, 0)
|
||||||
|
return loss
|
|
@ -0,0 +1,19 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
learning rate scheduler
|
||||||
|
"""
|
||||||
|
from .lr_scheduler import *
|
|
@ -0,0 +1,656 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
learning rate scheduler
|
||||||
|
"""
|
||||||
|
|
||||||
|
import math
|
||||||
|
from collections import Counter
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
__all__ = ["LambdaLR", "MultiplicativeLR", "StepLR", "MultiStepLR", "ExponentialLR",
|
||||||
|
"CosineAnnealingLR", "CyclicLR", "CosineAnnealingWarmRestarts", "OneCycleLR"]
|
||||||
|
|
||||||
|
class _WarmUp():
|
||||||
|
def __init__(self, warmup_init_lr):
|
||||||
|
self.warmup_init_lr = warmup_init_lr
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
# Get learning rate during warmup
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
class _LinearWarmUp(_WarmUp):
|
||||||
|
"""
|
||||||
|
linear warmup function
|
||||||
|
"""
|
||||||
|
def __init__(self, lr, warmup_epochs, steps_per_epoch, warmup_init_lr=0):
|
||||||
|
self.base_lr = lr
|
||||||
|
self.warmup_init_lr = warmup_init_lr
|
||||||
|
self.warmup_steps = int(warmup_epochs * steps_per_epoch)
|
||||||
|
|
||||||
|
super(_LinearWarmUp, self).__init__(warmup_init_lr)
|
||||||
|
|
||||||
|
def get_warmup_steps(self):
|
||||||
|
return self.warmup_steps
|
||||||
|
|
||||||
|
def get_lr(self, current_step):
|
||||||
|
lr_inc = (float(self.base_lr) - float(self.warmup_init_lr)) / float(self.warmup_steps)
|
||||||
|
lr = float(self.warmup_init_lr) + lr_inc * current_step
|
||||||
|
return lr
|
||||||
|
|
||||||
|
class _ConstWarmUp(_WarmUp):
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
return self.warmup_init_lr
|
||||||
|
|
||||||
|
class _LRScheduler():
|
||||||
|
|
||||||
|
def __init__(self, lr, max_epoch, steps_per_epoch):
|
||||||
|
self.base_lr = lr
|
||||||
|
self.steps_per_epoch = steps_per_epoch
|
||||||
|
self.total_steps = int(max_epoch * steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
# Compute learning rate using chainable form of the scheduler
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
|
||||||
|
class LambdaLR(_LRScheduler):
|
||||||
|
"""Sets the learning rate to the initial lr times a given function.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
lr_lambda (function or list): A function which computes a multiplicative
|
||||||
|
factor given an integer parameter epoch.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
Example:
|
||||||
|
>>> # Assuming optimizer has two groups.
|
||||||
|
>>> lambda1 = lambda epoch: epoch // 30
|
||||||
|
>>> scheduler = LambdaLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
|
||||||
|
>>> max_epoch=90, warmup_epochs=0)
|
||||||
|
>>> lr = scheduler.get_lr()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
|
||||||
|
self.lr_lambda = lr_lambda
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(LambdaLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
cur_ep = i // self.steps_per_epoch
|
||||||
|
lr = self.base_lr * self.lr_lambda(cur_ep)
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class MultiplicativeLR(_LRScheduler):
|
||||||
|
"""Multiply the learning rate by the factor given
|
||||||
|
in the specified function.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr_lambda (function or list): A function which computes a multiplicative
|
||||||
|
factor given an integer parameter epoch,.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> lmbda = lambda epoch: 0.95
|
||||||
|
>>> scheduler = MultiplicativeLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
|
||||||
|
>>> max_epoch=90, warmup_epochs=0)
|
||||||
|
>>> lr = scheduler.get_lr()
|
||||||
|
"""
|
||||||
|
def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
|
||||||
|
self.lr_lambda = lr_lambda
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(MultiplicativeLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
current_lr = self.base_lr
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
cur_ep = i // self.steps_per_epoch
|
||||||
|
if i % self.steps_per_epoch == 0 and cur_ep > 0:
|
||||||
|
current_lr = current_lr * self.lr_lambda(cur_ep)
|
||||||
|
|
||||||
|
lr = current_lr
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class StepLR(_LRScheduler):
|
||||||
|
"""Decays the learning rate by gamma every epoch_size epochs.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
epoch_size (int): Period of learning rate decay.
|
||||||
|
gamma (float): Multiplicative factor of learning rate decay.
|
||||||
|
Default: 0.1.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> # Assuming optimizer uses lr = 0.05 for all groups
|
||||||
|
>>> # lr = 0.05 if epoch < 30
|
||||||
|
>>> # lr = 0.005 if 30 <= epoch < 60
|
||||||
|
>>> # lr = 0.0005 if 60 <= epoch < 90
|
||||||
|
>>> # ...
|
||||||
|
>>> scheduler = StepLR(lr=0.1, epoch_size=30, gamma=0.1, steps_per_epoch=5000,
|
||||||
|
>>> max_epoch=90, warmup_epochs=0)
|
||||||
|
>>> lr = scheduler.get_lr()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, epoch_size, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
|
||||||
|
self.epoch_size = epoch_size
|
||||||
|
self.gamma = gamma
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(StepLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
cur_ep = i // self.steps_per_epoch
|
||||||
|
lr = self.base_lr * self.gamma**(cur_ep // self.epoch_size)
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class MultiStepLR(_LRScheduler):
|
||||||
|
"""Decays the learning rate by gamma once the number of epoch reaches one
|
||||||
|
of the milestones.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
milestones (list): List of epoch indices. Must be increasing.
|
||||||
|
gamma (float): Multiplicative factor of learning rate decay.
|
||||||
|
Default: 0.1.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> # Assuming optimizer uses lr = 0.05 for all groups
|
||||||
|
>>> # lr = 0.05 if epoch < 30
|
||||||
|
>>> # lr = 0.005 if 30 <= epoch < 80
|
||||||
|
>>> # lr = 0.0005 if epoch >= 80
|
||||||
|
>>> scheduler = MultiStepLR(lr=0.1, milestones=[30,80], gamma=0.1, steps_per_epoch=5000,
|
||||||
|
>>> max_epoch=90, warmup_epochs=0)
|
||||||
|
>>> lr = scheduler.get_lr()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, milestones, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
|
||||||
|
self.milestones = Counter(milestones)
|
||||||
|
self.gamma = gamma
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(MultiStepLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
current_lr = self.base_lr
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
cur_ep = i // self.steps_per_epoch
|
||||||
|
if i % self.steps_per_epoch == 0 and cur_ep in self.milestones:
|
||||||
|
current_lr = current_lr * self.gamma
|
||||||
|
lr = current_lr
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class ExponentialLR(_LRScheduler):
|
||||||
|
"""Decays the learning rate of each parameter group by gamma every epoch.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
gamma (float): Multiplicative factor of learning rate decay.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
|
||||||
|
self.gamma = gamma
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(ExponentialLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
current_lr = self.base_lr
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
if i % self.steps_per_epoch == 0 and i > 0:
|
||||||
|
current_lr = current_lr * self.gamma
|
||||||
|
lr = current_lr
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class CosineAnnealingLR(_LRScheduler):
|
||||||
|
r"""Set the learning rate using a cosine annealing schedule, where
|
||||||
|
:math:`\eta_{max}` is set to the initial lr and :math:`T_{cur}` is the
|
||||||
|
number of epochs since the last restart in SGDR:
|
||||||
|
|
||||||
|
.. math::
|
||||||
|
\begin{aligned}
|
||||||
|
\eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1
|
||||||
|
+ \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right),
|
||||||
|
& T_{cur} \neq (2k+1)T_{max}; \\
|
||||||
|
\eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min})
|
||||||
|
\left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right),
|
||||||
|
& T_{cur} = (2k+1)T_{max}.
|
||||||
|
\end{aligned}
|
||||||
|
|
||||||
|
It has been proposed in
|
||||||
|
`SGDR: Stochastic Gradient Descent with Warm Restarts`_. Note that this only
|
||||||
|
implements the cosine annealing part of SGDR, and not the restarts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
T_max (int): Maximum number of iterations.
|
||||||
|
eta_min (float): Minimum learning rate. Default: 0.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
.. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
|
||||||
|
https://arxiv.org/abs/1608.03983
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, T_max, steps_per_epoch, max_epoch, warmup_epochs=0, eta_min=0):
|
||||||
|
self.T_max = T_max
|
||||||
|
self.eta_min = eta_min
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(CosineAnnealingLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
current_lr = self.base_lr
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
cur_ep = i // self.steps_per_epoch
|
||||||
|
if i % self.steps_per_epoch == 0 and i > 0:
|
||||||
|
current_lr = self.eta_min + \
|
||||||
|
(self.base_lr - self.eta_min) * (1. + math.cos(math.pi*cur_ep / self.T_max)) / 2
|
||||||
|
|
||||||
|
lr = current_lr
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class CyclicLR(_LRScheduler):
|
||||||
|
r"""Sets the learning rate according to cyclical learning rate policy (CLR).
|
||||||
|
The policy cycles the learning rate between two boundaries with a constant
|
||||||
|
frequency, as detailed in the paper `Cyclical Learning Rates for Training
|
||||||
|
Neural Networks`_. The distance between the two boundaries can be scaled on
|
||||||
|
a per-iteration or per-cycle basis.
|
||||||
|
|
||||||
|
Cyclical learning rate policy changes the learning rate after every batch.
|
||||||
|
|
||||||
|
This class has three built-in policies, as put forth in the paper:
|
||||||
|
|
||||||
|
* "triangular": A basic triangular cycle without amplitude scaling.
|
||||||
|
* "triangular2": A basic triangular cycle that scales initial amplitude by half each cycle.
|
||||||
|
* "exp_range": A cycle that scales initial amplitude by :math:`\text{gamma}^{\text{cycle iterations}}`
|
||||||
|
at each cycle iteration.
|
||||||
|
|
||||||
|
This implementation was adapted from the github repo: `bckenstler/CLR`_
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate which is the
|
||||||
|
lower boundary in the cycle.
|
||||||
|
max_lr (float): Upper learning rate boundaries in the cycle.
|
||||||
|
Functionally, it defines the cycle amplitude (max_lr - base_lr).
|
||||||
|
The lr at any cycle is the sum of base_lr and some scaling
|
||||||
|
of the amplitude; therefore max_lr may not actually be reached
|
||||||
|
depending on scaling function.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
step_size_up (int): Number of training iterations in the
|
||||||
|
increasing half of a cycle. Default: 2000
|
||||||
|
step_size_down (int): Number of training iterations in the
|
||||||
|
decreasing half of a cycle. If step_size_down is None,
|
||||||
|
it is set to step_size_up. Default: None
|
||||||
|
mode (str): One of {triangular, triangular2, exp_range}.
|
||||||
|
Values correspond to policies detailed above.
|
||||||
|
If scale_fn is not None, this argument is ignored.
|
||||||
|
Default: 'triangular'
|
||||||
|
gamma (float): Constant in 'exp_range' scaling function:
|
||||||
|
gamma**(cycle iterations)
|
||||||
|
Default: 1.0
|
||||||
|
scale_fn (function): Custom scaling policy defined by a single
|
||||||
|
argument lambda function, where
|
||||||
|
0 <= scale_fn(x) <= 1 for all x >= 0.
|
||||||
|
If specified, then 'mode' is ignored.
|
||||||
|
Default: None
|
||||||
|
scale_mode (str): {'cycle', 'iterations'}.
|
||||||
|
Defines whether scale_fn is evaluated on
|
||||||
|
cycle number or cycle iterations (training
|
||||||
|
iterations since start of cycle).
|
||||||
|
Default: 'cycle'
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
.. _Cyclical Learning Rates for Training Neural Networks: https://arxiv.org/abs/1506.01186
|
||||||
|
.. _bckenstler/CLR: https://github.com/bckenstler/CLR
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self,
|
||||||
|
lr,
|
||||||
|
max_lr,
|
||||||
|
steps_per_epoch,
|
||||||
|
max_epoch,
|
||||||
|
step_size_up=2000,
|
||||||
|
step_size_down=None,
|
||||||
|
mode='triangular',
|
||||||
|
gamma=1.,
|
||||||
|
scale_fn=None,
|
||||||
|
scale_mode='cycle',
|
||||||
|
warmup_epochs=0):
|
||||||
|
|
||||||
|
self.max_lr = max_lr
|
||||||
|
|
||||||
|
step_size_up = float(step_size_up)
|
||||||
|
step_size_down = float(step_size_down) if step_size_down is not None else step_size_up
|
||||||
|
self.total_size = step_size_up + step_size_down
|
||||||
|
self.step_ratio = step_size_up / self.total_size
|
||||||
|
|
||||||
|
if mode not in ['triangular', 'triangular2', 'exp_range'] \
|
||||||
|
and scale_fn is None:
|
||||||
|
raise ValueError('mode is invalid and scale_fn is None')
|
||||||
|
|
||||||
|
self.mode = mode
|
||||||
|
self.gamma = gamma
|
||||||
|
|
||||||
|
if scale_fn is None:
|
||||||
|
if self.mode == 'triangular':
|
||||||
|
self.scale_fn = self._triangular_scale_fn
|
||||||
|
self.scale_mode = 'cycle'
|
||||||
|
elif self.mode == 'triangular2':
|
||||||
|
self.scale_fn = self._triangular2_scale_fn
|
||||||
|
self.scale_mode = 'cycle'
|
||||||
|
elif self.mode == 'exp_range':
|
||||||
|
self.scale_fn = self._exp_range_scale_fn
|
||||||
|
self.scale_mode = 'iterations'
|
||||||
|
else:
|
||||||
|
self.scale_fn = scale_fn
|
||||||
|
self.scale_mode = scale_mode
|
||||||
|
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(CyclicLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def _triangular_scale_fn(self, x):
|
||||||
|
return 1.
|
||||||
|
|
||||||
|
def _triangular2_scale_fn(self, x):
|
||||||
|
return 1 / (2. ** (x - 1))
|
||||||
|
|
||||||
|
def _exp_range_scale_fn(self, x):
|
||||||
|
return self.gamma**(x)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
# Calculates the learning rate at batch index.
|
||||||
|
cycle = math.floor(1 + i / self.total_size)
|
||||||
|
x = 1. + i / self.total_size - cycle
|
||||||
|
if x <= self.step_ratio:
|
||||||
|
scale_factor = x / self.step_ratio
|
||||||
|
else:
|
||||||
|
scale_factor = (x - 1) / (self.step_ratio - 1)
|
||||||
|
|
||||||
|
base_height = (self.max_lr - self.base_lr) * scale_factor
|
||||||
|
if self.scale_mode == 'cycle':
|
||||||
|
lr = self.base_lr + base_height * self.scale_fn(cycle)
|
||||||
|
else:
|
||||||
|
lr = self.base_lr + base_height * self.scale_fn(i)
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class CosineAnnealingWarmRestarts(_LRScheduler):
|
||||||
|
r"""Set the learning rate using a cosine annealing schedule, where
|
||||||
|
:math:`\eta_{max}` is set to the initial lr, :math:`T_{cur}` is the
|
||||||
|
number of epochs since the last restart and :math:`T_{i}` is the number
|
||||||
|
of epochs between two warm restarts in SGDR:
|
||||||
|
|
||||||
|
.. math::
|
||||||
|
\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
|
||||||
|
\cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)
|
||||||
|
|
||||||
|
When :math:`T_{cur}=T_{i}`, set :math:`\eta_t = \eta_{min}`.
|
||||||
|
When :math:`T_{cur}=0` after restart, set :math:`\eta_t=\eta_{max}`.
|
||||||
|
|
||||||
|
It has been proposed in
|
||||||
|
`SGDR: Stochastic Gradient Descent with Warm Restarts`_.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
T_0 (int): Number of iterations for the first restart.
|
||||||
|
T_mult (int, optional): A factor increases :math:`T_{i}` after a restart. Default: 1.
|
||||||
|
eta_min (float, optional): Minimum learning rate. Default: 0.
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
.. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
|
||||||
|
https://arxiv.org/abs/1608.03983
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, lr, steps_per_epoch, max_epoch, T_0, T_mult=1, eta_min=0, warmup_epochs=0):
|
||||||
|
if T_0 <= 0 or not isinstance(T_0, int):
|
||||||
|
raise ValueError("Expected positive integer T_0, but got {}".format(T_0))
|
||||||
|
if T_mult < 1 or not isinstance(T_mult, int):
|
||||||
|
raise ValueError("Expected integer T_mult >= 1, but got {}".format(T_mult))
|
||||||
|
self.T_0 = T_0
|
||||||
|
self.T_i = T_0
|
||||||
|
self.T_mult = T_mult
|
||||||
|
self.eta_min = eta_min
|
||||||
|
self.T_cur = 0
|
||||||
|
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(CosineAnnealingWarmRestarts, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
if i % self.steps_per_epoch == 0 and i > 0:
|
||||||
|
self.T_cur += 1
|
||||||
|
if self.T_cur >= self.T_i:
|
||||||
|
self.T_cur = self.T_cur - self.T_i
|
||||||
|
self.T_i = self.T_i * self.T_mult
|
||||||
|
|
||||||
|
lr = self.eta_min + (self.base_lr - self.eta_min) * \
|
||||||
|
(1 + math.cos(math.pi * self.T_cur / self.T_i)) / 2
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
class OneCycleLR(_LRScheduler):
|
||||||
|
r"""Sets the learning rate of each parameter group according to the
|
||||||
|
1cycle learning rate policy. The 1cycle policy anneals the learning
|
||||||
|
rate from an initial learning rate to some maximum learning rate and then
|
||||||
|
from that maximum learning rate to some minimum learning rate much lower
|
||||||
|
than the initial learning rate.
|
||||||
|
This policy was initially described in the paper `Super-Convergence:
|
||||||
|
Very Fast Training of Neural Networks Using Large Learning Rates`_.
|
||||||
|
|
||||||
|
The 1cycle learning rate policy changes the learning rate after every batch.
|
||||||
|
This scheduler is not chainable.
|
||||||
|
|
||||||
|
|
||||||
|
Args:
|
||||||
|
lr (float): Initial learning rate.
|
||||||
|
steps_per_epoch (int): The number of steps per epoch to train for. This is
|
||||||
|
used along with epochs in order to infer the total number of steps in the cycle.
|
||||||
|
max_epoch (int): The number of epochs to train for. This is used along
|
||||||
|
with steps_per_epoch in order to infer the total number of steps in the cycle.
|
||||||
|
pct_start (float): The percentage of the cycle (in number of steps) spent
|
||||||
|
increasing the learning rate.
|
||||||
|
Default: 0.3
|
||||||
|
anneal_strategy (str): {'cos', 'linear'}
|
||||||
|
Specifies the annealing strategy: "cos" for cosine annealing, "linear" for
|
||||||
|
linear annealing.
|
||||||
|
Default: 'cos'
|
||||||
|
div_factor (float): Determines the max learning rate via
|
||||||
|
max_lr = lr * div_factor
|
||||||
|
Default: 25
|
||||||
|
final_div_factor (float): Determines the minimum learning rate via
|
||||||
|
min_lr = lr / final_div_factor
|
||||||
|
Default: 1e4
|
||||||
|
warmup_epochs (int): The number of epochs to Warmup.
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
|
||||||
|
.. _Super-Convergence\: Very Fast Training of Neural Networks Using Large Learning Rates:
|
||||||
|
https://arxiv.org/abs/1708.07120
|
||||||
|
"""
|
||||||
|
def __init__(self,
|
||||||
|
lr,
|
||||||
|
steps_per_epoch,
|
||||||
|
max_epoch,
|
||||||
|
pct_start=0.3,
|
||||||
|
anneal_strategy='cos',
|
||||||
|
div_factor=25.,
|
||||||
|
final_div_factor=1e4,
|
||||||
|
warmup_epochs=0):
|
||||||
|
|
||||||
|
self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
|
||||||
|
super(OneCycleLR, self).__init__(lr, max_epoch, steps_per_epoch)
|
||||||
|
|
||||||
|
self.step_size_up = float(pct_start * self.total_steps) - 1
|
||||||
|
self.step_size_down = float(self.total_steps - self.step_size_up) - 1
|
||||||
|
|
||||||
|
# Validate pct_start
|
||||||
|
if pct_start < 0 or pct_start > 1 or not isinstance(pct_start, float):
|
||||||
|
raise ValueError("Expected float between 0 and 1 pct_start, but got {}".format(pct_start))
|
||||||
|
|
||||||
|
# Validate anneal_strategy
|
||||||
|
if anneal_strategy not in ['cos', 'linear']:
|
||||||
|
raise ValueError("anneal_strategy must by one of 'cos' or 'linear', instead got {}".format(anneal_strategy))
|
||||||
|
if anneal_strategy == 'cos':
|
||||||
|
self.anneal_func = self._annealing_cos
|
||||||
|
elif anneal_strategy == 'linear':
|
||||||
|
self.anneal_func = self._annealing_linear
|
||||||
|
|
||||||
|
# Initialize learning rate variables
|
||||||
|
self.max_lr = lr * div_factor
|
||||||
|
self.min_lr = lr / final_div_factor
|
||||||
|
|
||||||
|
def _annealing_cos(self, start, end, pct):
|
||||||
|
"Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
|
||||||
|
cos_out = math.cos(math.pi * pct) + 1
|
||||||
|
return end + (start - end) / 2.0 * cos_out
|
||||||
|
|
||||||
|
def _annealing_linear(self, start, end, pct):
|
||||||
|
"Linearly anneal from `start` to `end` as pct goes from 0.0 to 1.0."
|
||||||
|
return (end - start) * pct + start
|
||||||
|
|
||||||
|
def get_lr(self):
|
||||||
|
warmup_steps = self.warmup.get_warmup_steps()
|
||||||
|
|
||||||
|
lr_each_step = []
|
||||||
|
for i in range(self.total_steps):
|
||||||
|
if i < warmup_steps:
|
||||||
|
lr = self.warmup.get_lr(i+1)
|
||||||
|
else:
|
||||||
|
if i <= self.step_size_up:
|
||||||
|
lr = self.anneal_func(self.base_lr, self.max_lr, i / self.step_size_up)
|
||||||
|
|
||||||
|
else:
|
||||||
|
down_step_num = i - self.step_size_up
|
||||||
|
lr = self.anneal_func(self.max_lr, self.min_lr, down_step_num / self.step_size_down)
|
||||||
|
|
||||||
|
lr_each_step.append(lr)
|
||||||
|
|
||||||
|
return np.array(lr_each_step).astype(np.float32)
|
|
@ -0,0 +1,18 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
densenet network
|
||||||
|
"""
|
||||||
|
from .densenet import DenseNet121
|
|
@ -0,0 +1,230 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
"""
|
||||||
|
model architecture of densenet
|
||||||
|
"""
|
||||||
|
|
||||||
|
import math
|
||||||
|
from collections import OrderedDict
|
||||||
|
|
||||||
|
import mindspore.nn as nn
|
||||||
|
from mindspore.ops import operations as P
|
||||||
|
from mindspore.common import initializer as init
|
||||||
|
from src.utils.var_init import default_recurisive_init, KaimingNormal
|
||||||
|
|
||||||
|
__all__ = ["DenseNet121"]
|
||||||
|
|
||||||
|
class GlobalAvgPooling(nn.Cell):
|
||||||
|
"""
|
||||||
|
GlobalAvgPooling function.
|
||||||
|
"""
|
||||||
|
def __init__(self):
|
||||||
|
super(GlobalAvgPooling, self).__init__()
|
||||||
|
self.mean = P.ReduceMean(True)
|
||||||
|
self.shape = P.Shape()
|
||||||
|
self.reshape = P.Reshape()
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
x = self.mean(x, (2, 3))
|
||||||
|
b, c, _, _ = self.shape(x)
|
||||||
|
x = self.reshape(x, (b, c))
|
||||||
|
return x
|
||||||
|
|
||||||
|
class CommonHead(nn.Cell):
|
||||||
|
def __init__(self, num_classes, out_channels):
|
||||||
|
super(CommonHead, self).__init__()
|
||||||
|
self.avgpool = GlobalAvgPooling()
|
||||||
|
self.fc = nn.Dense(out_channels, num_classes, has_bias=True)
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
x = self.avgpool(x)
|
||||||
|
x = self.fc(x)
|
||||||
|
return x
|
||||||
|
|
||||||
|
def conv7x7(in_channels, out_channels, stride=1, padding=3, has_bias=False):
|
||||||
|
return nn.Conv2d(in_channels, out_channels, kernel_size=7, stride=stride, has_bias=has_bias,
|
||||||
|
padding=padding, pad_mode="pad")
|
||||||
|
|
||||||
|
|
||||||
|
def conv3x3(in_channels, out_channels, stride=1, padding=1, has_bias=False):
|
||||||
|
return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=has_bias,
|
||||||
|
padding=padding, pad_mode="pad")
|
||||||
|
|
||||||
|
|
||||||
|
def conv1x1(in_channels, out_channels, stride=1, padding=0, has_bias=False):
|
||||||
|
return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, has_bias=has_bias,
|
||||||
|
padding=padding, pad_mode="pad")
|
||||||
|
|
||||||
|
|
||||||
|
class _DenseLayer(nn.Cell):
|
||||||
|
"""
|
||||||
|
the dense layer, include 2 conv layer
|
||||||
|
"""
|
||||||
|
def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
|
||||||
|
super(_DenseLayer, self).__init__()
|
||||||
|
self.norm1 = nn.BatchNorm2d(num_input_features)
|
||||||
|
self.relu1 = nn.ReLU()
|
||||||
|
self.conv1 = conv1x1(num_input_features, bn_size*growth_rate)
|
||||||
|
|
||||||
|
self.norm2 = nn.BatchNorm2d(bn_size*growth_rate)
|
||||||
|
self.relu2 = nn.ReLU()
|
||||||
|
self.conv2 = conv3x3(bn_size*growth_rate, growth_rate)
|
||||||
|
|
||||||
|
# nn.Dropout in MindSpore use keep_prob, diff from Pytorch
|
||||||
|
self.keep_prob = 1 - drop_rate
|
||||||
|
self.dropout = nn.Dropout(keep_prob=self.keep_prob)
|
||||||
|
|
||||||
|
def construct(self, features):
|
||||||
|
bottleneck = self.conv1(self.relu1(self.norm1(features)))
|
||||||
|
new_features = self.conv2(self.relu2(self.norm2(bottleneck)))
|
||||||
|
if self.keep_prob < 1:
|
||||||
|
new_features = self.dropout(new_features)
|
||||||
|
return new_features
|
||||||
|
|
||||||
|
class _DenseBlock(nn.Cell):
|
||||||
|
"""
|
||||||
|
the dense block
|
||||||
|
"""
|
||||||
|
def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
|
||||||
|
super(_DenseBlock, self).__init__()
|
||||||
|
self.cell_list = nn.CellList()
|
||||||
|
for i in range(num_layers):
|
||||||
|
layer = _DenseLayer(
|
||||||
|
num_input_features + i * growth_rate,
|
||||||
|
growth_rate=growth_rate,
|
||||||
|
bn_size=bn_size,
|
||||||
|
drop_rate=drop_rate
|
||||||
|
)
|
||||||
|
self.cell_list.append(layer)
|
||||||
|
|
||||||
|
self.concate = P.Concat(axis=1)
|
||||||
|
|
||||||
|
def construct(self, init_features):
|
||||||
|
features = init_features
|
||||||
|
for layer in self.cell_list:
|
||||||
|
new_features = layer(features)
|
||||||
|
features = self.concate((features, new_features))
|
||||||
|
return features
|
||||||
|
|
||||||
|
class _Transition(nn.Cell):
|
||||||
|
"""
|
||||||
|
the transiton layer
|
||||||
|
"""
|
||||||
|
def __init__(self, num_input_features, num_output_features):
|
||||||
|
super(_Transition, self).__init__()
|
||||||
|
self.features = nn.SequentialCell(OrderedDict([
|
||||||
|
('norm', nn.BatchNorm2d(num_input_features)),
|
||||||
|
('relu', nn.ReLU()),
|
||||||
|
('conv', conv1x1(num_input_features, num_output_features)),
|
||||||
|
('pool', nn.MaxPool2d(kernel_size=2, stride=2))
|
||||||
|
]))
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
x = self.features(x)
|
||||||
|
return x
|
||||||
|
|
||||||
|
class Densenet(nn.Cell):
|
||||||
|
"""
|
||||||
|
the densenet architecture
|
||||||
|
"""
|
||||||
|
__constants__ = ['features']
|
||||||
|
|
||||||
|
def __init__(self, growth_rate, block_config, num_init_features, bn_size=4, drop_rate=0):
|
||||||
|
super(Densenet, self).__init__()
|
||||||
|
|
||||||
|
layers = OrderedDict()
|
||||||
|
layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3)
|
||||||
|
layers['norm0'] = nn.BatchNorm2d(num_init_features)
|
||||||
|
layers['relu0'] = nn.ReLU()
|
||||||
|
layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
|
||||||
|
|
||||||
|
# Each denseblock
|
||||||
|
num_features = num_init_features
|
||||||
|
for i, num_layers in enumerate(block_config):
|
||||||
|
block = _DenseBlock(
|
||||||
|
num_layers=num_layers,
|
||||||
|
num_input_features=num_features,
|
||||||
|
bn_size=bn_size,
|
||||||
|
growth_rate=growth_rate,
|
||||||
|
drop_rate=drop_rate
|
||||||
|
)
|
||||||
|
layers['denseblock%d'%(i+1)] = block
|
||||||
|
num_features = num_features + num_layers*growth_rate
|
||||||
|
|
||||||
|
if i != len(block_config)-1:
|
||||||
|
trans = _Transition(num_input_features=num_features,
|
||||||
|
num_output_features=num_features // 2)
|
||||||
|
layers['transition%d'%(i+1)] = trans
|
||||||
|
num_features = num_features // 2
|
||||||
|
|
||||||
|
# Final batch norm
|
||||||
|
layers['norm5'] = nn.BatchNorm2d(num_features)
|
||||||
|
layers['relu5'] = nn.ReLU()
|
||||||
|
|
||||||
|
self.features = nn.SequentialCell(layers)
|
||||||
|
self.out_channels = num_features
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
x = self.features(x)
|
||||||
|
return x
|
||||||
|
|
||||||
|
def get_out_channels(self):
|
||||||
|
return self.out_channels
|
||||||
|
|
||||||
|
def _densenet121(**kwargs):
|
||||||
|
return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def _densenet161(**kwargs):
|
||||||
|
return Densenet(growth_rate=48, block_config=(6, 12, 36, 24), num_init_features=96, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def _densenet169(**kwargs):
|
||||||
|
return Densenet(growth_rate=32, block_config=(6, 12, 32, 32), num_init_features=64, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def _densenet201(**kwargs):
|
||||||
|
return Densenet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
class DenseNet121(nn.Cell):
|
||||||
|
"""
|
||||||
|
the densenet121 architectur
|
||||||
|
"""
|
||||||
|
def __init__(self, num_classes):
|
||||||
|
super(DenseNet121, self).__init__()
|
||||||
|
self.backbone = _densenet121()
|
||||||
|
out_channels = self.backbone.get_out_channels()
|
||||||
|
self.head = CommonHead(num_classes, out_channels)
|
||||||
|
|
||||||
|
default_recurisive_init(self)
|
||||||
|
for _, cell in self.cells_and_names():
|
||||||
|
if isinstance(cell, nn.Conv2d):
|
||||||
|
cell.weight.default_input = init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out',
|
||||||
|
nonlinearity='relu'),
|
||||||
|
cell.weight.default_input.shape,
|
||||||
|
cell.weight.default_input.dtype).to_tensor()
|
||||||
|
elif isinstance(cell, nn.BatchNorm2d):
|
||||||
|
cell.gamma.default_input = init.initializer('ones', cell.gamma.default_input.shape).to_tensor()
|
||||||
|
cell.beta.default_input = init.initializer('zeros', cell.beta.default_input.shape).to_tensor()
|
||||||
|
elif isinstance(cell, nn.Dense):
|
||||||
|
cell.bias.default_input = init.initializer('zeros', cell.bias.default_input.shape).to_tensor()
|
||||||
|
|
||||||
|
def construct(self, x):
|
||||||
|
x = self.backbone(x)
|
||||||
|
x = self.head(x)
|
||||||
|
return x
|
|
@ -0,0 +1,41 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
get parameter function
|
||||||
|
"""
|
||||||
|
def get_param_groups(network):
|
||||||
|
"""
|
||||||
|
get parameter groups
|
||||||
|
"""
|
||||||
|
decay_params = []
|
||||||
|
no_decay_params = []
|
||||||
|
for x in network.trainable_params():
|
||||||
|
parameter_name = x.name
|
||||||
|
if parameter_name.endswith('.bias'):
|
||||||
|
# all bias not using weight decay
|
||||||
|
# print('no decay:{}'.format(parameter_name))
|
||||||
|
no_decay_params.append(x)
|
||||||
|
elif parameter_name.endswith('.gamma'):
|
||||||
|
# bn weight bias not using weight decay, be carefully for now x not include BN
|
||||||
|
# print('no decay:{}'.format(parameter_name))
|
||||||
|
no_decay_params.append(x)
|
||||||
|
elif parameter_name.endswith('.beta'):
|
||||||
|
# bn weight bias not using weight decay, be carefully for now x not include BN
|
||||||
|
# print('no decay:{}'.format(parameter_name))
|
||||||
|
no_decay_params.append(x)
|
||||||
|
else:
|
||||||
|
decay_params.append(x)
|
||||||
|
|
||||||
|
return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]
|
|
@ -0,0 +1,82 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
get logger.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
class LOGGER(logging.Logger):
|
||||||
|
"""
|
||||||
|
set up logging file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
logger_name (string): logger name.
|
||||||
|
log_dir (string): path of logger.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
string, logger path
|
||||||
|
"""
|
||||||
|
def __init__(self, logger_name, rank=0):
|
||||||
|
super(LOGGER, self).__init__(logger_name)
|
||||||
|
if rank % 8 == 0:
|
||||||
|
console = logging.StreamHandler(sys.stdout)
|
||||||
|
console.setLevel(logging.INFO)
|
||||||
|
formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
|
||||||
|
console.setFormatter(formatter)
|
||||||
|
self.addHandler(console)
|
||||||
|
|
||||||
|
def setup_logging_file(self, log_dir, rank=0):
|
||||||
|
"""set up log file"""
|
||||||
|
self.rank = rank
|
||||||
|
if not os.path.exists(log_dir):
|
||||||
|
os.makedirs(log_dir, exist_ok=True)
|
||||||
|
log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank)
|
||||||
|
self.log_fn = os.path.join(log_dir, log_name)
|
||||||
|
fh = logging.FileHandler(self.log_fn)
|
||||||
|
fh.setLevel(logging.INFO)
|
||||||
|
formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
|
||||||
|
fh.setFormatter(formatter)
|
||||||
|
self.addHandler(fh)
|
||||||
|
|
||||||
|
def info(self, msg, *args, **kwargs):
|
||||||
|
if self.isEnabledFor(logging.INFO):
|
||||||
|
self._log(logging.INFO, msg, args, **kwargs)
|
||||||
|
|
||||||
|
def save_args(self, args):
|
||||||
|
self.info('Args:')
|
||||||
|
args_dict = vars(args)
|
||||||
|
for key in args_dict.keys():
|
||||||
|
self.info('--> %s: %s', key, args_dict[key])
|
||||||
|
self.info('')
|
||||||
|
|
||||||
|
def important_info(self, msg, *args, **kwargs):
|
||||||
|
if self.isEnabledFor(logging.INFO) and self.rank == 0:
|
||||||
|
line_width = 2
|
||||||
|
important_msg = '\n'
|
||||||
|
important_msg += ('*'*70 + '\n')*line_width
|
||||||
|
important_msg += ('*'*line_width + '\n')*2
|
||||||
|
important_msg += '*'*line_width + ' '*8 + msg + '\n'
|
||||||
|
important_msg += ('*'*line_width + '\n')*2
|
||||||
|
important_msg += ('*'*70 + '\n')*line_width
|
||||||
|
self.info(important_msg, *args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def get_logger(path, rank):
|
||||||
|
logger = LOGGER("mindversion", rank)
|
||||||
|
logger.setup_logging_file(path, rank)
|
||||||
|
return logger
|
|
@ -0,0 +1,211 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""
|
||||||
|
Initialize.
|
||||||
|
"""
|
||||||
|
import math
|
||||||
|
from functools import reduce
|
||||||
|
import numpy as np
|
||||||
|
import mindspore.nn as nn
|
||||||
|
from mindspore import Tensor
|
||||||
|
from mindspore.common import initializer as init
|
||||||
|
|
||||||
|
def _calculate_gain(nonlinearity, param=None):
|
||||||
|
r"""
|
||||||
|
Return the recommended gain value for the given nonlinearity function.
|
||||||
|
|
||||||
|
The values are as follows:
|
||||||
|
================= ====================================================
|
||||||
|
nonlinearity gain
|
||||||
|
================= ====================================================
|
||||||
|
Linear / Identity :math:`1`
|
||||||
|
Conv{1,2,3}D :math:`1`
|
||||||
|
Sigmoid :math:`1`
|
||||||
|
Tanh :math:`\frac{5}{3}`
|
||||||
|
ReLU :math:`\sqrt{2}`
|
||||||
|
Leaky Relu :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
|
||||||
|
================= ====================================================
|
||||||
|
|
||||||
|
Args:
|
||||||
|
nonlinearity: the non-linear function
|
||||||
|
param: optional parameter for the non-linear function
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> gain = calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2
|
||||||
|
"""
|
||||||
|
linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
|
||||||
|
if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
|
||||||
|
return 1
|
||||||
|
if nonlinearity == 'tanh':
|
||||||
|
return 5.0 / 3
|
||||||
|
if nonlinearity == 'relu':
|
||||||
|
return math.sqrt(2.0)
|
||||||
|
if nonlinearity == 'leaky_relu':
|
||||||
|
if param is None:
|
||||||
|
negative_slope = 0.01
|
||||||
|
elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
|
||||||
|
negative_slope = param
|
||||||
|
else:
|
||||||
|
raise ValueError("negative_slope {} not a valid number".format(param))
|
||||||
|
return math.sqrt(2.0 / (1 + negative_slope ** 2))
|
||||||
|
|
||||||
|
raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
|
||||||
|
|
||||||
|
def _assignment(arr, num):
|
||||||
|
"""Assign the value of `num` to `arr`."""
|
||||||
|
if arr.shape == ():
|
||||||
|
arr = arr.reshape((1))
|
||||||
|
arr[:] = num
|
||||||
|
arr = arr.reshape(())
|
||||||
|
else:
|
||||||
|
if isinstance(num, np.ndarray):
|
||||||
|
arr[:] = num[:]
|
||||||
|
else:
|
||||||
|
arr[:] = num
|
||||||
|
return arr
|
||||||
|
|
||||||
|
def _calculate_in_and_out(arr):
|
||||||
|
"""
|
||||||
|
Calculate n_in and n_out.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
arr (Array): Input array.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple, a tuple with two elements, the first element is `n_in` and the second element is `n_out`.
|
||||||
|
"""
|
||||||
|
dim = len(arr.shape)
|
||||||
|
if dim < 2:
|
||||||
|
raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
|
||||||
|
|
||||||
|
n_in = arr.shape[1]
|
||||||
|
n_out = arr.shape[0]
|
||||||
|
|
||||||
|
if dim > 2:
|
||||||
|
counter = reduce(lambda x, y: x * y, arr.shape[2:])
|
||||||
|
n_in *= counter
|
||||||
|
n_out *= counter
|
||||||
|
return n_in, n_out
|
||||||
|
|
||||||
|
def _select_fan(array, mode):
|
||||||
|
mode = mode.lower()
|
||||||
|
valid_modes = ['fan_in', 'fan_out']
|
||||||
|
if mode not in valid_modes:
|
||||||
|
raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))
|
||||||
|
|
||||||
|
fan_in, fan_out = _calculate_in_and_out(array)
|
||||||
|
return fan_in if mode == 'fan_in' else fan_out
|
||||||
|
|
||||||
|
class KaimingInit(init.Initializer):
|
||||||
|
r"""
|
||||||
|
Base Class. Initialize the array with He kaiming algorithm.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
a: the negative slope of the rectifier used after this layer (only
|
||||||
|
used with ``'leaky_relu'``)
|
||||||
|
mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
|
||||||
|
preserves the magnitude of the variance of the weights in the
|
||||||
|
forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
|
||||||
|
backwards pass.
|
||||||
|
nonlinearity: the non-linear function, recommended to use only with
|
||||||
|
``'relu'`` or ``'leaky_relu'`` (default).
|
||||||
|
"""
|
||||||
|
def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):
|
||||||
|
super(KaimingInit, self).__init__()
|
||||||
|
self.mode = mode
|
||||||
|
self.gain = _calculate_gain(nonlinearity, a)
|
||||||
|
|
||||||
|
|
||||||
|
class KaimingUniform(KaimingInit):
|
||||||
|
r"""
|
||||||
|
Initialize the array with He kaiming uniform algorithm. The resulting tensor will
|
||||||
|
have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where
|
||||||
|
|
||||||
|
.. math::
|
||||||
|
\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
|
||||||
|
|
||||||
|
Input:
|
||||||
|
arr (Array): The array to be assigned.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Array, assigned array.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> w = np.empty(3, 5)
|
||||||
|
>>> KaimingUniform(w, mode='fan_in', nonlinearity='relu')
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _initialize(self, arr):
|
||||||
|
fan = _select_fan(arr, self.mode)
|
||||||
|
bound = math.sqrt(3.0) * self.gain / math.sqrt(fan)
|
||||||
|
np.random.seed(1)
|
||||||
|
data = np.random.uniform(-bound, bound, arr.shape)
|
||||||
|
|
||||||
|
_assignment(arr, data)
|
||||||
|
|
||||||
|
|
||||||
|
class KaimingNormal(KaimingInit):
|
||||||
|
r"""
|
||||||
|
Initialize the array with He kaiming normal algorithm. The resulting tensor will
|
||||||
|
have values sampled from :math:`\mathcal{N}(0, \text{std}^2)` where
|
||||||
|
|
||||||
|
.. math::
|
||||||
|
\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}
|
||||||
|
|
||||||
|
Input:
|
||||||
|
arr (Array): The array to be assigned.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Array, assigned array.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> w = np.empty(3, 5)
|
||||||
|
>>> KaimingNormal(w, mode='fan_out', nonlinearity='relu')
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _initialize(self, arr):
|
||||||
|
fan = _select_fan(arr, self.mode)
|
||||||
|
std = self.gain / math.sqrt(fan)
|
||||||
|
np.random.seed(1)
|
||||||
|
data = np.random.normal(0, std, arr.shape)
|
||||||
|
|
||||||
|
_assignment(arr, data)
|
||||||
|
|
||||||
|
|
||||||
|
def default_recurisive_init(custom_cell):
|
||||||
|
"""default_recurisive_init"""
|
||||||
|
for _, cell in custom_cell.cells_and_names():
|
||||||
|
if isinstance(cell, nn.Conv2d):
|
||||||
|
cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
|
||||||
|
cell.weight.default_input.shape,
|
||||||
|
cell.weight.default_input.dtype).to_tensor()
|
||||||
|
if cell.bias is not None:
|
||||||
|
fan_in, _ = _calculate_in_and_out(cell.weight.default_input.asnumpy())
|
||||||
|
bound = 1 / math.sqrt(fan_in)
|
||||||
|
np.random.seed(1)
|
||||||
|
cell.bias.default_input = Tensor(np.random.uniform(-bound, bound, cell.bias.default_input.shape),
|
||||||
|
cell.bias.default_input.dtype)
|
||||||
|
elif isinstance(cell, nn.Dense):
|
||||||
|
cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
|
||||||
|
cell.weight.default_input.shape,
|
||||||
|
cell.weight.default_input.dtype).to_tensor()
|
||||||
|
if cell.bias is not None:
|
||||||
|
fan_in, _ = _calculate_in_and_out(cell.weight.default_input.asnumpy())
|
||||||
|
bound = 1 / math.sqrt(fan_in)
|
||||||
|
np.random.seed(1)
|
||||||
|
cell.bias.default_input = Tensor(np.random.uniform(-bound, bound, cell.bias.default_input.shape),
|
||||||
|
cell.bias.default_input.dtype)
|
||||||
|
elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)):
|
||||||
|
pass
|
|
@ -0,0 +1,286 @@
|
||||||
|
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ============================================================================
|
||||||
|
"""train launch."""
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import argparse
|
||||||
|
import datetime
|
||||||
|
|
||||||
|
import mindspore.nn as nn
|
||||||
|
from mindspore import Tensor, ParallelMode
|
||||||
|
from mindspore.nn.optim import Momentum
|
||||||
|
from mindspore.communication.management import init, get_rank, get_group_size
|
||||||
|
from mindspore.train.callback import ModelCheckpoint
|
||||||
|
from mindspore.train.callback import CheckpointConfig, Callback
|
||||||
|
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||||
|
from mindspore.train.model import Model
|
||||||
|
from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
|
||||||
|
from mindspore import context
|
||||||
|
|
||||||
|
from src.optimizers import get_param_groups
|
||||||
|
from src.network import DenseNet121
|
||||||
|
from src.datasets import classification_dataset
|
||||||
|
from src.losses.crossentropy import CrossEntropy
|
||||||
|
from src.lr_scheduler import MultiStepLR, CosineAnnealingLR
|
||||||
|
from src.utils.logging import get_logger
|
||||||
|
from src.config import config
|
||||||
|
|
||||||
|
devid = int(os.getenv('DEVICE_ID'))
|
||||||
|
context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True,
|
||||||
|
device_target="Davinci", save_graphs=False, device_id=devid)
|
||||||
|
|
||||||
|
class BuildTrainNetwork(nn.Cell):
|
||||||
|
"""build training network"""
|
||||||
|
def __init__(self, network, criterion):
|
||||||
|
super(BuildTrainNetwork, self).__init__()
|
||||||
|
self.network = network
|
||||||
|
self.criterion = criterion
|
||||||
|
|
||||||
|
def construct(self, input_data, label):
|
||||||
|
output = self.network(input_data)
|
||||||
|
loss = self.criterion(output, label)
|
||||||
|
return loss
|
||||||
|
|
||||||
|
class ProgressMonitor(Callback):
|
||||||
|
"""monitor loss and time"""
|
||||||
|
def __init__(self, args):
|
||||||
|
super(ProgressMonitor, self).__init__()
|
||||||
|
self.me_epoch_start_time = 0
|
||||||
|
self.me_epoch_start_step_num = 0
|
||||||
|
self.args = args
|
||||||
|
self.ckpt_history = []
|
||||||
|
|
||||||
|
def begin(self, run_context):
|
||||||
|
self.args.logger.info('start network train...')
|
||||||
|
|
||||||
|
def epoch_begin(self, run_context):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def epoch_end(self, run_context, *me_args):
|
||||||
|
"""process epoch end"""
|
||||||
|
cb_params = run_context.original_args()
|
||||||
|
me_step = cb_params.cur_step_num - 1
|
||||||
|
|
||||||
|
real_epoch = me_step // self.args.steps_per_epoch
|
||||||
|
time_used = time.time() - self.me_epoch_start_time
|
||||||
|
fps_mean = self.args.per_batch_size * (me_step-self.me_epoch_start_step_num) * self.args.group_size / time_used
|
||||||
|
self.args.logger.info('epoch[{}], iter[{}], loss:{},'
|
||||||
|
'mean_fps:{:.2f} imgs/sec'.format(real_epoch, me_step, cb_params.net_outputs, fps_mean))
|
||||||
|
if self.args.rank_save_ckpt_flag:
|
||||||
|
import glob
|
||||||
|
ckpts = glob.glob(os.path.join(self.args.outputs_dir, '*.ckpt'))
|
||||||
|
for ckpt in ckpts:
|
||||||
|
ckpt_fn = os.path.basename(ckpt)
|
||||||
|
if not ckpt_fn.startswith('{}-'.format(self.args.rank)):
|
||||||
|
continue
|
||||||
|
if ckpt in self.ckpt_history:
|
||||||
|
continue
|
||||||
|
self.ckpt_history.append(ckpt)
|
||||||
|
self.args.logger.info('epoch[{}], iter[{}], loss:{}, ckpt:{},'
|
||||||
|
'ckpt_fn:{}'.format(real_epoch, me_step, cb_params.net_outputs, ckpt, ckpt_fn))
|
||||||
|
|
||||||
|
self.me_epoch_start_step_num = me_step
|
||||||
|
self.me_epoch_start_time = time.time()
|
||||||
|
|
||||||
|
def step_begin(self, run_context):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def step_end(self, run_context, *me_args):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def end(self, run_context):
|
||||||
|
self.args.logger.info('end network train...')
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args(cloud_args=None):
|
||||||
|
"""parameters"""
|
||||||
|
parser = argparse.ArgumentParser('mindspore classification training')
|
||||||
|
|
||||||
|
# dataset related
|
||||||
|
parser.add_argument('--data_dir', type=str, default='', help='train data dir')
|
||||||
|
|
||||||
|
# network related
|
||||||
|
parser.add_argument('--pretrained', default='', type=str, help='model_path, local pretrained model to load')
|
||||||
|
|
||||||
|
# distributed related
|
||||||
|
parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')
|
||||||
|
|
||||||
|
# roma obs
|
||||||
|
parser.add_argument('--train_url', type=str, default="", help='train url')
|
||||||
|
|
||||||
|
args, _ = parser.parse_known_args()
|
||||||
|
args = merge_args(args, cloud_args)
|
||||||
|
args.image_size = config.image_size
|
||||||
|
args.num_classes = config.num_classes
|
||||||
|
args.lr = config.lr
|
||||||
|
args.lr_scheduler = config.lr_scheduler
|
||||||
|
args.lr_epochs = config.lr_epochs
|
||||||
|
args.lr_gamma = config.lr_gamma
|
||||||
|
args.eta_min = config.eta_min
|
||||||
|
args.T_max = config.T_max
|
||||||
|
args.max_epoch = config.max_epoch
|
||||||
|
args.warmup_epochs = config.warmup_epochs
|
||||||
|
args.weight_decay = config.weight_decay
|
||||||
|
args.momentum = config.momentum
|
||||||
|
args.is_dynamic_loss_scale = config.is_dynamic_loss_scale
|
||||||
|
args.loss_scale = config.loss_scale
|
||||||
|
args.label_smooth = config.label_smooth
|
||||||
|
args.label_smooth_factor = config.label_smooth_factor
|
||||||
|
args.ckpt_interval = config.ckpt_interval
|
||||||
|
args.ckpt_path = config.ckpt_path
|
||||||
|
args.is_save_on_master = config.is_save_on_master
|
||||||
|
args.rank = config.rank
|
||||||
|
args.group_size = config.group_size
|
||||||
|
args.log_interval = config.log_interval
|
||||||
|
args.per_batch_size = config.per_batch_size
|
||||||
|
|
||||||
|
args.lr_epochs = list(map(int, args.lr_epochs.split(',')))
|
||||||
|
args.image_size = list(map(int, args.image_size.split(',')))
|
||||||
|
|
||||||
|
return args
|
||||||
|
|
||||||
|
def merge_args(args, cloud_args):
|
||||||
|
"""dictionary"""
|
||||||
|
args_dict = vars(args)
|
||||||
|
if isinstance(cloud_args, dict):
|
||||||
|
for key in cloud_args.keys():
|
||||||
|
val = cloud_args[key]
|
||||||
|
if key in args_dict and val:
|
||||||
|
arg_type = type(args_dict[key])
|
||||||
|
if arg_type is not type(None):
|
||||||
|
val = arg_type(val)
|
||||||
|
args_dict[key] = val
|
||||||
|
return args
|
||||||
|
|
||||||
|
def train(cloud_args=None):
|
||||||
|
"""training process"""
|
||||||
|
args = parse_args(cloud_args)
|
||||||
|
|
||||||
|
# init distributed
|
||||||
|
if args.is_distributed:
|
||||||
|
init()
|
||||||
|
args.rank = get_rank()
|
||||||
|
args.group_size = get_group_size()
|
||||||
|
|
||||||
|
if args.is_dynamic_loss_scale == 1:
|
||||||
|
args.loss_scale = 1 # for dynamic loss scale can not set loss scale in momentum opt
|
||||||
|
|
||||||
|
# select for master rank save ckpt or all rank save, compatiable for model parallel
|
||||||
|
args.rank_save_ckpt_flag = 0
|
||||||
|
if args.is_save_on_master:
|
||||||
|
if args.rank == 0:
|
||||||
|
args.rank_save_ckpt_flag = 1
|
||||||
|
else:
|
||||||
|
args.rank_save_ckpt_flag = 1
|
||||||
|
|
||||||
|
# logger
|
||||||
|
args.outputs_dir = os.path.join(args.ckpt_path,
|
||||||
|
datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
|
||||||
|
args.logger = get_logger(args.outputs_dir, args.rank)
|
||||||
|
|
||||||
|
# dataloader
|
||||||
|
de_dataset = classification_dataset(args.data_dir, args.image_size,
|
||||||
|
args.per_batch_size, args.max_epoch,
|
||||||
|
args.rank, args.group_size)
|
||||||
|
de_dataset.map_model = 4 # !!!important
|
||||||
|
args.steps_per_epoch = de_dataset.get_dataset_size()
|
||||||
|
|
||||||
|
args.logger.save_args(args)
|
||||||
|
|
||||||
|
# network
|
||||||
|
args.logger.important_info('start create network')
|
||||||
|
# get network and init
|
||||||
|
network = DenseNet121(args.num_classes)
|
||||||
|
# loss
|
||||||
|
if not args.label_smooth:
|
||||||
|
args.label_smooth_factor = 0.0
|
||||||
|
criterion = CrossEntropy(smooth_factor=args.label_smooth_factor,
|
||||||
|
num_classes=args.num_classes)
|
||||||
|
|
||||||
|
# load pretrain model
|
||||||
|
if os.path.isfile(args.pretrained):
|
||||||
|
param_dict = load_checkpoint(args.pretrained)
|
||||||
|
param_dict_new = {}
|
||||||
|
for key, values in param_dict.items():
|
||||||
|
if key.startswith('moments.'):
|
||||||
|
continue
|
||||||
|
elif key.startswith('network.'):
|
||||||
|
param_dict_new[key[8:]] = values
|
||||||
|
else:
|
||||||
|
param_dict_new[key] = values
|
||||||
|
load_param_into_net(network, param_dict_new)
|
||||||
|
args.logger.info('load model {} success'.format(args.pretrained))
|
||||||
|
|
||||||
|
# lr scheduler
|
||||||
|
if args.lr_scheduler == 'exponential':
|
||||||
|
lr_scheduler = MultiStepLR(args.lr,
|
||||||
|
args.lr_epochs,
|
||||||
|
args.lr_gamma,
|
||||||
|
args.steps_per_epoch,
|
||||||
|
args.max_epoch,
|
||||||
|
warmup_epochs=args.warmup_epochs)
|
||||||
|
elif args.lr_scheduler == 'cosine_annealing':
|
||||||
|
lr_scheduler = CosineAnnealingLR(args.lr,
|
||||||
|
args.T_max,
|
||||||
|
args.steps_per_epoch,
|
||||||
|
args.max_epoch,
|
||||||
|
warmup_epochs=args.warmup_epochs,
|
||||||
|
eta_min=args.eta_min)
|
||||||
|
else:
|
||||||
|
raise NotImplementedError(args.lr_scheduler)
|
||||||
|
lr_schedule = lr_scheduler.get_lr()
|
||||||
|
|
||||||
|
# optimizer
|
||||||
|
opt = Momentum(params=get_param_groups(network),
|
||||||
|
learning_rate=Tensor(lr_schedule),
|
||||||
|
momentum=args.momentum,
|
||||||
|
weight_decay=args.weight_decay,
|
||||||
|
loss_scale=args.loss_scale)
|
||||||
|
|
||||||
|
# mixed precision training
|
||||||
|
criterion.add_flags_recursive(fp32=True)
|
||||||
|
|
||||||
|
# package training process, adjust lr + forward + backward + optimizer
|
||||||
|
train_net = BuildTrainNetwork(network, criterion)
|
||||||
|
if args.is_distributed:
|
||||||
|
parallel_mode = ParallelMode.DATA_PARALLEL
|
||||||
|
else:
|
||||||
|
parallel_mode = ParallelMode.STAND_ALONE
|
||||||
|
if args.is_dynamic_loss_scale == 1:
|
||||||
|
loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000)
|
||||||
|
else:
|
||||||
|
loss_scale_manager = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
|
||||||
|
|
||||||
|
context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=args.group_size,
|
||||||
|
parameter_broadcast=True, mirror_mean=True)
|
||||||
|
model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O3")
|
||||||
|
|
||||||
|
# checkpoint save
|
||||||
|
progress_cb = ProgressMonitor(args)
|
||||||
|
callbacks = [progress_cb,]
|
||||||
|
if args.rank_save_ckpt_flag:
|
||||||
|
ckpt_max_num = args.max_epoch * args.steps_per_epoch // args.ckpt_interval
|
||||||
|
ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval,
|
||||||
|
keep_checkpoint_max=ckpt_max_num)
|
||||||
|
ckpt_cb = ModelCheckpoint(config=ckpt_config,
|
||||||
|
directory=args.outputs_dir,
|
||||||
|
prefix='{}'.format(args.rank))
|
||||||
|
callbacks.append(ckpt_cb)
|
||||||
|
|
||||||
|
model.train(args.max_epoch, de_dataset, callbacks=callbacks)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
train()
|
Loading…
Reference in New Issue