!21946 [昇腾众智] RetinaFace网络:众智RetinaFace_ResNet50、GPU征集活动RetinaFace_MobileNet0.25

Merge pull request !21946 from yexijoe/RetinaFace_ResNet50
This commit is contained in:
i-robot 2021-09-06 13:15:55 +00:00 committed by Gitee
commit 1b4f0bd033
28 changed files with 4917 additions and 0 deletions

View File

@ -0,0 +1,451 @@
# 目录
<!-- TOC -->
- [目录](#目录)
- [retinaface描述](#retinaface描述)
- [预训练模型](#预训练模型)
- [模型架构](#模型架构)
- [数据集](#数据集)
- [环境要求](#环境要求)
- [快速入门](#快速入门)
- [脚本说明](#脚本说明)
- [脚本及样例代码](#脚本及样例代码)
- [脚本参数](#脚本参数)
- [训练过程](#训练过程)
- [用法](#用法)
- [分布式训练](#分布式训练)
- [评估过程](#评估过程)
- [评估](#评估)
- [导出过程](#导出过程)
- [导出](#导出)
- [推理过程](#推理过程)
- [推理](#推理)
- [模型描述](#模型描述)
- [性能](#性能)
- [评估性能](#评估性能)
- [WIDERFACE上的retinaface](#WIDERFACE上的retinaface)
- [推理性能](#推理性能)
- [WIDERFACE上的retinaface](#WIDERFACE上的retinaface)
- [随机情况说明](#随机情况说明)
- [ModelZoo主页](#modelzoo主页)
<!-- /TOC -->
# retinaface描述
Retinaface人脸检测模型于2019年提出应用于WIDER FACE数据集时效果最佳。RetinaFace论文RetinaFace: Single-stage Dense Face Localisation in the Wild。与S3FD和MTCNN相比RetinaFace显著提上了小脸召回率但不适合多尺度人脸检测。为了解决这些问题RetinaFace采用RetinaFace特征金字塔结构进行不同尺度间的特征融合并增加了SSH模块。
[论文](https://arxiv.org/abs/1905.00641v2) Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou. "RetinaFace: Single-stage Dense Face Localisation in the Wild". 2019.
# 预训练模型
RetinaFace可以使用ResNet50或MobileNet0.25骨干提取图像特征进行检测。使用ResNet50充当backbone时需要使用./src/resnet.py作为模型文件然后从ModelZoo中获取ResNet50的训练脚本使用默认的参数配置在ImageNet2012上训练得到ResNet50的预训练模型。
# 模型架构
具体来说RetinaFace是基于RetinaNet的网络采用了RetinaNet的特性金字塔结构并增加了SSH结构。网络中除了传统的检测分支外还增加了关键点预测分支和自监控分支。结果表明这两个分支可以提高模型的性能。这里我们不介绍自我监控分支。
# 数据集
使用的数据集: [WIDERFACE](<http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>)
获取数据集:
1. 点击[此处](<https://github.com/peteryuX/retinaface-tf2>)获取数据集和标注。
2. 点击[此处](<https://github.com/peteryuX/retinaface-tf2/tree/master/widerface_evaluate/ground_truth>)获取评估地面真值标签。
- 数据集大小3.42G32203张彩色图像
- 训练集1.36G12800张图像
- 验证集345.95M3226张图像
- 测试集1.72G16177张图像
- 数据集目录结构如下所示:
```bash
├── data/
├── widerface/
├── ground_truth/
│ ├──wider_easy_val.mat
│ ├──wider_face_val.mat
│ ├──wider_hard_val.mat
│ ├──wider_medium_val.mat
├── train/
│ ├──images/
│ │ ├──0--Parade/
│ │ │ ├──0_Parade_marchingband_1_5.jpg
│ │ │ ├──...
│ │ ├──.../
│ ├──label.txt
├── val/
│ ├──images/
│ │ ├──0--Parade/
│ │ │ ├──0_Parade_marchingband_1_20.jpg
│ │ │ ├──...
│ │ ├──.../
│ ├──label.txt
```
# 环境要求
- 硬件Ascend、GPU
- 使用ResNet50作为backbone时用Ascend来搭建硬件环境。
- 使用MobileNet0.25作为backbone时用GPU来搭建硬件环境。
- 框架
- [MindSpore](https://www.mindspore.cn/install)
- 如需查看详情,请参见如下资源:
- [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
# 快速入门
通过官方网站安装MindSpore后您可以按照如下步骤进行训练和评估
- Ascend处理器环境运行使用ResNet50作为backbone
```python
# 训练示例
python train.py --backbone_name 'ResNet50' > train.log 2>&1 &
OR
bash ./scripts/run_standalone_train_ascend.sh
# 分布式训练示例
bash ./scripts/run_distribution_train_ascend.sh [RANK_TABLE_FILE]
# 评估示例
python eval.py --backbone_name 'ResNet50' --val_model [CKPT_FILE] > ./eval.log 2>&1 &
OR
bash ./scripts/run_standalone_eval_ascend.sh './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt'
# 推理示例
bash run_infer_310.sh ../retinaface.mindir /home/dataset/widerface/val/ 0
```
- GPU处理器环境运行使用MobileNet0.25作为backbone
```python
# 训练示例
export CUDA_VISIBLE_DEVICES=0
python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &
# 分布式训练示例
bash scripts/run_distribution_train_gpu.sh 2 0,1
# 评估示例
export CUDA_VISIBLE_DEVICES=0
python eval.py --backbone_name 'MobileNet025' --val_model [CKPT_FILE] > eval.log 2>&1 &
OR
bash scripts/run_standalone_eval_gpu.sh 0 './checkpoint/ckpt_0/RetinaFace-117_804.ckpt'
```
# 脚本说明
## 脚本及样例代码
```bash
├── model_zoo
├── README.md // 所有模型的说明
├── retinaface
├── README_CN.md // Retinaface相关说明
├── ascend310_infer // 实现310推理源代码
├── scripts
│ ├──run_distribution_train_ascend.sh // 分布式到Ascend的shell脚本
│ ├──run_distribution_train_gpu.sh // 分布式到GPU的shell脚本
│ ├──run_infer_310.sh // Ascend推理的shell脚本使用ResNet50作为backbone时
│ ├──run_standalone_eval_ascend.sh // Ascend评估的shell脚本
│ ├──run_standalone_eval_gpu.sh // GPU评估的shell脚本
│ ├──run_standalone_train_ascend.sh // Ascend单卡训练的shell脚本
├── src
│ ├──augmentation.py // 数据增强方法
│ ├──config.py // 参数配置
│ ├──dataset.py // 创建数据集
│ ├──loss.py // 损失函数
│ ├──lr_schedule.py // 学习率衰减策略
│ ├──network_with_mobilenet.py // 使用MobileNet0.25作为backbone的RetinaFace架构
│ ├──network_with_resnet.py // 使用ResNet50作为backbone的RetinaFace架构
│ ├──resnet.py // 使用ResNet50作为backbone时预训练要用到的ResNet50架构
│ ├──utils.py // 数据预处理
├── data
│ ├──widerface // 数据集
│ ├──resnet-90_625.ckpt // ResNet50 ImageNet预训练模型
│ ├──ground_truth // 评估标签
├── eval.py // 评估脚本
├── export.py // 将checkpoint文件导出到air/mindir使用ResNet50作为backbone时
├── postprocess.py // 310推理后处理脚本
├── preprocess.py // 310推理前处理脚本
├── train.py // 训练脚本
```
## 脚本参数
在config.py中可以同时配置训练参数和评估参数。
- 配置使用ResNet50作为backbone的RetinaFace和WIDER FACE数据集
```python
'variance': [0.1, 0.2], # 方差
'clip': False, # 裁剪
'loc_weight': 2.0, # Bbox回归损失权重
'class_weight': 1.0, # 置信度/类回归损失权重
'landm_weight': 1.0, # 地标回归损失权重
'batch_size': 8, # 训练批次大小
'num_workers': 16, # 数据集加载数据的线程数量
'num_anchor': 29126, # 矩形框数量,取决于图片大小
'nnpu': 8, # 训练的NPU数量
'image_size': 840, # 训练图像大小
'match_thresh': 0.35, # 匹配框阈值
'optim': 'sgd', # 优化器类型
'momentum': 0.9, # 优化器动量
'weight_decay': 1e-4, # 优化器权重衰减
'epoch': 60, # 训练轮次数量
'decay1': 20, # 首次权重衰减的轮次数
'decay2': 40, # 二次权重衰减的轮次数
'initial_lr':0.04 # 初始学习率八卡并行训练时设置为0.04
'warmup_epoch': -1, # 热身大小,-1表示无热身
'gamma': 0.1, # 学习率衰减比
'ckpt_path': './checkpoint/', # 模型保存路径
'keep_checkpoint_max': 8, # 预留检查点数量
'resume_net': None, # 重启网络默认为None
'training_dataset': '../data/widerface/train/label.txt', # 训练数据集标签路径
'pretrain': True, # 是否基于预训练骨干进行训练
'pretrain_path': '../data/resnet-90_625.ckpt', # 预训练的骨干检查点路径
# 验证
'val_model': './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt', # 验证模型路径
'val_dataset_folder': './data/widerface/val/', # 验证数据集路径
'val_origin_size': True, # 是否使用全尺寸验证
'val_confidence_threshold': 0.02, # 验证置信度阈值
'val_nms_threshold': 0.4, # 验证NMS阈值
'val_iou_threshold': 0.5, # 验证IOU阈值
'val_save_result': False, # 是否保存结果
'val_predict_save_folder': './widerface_result', # 结果保存路径
'val_gt_dir': './data/ground_truth/', # 验证集ground_truth路径
# 推理
'infer_dataset_folder': '/home/dataset/widerface/val/', # 310进行推理时验证数据集路径
'infer_gt_dir': '/home/dataset/widerface/ground_truth/', # 310进行推理时验证集ground_truth路径
```
- 配置使用MobileNet0.25作为backbone的RetinaFace和WIDER FACE数据集
```python
'name': 'MobileNet025', # 骨干名称
'variance': [0.1, 0.2], # 方差
'clip': False, # 裁剪
'loc_weight': 2.0, # Bbox回归损失权重
'class_weight': 1.0, # 置信度/类回归损失权重
'landm_weight': 1.0, # 地标回归损失权重
'batch_size': 8, # 训练批次大小
'num_workers': 12, # 数据集加载数据的线程数量
'num_anchor': 16800, # 矩形框数量,取决于图片大小
'ngpu': 2, # 训练的GPU数量
'epoch': 120, # 训练轮次数量
'decay1': 70, # 首次权重衰减的轮次数
'decay2': 90, # 二次权重衰减的轮次数
'image_size': 640, # 训练图像大小
'match_thresh': 0.35, # 匹配框阈值
'optim': 'sgd', # 优化器类型
'momentum': 0.9, # 优化器动量
'weight_decay': 5e-4, # 优化器权重衰减
'initial_lr': 0.02, # 学习率
'warmup_epoch': 5, # 热身大小,-1表示无热身
'gamma': 0.1, # 学习率衰减比
'ckpt_path': './checkpoint/', # 模型保存路径
'save_checkpoint_steps': 2000, # 保存检查点迭代
'keep_checkpoint_max': 3, # 预留检查点数量
'resume_net': None, # 重启网络默认为None
'training_dataset': '', # 训练数据集标签路径如data/widerface/train/label.txt
'pretrain': False, # 是否基于预训练骨干进行训练
'pretrain_path': './data/mobilenetv1-90_5004.ckpt', # 预训练的骨干检查点路径
# 验证
'val_model': './checkpoint/ckpt_0/RetinaFace-117_804.ckpt', # 验证模型路径
'val_dataset_folder': './data/widerface/val/', # 验证数据集路径
'val_origin_size': False, # 是否使用全尺寸验证
'val_confidence_threshold': 0.02, # 验证置信度阈值
'val_nms_threshold': 0.4, # 验证NMS阈值
'val_iou_threshold': 0.5, # 验证IOU阈值
'val_save_result': False, # 是否保存结果
'val_predict_save_folder': './widerface_result', # 结果保存路径
'val_gt_dir': './data/ground_truth/', # 验证集ground_truth路径
```
## 训练过程
### 用法
- Ascend处理器环境运行使用ResNet50作为backbone
```bash
python train.py --backbone_name 'ResNet50' > train.log 2>&1 &
OR
bash ./scripts/run_standalone_train_ascend.sh
```
上述python命令在后台运行可通过`train.log`文件查看结果。
训练结束后,可以得到损失值:
```bash
epoch: 7 step: 1609, loss is 5.327434
epoch time: 466281.709 ms, per step time: 289.796 ms
epoch: 8 step: 1609, loss is 4.7512465
epoch time: 466995.237 ms, per step time: 290.239 ms
```
- GPU处理器环境运行使用MobileNet0.25作为backbone
```bash
export CUDA_VISIBLE_DEVICES=0
python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &
```
上述python命令在后台运行可通过`train.log`文件查看结果。
训练结束后,可在默认文件夹`./checkpoint/`中找到检查点文件。
### 分布式训练
- Ascend处理器环境运行使用ResNet50作为backbone
```bash
bash ./scripts/run_distribution_train_ascend.sh [RANK_TABLE_FILE]
```
上述shell脚本在后台运行分布式训练可通过`train_parallel0/log`文件查看结果。
训练结束后,可以得到损失值:
```bash
epoch: 4 step: 201, loss is 4.870843
epoch time: 60460.177 ms, per step time: 300.797 ms
epoch: 5 step: 201, loss is 4.649786
epoch time: 60527.898 ms, per step time: 301.134 ms
```
- GPU处理器环境运行使用MobileNet0.25作为backbone
```bash
bash scripts/run_distribute_gpu_train.sh 2 0,1
```
上述shell脚本在后台运行分布式训练可通过`train/train.log`文件查看结果。
训练结束后,可在默认文件夹`./checkpoint/ckpt_0/`中找到检查点文件。
## 评估过程
### 评估
- Ascend环境运行评估WIDER FACE数据集使用ResNet50作为backbone
CKPT_FILE是用于评估的检查点路径。如'./train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt'。
```bash
python eval.py --backbone_name 'ResNet50' --val_model [CKPT_FILE] > ./eval.log 2>&1 &
OR
bash run_standalone_eval_ascend.sh [CKPT_FILE]
```
上述python命令在后台运行可通过"eval.log"文件查看结果。测试数据集的准确率如下:
```python
# grep "Val AP" eval.log
Easy Val AP : 0.9516
Medium Val AP : 0.9381
Hard Val AP : 0.8403
```
- GPU处理器环境运行评估WIDER FACE数据集使用MobileNet0.25作为backbone
CKPT_FILE是用于评估的检查点路径。如'./checkpoint/ckpt_0/RetinaFace-117_804.ckpt'。
```bash
export CUDA_VISIBLE_DEVICES=0
python eval.py --backbone_name 'MobileNet025' --val_model [CKPT_FILE] > eval.log 2>&1 &
```
上述python命令在后台运行可通过"eval.log"文件查看结果。测试数据集的准确率如下:
```python
# grep "Val AP" eval.log
Easy Val AP : 0.8877
Medium Val AP : 0.8698
Hard Val AP : 0.8005
```
## 导出过程
### 导出
将checkpoint文件导出成mindir格式模型。使用ResNet50作为backbone
```shell
python export.py --ckpt_file [CKPT_FILE]
```
## 推理过程
### 推理
在进行推理之前我们需要先导出模型。mindir可以在任意环境上导出air模型只能在昇腾910环境上导出。以下展示了使用mindir模型执行推理的示例。
- 在昇腾310上使用WIDER FACE数据集进行推理使用ResNet50作为backbone
执行推理的命令如下所示,其中'MINDIR_PATH'是mindir文件路径'DATASET_PATH'是使用的推理数据集所在路径,如'/home/dataset/widerface/val/''DEVICE_ID'可选默认值为0。
```shell
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
```
推理的精度结果保存在scripts目录下在acc.log日志文件中可以找到类似以下的分类准确率结果。推理的性能结果保存在scripts/time_Result目录下在test_perform_static.txt文件中可以找到类似以下的性能结果。
```bash
Easy Val AP : 0.9498
Medium Val AP : 0.9351
Hard Val AP : 0.8306
NN inference cost average time: 365.584 ms of infer_count 3226
```
# 模型描述
## 性能
### 评估性能
#### WIDERFACE上的retinaface
| 参数 | Ascend | GPU |
| -------------------------- | -------------------------------------------------------------| -------------------------------------------------------------|
| 模型版本 | RetinaFace + ResNet50 | RetinaFace + MobileNet0.25 |
| 资源 | Ascend 910 | Tesla V100-32G |
| 上传日期 | 2021-08-17 | 2021-08-16 |
| MindSpore版本 | 1.2.0 | 1.4.0 |
| 数据集 | WIDERFACE |
| 训练参数 | epoch=60, steps=201, batch_size=8, lr=0.048卡为0.04单卡可设为0.01 | epoch=120, steps=804, batch_size=8, initial_lr=0.02 |
| 优化器 | SGD | SGD |
| 损失函数 | MultiBoxLoss + Softmax交叉熵 | MultiBoxLoss + Softmax交叉熵 |
| 输出 |边界框 + 置信度 + 地标 |边界框 + 置信度 + 地标 |
| 准确率 | Easy0.9516Medium0.9381Hard0.8403 | Easy0.8877Medium0.8698Hard0.8005 |
| 速度 | 单卡290ms/step8卡301ms/step | 2卡435ms/step |
| 总时长 | 8卡1.05小时 | 2卡11.74小时 |
### 推理性能
#### WIDERFACE上的retinaface
| 参数 | Ascend |
| -------------------------- | ----------------------------------------------------------- |
| 模型版本 | RetinaFace + ResNet50 |
| 资源 | Ascend 310 |
| 上传日期 | 2021-08-17 |
| MindSpore版本 | 1.4.0.20210805 |
| 数据集 | WIDERFACE |
| 准确率 | Easy0.9498Medium0.9351Hard0.8306 |
| 速度 | NN inference cost average time: 365.584 ms of infer_count 3226 |
# 随机情况说明
在train.py中使用mindspore.common.seed.set_seed()函数设置种子。
# ModelZoo主页
请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。

View File

@ -0,0 +1,14 @@
cmake_minimum_required(VERSION 3.14.1)
project(Ascend310Infer)
add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined")
set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/)
option(MINDSPORE_PATH "mindspore install path" "")
include_directories(${MINDSPORE_PATH})
include_directories(${MINDSPORE_PATH}/include)
include_directories(${PROJECT_SRC_ROOT})
find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib)
file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*)
add_executable(main src/main.cc src/utils.cc)
target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags)

View File

@ -0,0 +1,29 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [ -d out ]; then
rm -rf out
fi
mkdir out
cd out || exit
if [ -f "Makefile" ]; then
make clean
fi
cmake .. \
-DMINDSPORE_PATH="`pip3.7 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`"
make

View File

@ -0,0 +1,35 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_INFERENCE_UTILS_H_
#define MINDSPORE_INFERENCE_UTILS_H_
#include <sys/stat.h>
#include <dirent.h>
#include <vector>
#include <string>
#include <memory>
#include "include/api/types.h"
std::vector<std::string> GetAllFiles(std::string_view dirName);
DIR *OpenDir(std::string_view dirName);
std::string RealPath(std::string_view path);
mindspore::MSTensor ReadFileToTensor(const std::string &file);
int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs);
std::vector<std::string> GetAllFiles(std::string dir_name);
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name);
#endif

View File

@ -0,0 +1,190 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <sys/time.h>
#include <gflags/gflags.h>
#include <dirent.h>
#include <iostream>
#include <string>
#include <algorithm>
#include <iosfwd>
#include <vector>
#include <fstream>
#include <sstream>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/types.h"
#include "include/api/serialization.h"
#include "include/dataset/vision_ascend.h"
#include "include/dataset/execute.h"
#include "include/dataset/transforms.h"
#include "include/dataset/vision.h"
#include "inc/utils.h"
using mindspore::Context;
using mindspore::Serialization;
using mindspore::Model;
using mindspore::Status;
using mindspore::ModelType;
using mindspore::GraphCell;
using mindspore::kSuccess;
using mindspore::MSTensor;
using mindspore::dataset::Execute;
using mindspore::dataset::vision::Decode;
using mindspore::dataset::vision::Resize;
using mindspore::dataset::vision::CenterCrop;
using mindspore::dataset::vision::Normalize;
using mindspore::dataset::vision::HWC2CHW;
DEFINE_string(mindir_path, "", "mindir path");
DEFINE_string(input0_path, ".", "input0 path");
DEFINE_string(dataset_name, "widerface", "dataset name");
DEFINE_int32(device_id, 0, "device id");
int load_model(Model *model, std::vector<MSTensor> *model_inputs, std::string mindir_path, int device_id) {
if (RealPath(mindir_path).empty()) {
std::cout << "Invalid mindir" << std::endl;
return 1;
}
auto context = std::make_shared<Context>();
auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
ascend310->SetDeviceID(device_id);
context->MutableDeviceInfo().push_back(ascend310);
mindspore::Graph graph;
Serialization::Load(mindir_path, ModelType::kMindIR, &graph);
Status ret = model->Build(GraphCell(graph), context);
if (ret != kSuccess) {
std::cout << "ERROR: Build failed." << std::endl;
return 1;
}
*model_inputs = model->GetInputs();
if (model_inputs->empty()) {
std::cout << "Invalid model, inputs is empty." << std::endl;
return 1;
}
return 0;
}
int main(int argc, char **argv) {
gflags::ParseCommandLineFlags(&argc, &argv, true);
Model model;
std::vector<MSTensor> model_inputs;
load_model(&model, &model_inputs, FLAGS_mindir_path, FLAGS_device_id);
std::map<double, double> costTime_map;
struct timeval start = {0};
struct timeval end = {0};
double startTimeMs;
double endTimeMs;
if (FLAGS_dataset_name == "widerface") {
auto input0_files = GetAllFiles(FLAGS_input0_path);
if (input0_files.empty()) {
std::cout << "ERROR: no input data." << std::endl;
return 1;
}
size_t size = input0_files.size();
for (size_t i = 0; i < size; ++i) {
std::vector<MSTensor> inputs;
std::vector<MSTensor> outputs;
std::cout << "Start predict input files:" << input0_files[i] <<std::endl;
auto input0 = ReadFileToTensor(input0_files[i]);
inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
input0.Data().get(), input0.DataSize());
gettimeofday(&start, nullptr);
Status ret = model.Predict(inputs, &outputs);
gettimeofday(&end, nullptr);
if (ret != kSuccess) {
std::cout << "Predict " << input0_files[i] << " failed." << std::endl;
return 1;
}
startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
int rst = WriteResult(input0_files[i], outputs);
if (rst != 0) {
std::cout << "write result failed." << std::endl;
return rst;
}
}
} else {
auto input0_files = GetAllInputData(FLAGS_input0_path);
if (input0_files.empty()) {
std::cout << "ERROR: no input data." << std::endl;
return 1;
}
size_t size = input0_files.size();
for (size_t i = 0; i < size; ++i) {
for (size_t j = 0; j < input0_files[i].size(); ++j) {
std::vector<MSTensor> inputs;
std::vector<MSTensor> outputs;
std::cout << "Start predict input files:" << input0_files[i][j] <<std::endl;
auto decode = Decode();
auto resize = Resize({256, 256});
auto centercrop = CenterCrop({224, 224});
auto normalize = Normalize({123.675, 116.28, 103.53}, {58.395, 57.12, 57.375});
auto hwc2chw = HWC2CHW();
Execute SingleOp({decode, resize, centercrop, normalize, hwc2chw});
auto imgDvpp = std::make_shared<MSTensor>();
SingleOp(ReadFileToTensor(input0_files[i][j]), imgDvpp.get());
inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
imgDvpp->Data().get(), imgDvpp->DataSize());
gettimeofday(&start, nullptr);
Status ret = model.Predict(inputs, &outputs);
gettimeofday(&end, nullptr);
if (ret != kSuccess) {
std::cout << "Predict " << input0_files[i][j] << " failed." << std::endl;
return 1;
}
startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
int rst = WriteResult(input0_files[i][j], outputs);
if (rst != 0) {
std::cout << "write result failed." << std::endl;
return rst;
}
}
}
}
double average = 0.0;
int inferCount = 0;
for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) {
double diff = 0.0;
diff = iter->second - iter->first;
average += diff;
inferCount++;
}
average = average / inferCount;
std::stringstream timeCost;
timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl;
std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl;
std::string fileName = "./time_Result" + std::string("/test_perform_static.txt");
std::ofstream fileStream(fileName.c_str(), std::ios::trunc);
fileStream << timeCost.str();
fileStream.close();
costTime_map.clear();
return 0;
}

View File

@ -0,0 +1,197 @@
/**
* Copyright 2021 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <fstream>
#include <algorithm>
#include <iostream>
#include "inc/utils.h"
using mindspore::MSTensor;
using mindspore::DataType;
std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name) {
std::vector<std::vector<std::string>> ret;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
struct dirent *filename;
/* read all the files in the dir ~ */
std::vector<std::string> sub_dirs;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
// get rid of "." and ".."
if (d_name == "." || d_name == ".." || d_name.empty()) {
continue;
}
std::string dir_path = RealPath(std::string(dir_name) + "/" + filename->d_name);
struct stat s;
lstat(dir_path.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
continue;
}
sub_dirs.emplace_back(dir_path);
}
std::sort(sub_dirs.begin(), sub_dirs.end());
(void)std::transform(sub_dirs.begin(), sub_dirs.end(), std::back_inserter(ret),
[](const std::string &d) { return GetAllFiles(d); });
return ret;
}
std::vector<std::string> GetAllFiles(std::string dir_name) {
struct dirent *filename;
DIR *dir = OpenDir(dir_name);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string d_name = std::string(filename->d_name);
if (d_name == "." || d_name == ".." || d_name.size() <= 3) {
continue;
}
res.emplace_back(std::string(dir_name) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
return res;
}
std::vector<std::string> GetAllFiles(std::string_view dirName) {
struct dirent *filename;
DIR *dir = OpenDir(dirName);
if (dir == nullptr) {
return {};
}
std::vector<std::string> res;
while ((filename = readdir(dir)) != nullptr) {
std::string dName = std::string(filename->d_name);
if (dName == "." || dName == ".." || filename->d_type != DT_REG) {
continue;
}
res.emplace_back(std::string(dirName) + "/" + filename->d_name);
}
std::sort(res.begin(), res.end());
for (auto &f : res) {
std::cout << "image file: " << f << std::endl;
}
return res;
}
int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) {
std::string homePath = "./result_Files";
const int INVALID_POINTER = -1;
const int ERROR = -2;
for (size_t i = 0; i < outputs.size(); ++i) {
size_t outputSize;
std::shared_ptr<const void> netOutput;
netOutput = outputs[i].Data();
outputSize = outputs[i].DataSize();
int pos = imageFile.rfind('/');
std::string fileName(imageFile, pos + 1);
fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin");
std::string outFileName = homePath + "/" + fileName;
FILE *outputFile = fopen(outFileName.c_str(), "wb");
if (outputFile == nullptr) {
std::cout << "open result file " << outFileName << " failed" << std::endl;
return INVALID_POINTER;
}
size_t size = fwrite(netOutput.get(), sizeof(char), outputSize, outputFile);
if (size != outputSize) {
fclose(outputFile);
outputFile = nullptr;
std::cout << "write result file " << outFileName << " failed, write size[" << size <<
"] is smaller than output size[" << outputSize << "], maybe the disk is full." << std::endl;
return ERROR;
}
fclose(outputFile);
outputFile = nullptr;
}
return 0;
}
mindspore::MSTensor ReadFileToTensor(const std::string &file) {
if (file.empty()) {
std::cout << "Pointer file is nullptr" << std::endl;
return mindspore::MSTensor();
}
std::ifstream ifs(file);
if (!ifs.good()) {
std::cout << "File: " << file << " is not exist" << std::endl;
return mindspore::MSTensor();
}
if (!ifs.is_open()) {
std::cout << "File: " << file << "open failed" << std::endl;
return mindspore::MSTensor();
}
ifs.seekg(0, std::ios::end);
size_t size = ifs.tellg();
mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size);
ifs.seekg(0, std::ios::beg);
ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size);
ifs.close();
return buffer;
}
DIR *OpenDir(std::string_view dirName) {
if (dirName.empty()) {
std::cout << " dirName is null ! " << std::endl;
return nullptr;
}
std::string realPath = RealPath(dirName);
struct stat s;
lstat(realPath.c_str(), &s);
if (!S_ISDIR(s.st_mode)) {
std::cout << "dirName is not a valid directory !" << std::endl;
return nullptr;
}
DIR *dir;
dir = opendir(realPath.c_str());
if (dir == nullptr) {
std::cout << "Can not open dir " << dirName << std::endl;
return nullptr;
}
std::cout << "Successfully opened the dir " << dirName << std::endl;
return dir;
}
std::string RealPath(std::string_view path) {
char realPathMem[PATH_MAX] = {0};
char *realPathRet = nullptr;
realPathRet = realpath(path.data(), realPathMem);
if (realPathRet == nullptr) {
std::cout << "File: " << path << " is not exist.";
return "";
}
std::string realPath(realPathMem);
std::cout << path << " realpath is: " << realPath << std::endl;
return realPath;
}

View File

@ -0,0 +1,558 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Eval Retinaface_resnet50_or_mobilenet0.25."""
import argparse
import os
import time
import datetime
import numpy as np
import cv2
from mindspore import Tensor, context
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.config import cfg_res50, cfg_mobile025
from src.utils import decode_bbox, prior_box
class Timer():
def __init__(self):
self.start_time = 0.
self.diff = 0.
def start(self):
self.start_time = time.time()
def end(self):
self.diff = time.time() - self.start_time
class DetectionEngine:
"""DetectionEngine"""
def __init__(self, cfg):
self.results = {}
self.nms_thresh = cfg['val_nms_threshold']
self.conf_thresh = cfg['val_confidence_threshold']
self.iou_thresh = cfg['val_iou_threshold']
self.var = cfg['variance']
self.save_prefix = cfg['val_predict_save_folder']
self.gt_dir = cfg['val_gt_dir']
def _iou(self, a, b):
"""_iou"""
A = a.shape[0]
B = b.shape[0]
max_xy = np.minimum(
np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
min_xy = np.maximum(
np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
inter = np.maximum((max_xy - min_xy + 1), np.zeros_like(max_xy - min_xy))
inter = inter[:, :, 0] * inter[:, :, 1]
area_a = np.broadcast_to(
np.expand_dims(
(a[:, 2] - a[:, 0] + 1) * (a[:, 3] - a[:, 1] + 1), 1),
np.shape(inter))
area_b = np.broadcast_to(
np.expand_dims(
(b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1), 0),
np.shape(inter))
union = area_a + area_b - inter
return inter / union
def _nms(self, boxes, threshold=0.5):
"""_nms"""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
scores = boxes[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
reserved_boxes = []
while order.size > 0:
i = order[0]
reserved_boxes.append(i)
max_x1 = np.maximum(x1[i], x1[order[1:]])
max_y1 = np.maximum(y1[i], y1[order[1:]])
min_x2 = np.minimum(x2[i], x2[order[1:]])
min_y2 = np.minimum(y2[i], y2[order[1:]])
intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)
intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)
intersect_area = intersect_w * intersect_h
ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)
indices = np.where(ovr <= threshold)[0]
order = order[indices + 1]
return reserved_boxes
def write_result(self):
"""write_result"""
# save result to file.
import json
t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S')
try:
if not os.path.isdir(self.save_prefix):
os.makedirs(self.save_prefix)
self.file_path = self.save_prefix + '/predict' + t + '.json'
f = open(self.file_path, 'w')
json.dump(self.results, f)
except IOError as e:
raise RuntimeError("Unable to open json file to dump. What(): {}".format(str(e)))
else:
f.close()
return self.file_path
def detect(self, boxes, confs, resize, scale, image_path, priors):
"""detect"""
if boxes.shape[0] == 0:
# add to result
event_name, img_name = image_path.split('/')
self.results[event_name][img_name[:-4]] = {'img_path': image_path,
'bboxes': []}
return
boxes = decode_bbox(np.squeeze(boxes.asnumpy(), 0), priors, self.var)
boxes = boxes * scale / resize
scores = np.squeeze(confs.asnumpy(), 0)[:, 1]
# ignore low scores
inds = np.where(scores > self.conf_thresh)[0]
boxes = boxes[inds]
scores = scores[inds]
# keep top-K before NMS
order = scores.argsort()[::-1]
boxes = boxes[order]
scores = scores[order]
# do NMS
dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = self._nms(dets, self.nms_thresh)
dets = dets[keep, :]
dets[:, 2:4] = (dets[:, 2:4].astype(np.int) - dets[:, 0:2].astype(np.int)).astype(np.float) # int
dets[:, 0:4] = dets[:, 0:4].astype(np.int).astype(np.float) # int
# add to result
event_name, img_name = image_path.split('/')
if event_name not in self.results.keys():
self.results[event_name] = {}
self.results[event_name][img_name[:-4]] = {'img_path': image_path,
'bboxes': dets[:, :5].astype(np.float).tolist()}
def _get_gt_boxes(self):
"""_get_gt_boxes"""
from scipy.io import loadmat
gt = loadmat(os.path.join(self.gt_dir, 'wider_face_val.mat'))
hard = loadmat(os.path.join(self.gt_dir, 'wider_hard_val.mat'))
medium = loadmat(os.path.join(self.gt_dir, 'wider_medium_val.mat'))
easy = loadmat(os.path.join(self.gt_dir, 'wider_easy_val.mat'))
faceboxes = gt['face_bbx_list']
events = gt['event_list']
files = gt['file_list']
hard_gt_list = hard['gt_list']
medium_gt_list = medium['gt_list']
easy_gt_list = easy['gt_list']
return faceboxes, events, files, hard_gt_list, medium_gt_list, easy_gt_list
def _norm_pre_score(self):
"""_norm_pre_score"""
max_score = 0
min_score = 1
for event in self.results:
for name in self.results[event].keys():
bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
if bbox.shape[0] <= 0:
continue
max_score = max(max_score, np.max(bbox[:, -1]))
min_score = min(min_score, np.min(bbox[:, -1]))
length = max_score - min_score
for event in self.results:
for name in self.results[event].keys():
bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
if bbox.shape[0] <= 0:
continue
bbox[:, -1] -= min_score
bbox[:, -1] /= length
self.results[event][name]['bboxes'] = bbox.tolist()
def _image_eval(self, predict, gt, keep, iou_thresh, section_num):
"""_image_eval"""
_predict = predict.copy()
_gt = gt.copy()
image_p_right = np.zeros(_predict.shape[0])
image_gt_right = np.zeros(_gt.shape[0])
proposal = np.ones(_predict.shape[0])
# x1y1wh -> x1y1x2y2
_predict[:, 2:4] = _predict[:, 0:2] + _predict[:, 2:4]
_gt[:, 2:4] = _gt[:, 0:2] + _gt[:, 2:4]
ious = self._iou(_predict[:, 0:4], _gt[:, 0:4])
for i in range(_predict.shape[0]):
gt_ious = ious[i, :]
max_iou, max_index = gt_ious.max(), gt_ious.argmax()
if max_iou >= iou_thresh:
if keep[max_index] == 0:
image_gt_right[max_index] = -1
proposal[i] = -1
elif image_gt_right[max_index] == 0:
image_gt_right[max_index] = 1
right_index = np.where(image_gt_right == 1)[0]
image_p_right[i] = len(right_index)
image_pr = np.zeros((section_num, 2), dtype=np.float)
for section in range(section_num):
_thresh = 1 - (section + 1)/section_num
over_score_index = np.where(predict[:, 4] >= _thresh)[0]
if over_score_index.shape[0] <= 0:
image_pr[section, 0] = 0
image_pr[section, 1] = 0
else:
index = over_score_index[-1]
p_num = len(np.where(proposal[0:(index+1)] == 1)[0])
image_pr[section, 0] = p_num
image_pr[section, 1] = image_p_right[index]
return image_pr
def get_eval_result(self):
"""get_eval_result"""
self._norm_pre_score()
facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = self._get_gt_boxes()
section_num = 1000
sets = ['easy', 'medium', 'hard']
set_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
ap_key_dict = {0: "Easy Val AP : ", 1: "Medium Val AP : ", 2: "Hard Val AP : ",}
ap_dict = {}
for _set in range(len(sets)):
gt_list = set_gts[_set]
count_gt = 0
pr_curve = np.zeros((section_num, 2), dtype=np.float)
for i, _ in enumerate(event_list):
event = str(event_list[i][0][0])
image_list = file_list[i][0]
event_predict_dict = self.results[event]
event_gt_index_list = gt_list[i][0]
event_gt_box_list = facebox_list[i][0]
for j, _ in enumerate(image_list):
predict = np.array(event_predict_dict[str(image_list[j][0][0])]['bboxes']).astype(np.float)
gt_boxes = event_gt_box_list[j][0].astype('float')
keep_index = event_gt_index_list[j][0]
count_gt += len(keep_index)
if gt_boxes.shape[0] <= 0 or predict.shape[0] <= 0:
continue
keep = np.zeros(gt_boxes.shape[0])
if keep_index.shape[0] > 0:
keep[keep_index-1] = 1
image_pr = self._image_eval(predict, gt_boxes, keep,
iou_thresh=self.iou_thresh,
section_num=section_num)
pr_curve += image_pr
precision = pr_curve[:, 1] / pr_curve[:, 0]
recall = pr_curve[:, 1] / count_gt
precision = np.concatenate((np.array([0.]), precision, np.array([0.])))
recall = np.concatenate((np.array([0.]), recall, np.array([1.])))
for i in range(precision.shape[0]-1, 0, -1):
precision[i-1] = np.maximum(precision[i-1], precision[i])
index = np.where(recall[1:] != recall[:-1])[0]
ap = np.sum((recall[index + 1] - recall[index]) * precision[index + 1])
print(ap_key_dict[_set] + '{:.4f}'.format(ap))
return ap_dict
def val_with_resnet(args_opt):
"""val_with_resnet"""
from src.network_with_resnet import RetinaFace, resnet50
cfg = cfg_res50
context.set_context(mode=context.GRAPH_MODE, device_target='Ascend', device_id=cfg['device_id'], save_graphs=False)
backbone = resnet50(1001)
network = RetinaFace(phase='predict', backbone=backbone)
backbone.set_train(False)
network.set_train(False)
# load checkpoint
assert args_opt.val_model is not None, 'val_model is None.'
param_dict = load_checkpoint(args_opt.val_model)
print('Load trained model done. {}'.format(args_opt.val_model))
network.init_parameters_data()
load_param_into_net(network, param_dict)
# testing dataset
testset_folder = cfg['val_dataset_folder']
testset_label_path = cfg['val_dataset_folder'] + "label.txt"
with open(testset_label_path, 'r') as f:
_test_dataset = f.readlines()
test_dataset = []
for im_path in _test_dataset:
if im_path.startswith('# '):
test_dataset.append(im_path[2:-1]) # delete '# ...\n'
num_images = len(test_dataset)
timers = {'forward_time': Timer(), 'misc': Timer()}
if cfg['val_origin_size']:
h_max, w_max = 0, 0
for img_name in test_dataset:
image_path = os.path.join(testset_folder, 'images', img_name)
_img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if _img.shape[0] > h_max:
h_max = _img.shape[0]
if _img.shape[1] > w_max:
w_max = _img.shape[1]
h_max = (int(h_max / 32) + 1) * 32
w_max = (int(w_max / 32) + 1) * 32
priors = prior_box(image_sizes=(h_max, w_max),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
else:
target_size = 1600
max_size = 2176
priors = prior_box(image_sizes=(max_size, max_size),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
# init detection engine
detection = DetectionEngine(cfg)
# testing begin
print('Predict box starting')
for i, img_name in enumerate(test_dataset):
image_path = os.path.join(testset_folder, 'images', img_name)
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
# testing scale
if cfg['val_origin_size']:
resize = 1
assert img.shape[0] <= h_max and img.shape[1] <= w_max
image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
else:
im_size_min = np.min(img.shape[0:2])
im_size_max = np.max(img.shape[0:2])
resize = float(target_size) / float(im_size_min)
# prevent bigger axis from being more than max_size:
if np.round(resize * im_size_max) > max_size:
resize = float(max_size) / float(im_size_max)
img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
assert img.shape[0] <= max_size and img.shape[1] <= max_size
image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
img -= (104, 117, 123)
img = img.transpose(2, 0, 1)
img = np.expand_dims(img, 0)
img = Tensor(img)
timers['forward_time'].start()
boxes, confs, _ = network(img)
timers['forward_time'].end()
timers['misc'].start()
detection.detect(boxes, confs, resize, scale, img_name, priors)
timers['misc'].end()
print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
timers['forward_time'].diff,
timers['misc'].diff))
print('Predict box done.')
print('Eval starting')
if cfg['val_save_result']:
# Save the predict result if you want.
predict_result_path = detection.write_result()
print('predict result path is {}'.format(predict_result_path))
detection.get_eval_result()
print(args_opt.val_model)
print('Eval done.')
def val_with_mobilenet(args_opt):
"""val_with_mobilenet"""
from src.network_with_mobilenet import RetinaFace, resnet50, mobilenet025
context.set_context(mode=context.GRAPH_MODE, device_target='GPU', save_graphs=False)
# cfg = cfg_res50
cfg = cfg_mobile025
if cfg['name'] == 'ResNet50':
backbone = resnet50(1001)
elif cfg['name'] == 'MobileNet025':
backbone = mobilenet025(1000)
network = RetinaFace(phase='predict', backbone=backbone, cfg=cfg)
backbone.set_train(False)
network.set_train(False)
# load checkpoint
assert args_opt.val_model is not None, 'val_model is None.'
param_dict = load_checkpoint(args_opt.val_model)
print('Load trained model done. {}'.format(args_opt.val_model))
network.init_parameters_data()
load_param_into_net(network, param_dict)
# testing dataset
testset_folder = cfg['val_dataset_folder']
testset_label_path = cfg['val_dataset_folder'] + "label.txt"
with open(testset_label_path, 'r') as f:
_test_dataset = f.readlines()
test_dataset = []
for im_path in _test_dataset:
if im_path.startswith('# '):
test_dataset.append(im_path[2:-1]) # delete '# ...\n'
num_images = len(test_dataset)
timers = {'forward_time': Timer(), 'misc': Timer()}
if cfg['val_origin_size']:
h_max, w_max = 0, 0
for img_name in test_dataset:
image_path = os.path.join(testset_folder, 'images', img_name)
_img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if _img.shape[0] > h_max:
h_max = _img.shape[0]
if _img.shape[1] > w_max:
w_max = _img.shape[1]
h_max = (int(h_max / 32) + 1) * 32
w_max = (int(w_max / 32) + 1) * 32
priors = prior_box(image_sizes=(h_max, w_max),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
else:
target_size = 1600
max_size = 2176
priors = prior_box(image_sizes=(max_size, max_size),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
# init detection engine
detection = DetectionEngine(cfg)
# testing begin
print('Predict box starting')
for i, img_name in enumerate(test_dataset):
image_path = os.path.join(testset_folder, 'images', img_name)
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
# testing scale
if cfg['val_origin_size']:
resize = 1
assert img.shape[0] <= h_max and img.shape[1] <= w_max
image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
else:
im_size_min = np.min(img.shape[0:2])
im_size_max = np.max(img.shape[0:2])
resize = float(target_size) / float(im_size_min)
# prevent bigger axis from being more than max_size:
if np.round(resize * im_size_max) > max_size:
resize = float(max_size) / float(im_size_max)
img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
assert img.shape[0] <= max_size and img.shape[1] <= max_size
image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
img -= (104, 117, 123)
img = img.transpose(2, 0, 1)
img = np.expand_dims(img, 0)
img = Tensor(img)
timers['forward_time'].start()
boxes, confs, _ = network(img)
timers['forward_time'].end()
timers['misc'].start()
detection.detect(boxes, confs, resize, scale, img_name, priors)
timers['misc'].end()
print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
timers['forward_time'].diff,
timers['misc'].diff))
print('Predict box done.')
print('Eval starting')
if cfg['val_save_result']:
# Save the predict result if you want.
predict_result_path = detection.write_result()
print('predict result path is {}'.format(predict_result_path))
detection.get_eval_result()
print('Eval done.')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='val')
parser.add_argument('--backbone_name', type=str, default='ResNet50',
help='backbone name')
parser.add_argument('--val_model', type=str, default='./train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt',
help='val_model location')
args = parser.parse_args()
if args_opt.backbone_name == 'ResNet50':
val_with_resnet(args_opt=args)
elif args_opt.backbone_name == 'MobileNet025':
val_with_mobilenet(args_opt=args)

View File

@ -0,0 +1,62 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
##############export checkpoint file into air, onnx or mindir model#################
python export.py
"""
import argparse
import numpy as np
from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
from src.network import RetinaFace, resnet50
from src.config import cfg_res50
parser = argparse.ArgumentParser(description='senet export')
parser.add_argument("--device_id", type=int, default=0, help="Device id")
parser.add_argument("--batch_size", type=int, default=1, help="batch size")
parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.")
parser.add_argument("--file_name", type=str, default="retinaface", help="output file name.")
parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="MINDIR", help="file format")
parser.add_argument("--device_target", type=str, default="Ascend",
choices=["Ascend", "GPU", "CPU"], help="device target(default: Ascend)")
args = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
if args.device_target == "Ascend":
context.set_context(device_id=args.device_id)
def export_net():
"""export net"""
if cfg_res50['val_origin_size']:
height, width = 5568, 1056
else:
height, width = 2176, 2176
backbone = resnet50(1001)
net = RetinaFace(phase='predict', backbone=backbone)
backbone.set_train(False)
net.set_train(False)
assert args.ckpt_file is not None, "checkpoint_path is None."
param_dict = load_checkpoint(args.ckpt_file)
net.init_parameters_data()
load_param_into_net(net, param_dict)
input_arr = Tensor(np.zeros([args.batch_size, 3, height, width], np.float32))
export(net, input_arr, file_name=args.file_name, file_format=args.file_format)
if __name__ == '__main__':
export_net()

View File

@ -0,0 +1,423 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Infer Retinaface_resnet50."""
from __future__ import print_function
import argparse
import os
import time
import datetime
import numpy as np
import cv2
from mindspore import context
from src.config import cfg_res50
from src.utils import decode_bbox, prior_box
class Timer():
def __init__(self):
self.start_time = 0.
self.diff = 0.
def start(self):
self.start_time = time.time()
def end(self):
self.diff = time.time() - self.start_time
class DetectionEngine:
"""DetectionEngine"""
def __init__(self, cfg):
self.results = {}
self.nms_thresh = cfg['val_nms_threshold']
self.conf_thresh = cfg['val_confidence_threshold']
self.iou_thresh = cfg['val_iou_threshold']
self.var = cfg['variance']
self.save_prefix = cfg['val_predict_save_folder']
self.gt_dir = cfg['infer_gt_dir']
def _iou(self, a, b):
"""_iou"""
A = a.shape[0]
B = b.shape[0]
max_xy = np.minimum(
np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
min_xy = np.maximum(
np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
inter = np.maximum((max_xy - min_xy + 1), np.zeros_like(max_xy - min_xy))
inter = inter[:, :, 0] * inter[:, :, 1]
area_a = np.broadcast_to(
np.expand_dims(
(a[:, 2] - a[:, 0] + 1) * (a[:, 3] - a[:, 1] + 1), 1),
np.shape(inter))
area_b = np.broadcast_to(
np.expand_dims(
(b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1), 0),
np.shape(inter))
union = area_a + area_b - inter
return inter / union
def _nms(self, boxes, threshold=0.5):
"""_nms"""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
scores = boxes[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
reserved_boxes = []
while order.size > 0:
i = order[0]
reserved_boxes.append(i)
max_x1 = np.maximum(x1[i], x1[order[1:]])
max_y1 = np.maximum(y1[i], y1[order[1:]])
min_x2 = np.minimum(x2[i], x2[order[1:]])
min_y2 = np.minimum(y2[i], y2[order[1:]])
intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)
intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)
intersect_area = intersect_w * intersect_h
ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)
indices = np.where(ovr <= threshold)[0]
order = order[indices + 1]
return reserved_boxes
def write_result(self):
"""write_result"""
# save result to file.
import json
t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S')
try:
if not os.path.isdir(self.save_prefix):
os.makedirs(self.save_prefix)
self.file_path = self.save_prefix + '/predict' + t + '.json'
f = open(self.file_path, 'w')
json.dump(self.results, f)
except IOError as e:
raise RuntimeError("Unable to open json file to dump. What(): {}".format(str(e)))
else:
f.close()
return self.file_path
def detect(self, boxes, confs, resize, scale, image_path, priors):
"""detect"""
if boxes.shape[0] == 0:
# add to result
event_name, img_name = image_path.split('/')
self.results[event_name][img_name[:-4]] = {'img_path': image_path,
'bboxes': []}
return
boxes = decode_bbox(np.squeeze(boxes, 0), priors, self.var)
boxes = boxes * scale / resize
scores = np.squeeze(confs, 0)[:, 1]
# ignore low scores
inds = np.where(scores > self.conf_thresh)[0]
boxes = boxes[inds]
scores = scores[inds]
# keep top-K before NMS
order = scores.argsort()[::-1]
boxes = boxes[order]
scores = scores[order]
# do NMS
dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
keep = self._nms(dets, self.nms_thresh)
dets = dets[keep, :]
dets[:, 2:4] = (dets[:, 2:4].astype(np.int) - dets[:, 0:2].astype(np.int)).astype(np.float) # int
dets[:, 0:4] = dets[:, 0:4].astype(np.int).astype(np.float) # int
# add to result
event_name, img_name = image_path.split('/')
if event_name not in self.results.keys():
self.results[event_name] = {}
self.results[event_name][img_name[:-4]] = {'img_path': image_path,
'bboxes': dets[:, :5].astype(np.float).tolist()}
def _get_gt_boxes(self):
"""_get_gt_boxes"""
from scipy.io import loadmat
gt = loadmat(os.path.join(self.gt_dir, 'wider_face_val.mat'))
hard = loadmat(os.path.join(self.gt_dir, 'wider_hard_val.mat'))
medium = loadmat(os.path.join(self.gt_dir, 'wider_medium_val.mat'))
easy = loadmat(os.path.join(self.gt_dir, 'wider_easy_val.mat'))
faceboxes = gt['face_bbx_list']
events = gt['event_list']
files = gt['file_list']
hard_gt_list = hard['gt_list']
medium_gt_list = medium['gt_list']
easy_gt_list = easy['gt_list']
return faceboxes, events, files, hard_gt_list, medium_gt_list, easy_gt_list
def _norm_pre_score(self):
"""_norm_pre_score"""
max_score = 0
min_score = 1
for event in self.results:
for name in self.results[event].keys():
bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
if bbox.shape[0] <= 0:
continue
max_score = max(max_score, np.max(bbox[:, -1]))
min_score = min(min_score, np.min(bbox[:, -1]))
length = max_score - min_score
for event in self.results:
for name in self.results[event].keys():
bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
if bbox.shape[0] <= 0:
continue
bbox[:, -1] -= min_score
bbox[:, -1] /= length
self.results[event][name]['bboxes'] = bbox.tolist()
def _image_eval(self, predict, gt, keep, iou_thresh, section_num):
"""_image_eval"""
_predict = predict.copy()
_gt = gt.copy()
image_p_right = np.zeros(_predict.shape[0])
image_gt_right = np.zeros(_gt.shape[0])
proposal = np.ones(_predict.shape[0])
# x1y1wh -> x1y1x2y2
_predict[:, 2:4] = _predict[:, 0:2] + _predict[:, 2:4]
_gt[:, 2:4] = _gt[:, 0:2] + _gt[:, 2:4]
ious = self._iou(_predict[:, 0:4], _gt[:, 0:4])
for i in range(_predict.shape[0]):
gt_ious = ious[i, :]
max_iou, max_index = gt_ious.max(), gt_ious.argmax()
if max_iou >= iou_thresh:
if keep[max_index] == 0:
image_gt_right[max_index] = -1
proposal[i] = -1
elif image_gt_right[max_index] == 0:
image_gt_right[max_index] = 1
right_index = np.where(image_gt_right == 1)[0]
image_p_right[i] = len(right_index)
image_pr = np.zeros((section_num, 2), dtype=np.float)
for section in range(section_num):
_thresh = 1 - (section + 1)/section_num
over_score_index = np.where(predict[:, 4] >= _thresh)[0]
if over_score_index.shape[0] <= 0:
image_pr[section, 0] = 0
image_pr[section, 1] = 0
else:
index = over_score_index[-1]
p_num = len(np.where(proposal[0:(index+1)] == 1)[0])
image_pr[section, 0] = p_num
image_pr[section, 1] = image_p_right[index]
return image_pr
def get_eval_result(self):
"""get_eval_result"""
self._norm_pre_score()
facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = self._get_gt_boxes()
section_num = 1000
sets = ['easy', 'medium', 'hard']
set_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
ap_key_dict = {0: "Easy Val AP : ", 1: "Medium Val AP : ", 2: "Hard Val AP : ",}
ap_dict = {}
for _set in range(len(sets)):
gt_list = set_gts[_set]
count_gt = 0
pr_curve = np.zeros((section_num, 2), dtype=np.float)
for i, _ in enumerate(event_list):
event = str(event_list[i][0][0])
image_list = file_list[i][0]
event_predict_dict = self.results[event]
event_gt_index_list = gt_list[i][0]
event_gt_box_list = facebox_list[i][0]
for j, _ in enumerate(image_list):
predict = np.array(event_predict_dict[str(image_list[j][0][0])]['bboxes']).astype(np.float)
gt_boxes = event_gt_box_list[j][0].astype('float')
keep_index = event_gt_index_list[j][0]
count_gt += len(keep_index)
if gt_boxes.shape[0] <= 0 or predict.shape[0] <= 0:
continue
keep = np.zeros(gt_boxes.shape[0])
if keep_index.shape[0] > 0:
keep[keep_index-1] = 1
image_pr = self._image_eval(predict, gt_boxes, keep,
iou_thresh=self.iou_thresh,
section_num=section_num)
pr_curve += image_pr
precision = pr_curve[:, 1] / pr_curve[:, 0]
recall = pr_curve[:, 1] / count_gt
precision = np.concatenate((np.array([0.]), precision, np.array([0.])))
recall = np.concatenate((np.array([0.]), recall, np.array([1.])))
for i in range(precision.shape[0]-1, 0, -1):
precision[i-1] = np.maximum(precision[i-1], precision[i])
index = np.where(recall[1:] != recall[:-1])[0]
ap = np.sum((recall[index + 1] - recall[index]) * precision[index + 1])
print(ap_key_dict[_set] + '{:.4f}'.format(ap))
return ap_dict
def val():
"""val"""
parser = argparse.ArgumentParser(description='Postprocess file')
parser.add_argument('--device_id', type=int, default=0, help='device id.')
args_opt = parser.parse_args()
cfg = cfg_res50
context.set_context(mode=context.GRAPH_MODE, device_target='Ascend',
device_id=args_opt.device_id, save_graphs=False)
# testing dataset
testset_folder = cfg['infer_dataset_folder']
testset_label_path = cfg['infer_dataset_folder'] + "label.txt"
with open(testset_label_path, 'r') as f:
_test_dataset = f.readlines()
test_dataset = [] # such as "0--Parade/0_Parade_marchingband_1_465.jpg"
for im_path in _test_dataset:
if im_path.startswith('# '):
test_dataset.append(im_path[2:-1]) # delete '# ...\n'
num_images = len(test_dataset)
timers = {'forward_time': Timer(), 'misc': Timer()}
if cfg['val_origin_size']:
h_max, w_max = 0, 0
for img_name in test_dataset:
image_path = os.path.join(testset_folder, 'images', img_name) # .jpg's location
_img = cv2.imread(image_path, cv2.IMREAD_COLOR)
if _img.shape[0] > h_max:
h_max = _img.shape[0]
if _img.shape[1] > w_max:
w_max = _img.shape[1]
h_max = (int(h_max / 32) + 1) * 32
w_max = (int(w_max / 32) + 1) * 32
priors = prior_box(image_sizes=(h_max, w_max),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
else:
target_size = 1600
max_size = 2176
priors = prior_box(image_sizes=(max_size, max_size),
min_sizes=[[16, 32], [64, 128], [256, 512]],
steps=[8, 16, 32],
clip=False)
# init detection engine
detection = DetectionEngine(cfg)
# testing begin
print('Predict box starting')
for i, img_name in enumerate(test_dataset):
image_path = os.path.join(testset_folder, 'images', img_name) # .jpg's location
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
# testing scale
if cfg['val_origin_size']:
resize = 1
assert img.shape[0] <= h_max and img.shape[1] <= w_max
image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
else:
im_size_min = np.min(img.shape[0:2])
im_size_max = np.max(img.shape[0:2])
resize = float(target_size) / float(im_size_min)
# prevent bigger axis from being more than max_size:
if np.round(resize * im_size_max) > max_size:
resize = float(max_size) / float(im_size_max)
img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
assert img.shape[0] <= max_size and img.shape[1] <= max_size
image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
timers['forward_time'].start()
boxes_name = os.path.join("./result_Files", "widerface_test" + "_" + str(i) + "_0.bin")
boxes = np.fromfile(boxes_name, np.float32)
if cfg['val_origin_size']:
boxes = boxes.reshape(1, 241164, 4)
else:
boxes = boxes.reshape(1, 194208, 4)
confs_name = os.path.join("./result_Files", "widerface_test" + "_" + str(i) + "_1.bin")
confs = np.fromfile(confs_name, np.float32)
if cfg['val_origin_size']:
confs = confs.reshape(1, 241164, 2)
else:
confs = confs.reshape(1, 194208, 2)
timers['forward_time'].end()
timers['misc'].start()
detection.detect(boxes, confs, resize, scale, img_name, priors)
timers['misc'].end()
print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
timers['forward_time'].diff,
timers['misc'].diff))
print('Predict box done.')
print('Eval starting')
if cfg['val_save_result']:
# Save the predict result if you want.
predict_result_path = detection.write_result()
print('predict result path is {}'.format(predict_result_path))
detection.get_eval_result()
print(cfg['val_model'])
print('Eval done.')
if __name__ == '__main__':
val()

View File

@ -0,0 +1,88 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""preprocess"""
from __future__ import print_function
import argparse
import os
import numpy as np
import cv2
from src.config import cfg_res50
cfg = cfg_res50
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Process file')
parser.add_argument('--val_dataset_folder', type=str, default='/home/dataset/widerface/val',
help='val dataset folder.')
args_opt = parser.parse_args()
# testing dataset
testset_folder = args_opt.val_dataset_folder
testset_label_path = os.path.join(args_opt.val_dataset_folder, "label.txt")
with open(testset_label_path, 'r') as f:
_test_dataset = f.readlines()
test_dataset = []
for im_path in _test_dataset:
if im_path.startswith('# '):
test_dataset.append(im_path[2:-1]) # delete '# ...\n'
# transform data to bin_file
print('Transform starting')
img_path = "./bin_file"
os.makedirs(img_path)
h_max, w_max = 5568, 1056
for i, img_name in enumerate(test_dataset):
image_path = os.path.join(testset_folder, 'images', img_name)
img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)
# testing scale
if cfg['val_origin_size']:
resize = 1
assert img.shape[0] <= h_max and img.shape[1] <= w_max
image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
else:
im_size_min = np.min(img.shape[0:2])
im_size_max = np.max(img.shape[0:2])
resize = float(target_size) / float(im_size_min)
# prevent bigger axis from being more than max_size:
if np.round(resize * im_size_max) > max_size:
resize = float(max_size) / float(im_size_max)
img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
assert img.shape[0] <= max_size and img.shape[1] <= max_size
image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
image_t[:, :] = (104.0, 117.0, 123.0)
image_t[0:img.shape[0], 0:img.shape[1]] = img
img = image_t
scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
img -= (104, 117, 123)
img = img.transpose(2, 0, 1)
img = np.expand_dims(img, 0) # [1, c, h, w] (1, 3, 2176, 2176)
# save bin file
file_name = "widerface_test" + "_" + str(i) + ".bin"
file_path = os.path.join(img_path, file_name)
img.tofile(file_path)
if i % 50 == 0:
print("Finish {} files".format(i))
print("=" * 20, "export bin files finished", "=" * 20)

View File

@ -0,0 +1,3 @@
numpy
opencv-python
scipy

View File

@ -0,0 +1,40 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
ulimit -u unlimited
export DEVICE_NUM=8
export RANK_SIZE=8
RANK_TABLE_FILE=$(realpath $1)
export RANK_TABLE_FILE
echo "RANK_TABLE_FILE=${RANK_TABLE_FILE}"
export SERVER_ID=0
rank_start=$((DEVICE_NUM * SERVER_ID))
for((i=0; i<${DEVICE_NUM}; i++))
do
export DEVICE_ID=$i
export RANK_ID=$((rank_start + i))
rm -rf ./train_parallel$i
mkdir ./train_parallel$i
cp -r ./src ./train_parallel$i
cp ./train.py ./train_parallel$i
echo "start training for rank $RANK_ID, device $DEVICE_ID"
cd ./train_parallel$i ||exit
env > env.log
python train.py --backbone_name 'ResNet50' --device_id=$i > log 2>&1 &
cd ..
done

View File

@ -0,0 +1,27 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_distribute_gpu_train.sh DEVICE_NUM CUDA_VISIBLE_DEVICES"
echo "for example: bash run_distribute_gpu_train.sh 4 0,1,2,3"
echo "=============================================================================================================="
RANK_SIZE=$1
export CUDA_VISIBLE_DEVICES="$2"
mpirun --allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &

View File

@ -0,0 +1,129 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
if [[ $# -lt 2 || $# -gt 3 ]]; then
echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
exit 1
fi
get_real_path(){
if [ "${1:0:1}" == "/" ]; then
echo "$1"
else
echo "$(realpath -m $PWD/$1)"
fi
}
model=$(get_real_path $1)
dataset_path=$(get_real_path $2)
device_id=0
if [ $# == 3 ]; then
device_id=$3
fi
echo "mindir name: "$model
echo "dataset path: "$dataset_path
echo "device id: "$device_id
export ASCEND_HOME=/usr/local/Ascend/
if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then
export PATH=$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe
export PYTHONPATH=${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp
else
export PATH=$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
export PYTHONPATH=$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH
export ASCEND_OPP_PATH=$ASCEND_HOME/opp
fi
export ASCEND_HOME=/usr/local/Ascend
export PATH=$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/toolkit/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib/:/usr/local/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:/usr/local/Ascend/toolkit/lib64:$LD_LIBRARY_PATH
export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages
export PATH=/usr/local/python375/bin:$PATH
export NPU_HOST_LIB=/usr/local/Ascend/acllib/lib64/stub
export ASCEND_OPP_PATH=/usr/local/Ascend/opp
export ASCEND_AICPU_PATH=/usr/local/Ascend
export LD_LIBRARY_PATH=/usr/local/lib64/:$LD_LIBRARY_PATH
function preprocess_data()
{
if [ -d preprocess_Result ]; then
rm -rf ./preprocess_Result
fi
mkdir preprocess_Result
python3.7 ../preprocess.py --val_dataset_folder=$dataset_path
}
function compile_app()
{
cd ../ascend310_infer/ || exit
bash build.sh &> build.log
}
function infer()
{
cd - || exit
if [ -d result_Files ]; then
rm -rf ./result_Files
fi
if [ -d time_Result ]; then
rm -rf ./time_Result
fi
mkdir result_Files
mkdir time_Result
../ascend310_infer/out/main --mindir_path=$model --input0_path=./bin_file --device_id=$device_id &> infer.log
}
function cal_acc()
{
python3.7 ../postprocess.py --device_id=$device_id &> acc.log
}
preprocess_data
if [ $? -ne 0 ]; then
echo "preprocess dataset failed"
exit 1
fi
compile_app
if [ $? -ne 0 ]; then
echo "compile app code failed"
exit 1
fi
infer
if [ $? -ne 0 ]; then
echo " execute inference failed"
exit 1
fi
cal_acc
if [ $? -ne 0 ]; then
echo "calculate accuracy failed"
exit 1
fi

View File

@ -0,0 +1,22 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "for example: bash run_standalone_eval_ascend.sh [CKPT_FILE]"
echo "=============================================================================================================="
python eval.py --backbone_name 'ResNet50' --val_model $1 > ./eval.log 2>&1 &

View File

@ -0,0 +1,24 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "=============================================================================================================="
echo "Please run the script as: "
echo "bash run_standalone_gpu_eval.sh CUDA_VISIBLE_DEVICES"
echo "for example: bash run_standalone_gpu_eval.sh 0 [CKPT_FILE]"
echo "=============================================================================================================="
export CUDA_VISIBLE_DEVICES="$1"
python eval.py --backbone_name 'MobileNet025' --val_model $2 > eval.log 2>&1 &

View File

@ -0,0 +1,19 @@
#!/bin/bash
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
echo "Usage: bash ./scripts/run_standalone_train_ascend.sh"
python train.py --backbone_name 'ResNet50' > train.log 2>&1 &

View File

@ -0,0 +1,315 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Augmentation."""
import random
import copy
import cv2
import numpy as np
def _rand(a=0., b=1.):
return np.random.rand() * (b - a) + a
def bbox_iof(bbox_a, bbox_b, offset=0):
"""bbox_iof"""
if bbox_a.shape[1] < 4 or bbox_b.shape[1] < 4:
raise IndexError("Bounding boxes axis 1 must have at least length 4")
tl = np.maximum(bbox_a[:, None, 0:2], bbox_b[:, 0:2])
br = np.minimum(bbox_a[:, None, 2:4], bbox_b[:, 2:4])
area_i = np.prod(br - tl + offset, axis=2) * (tl < br).all(axis=2)
area_a = np.prod(bbox_a[:, 2:4] - bbox_a[:, :2] + offset, axis=1)
return area_i / np.maximum(area_a[:, None], 1)
def _is_iof_satisfied_constraint(box, crop_box):
iof = bbox_iof(box, crop_box)
satisfied = np.any((iof >= 1.0))
return satisfied
def _choose_candidate(max_trial, image_w, image_h, boxes):
"""_choose_candidate"""
# add default candidate
candidates = [(0, 0, image_w, image_h)]
for _ in range(max_trial):
# box_data should have at least one box
if _rand() > 0.2:
scale = _rand(0.3, 1.0)
else:
scale = 1.0
nh = int(scale * min(image_w, image_h))
nw = nh
dx = int(_rand(0, image_w - nw))
dy = int(_rand(0, image_h - nh))
if boxes.shape[0] > 0:
crop_box = np.array((dx, dy, dx + nw, dy + nh))
if not _is_iof_satisfied_constraint(boxes, crop_box[np.newaxis]):
continue
else:
candidates.append((dx, dy, nw, nh))
else:
raise Exception("!!! annotation box is less than 1")
if len(candidates) >= 3:
break
return candidates
def _correct_bbox_by_candidates(candidates, input_w, input_h, flip, boxes, labels, landms, allow_outside_center):
"""Calculate correct boxes."""
while candidates:
if len(candidates) > 1:
# ignore default candidate which do not crop
candidate = candidates.pop(np.random.randint(1, len(candidates)))
else:
candidate = candidates.pop(np.random.randint(0, len(candidates)))
dx, dy, nw, nh = candidate
boxes_t = copy.deepcopy(boxes)
landms_t = copy.deepcopy(landms)
labels_t = copy.deepcopy(labels)
landms_t = landms_t.reshape([-1, 5, 2])
if nw == nh:
scale = float(input_w) / float(nw)
else:
scale = float(input_w) / float(max(nh, nw))
boxes_t[:, [0, 2]] = (boxes_t[:, [0, 2]] - dx) * scale
boxes_t[:, [1, 3]] = (boxes_t[:, [1, 3]] - dy) * scale
landms_t[:, :, 0] = (landms_t[:, :, 0] - dx) * scale
landms_t[:, :, 1] = (landms_t[:, :, 1] - dy) * scale
if flip:
boxes_t[:, [0, 2]] = input_w - boxes_t[:, [2, 0]]
landms_t[:, :, 0] = input_w - landms_t[:, :, 0]
# flip landms
landms_t_1 = landms_t[:, 1, :].copy()
landms_t[:, 1, :] = landms_t[:, 0, :]
landms_t[:, 0, :] = landms_t_1
landms_t_4 = landms_t[:, 4, :].copy()
landms_t[:, 4, :] = landms_t[:, 3, :]
landms_t[:, 3, :] = landms_t_4
if allow_outside_center:
pass
else:
mask1 = np.logical_and((boxes_t[:, 0] + boxes_t[:, 2])/2. >= 0., (boxes_t[:, 1] + boxes_t[:, 3])/2. >= 0.)
boxes_t = boxes_t[mask1]
landms_t = landms_t[mask1]
labels_t = labels_t[mask1]
mask2 = np.logical_and((boxes_t[:, 0] + boxes_t[:, 2]) / 2. <= input_w,
(boxes_t[:, 1] + boxes_t[:, 3]) / 2. <= input_h)
boxes_t = boxes_t[mask2]
landms_t = landms_t[mask2]
labels_t = labels_t[mask2]
# recorrect x, y for case x,y < 0 reset to zero, after dx and dy, some box can smaller than zero
boxes_t[:, 0:2][boxes_t[:, 0:2] < 0] = 0
# recorrect w,h not higher than input size
boxes_t[:, 2][boxes_t[:, 2] > input_w] = input_w
boxes_t[:, 3][boxes_t[:, 3] > input_h] = input_h
box_w = boxes_t[:, 2] - boxes_t[:, 0]
box_h = boxes_t[:, 3] - boxes_t[:, 1]
# discard invalid box: w or h smaller than 1 pixel
mask3 = np.logical_and(box_w > 1, box_h > 1)
boxes_t = boxes_t[mask3]
landms_t = landms_t[mask3]
labels_t = labels_t[mask3]
# normal
boxes_t[:, [0, 2]] /= input_w
boxes_t[:, [1, 3]] /= input_h
landms_t[:, :, 0] /= input_w
landms_t[:, :, 1] /= input_h
landms_t = landms_t.reshape([-1, 10])
labels_t = np.expand_dims(labels_t, 1)
targets_t = np.hstack((boxes_t, landms_t, labels_t))
if boxes_t.shape[0] > 0:
return targets_t, candidate
raise Exception('all candidates can not satisfied re-correct bbox')
def get_interp_method(interp, sizes=()):
"""Get the interpolation method for resize functions.
The major purpose of this function is to wrap a random interp method selection
and a auto-estimation method.
Parameters
----------
interp : int
interpolation method for all resizing operations
Possible values:
0: Nearest Neighbors Interpolation.
1: Bilinear interpolation.
2: Bicubic interpolation over 4x4 pixel neighborhood.
3: Nearest Neighbors. [Originally it should be Area-based,
as we cannot find Area-based, so we use NN instead.
Area-based (resampling using pixel area relation). It may be a
preferred method for image decimation, as it gives moire-free
results. But when the image is zoomed, it is similar to the Nearest
Neighbors method. (used by default).
4: Lanczos interpolation over 8x8 pixel neighborhood.
9: Cubic for enlarge, area for shrink, bilinear for others
10: Random select from interpolation method mentioned above.
Note:
When shrinking an image, it will generally look best with AREA-based
interpolation, whereas, when enlarging an image, it will generally look best
with Bicubic (slow) or Bilinear (faster but still looks OK).
More details can be found in the documentation of OpenCV, please refer to
http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
sizes : tuple of int
(old_height, old_width, new_height, new_width), if None provided, auto(9)
will return Area(2) anyway.
Returns
-------
int
interp method from 0 to 4
"""
if interp == 9:
if sizes:
assert len(sizes) == 4
oh, ow, nh, nw = sizes
if nh > oh and nw > ow:
return 2
if nh < oh and nw < ow:
return 0
return 1
return 2
if interp == 10:
return random.randint(0, 4)
if interp not in (0, 1, 2, 3, 4):
raise ValueError('Unknown interp method %d' % interp)
return interp
def cv_image_reshape(interp):
"""Reshape pil image."""
reshape_type = {
0: cv2.INTER_LINEAR,
1: cv2.INTER_CUBIC,
2: cv2.INTER_AREA,
3: cv2.INTER_NEAREST,
4: cv2.INTER_LANCZOS4,
}
return reshape_type[interp]
def color_convert(image, a=1, b=0):
c_image = image.astype(float) * a + b
c_image[c_image < 0] = 0
c_image[c_image > 255] = 255
image[:] = c_image
def color_distortion(image):
"""color_distortion"""
image = copy.deepcopy(image)
if _rand() > 0.5:
if _rand() > 0.5:
color_convert(image, b=_rand(-32, 32))
if _rand() > 0.5:
color_convert(image, a=_rand(0.5, 1.5))
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
if _rand() > 0.5:
color_convert(image[:, :, 1], a=_rand(0.5, 1.5))
if _rand() > 0.5:
h_img = image[:, :, 0].astype(int) + random.randint(-18, 18)
h_img %= 180
image[:, :, 0] = h_img
image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
else:
if _rand() > 0.5:
color_convert(image, b=random.uniform(-32, 32))
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
if _rand() > 0.5:
color_convert(image[:, :, 1], a=random.uniform(0.5, 1.5))
if _rand() > 0.5:
tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
tmp %= 180
image[:, :, 0] = tmp
image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
if _rand() > 0.5:
color_convert(image, a=random.uniform(0.5, 1.5))
return image
class preproc():
"""preproc"""
def __init__(self, image_dim):
self.image_input_size = image_dim
def __call__(self, image, target):
assert target.shape[0] > 0, "target without ground truth."
_target = copy.deepcopy(target)
boxes = _target[:, :4]
landms = _target[:, 4:-1]
labels = _target[:, -1]
aug_image, aug_target = self._data_aug(image, boxes, labels, landms, self.image_input_size)
return aug_image, aug_target
def _data_aug(self, image, boxes, labels, landms, image_input_size, max_trial=250):
"""_data_aug"""
image_h, image_w, _ = image.shape
input_h, input_w = image_input_size, image_input_size
flip = _rand() < .5
candidates = _choose_candidate(max_trial=max_trial,
image_w=image_w,
image_h=image_h,
boxes=boxes)
targets, candidate = _correct_bbox_by_candidates(candidates=candidates,
input_w=input_w,
input_h=input_h,
flip=flip,
boxes=boxes,
labels=labels,
landms=landms,
allow_outside_center=False)
# crop image
dx, dy, nw, nh = candidate
image = image[dy:(dy + nh), dx:(dx + nw)]
if nw != nh:
assert nw == image_w and nh == image_h
# pad ori image to square
l = max(nw, nh)
t_image = np.empty((l, l, 3), dtype=image.dtype)
t_image[:, :] = (104, 117, 123)
t_image[:nh, :nw] = image
image = t_image
interp = get_interp_method(interp=10)
image = cv2.resize(image, (input_w, input_h), interpolation=cv_image_reshape(interp))
if flip:
image = image[:, ::-1]
image = image.astype(np.float32)
image -= (104, 117, 123)
image = image.transpose(2, 0, 1)
return image, targets

View File

@ -0,0 +1,134 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Config for train and eval."""
cfg_res50 = {
'name': 'ResNet50',
'device_target': "Ascend",
'device_id': 0,
'variance': [0.1, 0.2],
'clip': False,
'loc_weight': 2.0,
'class_weight': 1.0,
'landm_weight': 1.0,
'batch_size': 8,
'num_workers': 16,
'num_anchor': 29126,
'nnpu': 8,
'image_size': 840,
'in_channel': 256,
'out_channel': 256,
'match_thresh': 0.35,
# opt
'optim': 'sgd', # 'sgd' or 'momentum'
'momentum': 0.9,
'weight_decay': 1e-4,
'loss_scale': 1,
# seed
'seed': 1,
# lr
'epoch': 60,
'T_max': 50, # cosine_annealing
'eta_min': 0.0, # cosine_annealing
'decay1': 20,
'decay2': 40,
'lr_type': 'dynamic_lr', # 'dynamic_lr' or cosine_annealing
'initial_lr': 0.04,
'warmup_epoch': -1, # dynamic_lr: -1, cosine_annealing:0
'gamma': 0.1,
# checkpoint
'ckpt_path': './checkpoint/',
'keep_checkpoint_max': 8,
'resume_net': None,
# dataset
'training_dataset': '../data/widerface/train/label.txt',
'pretrain': True,
'pretrain_path': '../data/resnet-90_625.ckpt',
# val
'val_model': './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt',
'val_dataset_folder': './data/widerface/val/',
'val_origin_size': True,
'val_confidence_threshold': 0.02,
'val_nms_threshold': 0.4,
'val_iou_threshold': 0.5,
'val_save_result': False,
'val_predict_save_folder': './widerface_result',
'val_gt_dir': './data/ground_truth/',
# infer
'infer_dataset_folder': '/home/dataset/widerface/val/',
'infer_gt_dir': '/home/dataset/widerface/ground_truth/',
}
cfg_mobile025 = {
'name': 'MobileNet025',
'variance': [0.1, 0.2],
'clip': False,
'loc_weight': 2.0,
'class_weight': 1.0,
'landm_weight': 1.0,
'batch_size': 8,
'num_workers': 12,
'num_anchor': 16800,
'ngpu': 2,
'image_size': 640,
'in_channel': 32,
'out_channel': 64,
'match_thresh': 0.35,
# opt
'optim': 'sgd',
'momentum': 0.9,
'weight_decay': 5e-4,
# seed
'seed': 1,
# lr
'epoch': 120,
'decay1': 70,
'decay2': 90,
'lr_type': 'dynamic_lr',
'initial_lr': 0.02,
'warmup_epoch': 5,
'gamma': 0.1,
# checkpoint
'ckpt_path': './checkpoint/',
'save_checkpoint_steps': 2000,
'keep_checkpoint_max': 3,
'resume_net': None,
# dataset
'training_dataset': '../data/widerface/train/label.txt',
'pretrain': False,
'pretrain_path': '../data/mobilenetv1-90_5004.ckpt',
# val
'val_model': './checkpoint/ckpt_0/RetinaFace-117_804.ckpt',
'val_dataset_folder': './data/widerface/val/',
'val_origin_size': False,
'val_confidence_threshold': 0.02,
'val_nms_threshold': 0.4,
'val_iou_threshold': 0.5,
'val_save_result': False,
'val_predict_save_folder': './widerface_result',
'val_gt_dir': './data/ground_truth/',
}

View File

@ -0,0 +1,171 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Dataset for train and eval."""
import os
import copy
import cv2
import numpy as np
import mindspore.dataset as de
from mindspore.communication.management import init, get_rank, get_group_size
from .augmemtation import preproc
from .utils import bbox_encode
class WiderFace():
"""WiderFace"""
def __init__(self, label_path):
self.images_list = []
self.labels_list = []
f = open(label_path, 'r')
lines = f.readlines()
First = True
labels = []
for line in lines:
line = line.rstrip()
if line.startswith('#'):
if First is True:
First = False
else:
c_labels = copy.deepcopy(labels)
self.labels_list.append(c_labels)
labels.clear()
# remove '# '
path = line[2:]
path = label_path.replace('label.txt', 'images/') + path
assert os.path.exists(path), 'image path is not exists.'
self.images_list.append(path)
else:
line = line.split(' ')
label = [float(x) for x in line]
labels.append(label)
# add the last label
self.labels_list.append(labels)
# del bbox which width is zero or height is zero
for i in range(len(self.labels_list) - 1, -1, -1):
labels = self.labels_list[i]
for j in range(len(labels) - 1, -1, -1):
label = labels[j]
if label[2] <= 0 or label[3] <= 0:
labels.pop(j)
if not labels:
self.images_list.pop(i)
self.labels_list.pop(i)
else:
self.labels_list[i] = labels
def __len__(self):
return len(self.images_list)
def __getitem__(self, item):
return self.images_list[item], self.labels_list[item]
def read_dataset(img_path, annotation):
"""read_dataset"""
cv2.setNumThreads(2)
if isinstance(img_path, str):
img = cv2.imread(img_path)
else:
img = cv2.imread(img_path.tostring().decode("utf-8"))
labels = annotation
anns = np.zeros((0, 15))
if labels.shape[0] <= 0:
return anns
for _, label in enumerate(labels):
ann = np.zeros((1, 15))
# get bbox
ann[0, 0:2] = label[0:2] # x1, y1
ann[0, 2:4] = label[0:2] + label[2:4] # x2, y2
# get landmarks
ann[0, 4:14] = label[[4, 5, 7, 8, 10, 11, 13, 14, 16, 17]]
# set flag
if (ann[0, 4] < 0):
ann[0, 14] = -1
else:
ann[0, 14] = 1
anns = np.append(anns, ann, axis=0)
target = np.array(anns).astype(np.float32)
return img, target
def create_dataset(data_dir, cfg, batch_size=32, repeat_num=1, shuffle=True, multiprocessing=True, num_worker=16):
"""create_dataset"""
dataset = WiderFace(data_dir)
if cfg['name'] == 'ResNet50':
device_num, rank_id = _get_rank_info()
elif cfg['name'] == 'MobileNet025':
init("nccl")
rank_id = get_rank()
device_num = get_group_size()
if device_num == 1:
de_dataset = de.GeneratorDataset(dataset, ["image", "annotation"],
shuffle=shuffle,
num_parallel_workers=num_worker)
else:
de_dataset = de.GeneratorDataset(dataset, ["image", "annotation"],
shuffle=shuffle,
num_parallel_workers=num_worker,
num_shards=device_num,
shard_id=rank_id)
aug = preproc(cfg['image_size'])
encode = bbox_encode(cfg)
def union_data(image, annot):
i, a = read_dataset(image, annot)
i, a = aug(i, a)
out = encode(i, a)
return out
de_dataset = de_dataset.map(input_columns=["image", "annotation"],
output_columns=["image", "truths", "conf", "landm"],
column_order=["image", "truths", "conf", "landm"],
operations=union_data,
python_multiprocessing=multiprocessing,
num_parallel_workers=num_worker)
de_dataset = de_dataset.batch(batch_size, drop_remainder=True)
de_dataset = de_dataset.repeat(repeat_num)
return de_dataset
def _get_rank_info():
"""
get rank size and rank id
"""
rank_size = int(os.environ.get("RANK_SIZE", 1))
if rank_size > 1:
rank_size = get_group_size()
rank_id = get_rank()
else:
rank_size = rank_id = None
return rank_size, rank_id

View File

@ -0,0 +1,121 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Loss."""
import mindspore.common.dtype as mstype
import mindspore.nn as nn
from mindspore.ops import operations as P
from mindspore.ops import functional as F
from mindspore import Tensor
class SoftmaxCrossEntropyWithLogits(nn.Cell):
"""SoftmaxCrossEntropyWithLogits"""
def __init__(self):
super(SoftmaxCrossEntropyWithLogits, self).__init__()
self.log_softmax = P.LogSoftmax()
self.neg = P.Neg()
self.one_hot = P.OneHot()
self.on_value = Tensor(1.0, mstype.float32)
self.off_value = Tensor(0.0, mstype.float32)
self.reduce_sum = P.ReduceSum()
def construct(self, logits, labels):
"""construct"""
prob = self.log_softmax(logits)
labels = self.one_hot(labels, F.shape(logits)[-1], self.on_value, self.off_value)
return self.neg(self.reduce_sum(prob * labels, 1))
class MultiBoxLoss(nn.Cell):
"""MultiBoxLoss"""
def __init__(self, num_classes, num_boxes, neg_pre_positive, batch_size):
super(MultiBoxLoss, self).__init__()
self.num_classes = num_classes
self.num_boxes = num_boxes
self.neg_pre_positive = neg_pre_positive
self.notequal = P.NotEqual()
self.less = P.Less()
self.tile = P.Tile()
self.reduce_sum = P.ReduceSum()
self.reduce_mean = P.ReduceMean()
self.expand_dims = P.ExpandDims()
self.smooth_l1_loss = P.SmoothL1Loss()
self.cross_entropy = SoftmaxCrossEntropyWithLogits()
self.maximum = P.Maximum()
self.minimum = P.Minimum()
self.sort_descend = P.TopK(True)
self.sort = P.TopK(True)
self.max = P.ReduceMax()
self.log = P.Log()
self.exp = P.Exp()
self.concat = P.Concat(axis=1)
self.reduce_sum2 = P.ReduceSum(keep_dims=True)
self.mul = P.Mul()
self.reduce_sum_new = P.ReduceSum(keep_dims=True)
def construct(self, loc_data, loc_t, conf_data, conf_t, landm_data, landm_t):
"""construct"""
# landm loss
mask_pos1 = F.cast(self.less(0.0, F.cast(conf_t, mstype.float32)), mstype.float32)
N1 = self.maximum(self.reduce_sum(mask_pos1), 1)
mask_pos_idx1 = self.tile(self.expand_dims(mask_pos1, -1), (1, 1, 10))
loss_landm = self.reduce_sum(self.smooth_l1_loss(landm_data, landm_t) * mask_pos_idx1)
loss_landm = loss_landm / N1
# Localization Loss
mask_pos = F.cast(self.notequal(0, conf_t), mstype.float32)
conf_t = F.cast(mask_pos, mstype.int32)
N = self.maximum(self.reduce_sum(mask_pos), 1)
mask_pos_idx = self.tile(self.expand_dims(mask_pos, -1), (1, 1, 4))
loss_l = self.reduce_sum(self.smooth_l1_loss(loc_data, loc_t) * mask_pos_idx)
loss_l = loss_l / N
# Conf Loss
conf_t_shape = F.shape(conf_t)
conf_t = F.reshape(conf_t, (-1,))
indices = self.concat((1 - F.reshape(conf_t, (-1, 1)), F.reshape(conf_t, (-1, 1))))
batch_conf = F.reshape(conf_data, (-1, self.num_classes))
x_max = self.max(batch_conf)
loss_c = self.log(self.reduce_sum2(self.exp(batch_conf - x_max), 1)) + x_max
mul_tensor = self.mul(indices, batch_conf)
loss_c = loss_c - self.reduce_sum_new(mul_tensor, 1)
loss_c = F.reshape(loss_c, conf_t_shape)
# hard example mining
num_matched_boxes = F.reshape(self.reduce_sum(mask_pos, 1), (-1,))
neg_masked_cross_entropy = F.cast(loss_c * (1 - mask_pos), mstype.float32)
_, loss_idx = self.sort_descend(neg_masked_cross_entropy, self.num_boxes)
_, relative_position = self.sort(F.cast(loss_idx, mstype.float32), self.num_boxes)
relative_position = F.cast(relative_position, mstype.float32)
relative_position = relative_position[:, ::-1]
relative_position = F.cast(relative_position, mstype.int32)
num_neg_boxes = self.minimum(num_matched_boxes * self.neg_pre_positive, self.num_boxes - 1)
tile_num_neg_boxes = self.tile(self.expand_dims(num_neg_boxes, -1), (1, self.num_boxes))
top_k_neg_mask = F.cast(self.less(relative_position, tile_num_neg_boxes), mstype.float32)
cross_entropy = self.cross_entropy(batch_conf, conf_t)
cross_entropy = F.reshape(cross_entropy, conf_t_shape)
loss_c = self.reduce_sum(cross_entropy * self.minimum(mask_pos + top_k_neg_mask, 1))
loss_c = loss_c / N
return loss_l, loss_c, loss_landm

View File

@ -0,0 +1,81 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""learning rate schedule."""
import math
import numpy as np
def warmup_cosine_annealing_lr(lr5, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):
""" warmup cosine annealing lr"""
base_lr = lr5
warmup_init_lr = 0
total_steps = int(max_epoch * steps_per_epoch)
warmup_steps = int(warmup_epochs * steps_per_epoch)
lr_each_step = []
for i in range(total_steps):
last_epoch = i // steps_per_epoch
if i < warmup_steps:
lr5 = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
else:
lr5 = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2
lr_each_step.append(lr5)
return np.array(lr_each_step).astype(np.float32)
def _linear_warmup_learning_rate(current_step, warmup_steps, base_lr, init_lr):
lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
learning_rate = float(init_lr) + lr_inc * current_step
return learning_rate
def _a_cosine_learning_rate(current_step, base_lr, warmup_steps, decay_steps):
base = float(current_step - warmup_steps) / float(decay_steps)
learning_rate = (1 + math.cos(base * math.pi)) / 2 * base_lr
return learning_rate
def _dynamic_lr(base_lr, total_steps, warmup_steps, warmup_ratio=1 / 3):
lr = []
for i in range(total_steps):
if i < warmup_steps:
lr.append(_linear_warmup_learning_rate(i, warmup_steps, base_lr, base_lr * warmup_ratio))
else:
lr.append(_a_cosine_learning_rate(i, base_lr, warmup_steps, total_steps))
return lr
def adjust_learning_rate(initial_lr, gamma, stepvalues, steps_pre_epoch, total_epochs, warmup_epoch=5, lr_type1=None):
"""adjust_learning_rate"""
if lr_type1 == 'dynamic_lr':
return _dynamic_lr(initial_lr, total_epochs * steps_pre_epoch, warmup_epoch * steps_pre_epoch,
warmup_ratio=1 / 3)
lr_each_step = []
for epoch in range(1, total_epochs + 1):
for _ in range(steps_pre_epoch):
if epoch <= warmup_epoch:
lr = 0.1 * initial_lr * (1.5849 ** (epoch - 1))
else:
if stepvalues[0] <= epoch <= stepvalues[1]:
lr = initial_lr * (gamma ** (1))
elif epoch > stepvalues[1]:
lr = initial_lr * (gamma ** (2))
else:
lr = initial_lr
lr_each_step.append(lr)
return lr_each_step

View File

@ -0,0 +1,610 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Network."""
import math
from functools import reduce
import numpy as np
import mindspore
import mindspore.nn as nn
from mindspore.ops import functional as F
from mindspore.ops import operations as P
from mindspore.ops import composite as C
from mindspore import context, Tensor
from mindspore.parallel._auto_parallel_context import auto_parallel_context
from mindspore.communication.management import get_group_size
# ResNet
def _weight_variable(shape, factor=0.01):
init_value = np.random.randn(*shape).astype(np.float32) * factor
return Tensor(init_value)
def _conv3x3(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 3, 3)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
def _conv1x1(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 1, 1)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
def _conv7x7(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel)
def _bn_last(channel):
return nn.BatchNorm2d(channel)
def _fc(in_channel, out_channel):
weight_shape = (out_channel, in_channel)
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
class ResidualBlock(nn.Cell):
"""ResidualBlock"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1):
super(ResidualBlock, self).__init__()
channel = out_channel // self.expansion
self.conv1 = _conv1x1(in_channel, channel, stride=1)
self.bn1 = _bn(channel)
self.conv2 = _conv3x3(channel, channel, stride=stride)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1)
self.bn3 = _bn_last(out_channel)
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
_bn(out_channel)])
self.add = P.Add()
def construct(self, x):
"""construct"""
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
class ResNet(nn.Cell):
"""ResNet"""
def __init__(self,
block,
layer_nums,
in_channels,
out_channels,
strides,
num_classes):
super(ResNet, self).__init__()
if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
self.conv1 = _conv7x7(3, 64, stride=2)
self.bn1 = _bn(64)
self.relu = P.ReLU()
self.pad = P.Pad(((0, 0), (0, 0), (1, 0), (1, 0)))
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
self.layer1 = self._make_layer(block,
layer_nums[0],
in_channel=in_channels[0],
out_channel=out_channels[0],
stride=strides[0])
self.layer2 = self._make_layer(block,
layer_nums[1],
in_channel=in_channels[1],
out_channel=out_channels[1],
stride=strides[1])
self.layer3 = self._make_layer(block,
layer_nums[2],
in_channel=in_channels[2],
out_channel=out_channels[2],
stride=strides[2])
self.layer4 = self._make_layer(block,
layer_nums[3],
in_channel=in_channels[3],
out_channel=out_channels[3],
stride=strides[3])
self.mean = P.ReduceMean(keep_dims=True)
self.flatten = nn.Flatten()
self.end_point = _fc(out_channels[3], num_classes)
def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
"""_make_layer"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride)
layers.append(resnet_block)
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1)
layers.append(resnet_block)
return nn.SequentialCell(layers)
def construct(self, x):
"""construct"""
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.pad(x)
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
out = self.mean(c5, (2, 3))
out = self.flatten(out)
out = self.end_point(out)
return c3, c4, c5
def resnet50(class_num=10):
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
# MobileNet0.25
def conv_bn(inp, oup, stride=1, leaky=0):
return nn.SequentialCell([
nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=3, stride=stride,
pad_mode='pad', padding=1, has_bias=False),
nn.BatchNorm2d(num_features=oup, momentum=0.9),
nn.LeakyReLU(alpha=leaky) # ms official: nn.get_activation('relu6')
])
def conv_dw(inp, oup, stride, leaky=0.1):
return nn.SequentialCell([
nn.Conv2d(in_channels=inp, out_channels=inp, kernel_size=3, stride=stride,
pad_mode='pad', padding=1, group=inp, has_bias=False),
nn.BatchNorm2d(num_features=inp, momentum=0.9),
nn.LeakyReLU(alpha=leaky), # ms official: nn.get_activation('relu6')
nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, stride=1,
pad_mode='pad', padding=0, has_bias=False),
nn.BatchNorm2d(num_features=oup, momentum=0.9),
nn.LeakyReLU(alpha=leaky), # ms official: nn.get_activation('relu6')
])
class MobileNetV1(nn.Cell):
"""MobileNetV1"""
def __init__(self, num_classes):
super(MobileNetV1, self).__init__()
self.stage1 = nn.SequentialCell([
conv_bn(3, 8, 2, leaky=0.1), # 3
conv_dw(8, 16, 1), # 7
conv_dw(16, 32, 2), # 11
conv_dw(32, 32, 1), # 19
conv_dw(32, 64, 2), # 27
conv_dw(64, 64, 1), # 43
])
self.stage2 = nn.SequentialCell([
conv_dw(64, 128, 2), # 43 + 16 = 59
conv_dw(128, 128, 1), # 59 + 32 = 91
conv_dw(128, 128, 1), # 91 + 32 = 123
conv_dw(128, 128, 1), # 123 + 32 = 155
conv_dw(128, 128, 1), # 155 + 32 = 187
conv_dw(128, 128, 1), # 187 + 32 = 219
])
self.stage3 = nn.SequentialCell([
conv_dw(128, 256, 2), # 219 +3 2 = 241
conv_dw(256, 256, 1), # 241 + 64 = 301
])
self.avg = P.ReduceMean()
self.fc = nn.Dense(in_channels=256, out_channels=num_classes)
def construct(self, x):
x1 = self.stage1(x)
x2 = self.stage2(x1)
x3 = self.stage3(x2)
out = self.avg(x3, (2, 3))
out = self.fc(out)
return x1, x2, x3
def mobilenet025(class_num=1000):
return MobileNetV1(class_num)
# RetinaFace
def Init_KaimingUniform(arr_shape, a=0, nonlinearity='leaky_relu', has_bias=False):
"""Init_KaimingUniform"""
def _calculate_in_and_out(arr_shape):
dim = len(arr_shape)
if dim < 2:
raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
n_in = arr_shape[1]
n_out = arr_shape[0]
if dim > 2:
counter = reduce(lambda x, y: x * y, arr_shape[2:])
n_in *= counter
n_out *= counter
return n_in, n_out
def calculate_gain(nonlinearity, a=None):
linear_fans = ['linear', 'conv1d', 'conv2d', 'conv3d',
'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
if nonlinearity in linear_fans or nonlinearity == 'sigmoid':
return 1
if nonlinearity == 'tanh':
return 5.0 / 3
if nonlinearity == 'relu':
return math.sqrt(2.0)
if nonlinearity == 'leaky_relu':
if a is None:
negative_slope = 0.01
elif not isinstance(a, bool) and isinstance(a, int) or isinstance(a, float):
negative_slope = a
else:
raise ValueError("negative_slope {} not a valid number".format(a))
return math.sqrt(2.0 / (1 + negative_slope ** 2))
raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
fan_in, _ = _calculate_in_and_out(arr_shape)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan_in)
bound = math.sqrt(3.0) * std
weight = np.random.uniform(-bound, bound, arr_shape).astype(np.float32)
bias = None
if has_bias:
bound_bias = 1 / math.sqrt(fan_in)
bias = np.random.uniform(-bound_bias, bound_bias, arr_shape[0:1]).astype(np.float32)
bias = Tensor(bias)
return Tensor(weight), bias
class ConvBNReLU(nn.SequentialCell):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer, leaky=0):
weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
has_bias=False, weight_init=kaiming_weight),
norm_layer(out_planes),
nn.LeakyReLU(alpha=leaky)
)
class ConvBN(nn.SequentialCell):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer):
weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
super(ConvBN, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
has_bias=False, weight_init=kaiming_weight),
norm_layer(out_planes),
)
class SSH(nn.Cell):
"""SSH"""
def __init__(self, in_channel, out_channel):
super(SSH, self).__init__()
assert out_channel % 4 == 0
leaky = 0
if out_channel <= 64:
leaky = 0.1
norm_layer = nn.BatchNorm2d
self.conv3X3 = ConvBN(in_channel, out_channel // 2, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.conv5X5_1 = ConvBNReLU(in_channel, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.conv5X5_2 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.conv7X7_2 = ConvBNReLU(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.conv7X7_3 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.cat = P.Concat(axis=1)
self.relu = nn.ReLU()
def construct(self, x):
"""construct"""
conv3X3 = self.conv3X3(x)
conv5X5_1 = self.conv5X5_1(x)
conv5X5 = self.conv5X5_2(conv5X5_1)
conv7X7_2 = self.conv7X7_2(conv5X5_1)
conv7X7 = self.conv7X7_3(conv7X7_2)
out = self.cat((conv3X3, conv5X5, conv7X7))
out = self.relu(out)
return out
class FPN(nn.Cell):
"""FPN"""
def __init__(self, cfg):
super(FPN, self).__init__()
out_channels = cfg['out_channel']
leaky = 0
if out_channels <= 64:
leaky = 0.1
norm_layer = nn.BatchNorm2d
self.output1 = ConvBNReLU(cfg['in_channel'] * 2, cfg['out_channel'], kernel_size=1, stride=1,
padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
self.output2 = ConvBNReLU(cfg['in_channel'] * 4, cfg['out_channel'], kernel_size=1, stride=1,
padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
self.output3 = ConvBNReLU(cfg['in_channel'] * 8, cfg['out_channel'], kernel_size=1, stride=1,
padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
self.merge1 = ConvBNReLU(cfg['out_channel'], cfg['out_channel'], kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.merge2 = ConvBNReLU(cfg['out_channel'], cfg['out_channel'], kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
def construct(self, input1, input2, input3):
"""construct"""
output1 = self.output1(input1)
output2 = self.output2(input2)
output3 = self.output3(input3)
up3 = P.ResizeNearestNeighbor([P.Shape()(output2)[2], P.Shape()(output2)[3]])(output3)
output2 = up3 + output2
output2 = self.merge2(output2)
up2 = P.ResizeNearestNeighbor([P.Shape()(output1)[2], P.Shape()(output1)[3]])(output2)
output1 = up2 + output1
output1 = self.merge1(output1)
return output1, output2, output3
class ClassHead(nn.Cell):
"""ClassHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(ClassHead, self).__init__()
self.num_anchors = num_anchors
weight_shape = (self.num_anchors * 2, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, self.num_anchors * 2, kernel_size=(1, 1), stride=1, padding=0,
has_bias=True, weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 2))
class BboxHead(nn.Cell):
"""BboxHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(BboxHead, self).__init__()
weight_shape = (num_anchors * 4, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 4, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 4))
class LandmarkHead(nn.Cell):
"""LandmarkHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(LandmarkHead, self).__init__()
weight_shape = (num_anchors * 10, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 10, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 10))
class RetinaFace(nn.Cell):
"""RetinaFace"""
def __init__(self, phase='train', backbone=None, cfg=None):
super(RetinaFace, self).__init__()
self.phase = phase
self.base = backbone
self.fpn = FPN(cfg)
self.ssh1 = SSH(cfg['out_channel'], cfg['out_channel'])
self.ssh2 = SSH(cfg['out_channel'], cfg['out_channel'])
self.ssh3 = SSH(cfg['out_channel'], cfg['out_channel'])
self.ClassHead = self._make_class_head(fpn_num=3, inchannels=[cfg['out_channel'], cfg['out_channel'],
cfg['out_channel']], anchor_num=[2, 2, 2])
self.BboxHead = self._make_bbox_head(fpn_num=3, inchannels=[cfg['out_channel'], cfg['out_channel'],
cfg['out_channel']], anchor_num=[2, 2, 2])
self.LandmarkHead = self._make_landmark_head(fpn_num=3, inchannels=[cfg['out_channel'],
cfg['out_channel'],
cfg['out_channel']],
anchor_num=[2, 2, 2])
self.cat = P.Concat(axis=1)
def _make_class_head(self, fpn_num, inchannels, anchor_num):
classhead = nn.CellList()
for i in range(fpn_num):
classhead.append(ClassHead(inchannels[i], anchor_num[i]))
return classhead
def _make_bbox_head(self, fpn_num, inchannels, anchor_num):
bboxhead = nn.CellList()
for i in range(fpn_num):
bboxhead.append(BboxHead(inchannels[i], anchor_num[i]))
return bboxhead
def _make_landmark_head(self, fpn_num, inchannels, anchor_num):
landmarkhead = nn.CellList()
for i in range(fpn_num):
landmarkhead.append(LandmarkHead(inchannels[i], anchor_num[i]))
return landmarkhead
def construct(self, inputs):
"""construct"""
f1, f2, f3 = self.base(inputs)
f1, f2, f3 = self.fpn(f1, f2, f3)
# SSH
f1 = self.ssh1(f1)
f2 = self.ssh2(f2)
f3 = self.ssh3(f3)
features = [f1, f2, f3]
bbox = ()
for i, feature in enumerate(features):
bbox = bbox + (self.BboxHead[i](feature),)
bbox_regressions = self.cat(bbox)
cls = ()
for i, feature in enumerate(features):
cls = cls + (self.ClassHead[i](feature),)
classifications = self.cat(cls)
landm = ()
for i, feature in enumerate(features):
landm = landm + (self.LandmarkHead[i](feature),)
ldm_regressions = self.cat(landm)
if self.phase == 'train':
output = (bbox_regressions, classifications, ldm_regressions)
else:
output = (bbox_regressions, P.Softmax(-1)(classifications), ldm_regressions)
return output
class RetinaFaceWithLossCell(nn.Cell):
"""RetinaFaceWithLossCell"""
def __init__(self, network, multibox_loss, config):
super(RetinaFaceWithLossCell, self).__init__()
self.network = network
self.loc_weight = config['loc_weight']
self.class_weight = config['class_weight']
self.landm_weight = config['landm_weight']
self.multibox_loss = multibox_loss
def construct(self, img, loc_t, conf_t, landm_t):
pred_loc, pre_conf, pre_landm = self.network(img)
loss_loc, loss_conf, loss_landm = self.multibox_loss(pred_loc, loc_t, pre_conf, conf_t, pre_landm, landm_t)
return loss_loc * self.loc_weight + loss_conf * self.class_weight + loss_landm * self.landm_weight
class TrainingWrapper(nn.Cell):
"""TrainingWrapper"""
def __init__(self, network, optimizer, sens=1.0):
super(TrainingWrapper, self).__init__(auto_prefix=False)
self.network = network
self.weights = mindspore.ParameterTuple(network.trainable_params())
self.optimizer = optimizer
self.grad = C.GradOperation(get_by_list=True, sens_param=True)
self.sens = sens
self.reducer_flag = False
self.grad_reducer = None
self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
class_list = [mindspore.context.ParallelMode.DATA_PARALLEL, mindspore.context.ParallelMode.HYBRID_PARALLEL]
if self.parallel_mode in class_list:
self.reducer_flag = True
if self.reducer_flag:
mean = context.get_auto_parallel_context("gradients_mean")
if auto_parallel_context().get_device_num_is_set():
degree = context.get_auto_parallel_context("device_num")
else:
degree = get_group_size()
self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
def construct(self, *args):
weights = self.weights
loss = self.network(*args)
sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
grads = self.grad(self.network, weights)(*args, sens)
if self.reducer_flag:
# apply grad reducer on grads
grads = self.grad_reducer(grads)
return F.depend(loss, self.optimizer(grads))

View File

@ -0,0 +1,578 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Network."""
import math
from functools import reduce
import numpy as np
import mindspore
import mindspore.nn as nn
from mindspore.ops import functional as F
from mindspore.ops import operations as P
from mindspore.ops import composite as C
from mindspore import context, Tensor
from mindspore.parallel._auto_parallel_context import auto_parallel_context
from mindspore.communication.management import get_group_size
conv_weight_init = 'HeUniform'
# ResNet
def _weight_variable(shape, factor=0.01):
init_value = np.random.randn(*shape).astype(np.float32) * factor
return Tensor(init_value)
def _conv3x3(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 3, 3)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
def _conv1x1(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 1, 1)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
def _conv7x7(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel)
def _bn_last(channel):
return nn.BatchNorm2d(channel)
def _fc(in_channel, out_channel):
weight_shape = (out_channel, in_channel)
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
class ResidualBlock(nn.Cell):
"""ResidualBlock"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1):
super(ResidualBlock, self).__init__()
channel = out_channel // self.expansion
self.conv1 = _conv1x1(in_channel, channel, stride=1)
self.bn1 = _bn(channel)
self.conv2 = _conv3x3(channel, channel, stride=stride)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1)
self.bn3 = _bn_last(out_channel)
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
_bn(out_channel)])
self.add = P.Add()
def construct(self, x):
"""construct"""
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
class ResNet(nn.Cell):
"""ResNet"""
def __init__(self,
block,
layer_nums,
in_channels,
out_channels,
strides,
num_classes):
super(ResNet, self).__init__()
if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
self.conv1 = _conv7x7(3, 64, stride=2)
self.bn1 = _bn(64)
self.relu = P.ReLU()
self.zeros1 = P.Zeros()
self.zeros2 = P.Zeros()
self.concat1 = P.Concat(axis=2)
self.concat2 = P.Concat(axis=3)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
self.layer1 = self._make_layer(block,
layer_nums[0],
in_channel=in_channels[0],
out_channel=out_channels[0],
stride=strides[0])
self.layer2 = self._make_layer(block,
layer_nums[1],
in_channel=in_channels[1],
out_channel=out_channels[1],
stride=strides[1])
self.layer3 = self._make_layer(block,
layer_nums[2],
in_channel=in_channels[2],
out_channel=out_channels[2],
stride=strides[2])
self.layer4 = self._make_layer(block,
layer_nums[3],
in_channel=in_channels[3],
out_channel=out_channels[3],
stride=strides[3])
self.mean = P.ReduceMean(keep_dims=True)
self.flatten = nn.Flatten()
self.end_point = _fc(out_channels[3], num_classes)
def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
"""_make_layer"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride)
layers.append(resnet_block)
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1)
layers.append(resnet_block)
return nn.SequentialCell(layers)
def construct(self, x):
"""construct"""
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
zeros1 = self.zeros1((x.shape[0], x.shape[1], 1, x.shape[3]), mindspore.float32)
x = self.concat1((zeros1, x))
zeros2 = self.zeros2((x.shape[0], x.shape[1], x.shape[2], 1), mindspore.float32)
x = self.concat2((zeros2, x))
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
out = self.mean(c5, (2, 3))
out = self.flatten(out)
out = self.end_point(out)
return c3, c4, c5
def resnet50(class_num=10):
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)
# RetinaFace
def Init_KaimingUniform(arr_shape, a=0, nonlinearity='leaky_relu', has_bias=False):
"""Init_KaimingUniform"""
def _calculate_in_and_out(arr_shape):
dim = len(arr_shape)
if dim < 2:
raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
n_in = arr_shape[1]
n_out = arr_shape[0]
if dim > 2:
counter = reduce(lambda x, y: x * y, arr_shape[2:])
n_in *= counter
n_out *= counter
return n_in, n_out
def calculate_gain(nonlinearity, a=None):
linear_fans = ['linear', 'conv1d', 'conv2d', 'conv3d',
'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
if nonlinearity in linear_fans or nonlinearity == 'sigmoid':
return 1
if nonlinearity == 'tanh':
return 5.0 / 3
if nonlinearity == 'relu':
return math.sqrt(2.0)
if nonlinearity == 'leaky_relu':
if a is None:
negative_slope = 0.01
elif not isinstance(a, bool) and isinstance(a, int) or isinstance(a, float):
negative_slope = a
else:
raise ValueError("negative_slope {} not a valid number".format(a))
return math.sqrt(2.0 / (1 + negative_slope ** 2))
raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
fan_in, _ = _calculate_in_and_out(arr_shape)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan_in)
bound = math.sqrt(3.0) * std
weight = np.random.uniform(-bound, bound, arr_shape).astype(np.float32)
bias = None
if has_bias:
bound_bias = 1 / math.sqrt(fan_in)
bias = np.random.uniform(-bound_bias, bound_bias, arr_shape[0:1]).astype(np.float32)
bias = Tensor(bias)
return Tensor(weight), bias
class ConvBNReLU(nn.SequentialCell):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer, leaky=0):
weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
has_bias=False, weight_init=kaiming_weight),
norm_layer(out_planes),
nn.ReLU()
)
class ConvBN(nn.SequentialCell):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer):
weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
super(ConvBN, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
has_bias=False, weight_init=kaiming_weight),
norm_layer(out_planes),
)
class SSH(nn.Cell):
"""SSH"""
def __init__(self, in_channel, out_channel):
super(SSH, self).__init__()
assert out_channel % 4 == 0
leaky = 0
if out_channel <= 64:
leaky = 0.1
norm_layer = nn.BatchNorm2d
self.conv3X3 = ConvBN(in_channel, out_channel // 2, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.conv5X5_1 = ConvBNReLU(in_channel, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.conv5X5_2 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.conv7X7_2 = ConvBNReLU(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.conv7X7_3 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer)
self.cat = P.Concat(axis=1)
self.relu = nn.ReLU()
def construct(self, x):
"""construct"""
conv3X3 = self.conv3X3(x)
conv5X5_1 = self.conv5X5_1(x)
conv5X5 = self.conv5X5_2(conv5X5_1)
conv7X7_2 = self.conv7X7_2(conv5X5_1)
conv7X7 = self.conv7X7_3(conv7X7_2)
out = self.cat((conv3X3, conv5X5, conv7X7))
out = self.relu(out)
return out
class FPN(nn.Cell):
"""FPN"""
def __init__(self):
super(FPN, self).__init__()
out_channels = 256
leaky = 0
if out_channels <= 64:
leaky = 0.1
norm_layer = nn.BatchNorm2d
self.output1 = ConvBNReLU(512, 256, kernel_size=1, stride=1, padding=0, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.output2 = ConvBNReLU(1024, 256, kernel_size=1, stride=1, padding=0, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.output3 = ConvBNReLU(2048, 256, kernel_size=1, stride=1, padding=0, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.merge1 = ConvBNReLU(256, 256, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
self.merge2 = ConvBNReLU(256, 256, kernel_size=3, stride=1, padding=1, groups=1,
norm_layer=norm_layer, leaky=leaky)
def construct(self, input1, input2, input3):
"""construct"""
output1 = self.output1(input1)
output2 = self.output2(input2)
output3 = self.output3(input3)
up3 = P.ResizeNearestNeighbor([P.Shape()(output2)[2], P.Shape()(output2)[3]])(output3)
output2 = up3 + output2
output2 = self.merge2(output2)
up2 = P.ResizeNearestNeighbor([P.Shape()(output1)[2], P.Shape()(output1)[3]])(output2)
output1 = up2 + output1
output1 = self.merge1(output1)
return output1, output2, output3
class ClassHead(nn.Cell):
"""ClassHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(ClassHead, self).__init__()
self.num_anchors = num_anchors
weight_shape = (self.num_anchors * 2, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, self.num_anchors * 2, kernel_size=(1, 1), stride=1, padding=0,
has_bias=True, weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
"""construct"""
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 2))
class BboxHead(nn.Cell):
"""BboxHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(BboxHead, self).__init__()
weight_shape = (num_anchors * 4, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 4, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
"""construct"""
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 4))
class LandmarkHead(nn.Cell):
"""LandmarkHead"""
def __init__(self, inchannels=512, num_anchors=3):
super(LandmarkHead, self).__init__()
weight_shape = (num_anchors * 10, inchannels, 1, 1)
kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 10, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
weight_init=kaiming_weight, bias_init=kaiming_bias)
self.permute = P.Transpose()
self.reshape = P.Reshape()
def construct(self, x):
"""construct"""
out = self.conv1x1(x)
out = self.permute(out, (0, 2, 3, 1))
return self.reshape(out, (P.Shape()(out)[0], -1, 10))
class RetinaFace(nn.Cell):
"""RetinaFace"""
def __init__(self, phase='train', backbone=None):
super(RetinaFace, self).__init__()
self.phase = phase
self.base = backbone
self.fpn = FPN()
self.ssh1 = SSH(256, 256)
self.ssh2 = SSH(256, 256)
self.ssh3 = SSH(256, 256)
self.ClassHead = self._make_class_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
self.BboxHead = self._make_bbox_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
self.LandmarkHead = self._make_landmark_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
self.cat = P.Concat(axis=1)
def _make_class_head(self, fpn_num, inchannels, anchor_num):
classhead = nn.CellList()
for i in range(fpn_num):
classhead.append(ClassHead(inchannels[i], anchor_num[i]))
return classhead
def _make_bbox_head(self, fpn_num, inchannels, anchor_num):
bboxhead = nn.CellList()
for i in range(fpn_num):
bboxhead.append(BboxHead(inchannels[i], anchor_num[i]))
return bboxhead
def _make_landmark_head(self, fpn_num, inchannels, anchor_num):
landmarkhead = nn.CellList()
for i in range(fpn_num):
landmarkhead.append(LandmarkHead(inchannels[i], anchor_num[i]))
return landmarkhead
def construct(self, inputs):
"""construct"""
f1, f2, f3 = self.base(inputs)
f1, f2, f3 = self.fpn(f1, f2, f3)
# SSH
f1 = self.ssh1(f1)
f2 = self.ssh2(f2)
f3 = self.ssh3(f3)
features = [f1, f2, f3]
bbox = ()
for i, feature in enumerate(features):
bbox = bbox + (self.BboxHead[i](feature),)
bbox_regressions = self.cat(bbox)
cls = ()
for i, feature in enumerate(features):
cls = cls + (self.ClassHead[i](feature),)
classifications = self.cat(cls)
landm = ()
for i, feature in enumerate(features):
landm = landm + (self.LandmarkHead[i](feature),)
ldm_regressions = self.cat(landm)
if self.phase == 'train':
output = (bbox_regressions, classifications, ldm_regressions)
else:
output = (bbox_regressions, P.Softmax(-1)(classifications), ldm_regressions)
return output
class RetinaFaceWithLossCell(nn.Cell):
"""RetinaFaceWithLossCell"""
def __init__(self, network, multibox_loss, config):
super(RetinaFaceWithLossCell, self).__init__()
self.network = network
self.loc_weight = config['loc_weight']
self.class_weight = config['class_weight']
self.landm_weight = config['landm_weight']
self.multibox_loss = multibox_loss
def construct(self, img, loc_t, conf_t, landm_t):
"""construct"""
pred_loc, pre_conf, pre_landm = self.network(img)
loss_loc, loss_conf, loss_landm = self.multibox_loss(pred_loc, loc_t, pre_conf, conf_t, pre_landm, landm_t) # 跳转到loss.py中的MultiBoxLoss类construct函数
return loss_loc * self.loc_weight + loss_conf * self.class_weight + loss_landm * self.landm_weight
# form dsj
GRADIENT_CLIP_TYPE = 1
GRADIENT_CLIP_VALUE = 1.0
clip_grad = C.MultitypeFuncGraph("clip_grad")
@clip_grad.register("Number", "Number", "Tensor")
def _clip_grad(clip_type, clip_value, grad):
"""_clip_grad"""
if clip_type not in (0, 1):
return grad
dt = F.dtype(grad)
if clip_type == 0:
new_grad = C.clip_by_value(grad, F.cast(F.tuple_to_array((-clip_value,)), dt),
F.cast(F.tuple_to_array((clip_value,)), dt))
else:
new_grad = nn.ClipByNorm()(grad, F.cast(F.tuple_to_array((clip_value,)), dt))
return new_grad
class TrainingWrapper(nn.Cell):
"""TrainingWrapper"""
def __init__(self, network, optimizer, sens=1.0):
super(TrainingWrapper, self).__init__(auto_prefix=False)
self.network = network
self.weights = mindspore.ParameterTuple(network.trainable_params())
self.optimizer = optimizer
self.grad = C.GradOperation(get_by_list=True, sens_param=True)
self.sens = sens
self.reducer_flag = False
self.grad_reducer = None
self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
class_list = [mindspore.context.ParallelMode.DATA_PARALLEL, mindspore.context.ParallelMode.HYBRID_PARALLEL]
if self.parallel_mode in class_list:
self.reducer_flag = True
if self.reducer_flag:
mean = context.get_auto_parallel_context("gradients_mean")
if auto_parallel_context().get_device_num_is_set():
degree = context.get_auto_parallel_context("device_num")
else:
degree = get_group_size()
self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
# from dsj
self.hyper_map = mindspore.ops.HyperMap()
def construct(self, *args):
"""construct"""
weights = self.weights
loss = self.network(*args)
sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
grads = self.grad(self.network, weights)(*args, sens)
# from dsj
grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
if self.reducer_flag:
# apply grad reducer on grads
grads = self.grad_reducer(grads)
return F.depend(loss, self.optimizer(grads))

View File

@ -0,0 +1,204 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Network."""
import numpy as np
import mindspore.nn as nn
from mindspore.ops import operations as P
from mindspore import Tensor
# ResNet
def _weight_variable(shape, factor=0.01):
init_value = np.random.randn(*shape).astype(np.float32) * factor
return Tensor(init_value)
def _conv3x3(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 3, 3)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
def _conv1x1(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 1, 1)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
def _conv7x7(in_channel, out_channel, stride=1):
weight_shape = (out_channel, in_channel, 7, 7)
weight = _weight_variable(weight_shape)
return nn.Conv2d(in_channel, out_channel,
kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
def _bn(channel):
return nn.BatchNorm2d(channel)
def _bn_last(channel):
return nn.BatchNorm2d(channel)
def _fc(in_channel, out_channel):
weight_shape = (out_channel, in_channel)
weight = _weight_variable(weight_shape)
return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
class ResidualBlock(nn.Cell):
"""ResidualBlock"""
expansion = 4
def __init__(self,
in_channel,
out_channel,
stride=1):
super(ResidualBlock, self).__init__()
channel = out_channel // self.expansion
self.conv1 = _conv1x1(in_channel, channel, stride=1)
self.bn1 = _bn(channel)
self.conv2 = _conv3x3(channel, channel, stride=stride)
self.bn2 = _bn(channel)
self.conv3 = _conv1x1(channel, out_channel, stride=1)
self.bn3 = _bn_last(out_channel)
self.relu = nn.ReLU()
self.down_sample = False
if stride != 1 or in_channel != out_channel:
self.down_sample = True
self.down_sample_layer = None
if self.down_sample:
self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
_bn(out_channel)])
self.add = P.Add()
def construct(self, x):
"""construct"""
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.down_sample:
identity = self.down_sample_layer(identity)
out = self.add(out, identity)
out = self.relu(out)
return out
class ResNet(nn.Cell):
"""ResNet"""
def __init__(self,
block,
layer_nums,
in_channels,
out_channels,
strides,
num_classes):
super(ResNet, self).__init__()
if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
self.conv1 = _conv7x7(3, 64, stride=2)
self.bn1 = _bn(64)
self.relu = P.ReLU()
self.pad = P.Pad(((0, 0), (0, 0), (1, 0), (1, 0)))
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
self.layer1 = self._make_layer(block,
layer_nums[0],
in_channel=in_channels[0],
out_channel=out_channels[0],
stride=strides[0])
self.layer2 = self._make_layer(block,
layer_nums[1],
in_channel=in_channels[1],
out_channel=out_channels[1],
stride=strides[1])
self.layer3 = self._make_layer(block,
layer_nums[2],
in_channel=in_channels[2],
out_channel=out_channels[2],
stride=strides[2])
self.layer4 = self._make_layer(block,
layer_nums[3],
in_channel=in_channels[3],
out_channel=out_channels[3],
stride=strides[3])
self.mean = P.ReduceMean(keep_dims=True)
self.flatten = nn.Flatten()
self.end_point = _fc(out_channels[3], num_classes)
def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
"""_make_layer"""
layers = []
resnet_block = block(in_channel, out_channel, stride=stride)
layers.append(resnet_block)
for _ in range(1, layer_num):
resnet_block = block(out_channel, out_channel, stride=1)
layers.append(resnet_block)
return nn.SequentialCell(layers)
def construct(self, x):
"""construct"""
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.pad(x)
c1 = self.maxpool(x)
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
out = self.mean(c5, (2, 3))
out = self.flatten(out)
out = self.end_point(out)
return out
def resnet50(class_num=1000):
return ResNet(ResidualBlock,
[3, 4, 6, 3],
[64, 256, 512, 1024],
[256, 512, 1024, 2048],
[1, 2, 2, 2],
class_num)

View File

@ -0,0 +1,165 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Utils."""
from itertools import product
import math
import numpy as np
def prior_box(image_sizes, min_sizes, steps, clip=False):
"""prior box"""
feature_maps = [
[math.ceil(image_sizes[0] / step), math.ceil(image_sizes[1] / step)]
for step in steps]
anchors = []
for k, f in enumerate(feature_maps):
for i, j in product(range(f[0]), range(f[1])):
for min_size in min_sizes[k]:
s_kx = min_size / image_sizes[1]
s_ky = min_size / image_sizes[0]
cx = (j + 0.5) * steps[k] / image_sizes[1]
cy = (i + 0.5) * steps[k] / image_sizes[0]
anchors += [cx, cy, s_kx, s_ky]
output = np.asarray(anchors).reshape([-1, 4]).astype(np.float32)
if clip:
output = np.clip(output, 0, 1)
return output
def center_point_2_box(boxes):
return np.concatenate((boxes[:, 0:2] - boxes[:, 2:4] / 2,
boxes[:, 0:2] + boxes[:, 2:4] / 2), axis=1)
def compute_intersect(a, b):
"""compute_intersect"""
A = a.shape[0]
B = b.shape[0]
max_xy = np.minimum(
np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
min_xy = np.maximum(
np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
inter = np.maximum((max_xy - min_xy), np.zeros_like(max_xy - min_xy))
return inter[:, :, 0] * inter[:, :, 1]
def compute_overlaps(a, b):
"""compute_overlaps"""
inter = compute_intersect(a, b)
area_a = np.broadcast_to(
np.expand_dims(
(a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), 1),
np.shape(inter))
area_b = np.broadcast_to(
np.expand_dims(
(b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1]), 0),
np.shape(inter))
union = area_a + area_b - inter
return inter / union
def match(threshold, boxes, priors, var, labels, landms):
"""match"""
overlaps = compute_overlaps(boxes, center_point_2_box(priors))
best_prior_overlap = overlaps.max(1, keepdims=True)
best_prior_idx = np.argsort(-overlaps, axis=1)[:, 0:1]
valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
if best_prior_idx_filter.shape[0] <= 0:
loc = np.zeros((priors.shape[0], 4), dtype=np.float32)
conf = np.zeros((priors.shape[0],), dtype=np.int32)
landm = np.zeros((priors.shape[0], 10), dtype=np.float32)
return loc, conf, landm
best_truth_overlap = overlaps.max(0, keepdims=True)
best_truth_idx = np.argsort(-overlaps, axis=0)[:1, :]
best_truth_idx = best_truth_idx.squeeze(0)
best_truth_overlap = best_truth_overlap.squeeze(0)
best_prior_idx = best_prior_idx.squeeze(1)
best_prior_idx_filter = best_prior_idx_filter.squeeze(1)
best_truth_overlap[best_prior_idx_filter] = 2
for j in range(best_prior_idx.shape[0]):
best_truth_idx[best_prior_idx[j]] = j
matches = boxes[best_truth_idx]
# encode boxes
offset_cxcy = (matches[:, 0:2] + matches[:, 2:4]) / 2 - priors[:, 0:2]
offset_cxcy /= (var[0] * priors[:, 2:4])
wh = (matches[:, 2:4] - matches[:, 0:2]) / priors[:, 2:4]
wh[wh == 0] = 1e-12
wh = np.log(wh) / var[1]
loc = np.concatenate([offset_cxcy, wh], axis=1)
conf = labels[best_truth_idx]
conf[best_truth_overlap < threshold] = 0
matches_landm = landms[best_truth_idx]
# encode landms
matched = np.reshape(matches_landm, [-1, 5, 2])
priors = np.broadcast_to(np.expand_dims(priors, 1), [priors.shape[0], 5, 4])
offset_cxcy = matched[:, :, 0:2] - priors[:, :, 0:2]
offset_cxcy /= (priors[:, :, 2:4] * var[0])
landm = np.reshape(offset_cxcy, [-1, 10])
return loc, np.array(conf, dtype=np.int32), landm
class bbox_encode():
"""bbox_encode"""
def __init__(self, cfg):
self.match_thresh = cfg['match_thresh']
self.variances = cfg['variance']
self.priors = prior_box((cfg['image_size'], cfg['image_size']),
[[16, 32], [64, 128], [256, 512]],
[8, 16, 32],
cfg['clip'])
def __call__(self, image, targets):
boxes = targets[:, :4]
labels = targets[:, -1]
landms = targets[:, 4:14]
priors = self.priors
loc_t, conf_t, landm_t = match(self.match_thresh, boxes, priors, self.variances, labels, landms)
return image, loc_t, conf_t, landm_t
def decode_bbox(bbox, priors, var):
boxes = np.concatenate((
priors[:, 0:2] + bbox[:, 0:2] * var[0] * priors[:, 2:4],
priors[:, 2:4] * np.exp(bbox[:, 2:4] * var[1])), axis=1) # (xc, yc, w, h)
boxes[:, :2] -= boxes[:, 2:] / 2 # (x0, y0, w, h)
boxes[:, 2:] += boxes[:, :2] # (x0, y0, x1, y1)
return boxes
def decode_landm(landm, priors, var):
return np.concatenate((priors[:, 0:2] + landm[:, 0:2] * var[0] * priors[:, 2:4],
priors[:, 0:2] + landm[:, 2:4] * var[0] * priors[:, 2:4],
priors[:, 0:2] + landm[:, 4:6] * var[0] * priors[:, 2:4],
priors[:, 0:2] + landm[:, 6:8] * var[0] * priors[:, 2:4],
priors[:, 0:2] + landm[:, 8:10] * var[0] * priors[:, 2:4],
), axis=1)

View File

@ -0,0 +1,227 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Train Retinaface_resnet50ormobilenet0.25."""
import argparse
import math
import mindspore
from mindspore import context
from mindspore.context import ParallelMode
from mindspore.train import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
from mindspore.communication.management import init, get_rank, get_group_size
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.config import cfg_res50, cfg_mobile025
from src.loss import MultiBoxLoss
from src.dataset import create_dataset
from src.lr_schedule import adjust_learning_rate, warmup_cosine_annealing_lr
def train_with_resnet(cfg):
"""train_with_resnet"""
mindspore.common.seed.set_seed(cfg['seed'])
from src.network_with_resnet import RetinaFace, RetinaFaceWithLossCell, TrainingWrapper, resnet50
context.set_context(mode=context.GRAPH_MODE, device_target=cfg['device_target'])
device_num = cfg['nnpu']
rank = 0
if cfg['device_target'] == "Ascend":
if device_num > 1:
context.reset_auto_parallel_context()
context.set_auto_parallel_context(device_num=device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
init()
rank = get_rank()
else:
context.set_context(device_id=cfg['device_id'])
elif cfg['device_target'] == "GPU":
if cfg['ngpu'] > 1:
init("nccl")
context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
rank = get_rank()
batch_size = cfg['batch_size']
max_epoch = cfg['epoch']
momentum = cfg['momentum']
lr_type = cfg['lr_type']
weight_decay = cfg['weight_decay']
loss_scale = cfg['loss_scale']
initial_lr = cfg['initial_lr']
gamma = cfg['gamma']
T_max = cfg['T_max']
eta_min = cfg['eta_min']
training_dataset = cfg['training_dataset']
num_classes = 2
negative_ratio = 7
stepvalues = (cfg['decay1'], cfg['decay2'])
ds_train = create_dataset(training_dataset, cfg, batch_size, multiprocessing=True, num_worker=cfg['num_workers'])
print('dataset size is : \n', ds_train.get_dataset_size())
steps_per_epoch = math.ceil(ds_train.get_dataset_size())
multibox_loss = MultiBoxLoss(num_classes, cfg['num_anchor'], negative_ratio, cfg['batch_size'])
backbone = resnet50(1001)
backbone.set_train(True)
if cfg['pretrain'] and cfg['resume_net'] is None:
pretrained_res50 = cfg['pretrain_path']
param_dict_res50 = load_checkpoint(pretrained_res50)
load_param_into_net(backbone, param_dict_res50)
print('Load resnet50 from [{}] done.'.format(pretrained_res50))
net = RetinaFace(phase='train', backbone=backbone)
net.set_train(True)
if cfg['resume_net'] is not None:
pretrain_model_path = cfg['resume_net']
param_dict_retinaface = load_checkpoint(pretrain_model_path)
load_param_into_net(net, param_dict_retinaface)
print('Resume Model from [{}] Done.'.format(cfg['resume_net']))
net = RetinaFaceWithLossCell(net, multibox_loss, cfg)
if lr_type == 'dynamic_lr':
lr = adjust_learning_rate(initial_lr, gamma, stepvalues, steps_per_epoch, max_epoch,
warmup_epoch=cfg['warmup_epoch'], lr_type1=lr_type)
elif lr_type == 'cosine_annealing':
lr = warmup_cosine_annealing_lr(initial_lr, steps_per_epoch, cfg['warmup_epoch'], max_epoch, T_max, eta_min)
if cfg['optim'] == 'momentum':
opt = mindspore.nn.Momentum(net.trainable_params(), lr, momentum, weight_decay, loss_scale)
elif cfg['optim'] == 'sgd':
opt = mindspore.nn.SGD(params=net.trainable_params(), learning_rate=lr, momentum=momentum,
weight_decay=weight_decay, loss_scale=loss_scale)
else:
raise ValueError('optim is not define.')
net = TrainingWrapper(net, opt)
model = Model(net)
config_ck = CheckpointConfig(save_checkpoint_steps=ds_train.get_dataset_size() * 1,
keep_checkpoint_max=cfg['keep_checkpoint_max'])
cfg['ckpt_path'] = cfg['ckpt_path'] + "ckpt_" + str(rank) + "/"
ckpoint_cb = ModelCheckpoint(prefix="RetinaFace", directory=cfg['ckpt_path'], config=config_ck)
time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
callback_list = [LossMonitor(), time_cb, ckpoint_cb]
print("============== Starting Training ==============")
model.train(max_epoch, ds_train, callbacks=callback_list)
def train_with_mobilenet(cfg):
"""train_with_mobilenet"""
mindspore.common.seed.set_seed(cfg['seed'])
from src.network_with_mobilenet import RetinaFace, RetinaFaceWithLossCell, TrainingWrapper, resnet50, mobilenet025
context.set_context(mode=context.GRAPH_MODE, device_target='GPU', save_graphs=False)
if context.get_context("device_target") == "GPU":
# Enable graph kernel
context.set_context(enable_graph_kernel=True, graph_kernel_flags="--enable_parallel_fusion")
if cfg['ngpu'] > 1:
init("nccl")
context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
gradients_mean=True)
cfg['ckpt_path'] = cfg['ckpt_path'] + "ckpt_" + str(get_rank()) + "/"
batch_size = cfg['batch_size']
max_epoch = cfg['epoch']
momentum = cfg['momentum']
lr_type = cfg['lr_type']
weight_decay = cfg['weight_decay']
initial_lr = cfg['initial_lr']
gamma = cfg['gamma']
training_dataset = cfg['training_dataset']
num_classes = 2
negative_ratio = 7
stepvalues = (cfg['decay1'], cfg['decay2'])
ds_train = create_dataset(training_dataset, cfg, batch_size, multiprocessing=True, num_worker=cfg['num_workers'])
print('dataset size is : \n', ds_train.get_dataset_size())
steps_per_epoch = math.ceil(ds_train.get_dataset_size())
multibox_loss = MultiBoxLoss(num_classes, cfg['num_anchor'], negative_ratio, cfg['batch_size'])
if cfg['name'] == 'ResNet50':
backbone = resnet50(1001)
elif cfg['name'] == 'MobileNet025':
backbone = mobilenet025(1000)
backbone.set_train(True)
if cfg['name'] == 'ResNet50' and cfg['pretrain'] and cfg['resume_net'] is None:
pretrained_res50 = cfg['pretrain_path']
param_dict_res50 = load_checkpoint(pretrained_res50)
load_param_into_net(backbone, param_dict_res50)
print('Load resnet50 from [{}] done.'.format(pretrained_res50))
elif cfg['name'] == 'MobileNet025' and cfg['pretrain'] and cfg['resume_net'] is None:
pretrained_mobile025 = cfg['pretrain_path']
param_dict_mobile025 = load_checkpoint(pretrained_mobile025)
load_param_into_net(backbone, param_dict_mobile025)
print('Load mobilenet0.25 from [{}] done.'.format(pretrained_mobile025))
net = RetinaFace(phase='train', backbone=backbone, cfg=cfg)
net.set_train(True)
if cfg['resume_net'] is not None:
pretrain_model_path = cfg['resume_net']
param_dict_retinaface = load_checkpoint(pretrain_model_path)
load_param_into_net(net, param_dict_retinaface)
print('Resume Model from [{}] Done.'.format(cfg['resume_net']))
net = RetinaFaceWithLossCell(net, multibox_loss, cfg)
lr = adjust_learning_rate(initial_lr, gamma, stepvalues, steps_per_epoch, max_epoch,
warmup_epoch=cfg['warmup_epoch'], lr_type1=lr_type)
if cfg['optim'] == 'momentum':
opt = mindspore.nn.Momentum(net.trainable_params(), lr, momentum)
elif cfg['optim'] == 'sgd':
opt = mindspore.nn.SGD(params=net.trainable_params(), learning_rate=lr, momentum=momentum,
weight_decay=weight_decay, loss_scale=1)
else:
raise ValueError('optim is not define.')
net = TrainingWrapper(net, opt)
model = Model(net)
config_ck = CheckpointConfig(save_checkpoint_steps=cfg['save_checkpoint_steps'],
keep_checkpoint_max=cfg['keep_checkpoint_max'])
ckpoint_cb = ModelCheckpoint(prefix="RetinaFace", directory=cfg['ckpt_path'], config=config_ck)
time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
callback_list = [LossMonitor(), time_cb, ckpoint_cb]
print("============== Starting Training ==============")
model.train(max_epoch, ds_train, callbacks=callback_list, dataset_sink_mode=True)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='train')
parser.add_argument('--backbone_name', type=str, default='ResNet50',
help='backbone name')
args_opt = parser.parse_args()
if args_opt.backbone_name == 'ResNet50':
config = cfg_res50
train_with_resnet(cfg=config)
elif args_opt.backbone_name == 'MobileNet025':
config = cfg_mobile025
train_with_mobilenet(cfg=config)
print('train config:\n', config)