!21946 [昇腾众智] RetinaFace网络：众智RetinaFace_ResNet50、GPU征集活动RetinaFace_MobileNet0.25

Merge pull request !21946 from yexijoe/RetinaFace_ResNet50
2021-09-06 13:15:55 +00:00 · 2021-09-06 13:15:55 +00:00 · 1b4f0bd033
parent e63a984416 a93954b53e
commit 1b4f0bd033
28 changed files with 4917 additions and 0 deletions
--- a/model_zoo/research/cv/retinaface/README_CN.md
+++ b/model_zoo/research/cv/retinaface/README_CN.md
@ -0,0 +1,451 @@
+# 目录
+
+<!-- TOC -->
+
+- [目录](#目录)
+- [retinaface描述](#retinaface描述)
+- [预训练模型](#预训练模型)
+- [模型架构](#模型架构)
+- [数据集](#数据集)
+- [环境要求](#环境要求)
+- [快速入门](#快速入门)
+- [脚本说明](#脚本说明)
+    - [脚本及样例代码](#脚本及样例代码)
+    - [脚本参数](#脚本参数)
+    - [训练过程](#训练过程)
+        - [用法](#用法)
+        - [分布式训练](#分布式训练)
+    - [评估过程](#评估过程)
+        - [评估](#评估)
+    - [导出过程](#导出过程)
+        - [导出](#导出)
+    - [推理过程](#推理过程)
+        - [推理](#推理)
+- [模型描述](#模型描述)
+    - [性能](#性能)
+        - [评估性能](#评估性能)
+            - [WIDERFACE上的retinaface](#WIDERFACE上的retinaface)
+        - [推理性能](#推理性能)
+            - [WIDERFACE上的retinaface](#WIDERFACE上的retinaface)
+- [随机情况说明](#随机情况说明)
+- [ModelZoo主页](#modelzoo主页)
+
+<!-- /TOC -->
+
+# retinaface描述
+
+Retinaface人脸检测模型于2019年提出，应用于WIDER FACE数据集时效果最佳。RetinaFace论文：RetinaFace: Single-stage Dense Face Localisation in the Wild。与S3FD和MTCNN相比，RetinaFace显著提上了小脸召回率，但不适合多尺度人脸检测。为了解决这些问题，RetinaFace采用RetinaFace特征金字塔结构进行不同尺度间的特征融合，并增加了SSH模块。
+
+[论文](https://arxiv.org/abs/1905.00641v2)：  Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou. "RetinaFace: Single-stage Dense Face Localisation in the Wild". 2019.
+
+# 预训练模型
+
+RetinaFace可以使用ResNet50或MobileNet0.25骨干提取图像特征进行检测。使用ResNet50充当backbone时需要使用./src/resnet.py作为模型文件，然后从ModelZoo中获取ResNet50的训练脚本（使用默认的参数配置）在ImageNet2012上训练得到ResNet50的预训练模型。
+
+# 模型架构
+
+具体来说，RetinaFace是基于RetinaNet的网络，采用了RetinaNet的特性金字塔结构，并增加了SSH结构。网络中除了传统的检测分支外，还增加了关键点预测分支和自监控分支。结果表明，这两个分支可以提高模型的性能。这里我们不介绍自我监控分支。
+
+# 数据集
+
+使用的数据集： [WIDERFACE](<http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html>)
+
+获取数据集：
+
+1. 点击[此处](<https://github.com/peteryuX/retinaface-tf2>)获取数据集和标注。
+2. 点击[此处](<https://github.com/peteryuX/retinaface-tf2/tree/master/widerface_evaluate/ground_truth>)获取评估地面真值标签。
+
+- 数据集大小：3.42G，32203张彩色图像
+    - 训练集：1.36G，12800张图像
+    - 验证集：345.95M，3226张图像
+    - 测试集：1.72G，16177张图像
+
+- 数据集目录结构如下所示：
+
+    ```bash
+    ├── data/
+        ├── widerface/
+            ├── ground_truth/
+            │   ├──wider_easy_val.mat
+            │   ├──wider_face_val.mat
+            │   ├──wider_hard_val.mat
+            │   ├──wider_medium_val.mat
+            ├── train/
+            │   ├──images/
+            │   │   ├──0--Parade/
+            │   │   │   ├──0_Parade_marchingband_1_5.jpg
+            │   │   │   ├──...
+            │   │   ├──.../
+            │   ├──label.txt
+            ├── val/
+            │   ├──images/
+            │   │   ├──0--Parade/
+            │   │   │   ├──0_Parade_marchingband_1_20.jpg
+            │   │   │   ├──...
+            │   │   ├──.../
+            │   ├──label.txt
+    ```
+
+# 环境要求
+
+- 硬件（Ascend、GPU）
+    - 使用ResNet50作为backbone时用Ascend来搭建硬件环境。
+    - 使用MobileNet0.25作为backbone时用GPU来搭建硬件环境。
+- 框架
+    - [MindSpore](https://www.mindspore.cn/install)
+- 如需查看详情，请参见如下资源：
+    - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html)
+
+# 快速入门
+
+通过官方网站安装MindSpore后，您可以按照如下步骤进行训练和评估：
+
+- Ascend处理器环境运行（使用ResNet50作为backbone）
+
+  ```python
+  # 训练示例
+  python train.py --backbone_name 'ResNet50' > train.log 2>&1 &
+  OR
+  bash ./scripts/run_standalone_train_ascend.sh
+
+  # 分布式训练示例
+  bash ./scripts/run_distribution_train_ascend.sh [RANK_TABLE_FILE]
+
+  # 评估示例
+  python eval.py --backbone_name 'ResNet50' --val_model [CKPT_FILE] > ./eval.log 2>&1 &
+  OR
+  bash ./scripts/run_standalone_eval_ascend.sh './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt'
+
+  # 推理示例
+  bash run_infer_310.sh ../retinaface.mindir /home/dataset/widerface/val/ 0
+  ```
+
+- GPU处理器环境运行（使用MobileNet0.25作为backbone）
+
+  ```python
+  # 训练示例
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &
+
+  # 分布式训练示例
+  bash scripts/run_distribution_train_gpu.sh 2 0,1
+
+  # 评估示例
+  export CUDA_VISIBLE_DEVICES=0
+  python eval.py --backbone_name 'MobileNet025' --val_model [CKPT_FILE] > eval.log 2>&1 &  
+  OR
+  bash scripts/run_standalone_eval_gpu.sh 0 './checkpoint/ckpt_0/RetinaFace-117_804.ckpt'
+  ```
+
+# 脚本说明
+
+## 脚本及样例代码
+
+```bash
+├── model_zoo
+    ├── README.md                                  // 所有模型的说明
+    ├── retinaface
+        ├── README_CN.md                           // Retinaface相关说明
+        ├── ascend310_infer                        // 实现310推理源代码
+        ├── scripts
+        │   ├──run_distribution_train_ascend.sh    // 分布式到Ascend的shell脚本
+        │   ├──run_distribution_train_gpu.sh       // 分布式到GPU的shell脚本
+        │   ├──run_infer_310.sh                    // Ascend推理的shell脚本（使用ResNet50作为backbone时）
+        │   ├──run_standalone_eval_ascend.sh       // Ascend评估的shell脚本
+        │   ├──run_standalone_eval_gpu.sh          // GPU评估的shell脚本
+        │   ├──run_standalone_train_ascend.sh      // Ascend单卡训练的shell脚本
+        ├── src
+        │   ├──augmentation.py                     // 数据增强方法
+        │   ├──config.py                           // 参数配置
+        │   ├──dataset.py                          // 创建数据集
+        │   ├──loss.py                             // 损失函数
+        │   ├──lr_schedule.py                      // 学习率衰减策略
+        │   ├──network_with_mobilenet.py           // 使用MobileNet0.25作为backbone的RetinaFace架构
+        │   ├──network_with_resnet.py              // 使用ResNet50作为backbone的RetinaFace架构
+        │   ├──resnet.py                           // 使用ResNet50作为backbone时预训练要用到的ResNet50架构
+        │   ├──utils.py                            // 数据预处理
+        ├── data
+        │   ├──widerface                           // 数据集
+        │   ├──resnet-90_625.ckpt                  // ResNet50 ImageNet预训练模型
+        │   ├──ground_truth                        // 评估标签
+        ├── eval.py                                // 评估脚本
+        ├── export.py                              // 将checkpoint文件导出到air/mindir（使用ResNet50作为backbone时）
+        ├── postprocess.py                         // 310推理后处理脚本
+        ├── preprocess.py                          // 310推理前处理脚本
+        ├── train.py                               // 训练脚本
+```
+
+## 脚本参数
+
+在config.py中可以同时配置训练参数和评估参数。
+
+- 配置使用ResNet50作为backbone的RetinaFace和WIDER FACE数据集
+
+  ```python
+    'variance': [0.1, 0.2],                                   # 方差
+    'clip': False,                                            # 裁剪
+    'loc_weight': 2.0,                                        # Bbox回归损失权重
+    'class_weight': 1.0,                                      # 置信度/类回归损失权重
+    'landm_weight': 1.0,                                      # 地标回归损失权重
+    'batch_size': 8,                                          # 训练批次大小
+    'num_workers': 16,                                        # 数据集加载数据的线程数量
+    'num_anchor': 29126,                                      # 矩形框数量，取决于图片大小
+    'nnpu': 8,                                                # 训练的NPU数量
+    'image_size': 840,                                        # 训练图像大小
+    'match_thresh': 0.35,                                     # 匹配框阈值
+    'optim': 'sgd',                                           # 优化器类型
+    'momentum': 0.9,                                          # 优化器动量
+    'weight_decay': 1e-4,                                     # 优化器权重衰减
+    'epoch': 60,                                              # 训练轮次数量
+    'decay1': 20,                                             # 首次权重衰减的轮次数
+    'decay2': 40,                                             # 二次权重衰减的轮次数
+    'initial_lr':0.04                                         # 初始学习率，八卡并行训练时设置为0.04
+    'warmup_epoch': -1,                                       # 热身大小，-1表示无热身
+    'gamma': 0.1,                                             # 学习率衰减比
+    'ckpt_path': './checkpoint/',                             # 模型保存路径
+    'keep_checkpoint_max': 8,                                 # 预留检查点数量
+    'resume_net': None,                                       # 重启网络，默认为None
+    'training_dataset': '../data/widerface/train/label.txt',  # 训练数据集标签路径
+    'pretrain': True,                                         # 是否基于预训练骨干进行训练
+    'pretrain_path': '../data/resnet-90_625.ckpt',            # 预训练的骨干检查点路径
+    # 验证
+    'val_model': './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt', # 验证模型路径
+    'val_dataset_folder': './data/widerface/val/',            # 验证数据集路径
+    'val_origin_size': True,                                  # 是否使用全尺寸验证
+    'val_confidence_threshold': 0.02,                         # 验证置信度阈值
+    'val_nms_threshold': 0.4,                                 # 验证NMS阈值
+    'val_iou_threshold': 0.5,                                 # 验证IOU阈值
+    'val_save_result': False,                                 # 是否保存结果
+    'val_predict_save_folder': './widerface_result',          # 结果保存路径
+    'val_gt_dir': './data/ground_truth/',                     # 验证集ground_truth路径
+    # 推理
+    'infer_dataset_folder': '/home/dataset/widerface/val/',   # 310进行推理时验证数据集路径
+    'infer_gt_dir': '/home/dataset/widerface/ground_truth/',  # 310进行推理时验证集ground_truth路径
+  ```
+
+- 配置使用MobileNet0.25作为backbone的RetinaFace和WIDER FACE数据集
+
+  ```python
+    'name': 'MobileNet025',                                   # 骨干名称
+    'variance': [0.1, 0.2],                                   # 方差
+    'clip': False,                                            # 裁剪
+    'loc_weight': 2.0,                                        # Bbox回归损失权重
+    'class_weight': 1.0,                                      # 置信度/类回归损失权重
+    'landm_weight': 1.0,                                      # 地标回归损失权重
+    'batch_size': 8,                                          # 训练批次大小
+    'num_workers': 12,                                        # 数据集加载数据的线程数量
+    'num_anchor': 16800,                                      # 矩形框数量，取决于图片大小
+    'ngpu': 2,                                                # 训练的GPU数量
+    'epoch': 120,                                              # 训练轮次数量
+    'decay1': 70,                                             # 首次权重衰减的轮次数
+    'decay2': 90,                                             # 二次权重衰减的轮次数
+    'image_size': 640,                                        # 训练图像大小
+    'match_thresh': 0.35,                                     # 匹配框阈值
+    'optim': 'sgd',                                           # 优化器类型
+    'momentum': 0.9,                                          # 优化器动量
+    'weight_decay': 5e-4,                                     # 优化器权重衰减
+    'initial_lr': 0.02,                                       # 学习率
+    'warmup_epoch': 5,                                        # 热身大小，-1表示无热身
+    'gamma': 0.1,                                             # 学习率衰减比
+    'ckpt_path': './checkpoint/',                             # 模型保存路径
+    'save_checkpoint_steps': 2000,                            # 保存检查点迭代
+    'keep_checkpoint_max': 3,                                 # 预留检查点数量
+    'resume_net': None,                                       # 重启网络，默认为None
+    'training_dataset': '',                                   # 训练数据集标签路径，如data/widerface/train/label.txt
+    'pretrain': False,                                        # 是否基于预训练骨干进行训练
+    'pretrain_path': './data/mobilenetv1-90_5004.ckpt',       # 预训练的骨干检查点路径
+    # 验证
+    'val_model': './checkpoint/ckpt_0/RetinaFace-117_804.ckpt', # 验证模型路径
+    'val_dataset_folder': './data/widerface/val/',            # 验证数据集路径
+    'val_origin_size': False,                                 # 是否使用全尺寸验证
+    'val_confidence_threshold': 0.02,                         # 验证置信度阈值
+    'val_nms_threshold': 0.4,                                 # 验证NMS阈值
+    'val_iou_threshold': 0.5,                                 # 验证IOU阈值
+    'val_save_result': False,                                 # 是否保存结果
+    'val_predict_save_folder': './widerface_result',          # 结果保存路径
+    'val_gt_dir': './data/ground_truth/',                     # 验证集ground_truth路径
+  ```
+
+## 训练过程
+
+### 用法
+
+- Ascend处理器环境运行（使用ResNet50作为backbone）
+
+  ```bash
+  python train.py --backbone_name 'ResNet50' > train.log 2>&1 &
+  OR
+  bash ./scripts/run_standalone_train_ascend.sh
+  ```
+
+  上述python命令在后台运行，可通过`train.log`文件查看结果。
+
+  训练结束后，可以得到损失值：
+
+  ```bash
+  epoch: 7 step: 1609, loss is 5.327434
+  epoch time: 466281.709 ms, per step time: 289.796 ms
+  epoch: 8 step: 1609, loss is 4.7512465
+  epoch time: 466995.237 ms, per step time: 290.239 ms
+  ```
+
+- GPU处理器环境运行（使用MobileNet0.25作为backbone）
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &
+  ```
+
+  上述python命令在后台运行，可通过`train.log`文件查看结果。
+
+  训练结束后，可在默认文件夹`./checkpoint/`中找到检查点文件。
+
+### 分布式训练
+
+- Ascend处理器环境运行（使用ResNet50作为backbone）
+
+  ```bash
+  bash ./scripts/run_distribution_train_ascend.sh [RANK_TABLE_FILE]
+  ```
+
+  上述shell脚本在后台运行分布式训练，可通过`train_parallel0/log`文件查看结果。
+
+  训练结束后，可以得到损失值：
+
+  ```bash
+  epoch: 4 step: 201, loss is 4.870843
+  epoch time: 60460.177 ms, per step time: 300.797 ms
+  epoch: 5 step: 201, loss is 4.649786
+  epoch time: 60527.898 ms, per step time: 301.134 ms
+  ```
+
+- GPU处理器环境运行（使用MobileNet0.25作为backbone）
+
+  ```bash
+  bash scripts/run_distribute_gpu_train.sh 2 0,1
+  ```
+
+  上述shell脚本在后台运行分布式训练，可通过`train/train.log`文件查看结果。
+
+  训练结束后，可在默认文件夹`./checkpoint/ckpt_0/`中找到检查点文件。
+
+## 评估过程
+
+### 评估
+
+- Ascend环境运行评估WIDER FACE数据集（使用ResNet50作为backbone）
+
+  CKPT_FILE是用于评估的检查点路径。如'./train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt'。
+
+  ```bash
+  python eval.py --backbone_name 'ResNet50' --val_model [CKPT_FILE] > ./eval.log 2>&1 &
+  OR
+  bash run_standalone_eval_ascend.sh [CKPT_FILE]
+  ```
+
+  上述python命令在后台运行，可通过"eval.log"文件查看结果。测试数据集的准确率如下：
+
+  ```python
+  # grep "Val AP" eval.log
+  Easy   Val AP : 0.9516
+  Medium Val AP : 0.9381
+  Hard   Val AP : 0.8403
+  ```
+
+- GPU处理器环境运行评估WIDER FACE数据集（使用MobileNet0.25作为backbone）
+
+  CKPT_FILE是用于评估的检查点路径。如'./checkpoint/ckpt_0/RetinaFace-117_804.ckpt'。
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python eval.py --backbone_name 'MobileNet025' --val_model [CKPT_FILE] > eval.log 2>&1 &
+  ```
+
+  上述python命令在后台运行，可通过"eval.log"文件查看结果。测试数据集的准确率如下：
+
+  ```python
+  # grep "Val AP" eval.log
+  Easy   Val AP : 0.8877
+  Medium Val AP : 0.8698
+  Hard   Val AP : 0.8005
+  ```
+
+## 导出过程
+
+### 导出
+
+将checkpoint文件导出成mindir格式模型。（使用ResNet50作为backbone）
+
+  ```shell
+  python export.py --ckpt_file [CKPT_FILE]
+  ```
+
+## 推理过程
+
+### 推理
+
+在进行推理之前我们需要先导出模型。mindir可以在任意环境上导出，air模型只能在昇腾910环境上导出。以下展示了使用mindir模型执行推理的示例。
+
+- 在昇腾310上使用WIDER FACE数据集进行推理（使用ResNet50作为backbone）
+
+  执行推理的命令如下所示，其中'MINDIR_PATH'是mindir文件路径；'DATASET_PATH'是使用的推理数据集所在路径，如'/home/dataset/widerface/val/'；'DEVICE_ID'可选，默认值为0。
+
+  ```shell
+  # Ascend310 inference
+  bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
+  ```
+
+  推理的精度结果保存在scripts目录下，在acc.log日志文件中可以找到类似以下的分类准确率结果。推理的性能结果保存在scripts/time_Result目录下，在test_perform_static.txt文件中可以找到类似以下的性能结果。
+
+  ```bash
+  Easy   Val AP : 0.9498
+  Medium Val AP : 0.9351
+  Hard   Val AP : 0.8306
+  NN inference cost average time: 365.584 ms of infer_count 3226
+  ```
+
+# 模型描述
+
+## 性能
+
+### 评估性能
+
+#### WIDERFACE上的retinaface
+
+| 参数 | Ascend                                                          | GPU                                                          |
+| -------------------------- | -------------------------------------------------------------| -------------------------------------------------------------|
+| 模型版本 | RetinaFace + ResNet50                                        | RetinaFace + MobileNet0.25                                        |
+| 资源 | Ascend 910                                             | Tesla V100-32G                                             |
+| 上传日期 | 2021-08-17 | 2021-08-16 |
+| MindSpore版本 | 1.2.0 | 1.4.0 |
+| 数据集 | WIDERFACE                                                    |
+| 训练参数 | epoch=60, steps=201, batch_size=8, lr=0.04（8卡为0.04，单卡可设为0.01）    | epoch=120, steps=804, batch_size=8, initial_lr=0.02              |
+| 优化器 | SGD | SGD |
+| 损失函数 | MultiBoxLoss + Softmax交叉熵 | MultiBoxLoss + Softmax交叉熵 |
+| 输出 |边界框 + 置信度 + 地标 |边界框 + 置信度 + 地标 |
+| 准确率 | Easy：0.9516；Medium：0.9381；Hard：0.8403 | Easy：0.8877；Medium：0.8698；Hard：0.8005 |
+| 速度 | 单卡：290ms/step；8卡：301ms/step          | 2卡：435ms/step                                            |
+| 总时长 | 8卡：1.05小时 | 2卡：11.74小时 |
+
+### 推理性能
+
+#### WIDERFACE上的retinaface
+
+| 参数                 | Ascend                                                       |
+| -------------------------- | ----------------------------------------------------------- |
+| 模型版本              | RetinaFace + ResNet50                                                |
+| 资源                   | Ascend 310               |
+| 上传日期              | 2021-08-17                                 |
+| MindSpore版本          | 1.4.0.20210805                                                 |
+| 数据集                    | WIDERFACE                                                |
+| 准确率             | Easy：0.9498；Medium：0.9351；Hard：0.8306                 |
+| 速度                      | NN inference cost average time: 365.584 ms of infer_count 3226            |
+
+# 随机情况说明
+
+在train.py中使用mindspore.common.seed.set_seed()函数设置种子。
+
+# ModelZoo主页
+
+ 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。  
--- a/model_zoo/research/cv/retinaface/ascend310_infer/CMakeLists.txt
+++ b/model_zoo/research/cv/retinaface/ascend310_infer/CMakeLists.txt
@ -0,0 +1,14 @@
+cmake_minimum_required(VERSION 3.14.1)
+project(Ascend310Infer)
+add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -g -std=c++17 -Werror -Wall -fPIE -Wl,--allow-shlib-undefined")
+set(PROJECT_SRC_ROOT ${CMAKE_CURRENT_LIST_DIR}/)
+option(MINDSPORE_PATH "mindspore install path" "")
+include_directories(${MINDSPORE_PATH})
+include_directories(${MINDSPORE_PATH}/include)
+include_directories(${PROJECT_SRC_ROOT})
+find_library(MS_LIB libmindspore.so ${MINDSPORE_PATH}/lib)
+file(GLOB_RECURSE MD_LIB ${MINDSPORE_PATH}/_c_dataengine*)
+
+add_executable(main src/main.cc src/utils.cc)
+target_link_libraries(main ${MS_LIB} ${MD_LIB} gflags)
--- a/model_zoo/research/cv/retinaface/ascend310_infer/build.sh
+++ b/model_zoo/research/cv/retinaface/ascend310_infer/build.sh
@ -0,0 +1,29 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+if [ -d out ]; then
+    rm -rf out
+fi
+
+mkdir out
+cd out || exit
+
+if [ -f "Makefile" ]; then
+  make clean
+fi
+
+cmake .. \
+    -DMINDSPORE_PATH="`pip3.7 show mindspore-ascend | grep Location | awk '{print $2"/mindspore"}' | xargs realpath`"
+make
--- a/model_zoo/research/cv/retinaface/ascend310_infer/inc/utils.h
+++ b/model_zoo/research/cv/retinaface/ascend310_infer/inc/utils.h
@ -0,0 +1,35 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef MINDSPORE_INFERENCE_UTILS_H_
+#define MINDSPORE_INFERENCE_UTILS_H_
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <vector>
+#include <string>
+#include <memory>
+#include "include/api/types.h"
+
+std::vector<std::string> GetAllFiles(std::string_view dirName);
+DIR *OpenDir(std::string_view dirName);
+std::string RealPath(std::string_view path);
+mindspore::MSTensor ReadFileToTensor(const std::string &file);
+int WriteResult(const std::string& imageFile, const std::vector<mindspore::MSTensor> &outputs);
+std::vector<std::string> GetAllFiles(std::string dir_name);
+std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name);
+
+#endif
--- a/model_zoo/research/cv/retinaface/ascend310_infer/src/main.cc
+++ b/model_zoo/research/cv/retinaface/ascend310_infer/src/main.cc
@ -0,0 +1,190 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <sys/time.h>
+#include <gflags/gflags.h>
+#include <dirent.h>
+#include <iostream>
+#include <string>
+#include <algorithm>
+#include <iosfwd>
+#include <vector>
+#include <fstream>
+#include <sstream>
+
+#include "include/api/model.h"
+#include "include/api/context.h"
+#include "include/api/types.h"
+#include "include/api/serialization.h"
+#include "include/dataset/vision_ascend.h"
+#include "include/dataset/execute.h"
+#include "include/dataset/transforms.h"
+#include "include/dataset/vision.h"
+#include "inc/utils.h"
+
+using mindspore::Context;
+using mindspore::Serialization;
+using mindspore::Model;
+using mindspore::Status;
+using mindspore::ModelType;
+using mindspore::GraphCell;
+using mindspore::kSuccess;
+using mindspore::MSTensor;
+using mindspore::dataset::Execute;
+using mindspore::dataset::vision::Decode;
+using mindspore::dataset::vision::Resize;
+using mindspore::dataset::vision::CenterCrop;
+using mindspore::dataset::vision::Normalize;
+using mindspore::dataset::vision::HWC2CHW;
+
+
+DEFINE_string(mindir_path, "", "mindir path");
+DEFINE_string(input0_path, ".", "input0 path");
+DEFINE_string(dataset_name, "widerface", "dataset name");
+DEFINE_int32(device_id, 0, "device id");
+
+int load_model(Model *model, std::vector<MSTensor> *model_inputs, std::string mindir_path, int device_id) {
+  if (RealPath(mindir_path).empty()) {
+    std::cout << "Invalid mindir" << std::endl;
+    return 1;
+  }
+
+  auto context = std::make_shared<Context>();
+  auto ascend310 = std::make_shared<mindspore::Ascend310DeviceInfo>();
+  ascend310->SetDeviceID(device_id);
+  context->MutableDeviceInfo().push_back(ascend310);
+  mindspore::Graph graph;
+  Serialization::Load(mindir_path, ModelType::kMindIR, &graph);
+
+  Status ret = model->Build(GraphCell(graph), context);
+  if (ret != kSuccess) {
+    std::cout << "ERROR: Build failed." << std::endl;
+    return 1;
+  }
+
+  *model_inputs = model->GetInputs();
+  if (model_inputs->empty()) {
+    std::cout << "Invalid model, inputs is empty." << std::endl;
+    return 1;
+  }
+  return 0;
+}
+
+int main(int argc, char **argv) {
+  gflags::ParseCommandLineFlags(&argc, &argv, true);
+
+  Model model;
+  std::vector<MSTensor> model_inputs;
+  load_model(&model, &model_inputs, FLAGS_mindir_path, FLAGS_device_id);
+
+  std::map<double, double> costTime_map;
+  struct timeval start = {0};
+  struct timeval end = {0};
+  double startTimeMs;
+  double endTimeMs;
+
+  if (FLAGS_dataset_name == "widerface") {
+    auto input0_files = GetAllFiles(FLAGS_input0_path);
+    if (input0_files.empty()) {
+      std::cout << "ERROR: no input data." << std::endl;
+      return 1;
+    }
+    size_t size = input0_files.size();
+    for (size_t i = 0; i < size; ++i) {
+      std::vector<MSTensor> inputs;
+      std::vector<MSTensor> outputs;
+      std::cout << "Start predict input files:" << input0_files[i] <<std::endl;
+      auto input0 = ReadFileToTensor(input0_files[i]);
+      inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
+                          input0.Data().get(), input0.DataSize());
+
+      gettimeofday(&start, nullptr);
+      Status ret = model.Predict(inputs, &outputs);
+      gettimeofday(&end, nullptr);
+      if (ret != kSuccess) {
+        std::cout << "Predict " << input0_files[i] << " failed." << std::endl;
+        return 1;
+      }
+      startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
+      endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
+      costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
+      int rst = WriteResult(input0_files[i], outputs);
+      if (rst != 0) {
+          std::cout << "write result failed." << std::endl;
+          return rst;
+      }
+    }
+  } else {
+    auto input0_files = GetAllInputData(FLAGS_input0_path);
+    if (input0_files.empty()) {
+      std::cout << "ERROR: no input data." << std::endl;
+      return 1;
+    }
+    size_t size = input0_files.size();
+    for (size_t i = 0; i < size; ++i) {
+      for (size_t j = 0; j < input0_files[i].size(); ++j) {
+        std::vector<MSTensor> inputs;
+        std::vector<MSTensor> outputs;
+        std::cout << "Start predict input files:" << input0_files[i][j] <<std::endl;
+        auto decode = Decode();
+        auto resize = Resize({256, 256});
+        auto centercrop = CenterCrop({224, 224});
+        auto normalize = Normalize({123.675, 116.28, 103.53}, {58.395, 57.12, 57.375});
+        auto hwc2chw = HWC2CHW();
+
+        Execute SingleOp({decode, resize, centercrop, normalize, hwc2chw});
+        auto imgDvpp = std::make_shared<MSTensor>();
+        SingleOp(ReadFileToTensor(input0_files[i][j]), imgDvpp.get());
+        inputs.emplace_back(model_inputs[0].Name(), model_inputs[0].DataType(), model_inputs[0].Shape(),
+                            imgDvpp->Data().get(), imgDvpp->DataSize());
+      gettimeofday(&start, nullptr);
+      Status ret = model.Predict(inputs, &outputs);
+      gettimeofday(&end, nullptr);
+      if (ret != kSuccess) {
+        std::cout << "Predict " << input0_files[i][j] << " failed." << std::endl;
+        return 1;
+      }
+      startTimeMs = (1.0 * start.tv_sec * 1000000 + start.tv_usec) / 1000;
+      endTimeMs = (1.0 * end.tv_sec * 1000000 + end.tv_usec) / 1000;
+      costTime_map.insert(std::pair<double, double>(startTimeMs, endTimeMs));
+      int rst = WriteResult(input0_files[i][j], outputs);
+      if (rst != 0) {
+          std::cout << "write result failed." << std::endl;
+          return rst;
+      }
+    }
+    }
+  }
+
+  double average = 0.0;
+  int inferCount = 0;
+
+  for (auto iter = costTime_map.begin(); iter != costTime_map.end(); iter++) {
+    double diff = 0.0;
+    diff = iter->second - iter->first;
+    average += diff;
+    inferCount++;
+  }
+  average = average / inferCount;
+  std::stringstream timeCost;
+  timeCost << "NN inference cost average time: "<< average << " ms of infer_count " << inferCount << std::endl;
+  std::cout << "NN inference cost average time: "<< average << "ms of infer_count " << inferCount << std::endl;
+  std::string fileName = "./time_Result" + std::string("/test_perform_static.txt");
+  std::ofstream fileStream(fileName.c_str(), std::ios::trunc);
+  fileStream << timeCost.str();
+  fileStream.close();
+  costTime_map.clear();
+  return 0;
+}
--- a/model_zoo/research/cv/retinaface/ascend310_infer/src/utils.cc
+++ b/model_zoo/research/cv/retinaface/ascend310_infer/src/utils.cc
@ -0,0 +1,197 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <fstream>
+#include <algorithm>
+#include <iostream>
+#include "inc/utils.h"
+
+using mindspore::MSTensor;
+using mindspore::DataType;
+
+std::vector<std::vector<std::string>> GetAllInputData(std::string dir_name) {
+  std::vector<std::vector<std::string>> ret;
+
+  DIR *dir = OpenDir(dir_name);
+  if (dir == nullptr) {
+    return {};
+  }
+  struct dirent *filename;
+  /* read all the files in the dir ~ */
+  std::vector<std::string> sub_dirs;
+  while ((filename = readdir(dir)) != nullptr) {
+    std::string d_name = std::string(filename->d_name);
+    // get rid of "." and ".."
+    if (d_name == "." || d_name == ".." || d_name.empty()) {
+      continue;
+    }
+    std::string dir_path = RealPath(std::string(dir_name) + "/" + filename->d_name);
+    struct stat s;
+    lstat(dir_path.c_str(), &s);
+    if (!S_ISDIR(s.st_mode)) {
+      continue;
+    }
+
+    sub_dirs.emplace_back(dir_path);
+  }
+  std::sort(sub_dirs.begin(), sub_dirs.end());
+
+  (void)std::transform(sub_dirs.begin(), sub_dirs.end(), std::back_inserter(ret),
+                       [](const std::string &d) { return GetAllFiles(d); });
+
+  return ret;
+}
+
+
+std::vector<std::string> GetAllFiles(std::string dir_name) {
+  struct dirent *filename;
+  DIR *dir = OpenDir(dir_name);
+  if (dir == nullptr) {
+    return {};
+  }
+
+  std::vector<std::string> res;
+  while ((filename = readdir(dir)) != nullptr) {
+    std::string d_name = std::string(filename->d_name);
+    if (d_name == "." || d_name == ".." || d_name.size() <= 3) {
+      continue;
+    }
+    res.emplace_back(std::string(dir_name) + "/" + filename->d_name);
+  }
+  std::sort(res.begin(), res.end());
+
+  return res;
+}
+
+
+std::vector<std::string> GetAllFiles(std::string_view dirName) {
+  struct dirent *filename;
+  DIR *dir = OpenDir(dirName);
+  if (dir == nullptr) {
+    return {};
+  }
+  std::vector<std::string> res;
+  while ((filename = readdir(dir)) != nullptr) {
+    std::string dName = std::string(filename->d_name);
+    if (dName == "." || dName == ".." || filename->d_type != DT_REG) {
+      continue;
+    }
+    res.emplace_back(std::string(dirName) + "/" + filename->d_name);
+  }
+  std::sort(res.begin(), res.end());
+  for (auto &f : res) {
+    std::cout << "image file: " << f << std::endl;
+  }
+  return res;
+}
+
+
+int WriteResult(const std::string& imageFile, const std::vector<MSTensor> &outputs) {
+  std::string homePath = "./result_Files";
+  const int INVALID_POINTER = -1;
+  const int ERROR = -2;
+  for (size_t i = 0; i < outputs.size(); ++i) {
+    size_t outputSize;
+    std::shared_ptr<const void> netOutput;
+    netOutput = outputs[i].Data();
+    outputSize = outputs[i].DataSize();
+    int pos = imageFile.rfind('/');
+    std::string fileName(imageFile, pos + 1);
+    fileName.replace(fileName.find('.'), fileName.size() - fileName.find('.'), '_' + std::to_string(i) + ".bin");
+    std::string outFileName = homePath + "/" + fileName;
+    FILE *outputFile = fopen(outFileName.c_str(), "wb");
+    if (outputFile == nullptr) {
+        std::cout << "open result file " << outFileName << " failed" << std::endl;
+        return INVALID_POINTER;
+    }
+    size_t size = fwrite(netOutput.get(), sizeof(char), outputSize, outputFile);
+    if (size != outputSize) {
+        fclose(outputFile);
+        outputFile = nullptr;
+        std::cout << "write result file " << outFileName << " failed, write size[" << size <<
+            "] is smaller than output size[" << outputSize << "], maybe the disk is full." << std::endl;
+        return ERROR;
+    }
+    fclose(outputFile);
+    outputFile = nullptr;
+  }
+  return 0;
+}
+
+mindspore::MSTensor ReadFileToTensor(const std::string &file) {
+  if (file.empty()) {
+    std::cout << "Pointer file is nullptr" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  std::ifstream ifs(file);
+  if (!ifs.good()) {
+    std::cout << "File: " << file << " is not exist" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  if (!ifs.is_open()) {
+    std::cout << "File: " << file << "open failed" << std::endl;
+    return mindspore::MSTensor();
+  }
+
+  ifs.seekg(0, std::ios::end);
+  size_t size = ifs.tellg();
+  mindspore::MSTensor buffer(file, mindspore::DataType::kNumberTypeUInt8, {static_cast<int64_t>(size)}, nullptr, size);
+
+  ifs.seekg(0, std::ios::beg);
+  ifs.read(reinterpret_cast<char *>(buffer.MutableData()), size);
+  ifs.close();
+
+  return buffer;
+}
+
+
+DIR *OpenDir(std::string_view dirName) {
+  if (dirName.empty()) {
+    std::cout << " dirName is null ! " << std::endl;
+    return nullptr;
+  }
+  std::string realPath = RealPath(dirName);
+  struct stat s;
+  lstat(realPath.c_str(), &s);
+  if (!S_ISDIR(s.st_mode)) {
+    std::cout << "dirName is not a valid directory !" << std::endl;
+    return nullptr;
+  }
+  DIR *dir;
+  dir = opendir(realPath.c_str());
+  if (dir == nullptr) {
+    std::cout << "Can not open dir " << dirName << std::endl;
+    return nullptr;
+  }
+  std::cout << "Successfully opened the dir " << dirName << std::endl;
+  return dir;
+}
+
+std::string RealPath(std::string_view path) {
+  char realPathMem[PATH_MAX] = {0};
+  char *realPathRet = nullptr;
+  realPathRet = realpath(path.data(), realPathMem);
+  if (realPathRet == nullptr) {
+    std::cout << "File: " << path << " is not exist.";
+    return "";
+  }
+
+  std::string realPath(realPathMem);
+  std::cout << path << " realpath is: " << realPath << std::endl;
+  return realPath;
+}
--- a/model_zoo/research/cv/retinaface/eval.py
+++ b/model_zoo/research/cv/retinaface/eval.py
@ -0,0 +1,558 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Eval Retinaface_resnet50_or_mobilenet0.25."""
+import argparse
+import os
+import time
+import datetime
+import numpy as np
+import cv2
+
+from mindspore import Tensor, context
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+
+from src.config import cfg_res50, cfg_mobile025
+from src.utils import decode_bbox, prior_box
+
+class Timer():
+    def __init__(self):
+        self.start_time = 0.
+        self.diff = 0.
+
+    def start(self):
+        self.start_time = time.time()
+
+    def end(self):
+        self.diff = time.time() - self.start_time
+
+class DetectionEngine:
+    """DetectionEngine"""
+    def __init__(self, cfg):
+        self.results = {}
+        self.nms_thresh = cfg['val_nms_threshold']
+        self.conf_thresh = cfg['val_confidence_threshold']
+        self.iou_thresh = cfg['val_iou_threshold']
+        self.var = cfg['variance']
+        self.save_prefix = cfg['val_predict_save_folder']
+        self.gt_dir = cfg['val_gt_dir']
+
+    def _iou(self, a, b):
+        """_iou"""
+        A = a.shape[0]
+        B = b.shape[0]
+        max_xy = np.minimum(
+            np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
+            np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
+        min_xy = np.maximum(
+            np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
+            np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
+        inter = np.maximum((max_xy - min_xy + 1), np.zeros_like(max_xy - min_xy))
+        inter = inter[:, :, 0] * inter[:, :, 1]
+
+        area_a = np.broadcast_to(
+            np.expand_dims(
+                (a[:, 2] - a[:, 0] + 1) * (a[:, 3] - a[:, 1] + 1), 1),
+            np.shape(inter))
+        area_b = np.broadcast_to(
+            np.expand_dims(
+                (b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1), 0),
+            np.shape(inter))
+        union = area_a + area_b - inter
+        return inter / union
+
+    def _nms(self, boxes, threshold=0.5):
+        """_nms"""
+        x1 = boxes[:, 0]
+        y1 = boxes[:, 1]
+        x2 = boxes[:, 2]
+        y2 = boxes[:, 3]
+        scores = boxes[:, 4]
+
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+
+        reserved_boxes = []
+        while order.size > 0:
+            i = order[0]
+            reserved_boxes.append(i)
+            max_x1 = np.maximum(x1[i], x1[order[1:]])
+            max_y1 = np.maximum(y1[i], y1[order[1:]])
+            min_x2 = np.minimum(x2[i], x2[order[1:]])
+            min_y2 = np.minimum(y2[i], y2[order[1:]])
+
+            intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)
+            intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)
+            intersect_area = intersect_w * intersect_h
+            ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)
+
+            indices = np.where(ovr <= threshold)[0]
+            order = order[indices + 1]
+
+        return reserved_boxes
+
+    def write_result(self):
+        """write_result"""
+        # save result to file.
+        import json
+        t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S')
+        try:
+            if not os.path.isdir(self.save_prefix):
+                os.makedirs(self.save_prefix)
+
+            self.file_path = self.save_prefix + '/predict' + t + '.json'
+            f = open(self.file_path, 'w')
+            json.dump(self.results, f)
+        except IOError as e:
+            raise RuntimeError("Unable to open json file to dump. What(): {}".format(str(e)))
+        else:
+            f.close()
+            return self.file_path
+
+    def detect(self, boxes, confs, resize, scale, image_path, priors):
+        """detect"""
+        if boxes.shape[0] == 0:
+            # add to result
+            event_name, img_name = image_path.split('/')
+            self.results[event_name][img_name[:-4]] = {'img_path': image_path,
+                                                       'bboxes': []}
+            return
+
+        boxes = decode_bbox(np.squeeze(boxes.asnumpy(), 0), priors, self.var)
+        boxes = boxes * scale / resize
+
+        scores = np.squeeze(confs.asnumpy(), 0)[:, 1]
+        # ignore low scores
+        inds = np.where(scores > self.conf_thresh)[0]
+        boxes = boxes[inds]
+        scores = scores[inds]
+
+        # keep top-K before NMS
+        order = scores.argsort()[::-1]
+        boxes = boxes[order]
+        scores = scores[order]
+
+        # do NMS
+        dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
+        keep = self._nms(dets, self.nms_thresh)
+        dets = dets[keep, :]
+
+        dets[:, 2:4] = (dets[:, 2:4].astype(np.int) - dets[:, 0:2].astype(np.int)).astype(np.float) # int
+        dets[:, 0:4] = dets[:, 0:4].astype(np.int).astype(np.float)                                 # int
+
+
+        # add to result
+        event_name, img_name = image_path.split('/')
+        if event_name not in self.results.keys():
+            self.results[event_name] = {}
+        self.results[event_name][img_name[:-4]] = {'img_path': image_path,
+                                                   'bboxes': dets[:, :5].astype(np.float).tolist()}
+
+    def _get_gt_boxes(self):
+        """_get_gt_boxes"""
+        from scipy.io import loadmat
+        gt = loadmat(os.path.join(self.gt_dir, 'wider_face_val.mat'))
+        hard = loadmat(os.path.join(self.gt_dir, 'wider_hard_val.mat'))
+        medium = loadmat(os.path.join(self.gt_dir, 'wider_medium_val.mat'))
+        easy = loadmat(os.path.join(self.gt_dir, 'wider_easy_val.mat'))
+
+        faceboxes = gt['face_bbx_list']
+        events = gt['event_list']
+        files = gt['file_list']
+
+        hard_gt_list = hard['gt_list']
+        medium_gt_list = medium['gt_list']
+        easy_gt_list = easy['gt_list']
+
+        return faceboxes, events, files, hard_gt_list, medium_gt_list, easy_gt_list
+
+    def _norm_pre_score(self):
+        """_norm_pre_score"""
+        max_score = 0
+        min_score = 1
+
+        for event in self.results:
+            for name in self.results[event].keys():
+                bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
+                if bbox.shape[0] <= 0:
+                    continue
+                max_score = max(max_score, np.max(bbox[:, -1]))
+                min_score = min(min_score, np.min(bbox[:, -1]))
+
+        length = max_score - min_score
+        for event in self.results:
+            for name in self.results[event].keys():
+                bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
+                if bbox.shape[0] <= 0:
+                    continue
+                bbox[:, -1] -= min_score
+                bbox[:, -1] /= length
+                self.results[event][name]['bboxes'] = bbox.tolist()
+
+    def _image_eval(self, predict, gt, keep, iou_thresh, section_num):
+        """_image_eval"""
+        _predict = predict.copy()
+        _gt = gt.copy()
+
+        image_p_right = np.zeros(_predict.shape[0])
+        image_gt_right = np.zeros(_gt.shape[0])
+        proposal = np.ones(_predict.shape[0])
+
+        # x1y1wh -> x1y1x2y2
+        _predict[:, 2:4] = _predict[:, 0:2] + _predict[:, 2:4]
+        _gt[:, 2:4] = _gt[:, 0:2] + _gt[:, 2:4]
+
+        ious = self._iou(_predict[:, 0:4], _gt[:, 0:4])
+        for i in range(_predict.shape[0]):
+            gt_ious = ious[i, :]
+            max_iou, max_index = gt_ious.max(), gt_ious.argmax()
+            if max_iou >= iou_thresh:
+                if keep[max_index] == 0:
+                    image_gt_right[max_index] = -1
+                    proposal[i] = -1
+                elif image_gt_right[max_index] == 0:
+                    image_gt_right[max_index] = 1
+
+            right_index = np.where(image_gt_right == 1)[0]
+            image_p_right[i] = len(right_index)
+
+
+
+        image_pr = np.zeros((section_num, 2), dtype=np.float)
+        for section in range(section_num):
+            _thresh = 1 - (section + 1)/section_num
+            over_score_index = np.where(predict[:, 4] >= _thresh)[0]
+            if over_score_index.shape[0] <= 0:
+                image_pr[section, 0] = 0
+                image_pr[section, 1] = 0
+            else:
+                index = over_score_index[-1]
+                p_num = len(np.where(proposal[0:(index+1)] == 1)[0])
+                image_pr[section, 0] = p_num
+                image_pr[section, 1] = image_p_right[index]
+
+        return image_pr
+
+
+    def get_eval_result(self):
+        """get_eval_result"""
+        self._norm_pre_score()
+        facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = self._get_gt_boxes()
+        section_num = 1000
+        sets = ['easy', 'medium', 'hard']
+        set_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
+        ap_key_dict = {0: "Easy   Val AP : ", 1: "Medium Val AP : ", 2: "Hard   Val AP : ",}
+        ap_dict = {}
+        for _set in range(len(sets)):
+            gt_list = set_gts[_set]
+            count_gt = 0
+            pr_curve = np.zeros((section_num, 2), dtype=np.float)
+            for i, _ in enumerate(event_list):
+                event = str(event_list[i][0][0])
+                image_list = file_list[i][0]
+                event_predict_dict = self.results[event]
+                event_gt_index_list = gt_list[i][0]
+                event_gt_box_list = facebox_list[i][0]
+
+                for j, _ in enumerate(image_list):
+                    predict = np.array(event_predict_dict[str(image_list[j][0][0])]['bboxes']).astype(np.float)
+                    gt_boxes = event_gt_box_list[j][0].astype('float')
+                    keep_index = event_gt_index_list[j][0]
+                    count_gt += len(keep_index)
+
+                    if gt_boxes.shape[0] <= 0 or predict.shape[0] <= 0:
+                        continue
+                    keep = np.zeros(gt_boxes.shape[0])
+                    if keep_index.shape[0] > 0:
+                        keep[keep_index-1] = 1
+
+                    image_pr = self._image_eval(predict, gt_boxes, keep,
+                                                iou_thresh=self.iou_thresh,
+                                                section_num=section_num)
+                    pr_curve += image_pr
+
+            precision = pr_curve[:, 1] / pr_curve[:, 0]
+            recall = pr_curve[:, 1] / count_gt
+
+            precision = np.concatenate((np.array([0.]), precision, np.array([0.])))
+            recall = np.concatenate((np.array([0.]), recall, np.array([1.])))
+            for i in range(precision.shape[0]-1, 0, -1):
+                precision[i-1] = np.maximum(precision[i-1], precision[i])
+            index = np.where(recall[1:] != recall[:-1])[0]
+            ap = np.sum((recall[index + 1] - recall[index]) * precision[index + 1])
+
+
+            print(ap_key_dict[_set] + '{:.4f}'.format(ap))
+
+        return ap_dict
+
+
+def val_with_resnet(args_opt):
+    """val_with_resnet"""
+    from src.network_with_resnet import RetinaFace, resnet50
+    cfg = cfg_res50
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='Ascend', device_id=cfg['device_id'], save_graphs=False)
+
+    backbone = resnet50(1001)
+    network = RetinaFace(phase='predict', backbone=backbone)
+    backbone.set_train(False)
+    network.set_train(False)
+
+    # load checkpoint
+    assert args_opt.val_model is not None, 'val_model is None.'
+    param_dict = load_checkpoint(args_opt.val_model)
+    print('Load trained model done. {}'.format(args_opt.val_model))
+    network.init_parameters_data()
+    load_param_into_net(network, param_dict)
+
+    # testing dataset
+    testset_folder = cfg['val_dataset_folder']
+    testset_label_path = cfg['val_dataset_folder'] + "label.txt"
+    with open(testset_label_path, 'r') as f:
+        _test_dataset = f.readlines()
+        test_dataset = []
+        for im_path in _test_dataset:
+            if im_path.startswith('# '):
+                test_dataset.append(im_path[2:-1])  # delete '# ...\n'
+
+    num_images = len(test_dataset)
+
+    timers = {'forward_time': Timer(), 'misc': Timer()}
+
+    if cfg['val_origin_size']:
+        h_max, w_max = 0, 0
+        for img_name in test_dataset:
+            image_path = os.path.join(testset_folder, 'images', img_name)
+            _img = cv2.imread(image_path, cv2.IMREAD_COLOR)
+            if _img.shape[0] > h_max:
+                h_max = _img.shape[0]
+            if _img.shape[1] > w_max:
+                w_max = _img.shape[1]
+
+        h_max = (int(h_max / 32) + 1) * 32
+        w_max = (int(w_max / 32) + 1) * 32
+
+        priors = prior_box(image_sizes=(h_max, w_max),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+    else:
+        target_size = 1600
+        max_size = 2176
+        priors = prior_box(image_sizes=(max_size, max_size),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+
+    # init detection engine
+    detection = DetectionEngine(cfg)
+
+    # testing begin
+    print('Predict box starting')
+    for i, img_name in enumerate(test_dataset):
+        image_path = os.path.join(testset_folder, 'images', img_name)
+
+        img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
+        img = np.float32(img_raw)
+
+        # testing scale
+        if cfg['val_origin_size']:
+            resize = 1
+            assert img.shape[0] <= h_max and img.shape[1] <= w_max
+            image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+        else:
+            im_size_min = np.min(img.shape[0:2])
+            im_size_max = np.max(img.shape[0:2])
+            resize = float(target_size) / float(im_size_min)
+            # prevent bigger axis from being more than max_size:
+            if np.round(resize * im_size_max) > max_size:
+                resize = float(max_size) / float(im_size_max)
+
+            img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
+
+            assert img.shape[0] <= max_size and img.shape[1] <= max_size
+            image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+
+        scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
+        img -= (104, 117, 123)
+        img = img.transpose(2, 0, 1)
+        img = np.expand_dims(img, 0)
+        img = Tensor(img)
+
+        timers['forward_time'].start()
+        boxes, confs, _ = network(img)
+        timers['forward_time'].end()
+        timers['misc'].start()
+        detection.detect(boxes, confs, resize, scale, img_name, priors)
+        timers['misc'].end()
+
+        print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
+                                                                                     timers['forward_time'].diff,
+                                                                                     timers['misc'].diff))
+    print('Predict box done.')
+    print('Eval starting')
+
+    if cfg['val_save_result']:
+        # Save the predict result if you want.
+        predict_result_path = detection.write_result()
+        print('predict result path is {}'.format(predict_result_path))
+
+    detection.get_eval_result()
+    print(args_opt.val_model)
+    print('Eval done.')
+
+
+def val_with_mobilenet(args_opt):
+    """val_with_mobilenet"""
+    from src.network_with_mobilenet import RetinaFace, resnet50, mobilenet025
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU', save_graphs=False)
+
+    # cfg = cfg_res50
+    cfg = cfg_mobile025
+
+    if cfg['name'] == 'ResNet50':
+        backbone = resnet50(1001)
+    elif cfg['name'] == 'MobileNet025':
+        backbone = mobilenet025(1000)
+    network = RetinaFace(phase='predict', backbone=backbone, cfg=cfg)
+    backbone.set_train(False)
+    network.set_train(False)
+
+    # load checkpoint
+    assert args_opt.val_model is not None, 'val_model is None.'
+    param_dict = load_checkpoint(args_opt.val_model)
+    print('Load trained model done. {}'.format(args_opt.val_model))
+    network.init_parameters_data()
+    load_param_into_net(network, param_dict)
+
+    # testing dataset
+    testset_folder = cfg['val_dataset_folder']
+    testset_label_path = cfg['val_dataset_folder'] + "label.txt"
+    with open(testset_label_path, 'r') as f:
+        _test_dataset = f.readlines()
+        test_dataset = []
+        for im_path in _test_dataset:
+            if im_path.startswith('# '):
+                test_dataset.append(im_path[2:-1])  # delete '# ...\n'
+
+    num_images = len(test_dataset)
+
+    timers = {'forward_time': Timer(), 'misc': Timer()}
+
+    if cfg['val_origin_size']:
+        h_max, w_max = 0, 0
+        for img_name in test_dataset:
+            image_path = os.path.join(testset_folder, 'images', img_name)
+            _img = cv2.imread(image_path, cv2.IMREAD_COLOR)
+            if _img.shape[0] > h_max:
+                h_max = _img.shape[0]
+            if _img.shape[1] > w_max:
+                w_max = _img.shape[1]
+
+        h_max = (int(h_max / 32) + 1) * 32
+        w_max = (int(w_max / 32) + 1) * 32
+
+        priors = prior_box(image_sizes=(h_max, w_max),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+    else:
+        target_size = 1600
+        max_size = 2176
+        priors = prior_box(image_sizes=(max_size, max_size),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+
+    # init detection engine
+    detection = DetectionEngine(cfg)
+
+    # testing begin
+    print('Predict box starting')
+    for i, img_name in enumerate(test_dataset):
+        image_path = os.path.join(testset_folder, 'images', img_name)
+
+        img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
+        img = np.float32(img_raw)
+
+        # testing scale
+        if cfg['val_origin_size']:
+            resize = 1
+            assert img.shape[0] <= h_max and img.shape[1] <= w_max
+            image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+        else:
+            im_size_min = np.min(img.shape[0:2])
+            im_size_max = np.max(img.shape[0:2])
+            resize = float(target_size) / float(im_size_min)
+            # prevent bigger axis from being more than max_size:
+            if np.round(resize * im_size_max) > max_size:
+                resize = float(max_size) / float(im_size_max)
+
+            img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
+
+            assert img.shape[0] <= max_size and img.shape[1] <= max_size
+            image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+
+        scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
+        img -= (104, 117, 123)
+        img = img.transpose(2, 0, 1)
+        img = np.expand_dims(img, 0)
+        img = Tensor(img)
+
+        timers['forward_time'].start()
+        boxes, confs, _ = network(img)
+        timers['forward_time'].end()
+        timers['misc'].start()
+        detection.detect(boxes, confs, resize, scale, img_name, priors)
+        timers['misc'].end()
+
+        print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
+                                                                                     timers['forward_time'].diff,
+                                                                                     timers['misc'].diff))
+    print('Predict box done.')
+    print('Eval starting')
+
+    if cfg['val_save_result']:
+        # Save the predict result if you want.
+        predict_result_path = detection.write_result()
+        print('predict result path is {}'.format(predict_result_path))
+
+    detection.get_eval_result()
+    print('Eval done.')
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='val')
+    parser.add_argument('--backbone_name', type=str, default='ResNet50',
+                        help='backbone name')
+    parser.add_argument('--val_model', type=str, default='./train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt',
+                        help='val_model location')
+    args = parser.parse_args()
+    if args_opt.backbone_name == 'ResNet50':
+        val_with_resnet(args_opt=args)
+    elif args_opt.backbone_name == 'MobileNet025':
+        val_with_mobilenet(args_opt=args)
--- a/model_zoo/research/cv/retinaface/export.py
+++ b/model_zoo/research/cv/retinaface/export.py
@ -0,0 +1,62 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+##############export checkpoint file into air, onnx or mindir model#################
+python export.py
+"""
+import argparse
+import numpy as np
+
+from mindspore import Tensor, load_checkpoint, load_param_into_net, export, context
+
+from src.network import RetinaFace, resnet50
+from src.config import cfg_res50
+
+parser = argparse.ArgumentParser(description='senet export')
+parser.add_argument("--device_id", type=int, default=0, help="Device id")
+parser.add_argument("--batch_size", type=int, default=1, help="batch size")
+parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.")
+parser.add_argument("--file_name", type=str, default="retinaface", help="output file name.")
+parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="MINDIR", help="file format")
+parser.add_argument("--device_target", type=str, default="Ascend",
+                    choices=["Ascend", "GPU", "CPU"], help="device target(default: Ascend)")
+args = parser.parse_args()
+
+context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
+if args.device_target == "Ascend":
+    context.set_context(device_id=args.device_id)
+
+
+def export_net():
+    """export net"""
+    if cfg_res50['val_origin_size']:
+        height, width = 5568, 1056
+    else:
+        height, width = 2176, 2176
+
+    backbone = resnet50(1001)
+    net = RetinaFace(phase='predict', backbone=backbone)
+    backbone.set_train(False)
+    net.set_train(False)
+
+    assert args.ckpt_file is not None, "checkpoint_path is None."
+    param_dict = load_checkpoint(args.ckpt_file)
+    net.init_parameters_data()
+    load_param_into_net(net, param_dict)
+    input_arr = Tensor(np.zeros([args.batch_size, 3, height, width], np.float32))
+    export(net, input_arr, file_name=args.file_name, file_format=args.file_format)
+
+if __name__ == '__main__':
+    export_net()
--- a/model_zoo/research/cv/retinaface/postprocess.py
+++ b/model_zoo/research/cv/retinaface/postprocess.py
@ -0,0 +1,423 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Infer Retinaface_resnet50."""
+from __future__ import print_function
+import argparse
+import os
+import time
+import datetime
+import numpy as np
+import cv2
+
+from mindspore import context
+
+from src.config import cfg_res50
+from src.utils import decode_bbox, prior_box
+
+class Timer():
+    def __init__(self):
+        self.start_time = 0.
+        self.diff = 0.
+
+    def start(self):
+        self.start_time = time.time()
+
+    def end(self):
+        self.diff = time.time() - self.start_time
+
+class DetectionEngine:
+    """DetectionEngine"""
+    def __init__(self, cfg):
+        self.results = {}
+        self.nms_thresh = cfg['val_nms_threshold']
+        self.conf_thresh = cfg['val_confidence_threshold']
+        self.iou_thresh = cfg['val_iou_threshold']
+        self.var = cfg['variance']
+        self.save_prefix = cfg['val_predict_save_folder']
+        self.gt_dir = cfg['infer_gt_dir']
+
+    def _iou(self, a, b):
+        """_iou"""
+        A = a.shape[0]
+        B = b.shape[0]
+        max_xy = np.minimum(
+            np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
+            np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
+        min_xy = np.maximum(
+            np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
+            np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
+        inter = np.maximum((max_xy - min_xy + 1), np.zeros_like(max_xy - min_xy))
+        inter = inter[:, :, 0] * inter[:, :, 1]
+
+        area_a = np.broadcast_to(
+            np.expand_dims(
+                (a[:, 2] - a[:, 0] + 1) * (a[:, 3] - a[:, 1] + 1), 1),
+            np.shape(inter))
+        area_b = np.broadcast_to(
+            np.expand_dims(
+                (b[:, 2] - b[:, 0] + 1) * (b[:, 3] - b[:, 1] + 1), 0),
+            np.shape(inter))
+        union = area_a + area_b - inter
+        return inter / union
+
+    def _nms(self, boxes, threshold=0.5):
+        """_nms"""
+        x1 = boxes[:, 0]
+        y1 = boxes[:, 1]
+        x2 = boxes[:, 2]
+        y2 = boxes[:, 3]
+        scores = boxes[:, 4]
+
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+
+        reserved_boxes = []
+        while order.size > 0:
+            i = order[0]
+            reserved_boxes.append(i)
+            max_x1 = np.maximum(x1[i], x1[order[1:]])
+            max_y1 = np.maximum(y1[i], y1[order[1:]])
+            min_x2 = np.minimum(x2[i], x2[order[1:]])
+            min_y2 = np.minimum(y2[i], y2[order[1:]])
+
+            intersect_w = np.maximum(0.0, min_x2 - max_x1 + 1)
+            intersect_h = np.maximum(0.0, min_y2 - max_y1 + 1)
+            intersect_area = intersect_w * intersect_h
+            ovr = intersect_area / (areas[i] + areas[order[1:]] - intersect_area)
+
+            indices = np.where(ovr <= threshold)[0]
+            order = order[indices + 1]
+
+        return reserved_boxes
+
+    def write_result(self):
+        """write_result"""
+        # save result to file.
+        import json
+        t = datetime.datetime.now().strftime('_%Y_%m_%d_%H_%M_%S')
+        try:
+            if not os.path.isdir(self.save_prefix):
+                os.makedirs(self.save_prefix)
+
+            self.file_path = self.save_prefix + '/predict' + t + '.json'
+            f = open(self.file_path, 'w')
+            json.dump(self.results, f)
+        except IOError as e:
+            raise RuntimeError("Unable to open json file to dump. What(): {}".format(str(e)))
+        else:
+            f.close()
+            return self.file_path
+
+    def detect(self, boxes, confs, resize, scale, image_path, priors):
+        """detect"""
+        if boxes.shape[0] == 0:
+            # add to result
+            event_name, img_name = image_path.split('/')
+            self.results[event_name][img_name[:-4]] = {'img_path': image_path,
+                                                       'bboxes': []}
+            return
+
+        boxes = decode_bbox(np.squeeze(boxes, 0), priors, self.var)
+        boxes = boxes * scale / resize
+
+        scores = np.squeeze(confs, 0)[:, 1]
+        # ignore low scores
+        inds = np.where(scores > self.conf_thresh)[0]
+        boxes = boxes[inds]
+        scores = scores[inds]
+
+        # keep top-K before NMS
+        order = scores.argsort()[::-1]
+        boxes = boxes[order]
+        scores = scores[order]
+
+        # do NMS
+        dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
+        keep = self._nms(dets, self.nms_thresh)
+        dets = dets[keep, :]
+
+        dets[:, 2:4] = (dets[:, 2:4].astype(np.int) - dets[:, 0:2].astype(np.int)).astype(np.float) # int
+        dets[:, 0:4] = dets[:, 0:4].astype(np.int).astype(np.float)                                 # int
+
+
+        # add to result
+        event_name, img_name = image_path.split('/')
+        if event_name not in self.results.keys():
+            self.results[event_name] = {}
+        self.results[event_name][img_name[:-4]] = {'img_path': image_path,
+                                                   'bboxes': dets[:, :5].astype(np.float).tolist()}
+
+    def _get_gt_boxes(self):
+        """_get_gt_boxes"""
+        from scipy.io import loadmat
+        gt = loadmat(os.path.join(self.gt_dir, 'wider_face_val.mat'))
+        hard = loadmat(os.path.join(self.gt_dir, 'wider_hard_val.mat'))
+        medium = loadmat(os.path.join(self.gt_dir, 'wider_medium_val.mat'))
+        easy = loadmat(os.path.join(self.gt_dir, 'wider_easy_val.mat'))
+
+        faceboxes = gt['face_bbx_list']
+        events = gt['event_list']
+        files = gt['file_list']
+
+        hard_gt_list = hard['gt_list']
+        medium_gt_list = medium['gt_list']
+        easy_gt_list = easy['gt_list']
+
+        return faceboxes, events, files, hard_gt_list, medium_gt_list, easy_gt_list
+
+    def _norm_pre_score(self):
+        """_norm_pre_score"""
+        max_score = 0
+        min_score = 1
+
+        for event in self.results:
+            for name in self.results[event].keys():
+                bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
+                if bbox.shape[0] <= 0:
+                    continue
+                max_score = max(max_score, np.max(bbox[:, -1]))
+                min_score = min(min_score, np.min(bbox[:, -1]))
+
+        length = max_score - min_score
+        for event in self.results:
+            for name in self.results[event].keys():
+                bbox = np.array(self.results[event][name]['bboxes']).astype(np.float)
+                if bbox.shape[0] <= 0:
+                    continue
+                bbox[:, -1] -= min_score
+                bbox[:, -1] /= length
+                self.results[event][name]['bboxes'] = bbox.tolist()
+
+    def _image_eval(self, predict, gt, keep, iou_thresh, section_num):
+        """_image_eval"""
+        _predict = predict.copy()
+        _gt = gt.copy()
+
+        image_p_right = np.zeros(_predict.shape[0])
+        image_gt_right = np.zeros(_gt.shape[0])
+        proposal = np.ones(_predict.shape[0])
+
+        # x1y1wh -> x1y1x2y2
+        _predict[:, 2:4] = _predict[:, 0:2] + _predict[:, 2:4]
+        _gt[:, 2:4] = _gt[:, 0:2] + _gt[:, 2:4]
+
+        ious = self._iou(_predict[:, 0:4], _gt[:, 0:4])
+        for i in range(_predict.shape[0]):
+            gt_ious = ious[i, :]
+            max_iou, max_index = gt_ious.max(), gt_ious.argmax()
+            if max_iou >= iou_thresh:
+                if keep[max_index] == 0:
+                    image_gt_right[max_index] = -1
+                    proposal[i] = -1
+                elif image_gt_right[max_index] == 0:
+                    image_gt_right[max_index] = 1
+
+            right_index = np.where(image_gt_right == 1)[0]
+            image_p_right[i] = len(right_index)
+
+
+
+        image_pr = np.zeros((section_num, 2), dtype=np.float)
+        for section in range(section_num):
+            _thresh = 1 - (section + 1)/section_num
+            over_score_index = np.where(predict[:, 4] >= _thresh)[0]
+            if over_score_index.shape[0] <= 0:
+                image_pr[section, 0] = 0
+                image_pr[section, 1] = 0
+            else:
+                index = over_score_index[-1]
+                p_num = len(np.where(proposal[0:(index+1)] == 1)[0])
+                image_pr[section, 0] = p_num
+                image_pr[section, 1] = image_p_right[index]
+
+        return image_pr
+
+
+    def get_eval_result(self):
+        """get_eval_result"""
+        self._norm_pre_score()
+        facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = self._get_gt_boxes()
+        section_num = 1000
+        sets = ['easy', 'medium', 'hard']
+        set_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
+        ap_key_dict = {0: "Easy   Val AP : ", 1: "Medium Val AP : ", 2: "Hard   Val AP : ",}
+        ap_dict = {}
+        for _set in range(len(sets)):
+            gt_list = set_gts[_set]
+            count_gt = 0
+            pr_curve = np.zeros((section_num, 2), dtype=np.float)
+            for i, _ in enumerate(event_list):
+                event = str(event_list[i][0][0])
+                image_list = file_list[i][0]
+                event_predict_dict = self.results[event]
+                event_gt_index_list = gt_list[i][0]
+                event_gt_box_list = facebox_list[i][0]
+
+                for j, _ in enumerate(image_list):
+                    predict = np.array(event_predict_dict[str(image_list[j][0][0])]['bboxes']).astype(np.float)
+                    gt_boxes = event_gt_box_list[j][0].astype('float')
+                    keep_index = event_gt_index_list[j][0]
+                    count_gt += len(keep_index)
+
+                    if gt_boxes.shape[0] <= 0 or predict.shape[0] <= 0:
+                        continue
+                    keep = np.zeros(gt_boxes.shape[0])
+                    if keep_index.shape[0] > 0:
+                        keep[keep_index-1] = 1
+
+                    image_pr = self._image_eval(predict, gt_boxes, keep,
+                                                iou_thresh=self.iou_thresh,
+                                                section_num=section_num)
+                    pr_curve += image_pr
+
+            precision = pr_curve[:, 1] / pr_curve[:, 0]
+            recall = pr_curve[:, 1] / count_gt
+
+            precision = np.concatenate((np.array([0.]), precision, np.array([0.])))
+            recall = np.concatenate((np.array([0.]), recall, np.array([1.])))
+            for i in range(precision.shape[0]-1, 0, -1):
+                precision[i-1] = np.maximum(precision[i-1], precision[i])
+            index = np.where(recall[1:] != recall[:-1])[0]
+            ap = np.sum((recall[index + 1] - recall[index]) * precision[index + 1])
+
+
+            print(ap_key_dict[_set] + '{:.4f}'.format(ap))
+
+        return ap_dict
+
+
+def val():
+    """val"""
+    parser = argparse.ArgumentParser(description='Postprocess file')
+    parser.add_argument('--device_id', type=int, default=0, help='device id.')
+    args_opt = parser.parse_args()
+    cfg = cfg_res50
+
+    context.set_context(mode=context.GRAPH_MODE, device_target='Ascend',
+                        device_id=args_opt.device_id, save_graphs=False)
+
+    # testing dataset
+    testset_folder = cfg['infer_dataset_folder']
+    testset_label_path = cfg['infer_dataset_folder'] + "label.txt"
+    with open(testset_label_path, 'r') as f:
+        _test_dataset = f.readlines()
+        test_dataset = []  # such as "0--Parade/0_Parade_marchingband_1_465.jpg"
+        for im_path in _test_dataset:
+            if im_path.startswith('# '):
+                test_dataset.append(im_path[2:-1])  # delete '# ...\n'
+
+    num_images = len(test_dataset)
+
+    timers = {'forward_time': Timer(), 'misc': Timer()}
+
+    if cfg['val_origin_size']:
+        h_max, w_max = 0, 0
+        for img_name in test_dataset:
+            image_path = os.path.join(testset_folder, 'images', img_name)  # .jpg's location
+            _img = cv2.imread(image_path, cv2.IMREAD_COLOR)
+            if _img.shape[0] > h_max:
+                h_max = _img.shape[0]
+            if _img.shape[1] > w_max:
+                w_max = _img.shape[1]
+
+        h_max = (int(h_max / 32) + 1) * 32
+        w_max = (int(w_max / 32) + 1) * 32
+
+        priors = prior_box(image_sizes=(h_max, w_max),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+    else:
+        target_size = 1600
+        max_size = 2176
+        priors = prior_box(image_sizes=(max_size, max_size),
+                           min_sizes=[[16, 32], [64, 128], [256, 512]],
+                           steps=[8, 16, 32],
+                           clip=False)
+
+    # init detection engine
+    detection = DetectionEngine(cfg)
+
+    # testing begin
+    print('Predict box starting')
+    for i, img_name in enumerate(test_dataset):
+        image_path = os.path.join(testset_folder, 'images', img_name)  # .jpg's location
+
+        img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
+        img = np.float32(img_raw)
+
+        # testing scale
+        if cfg['val_origin_size']:
+            resize = 1
+            assert img.shape[0] <= h_max and img.shape[1] <= w_max
+            image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+        else:
+            im_size_min = np.min(img.shape[0:2])
+            im_size_max = np.max(img.shape[0:2])
+            resize = float(target_size) / float(im_size_min)
+            # prevent bigger axis from being more than max_size:
+            if np.round(resize * im_size_max) > max_size:
+                resize = float(max_size) / float(im_size_max)
+
+            img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
+
+            assert img.shape[0] <= max_size and img.shape[1] <= max_size
+            image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+
+        scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
+
+        timers['forward_time'].start()
+        boxes_name = os.path.join("./result_Files", "widerface_test" + "_" + str(i) + "_0.bin")
+        boxes = np.fromfile(boxes_name, np.float32)
+        if cfg['val_origin_size']:
+            boxes = boxes.reshape(1, 241164, 4)
+        else:
+            boxes = boxes.reshape(1, 194208, 4)
+        confs_name = os.path.join("./result_Files", "widerface_test" + "_" + str(i) + "_1.bin")
+        confs = np.fromfile(confs_name, np.float32)
+        if cfg['val_origin_size']:
+            confs = confs.reshape(1, 241164, 2)
+        else:
+            confs = confs.reshape(1, 194208, 2)
+
+        timers['forward_time'].end()
+        timers['misc'].start()
+        detection.detect(boxes, confs, resize, scale, img_name, priors)
+        timers['misc'].end()
+
+        print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images,
+                                                                                     timers['forward_time'].diff,
+                                                                                     timers['misc'].diff))
+    print('Predict box done.')
+    print('Eval starting')
+
+    if cfg['val_save_result']:
+        # Save the predict result if you want.
+        predict_result_path = detection.write_result()
+        print('predict result path is {}'.format(predict_result_path))
+
+    detection.get_eval_result()
+    print(cfg['val_model'])
+    print('Eval done.')
+
+
+if __name__ == '__main__':
+    val()
--- a/model_zoo/research/cv/retinaface/preprocess.py
+++ b/model_zoo/research/cv/retinaface/preprocess.py
@ -0,0 +1,88 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""preprocess"""
+from __future__ import print_function
+import argparse
+import os
+import numpy as np
+import cv2
+
+from src.config import cfg_res50
+
+cfg = cfg_res50
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Process file')
+    parser.add_argument('--val_dataset_folder', type=str, default='/home/dataset/widerface/val',
+                        help='val dataset folder.')
+    args_opt = parser.parse_args()
+
+    # testing dataset
+    testset_folder = args_opt.val_dataset_folder
+    testset_label_path = os.path.join(args_opt.val_dataset_folder, "label.txt")
+    with open(testset_label_path, 'r') as f:
+        _test_dataset = f.readlines()
+        test_dataset = []
+        for im_path in _test_dataset:
+            if im_path.startswith('# '):
+                test_dataset.append(im_path[2:-1])  # delete '# ...\n'
+
+    # transform data to bin_file
+    print('Transform starting')
+    img_path = "./bin_file"
+    os.makedirs(img_path)
+    h_max, w_max = 5568, 1056
+    for i, img_name in enumerate(test_dataset):
+        image_path = os.path.join(testset_folder, 'images', img_name)
+
+        img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
+        img = np.float32(img_raw)
+
+        # testing scale
+        if cfg['val_origin_size']:
+            resize = 1
+            assert img.shape[0] <= h_max and img.shape[1] <= w_max
+            image_t = np.empty((h_max, w_max, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+        else:
+            im_size_min = np.min(img.shape[0:2])
+            im_size_max = np.max(img.shape[0:2])
+            resize = float(target_size) / float(im_size_min)
+            # prevent bigger axis from being more than max_size:
+            if np.round(resize * im_size_max) > max_size:
+                resize = float(max_size) / float(im_size_max)
+
+            img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR)
+
+            assert img.shape[0] <= max_size and img.shape[1] <= max_size
+            image_t = np.empty((max_size, max_size, 3), dtype=img.dtype)
+            image_t[:, :] = (104.0, 117.0, 123.0)
+            image_t[0:img.shape[0], 0:img.shape[1]] = img
+            img = image_t
+
+        scale = np.array([img.shape[1], img.shape[0], img.shape[1], img.shape[0]], dtype=img.dtype)
+        img -= (104, 117, 123)
+        img = img.transpose(2, 0, 1)
+        img = np.expand_dims(img, 0)  # [1, c, h, w] (1, 3, 2176, 2176)
+        # save bin file
+        file_name = "widerface_test" + "_" + str(i) + ".bin"
+        file_path = os.path.join(img_path, file_name)
+        img.tofile(file_path)
+        if i % 50 == 0:
+            print("Finish {} files".format(i))
+    print("=" * 20, "export bin files finished", "=" * 20)
--- a/model_zoo/research/cv/retinaface/requirements.txt
+++ b/model_zoo/research/cv/retinaface/requirements.txt
@ -0,0 +1,3 @@
+numpy
+opencv-python
+scipy
--- a/model_zoo/research/cv/retinaface/scripts/run_distribution_train_ascend.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_distribution_train_ascend.sh
@ -0,0 +1,40 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+
+ulimit -u unlimited
+export DEVICE_NUM=8
+export RANK_SIZE=8
+RANK_TABLE_FILE=$(realpath $1)
+export RANK_TABLE_FILE
+echo "RANK_TABLE_FILE=${RANK_TABLE_FILE}"
+
+export SERVER_ID=0
+rank_start=$((DEVICE_NUM * SERVER_ID))
+for((i=0; i<${DEVICE_NUM}; i++))
+do
+    export DEVICE_ID=$i
+    export RANK_ID=$((rank_start + i))
+    rm -rf ./train_parallel$i
+    mkdir ./train_parallel$i
+    cp -r ./src ./train_parallel$i
+    cp ./train.py ./train_parallel$i
+    echo "start training for rank $RANK_ID, device $DEVICE_ID"
+    cd ./train_parallel$i ||exit
+    env > env.log
+    python train.py --backbone_name 'ResNet50' --device_id=$i > log 2>&1 &
+    cd ..
+done
--- a/model_zoo/research/cv/retinaface/scripts/run_distribution_train_gpu.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_distribution_train_gpu.sh
@ -0,0 +1,27 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_distribute_gpu_train.sh DEVICE_NUM CUDA_VISIBLE_DEVICES"
+echo "for example: bash run_distribute_gpu_train.sh 4 0,1,2,3"
+echo "=============================================================================================================="
+
+RANK_SIZE=$1
+export CUDA_VISIBLE_DEVICES="$2"
+
+mpirun --allow-run-as-root -n $RANK_SIZE --output-filename log_output --merge-stderr-to-stdout \
+    python train.py --backbone_name 'MobileNet025' > train.log 2>&1 &
--- a/model_zoo/research/cv/retinaface/scripts/run_infer_310.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_infer_310.sh
@ -0,0 +1,129 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [[ $# -lt 2 || $# -gt 3 ]]; then
+    echo "Usage: bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]
+    DEVICE_ID is optional, it can be set by environment variable device_id, otherwise the value is zero"
+exit 1
+fi
+
+get_real_path(){
+    if [ "${1:0:1}" == "/" ]; then
+        echo "$1"
+    else
+        echo "$(realpath -m $PWD/$1)"
+    fi
+}
+model=$(get_real_path $1)
+
+dataset_path=$(get_real_path $2)
+
+
+device_id=0
+if [ $# == 3 ]; then
+    device_id=$3
+fi
+
+echo "mindir name: "$model
+echo "dataset path: "$dataset_path
+echo "device id: "$device_id
+
+export ASCEND_HOME=/usr/local/Ascend/
+if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then
+    export PATH=$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/ascend-toolkit/latest/atc/bin:$PATH
+    export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/ascend-toolkit/latest/atc/lib64:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
+    export TBE_IMPL_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe
+    export PYTHONPATH=${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH
+    export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp
+else
+    export PATH=$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH
+    export LD_LIBRARY_PATH=/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH
+    export PYTHONPATH=$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH
+    export ASCEND_OPP_PATH=$ASCEND_HOME/opp
+fi
+
+export ASCEND_HOME=/usr/local/Ascend
+
+export PATH=$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/toolkit/bin:$PATH
+
+export LD_LIBRARY_PATH=/usr/local/lib/:/usr/local/fwkacllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:/usr/local/Ascend/toolkit/lib64:$LD_LIBRARY_PATH
+
+export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages
+
+export PATH=/usr/local/python375/bin:$PATH
+export NPU_HOST_LIB=/usr/local/Ascend/acllib/lib64/stub
+export ASCEND_OPP_PATH=/usr/local/Ascend/opp
+export ASCEND_AICPU_PATH=/usr/local/Ascend
+export LD_LIBRARY_PATH=/usr/local/lib64/:$LD_LIBRARY_PATH
+
+function preprocess_data()
+{
+   if [ -d preprocess_Result ]; then
+       rm -rf ./preprocess_Result
+    fi
+    mkdir preprocess_Result
+    python3.7 ../preprocess.py --val_dataset_folder=$dataset_path
+}
+
+function compile_app()
+{
+    cd ../ascend310_infer/ || exit
+    bash build.sh &> build.log
+}
+
+function infer()
+{
+    cd - || exit
+    if [ -d result_Files ]; then
+        rm -rf ./result_Files
+    fi
+    if [ -d time_Result ]; then
+        rm -rf ./time_Result
+    fi
+    mkdir result_Files
+    mkdir time_Result
+
+    ../ascend310_infer/out/main --mindir_path=$model --input0_path=./bin_file --device_id=$device_id &> infer.log
+}
+
+function cal_acc()
+{
+    python3.7 ../postprocess.py --device_id=$device_id &> acc.log
+}
+
+preprocess_data
+if [ $? -ne 0 ]; then
+    echo "preprocess dataset failed"
+    exit 1
+fi
+
+compile_app
+if [ $? -ne 0 ]; then
+    echo "compile app code failed"
+    exit 1
+fi
+
+infer
+if [ $? -ne 0 ]; then
+    echo " execute inference failed"
+    exit 1
+fi
+
+cal_acc
+if [ $? -ne 0 ]; then
+    echo "calculate accuracy failed"
+    exit 1
+fi
--- a/model_zoo/research/cv/retinaface/scripts/run_standalone_eval_ascend.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_standalone_eval_ascend.sh
@ -0,0 +1,22 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "for example: bash run_standalone_eval_ascend.sh [CKPT_FILE]"
+echo "=============================================================================================================="
+
+python eval.py --backbone_name 'ResNet50' --val_model $1 > ./eval.log 2>&1 &
--- a/model_zoo/research/cv/retinaface/scripts/run_standalone_eval_gpu.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_standalone_eval_gpu.sh
@ -0,0 +1,24 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_standalone_gpu_eval.sh CUDA_VISIBLE_DEVICES"
+echo "for example: bash run_standalone_gpu_eval.sh 0 [CKPT_FILE]"
+echo "=============================================================================================================="
+
+export CUDA_VISIBLE_DEVICES="$1"
+python eval.py --backbone_name 'MobileNet025' --val_model $2 > eval.log 2>&1 &
--- a/model_zoo/research/cv/retinaface/scripts/run_standalone_train_ascend.sh
+++ b/model_zoo/research/cv/retinaface/scripts/run_standalone_train_ascend.sh
@ -0,0 +1,19 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "Usage: bash ./scripts/run_standalone_train_ascend.sh"
+
+python train.py --backbone_name 'ResNet50' > train.log 2>&1 &
--- a/model_zoo/research/cv/retinaface/src/init.py
+++ b/model_zoo/research/cv/retinaface/src/init.py
--- a/model_zoo/research/cv/retinaface/src/augmemtation.py
+++ b/model_zoo/research/cv/retinaface/src/augmemtation.py
@ -0,0 +1,315 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Augmentation."""
+import random
+import copy
+import cv2
+import numpy as np
+
+def _rand(a=0., b=1.):
+    return np.random.rand() * (b - a) + a
+
+def bbox_iof(bbox_a, bbox_b, offset=0):
+    """bbox_iof"""
+    if bbox_a.shape[1] < 4 or bbox_b.shape[1] < 4:
+        raise IndexError("Bounding boxes axis 1 must have at least length 4")
+
+    tl = np.maximum(bbox_a[:, None, 0:2], bbox_b[:, 0:2])
+    br = np.minimum(bbox_a[:, None, 2:4], bbox_b[:, 2:4])
+
+    area_i = np.prod(br - tl + offset, axis=2) * (tl < br).all(axis=2)
+    area_a = np.prod(bbox_a[:, 2:4] - bbox_a[:, :2] + offset, axis=1)
+    return area_i / np.maximum(area_a[:, None], 1)
+
+def _is_iof_satisfied_constraint(box, crop_box):
+    iof = bbox_iof(box, crop_box)
+    satisfied = np.any((iof >= 1.0))
+    return satisfied
+
+def _choose_candidate(max_trial, image_w, image_h, boxes):
+    """_choose_candidate"""
+    # add default candidate
+    candidates = [(0, 0, image_w, image_h)]
+
+    for _ in range(max_trial):
+        # box_data should have at least one box
+        if _rand() > 0.2:
+            scale = _rand(0.3, 1.0)
+        else:
+            scale = 1.0
+
+        nh = int(scale * min(image_w, image_h))
+        nw = nh
+
+        dx = int(_rand(0, image_w - nw))
+        dy = int(_rand(0, image_h - nh))
+
+        if boxes.shape[0] > 0:
+            crop_box = np.array((dx, dy, dx + nw, dy + nh))
+            if not _is_iof_satisfied_constraint(boxes, crop_box[np.newaxis]):
+                continue
+            else:
+                candidates.append((dx, dy, nw, nh))
+        else:
+            raise Exception("!!! annotation box is less than 1")
+
+        if len(candidates) >= 3:
+            break
+
+    return candidates
+
+def _correct_bbox_by_candidates(candidates, input_w, input_h, flip, boxes, labels, landms, allow_outside_center):
+    """Calculate correct boxes."""
+    while candidates:
+        if len(candidates) > 1:
+            # ignore default candidate which do not crop
+            candidate = candidates.pop(np.random.randint(1, len(candidates)))
+        else:
+            candidate = candidates.pop(np.random.randint(0, len(candidates)))
+        dx, dy, nw, nh = candidate
+
+        boxes_t = copy.deepcopy(boxes)
+        landms_t = copy.deepcopy(landms)
+        labels_t = copy.deepcopy(labels)
+        landms_t = landms_t.reshape([-1, 5, 2])
+
+        if nw == nh:
+            scale = float(input_w) / float(nw)
+        else:
+            scale = float(input_w) / float(max(nh, nw))
+        boxes_t[:, [0, 2]] = (boxes_t[:, [0, 2]] - dx) * scale
+        boxes_t[:, [1, 3]] = (boxes_t[:, [1, 3]] - dy) * scale
+        landms_t[:, :, 0] = (landms_t[:, :, 0] - dx) * scale
+        landms_t[:, :, 1] = (landms_t[:, :, 1] - dy) * scale
+
+        if flip:
+            boxes_t[:, [0, 2]] = input_w - boxes_t[:, [2, 0]]
+            landms_t[:, :, 0] = input_w - landms_t[:, :, 0]
+            # flip landms
+            landms_t_1 = landms_t[:, 1, :].copy()
+            landms_t[:, 1, :] = landms_t[:, 0, :]
+            landms_t[:, 0, :] = landms_t_1
+            landms_t_4 = landms_t[:, 4, :].copy()
+            landms_t[:, 4, :] = landms_t[:, 3, :]
+            landms_t[:, 3, :] = landms_t_4
+
+        if allow_outside_center:
+            pass
+        else:
+            mask1 = np.logical_and((boxes_t[:, 0] + boxes_t[:, 2])/2. >= 0., (boxes_t[:, 1] + boxes_t[:, 3])/2. >= 0.)
+            boxes_t = boxes_t[mask1]
+            landms_t = landms_t[mask1]
+            labels_t = labels_t[mask1]
+
+            mask2 = np.logical_and((boxes_t[:, 0] + boxes_t[:, 2]) / 2. <= input_w,
+                                   (boxes_t[:, 1] + boxes_t[:, 3]) / 2. <= input_h)
+            boxes_t = boxes_t[mask2]
+            landms_t = landms_t[mask2]
+            labels_t = labels_t[mask2]
+
+        # recorrect x, y for case x,y < 0 reset to zero, after dx and dy, some box can smaller than zero
+        boxes_t[:, 0:2][boxes_t[:, 0:2] < 0] = 0
+        # recorrect w,h not higher than input size
+        boxes_t[:, 2][boxes_t[:, 2] > input_w] = input_w
+        boxes_t[:, 3][boxes_t[:, 3] > input_h] = input_h
+        box_w = boxes_t[:, 2] - boxes_t[:, 0]
+        box_h = boxes_t[:, 3] - boxes_t[:, 1]
+        # discard invalid box: w or h smaller than 1 pixel
+        mask3 = np.logical_and(box_w > 1, box_h > 1)
+        boxes_t = boxes_t[mask3]
+        landms_t = landms_t[mask3]
+        labels_t = labels_t[mask3]
+
+        # normal
+        boxes_t[:, [0, 2]] /= input_w
+        boxes_t[:, [1, 3]] /= input_h
+        landms_t[:, :, 0] /= input_w
+        landms_t[:, :, 1] /= input_h
+
+        landms_t = landms_t.reshape([-1, 10])
+        labels_t = np.expand_dims(labels_t, 1)
+
+        targets_t = np.hstack((boxes_t, landms_t, labels_t))
+
+        if boxes_t.shape[0] > 0:
+
+            return targets_t, candidate
+
+    raise Exception('all candidates can not satisfied re-correct bbox')
+
+def get_interp_method(interp, sizes=()):
+    """Get the interpolation method for resize functions.
+    The major purpose of this function is to wrap a random interp method selection
+    and a auto-estimation method.
+
+    Parameters
+    ----------
+    interp : int
+        interpolation method for all resizing operations
+
+        Possible values:
+        0: Nearest Neighbors Interpolation.
+        1: Bilinear interpolation.
+        2: Bicubic interpolation over 4x4 pixel neighborhood.
+        3: Nearest Neighbors. [Originally it should be Area-based,
+        as we cannot find Area-based, so we use NN instead.
+        Area-based (resampling using pixel area relation). It may be a
+        preferred method for image decimation, as it gives moire-free
+        results. But when the image is zoomed, it is similar to the Nearest
+        Neighbors method. (used by default).
+        4: Lanczos interpolation over 8x8 pixel neighborhood.
+        9: Cubic for enlarge, area for shrink, bilinear for others
+        10: Random select from interpolation method mentioned above.
+        Note:
+        When shrinking an image, it will generally look best with AREA-based
+        interpolation, whereas, when enlarging an image, it will generally look best
+        with Bicubic (slow) or Bilinear (faster but still looks OK).
+        More details can be found in the documentation of OpenCV, please refer to
+        http://docs.opencv.org/master/da/d54/group__imgproc__transform.html.
+    sizes : tuple of int
+        (old_height, old_width, new_height, new_width), if None provided, auto(9)
+        will return Area(2) anyway.
+
+    Returns
+    -------
+    int
+        interp method from 0 to 4
+    """
+    if interp == 9:
+        if sizes:
+            assert len(sizes) == 4
+            oh, ow, nh, nw = sizes
+            if nh > oh and nw > ow:
+                return 2
+            if nh < oh and nw < ow:
+                return 0
+            return 1
+        return 2
+    if interp == 10:
+        return random.randint(0, 4)
+    if interp not in (0, 1, 2, 3, 4):
+        raise ValueError('Unknown interp method %d' % interp)
+    return interp
+
+def cv_image_reshape(interp):
+    """Reshape pil image."""
+    reshape_type = {
+        0: cv2.INTER_LINEAR,
+        1: cv2.INTER_CUBIC,
+        2: cv2.INTER_AREA,
+        3: cv2.INTER_NEAREST,
+        4: cv2.INTER_LANCZOS4,
+    }
+    return reshape_type[interp]
+
+def color_convert(image, a=1, b=0):
+    c_image = image.astype(float) * a + b
+    c_image[c_image < 0] = 0
+    c_image[c_image > 255] = 255
+
+    image[:] = c_image
+
+def color_distortion(image):
+    """color_distortion"""
+    image = copy.deepcopy(image)
+
+    if _rand() > 0.5:
+        if _rand() > 0.5:
+            color_convert(image, b=_rand(-32, 32))
+        if _rand() > 0.5:
+            color_convert(image, a=_rand(0.5, 1.5))
+        image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
+        if _rand() > 0.5:
+            color_convert(image[:, :, 1], a=_rand(0.5, 1.5))
+        if _rand() > 0.5:
+            h_img = image[:, :, 0].astype(int) + random.randint(-18, 18)
+            h_img %= 180
+            image[:, :, 0] = h_img
+        image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
+    else:
+        if _rand() > 0.5:
+            color_convert(image, b=random.uniform(-32, 32))
+        image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
+        if _rand() > 0.5:
+            color_convert(image[:, :, 1], a=random.uniform(0.5, 1.5))
+        if _rand() > 0.5:
+            tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
+            tmp %= 180
+            image[:, :, 0] = tmp
+        image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
+        if _rand() > 0.5:
+            color_convert(image, a=random.uniform(0.5, 1.5))
+
+    return image
+
+class preproc():
+    """preproc"""
+    def __init__(self, image_dim):
+        self.image_input_size = image_dim
+
+    def __call__(self, image, target):
+        assert target.shape[0] > 0, "target without ground truth."
+        _target = copy.deepcopy(target)
+        boxes = _target[:, :4]
+        landms = _target[:, 4:-1]
+        labels = _target[:, -1]
+
+        aug_image, aug_target = self._data_aug(image, boxes, labels, landms, self.image_input_size)
+
+        return aug_image, aug_target
+
+    def _data_aug(self, image, boxes, labels, landms, image_input_size, max_trial=250):
+        """_data_aug"""
+        image_h, image_w, _ = image.shape
+        input_h, input_w = image_input_size, image_input_size
+
+        flip = _rand() < .5
+
+        candidates = _choose_candidate(max_trial=max_trial,
+                                       image_w=image_w,
+                                       image_h=image_h,
+                                       boxes=boxes)
+        targets, candidate = _correct_bbox_by_candidates(candidates=candidates,
+                                                         input_w=input_w,
+                                                         input_h=input_h,
+                                                         flip=flip,
+                                                         boxes=boxes,
+                                                         labels=labels,
+                                                         landms=landms,
+                                                         allow_outside_center=False)
+        # crop image
+        dx, dy, nw, nh = candidate
+        image = image[dy:(dy + nh), dx:(dx + nw)]
+
+        if nw != nh:
+            assert nw == image_w and nh == image_h
+            # pad ori image to square
+            l = max(nw, nh)
+            t_image = np.empty((l, l, 3), dtype=image.dtype)
+            t_image[:, :] = (104, 117, 123)
+            t_image[:nh, :nw] = image
+            image = t_image
+
+        interp = get_interp_method(interp=10)
+        image = cv2.resize(image, (input_w, input_h), interpolation=cv_image_reshape(interp))
+
+        if flip:
+            image = image[:, ::-1]
+
+        image = image.astype(np.float32)
+        image -= (104, 117, 123)
+        image = image.transpose(2, 0, 1)
+
+        return image, targets
--- a/model_zoo/research/cv/retinaface/src/config.py
+++ b/model_zoo/research/cv/retinaface/src/config.py
@ -0,0 +1,134 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Config for train and eval."""
+cfg_res50 = {
+    'name': 'ResNet50',
+    'device_target': "Ascend",
+    'device_id': 0,
+    'variance': [0.1, 0.2],
+    'clip': False,
+    'loc_weight': 2.0,
+    'class_weight': 1.0,
+    'landm_weight': 1.0,
+    'batch_size': 8,
+    'num_workers': 16,
+    'num_anchor': 29126,
+    'nnpu': 8,
+    'image_size': 840,
+    'in_channel': 256,
+    'out_channel': 256,
+    'match_thresh': 0.35,
+
+    # opt
+    'optim': 'sgd',  # 'sgd' or 'momentum'
+    'momentum': 0.9,
+    'weight_decay': 1e-4,
+    'loss_scale': 1,
+
+    # seed
+    'seed': 1,
+
+    # lr
+    'epoch': 60,
+    'T_max': 50,  # cosine_annealing
+    'eta_min': 0.0,  # cosine_annealing
+    'decay1': 20,
+    'decay2': 40,
+    'lr_type': 'dynamic_lr',  # 'dynamic_lr' or cosine_annealing
+    'initial_lr': 0.04,
+    'warmup_epoch': -1,  # dynamic_lr: -1, cosine_annealing:0
+    'gamma': 0.1,
+
+    # checkpoint
+    'ckpt_path': './checkpoint/',
+    'keep_checkpoint_max': 8,
+    'resume_net': None,
+
+    # dataset
+    'training_dataset': '../data/widerface/train/label.txt',
+    'pretrain': True,
+    'pretrain_path': '../data/resnet-90_625.ckpt',
+
+    # val
+    'val_model': './train_parallel3/checkpoint/ckpt_3/RetinaFace-56_201.ckpt',
+    'val_dataset_folder': './data/widerface/val/',
+    'val_origin_size': True,
+    'val_confidence_threshold': 0.02,
+    'val_nms_threshold': 0.4,
+    'val_iou_threshold': 0.5,
+    'val_save_result': False,
+    'val_predict_save_folder': './widerface_result',
+    'val_gt_dir': './data/ground_truth/',
+
+    # infer
+    'infer_dataset_folder': '/home/dataset/widerface/val/',
+    'infer_gt_dir': '/home/dataset/widerface/ground_truth/',
+}
+
+cfg_mobile025 = {
+    'name': 'MobileNet025',
+    'variance': [0.1, 0.2],
+    'clip': False,
+    'loc_weight': 2.0,
+    'class_weight': 1.0,
+    'landm_weight': 1.0,
+    'batch_size': 8,
+    'num_workers': 12,
+    'num_anchor': 16800,
+    'ngpu': 2,
+    'image_size': 640,
+    'in_channel': 32,
+    'out_channel': 64,
+    'match_thresh': 0.35,
+
+    # opt
+    'optim': 'sgd',
+    'momentum': 0.9,
+    'weight_decay': 5e-4,
+
+    # seed
+    'seed': 1,
+
+    # lr
+    'epoch': 120,
+    'decay1': 70,
+    'decay2': 90,
+    'lr_type': 'dynamic_lr',
+    'initial_lr': 0.02,
+    'warmup_epoch': 5,
+    'gamma': 0.1,
+
+    # checkpoint
+    'ckpt_path': './checkpoint/',
+    'save_checkpoint_steps': 2000,
+    'keep_checkpoint_max': 3,
+    'resume_net': None,
+
+    # dataset
+    'training_dataset': '../data/widerface/train/label.txt',
+    'pretrain': False,
+    'pretrain_path': '../data/mobilenetv1-90_5004.ckpt',
+
+    # val
+    'val_model': './checkpoint/ckpt_0/RetinaFace-117_804.ckpt',
+    'val_dataset_folder': './data/widerface/val/',
+    'val_origin_size': False,
+    'val_confidence_threshold': 0.02,
+    'val_nms_threshold': 0.4,
+    'val_iou_threshold': 0.5,
+    'val_save_result': False,
+    'val_predict_save_folder': './widerface_result',
+    'val_gt_dir': './data/ground_truth/',
+}
--- a/model_zoo/research/cv/retinaface/src/dataset.py
+++ b/model_zoo/research/cv/retinaface/src/dataset.py
@ -0,0 +1,171 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Dataset for train and eval."""
+import os
+import copy
+import cv2
+import numpy as np
+
+import mindspore.dataset as de
+from mindspore.communication.management import init, get_rank, get_group_size
+
+from .augmemtation import preproc
+from .utils import bbox_encode
+
+
+class WiderFace():
+    """WiderFace"""
+    def __init__(self, label_path):
+        self.images_list = []
+        self.labels_list = []
+        f = open(label_path, 'r')
+        lines = f.readlines()
+        First = True
+        labels = []
+        for line in lines:
+            line = line.rstrip()
+            if line.startswith('#'):
+                if First is True:
+                    First = False
+                else:
+                    c_labels = copy.deepcopy(labels)
+                    self.labels_list.append(c_labels)
+                    labels.clear()
+                # remove '# '
+                path = line[2:]
+                path = label_path.replace('label.txt', 'images/') + path
+
+                assert os.path.exists(path), 'image path is not exists.'
+
+                self.images_list.append(path)
+            else:
+                line = line.split(' ')
+                label = [float(x) for x in line]
+                labels.append(label)
+        # add the last label
+        self.labels_list.append(labels)
+
+        # del bbox which width is zero or height is zero
+        for i in range(len(self.labels_list) - 1, -1, -1):
+            labels = self.labels_list[i]
+            for j in range(len(labels) - 1, -1, -1):
+                label = labels[j]
+                if label[2] <= 0 or label[3] <= 0:
+                    labels.pop(j)
+            if not labels:
+                self.images_list.pop(i)
+                self.labels_list.pop(i)
+            else:
+                self.labels_list[i] = labels
+
+    def __len__(self):
+        return len(self.images_list)
+
+    def __getitem__(self, item):
+        return self.images_list[item], self.labels_list[item]
+
+def read_dataset(img_path, annotation):
+    """read_dataset"""
+    cv2.setNumThreads(2)
+
+    if isinstance(img_path, str):
+        img = cv2.imread(img_path)
+    else:
+        img = cv2.imread(img_path.tostring().decode("utf-8"))
+
+    labels = annotation
+    anns = np.zeros((0, 15))
+    if labels.shape[0] <= 0:
+        return anns
+    for _, label in enumerate(labels):
+        ann = np.zeros((1, 15))
+
+        # get bbox
+        ann[0, 0:2] = label[0:2]  # x1, y1
+        ann[0, 2:4] = label[0:2] + label[2:4]  # x2, y2
+
+        # get landmarks
+        ann[0, 4:14] = label[[4, 5, 7, 8, 10, 11, 13, 14, 16, 17]]
+
+        # set flag
+        if (ann[0, 4] < 0):
+            ann[0, 14] = -1
+        else:
+            ann[0, 14] = 1
+
+        anns = np.append(anns, ann, axis=0)
+    target = np.array(anns).astype(np.float32)
+
+    return img, target
+
+
+def create_dataset(data_dir, cfg, batch_size=32, repeat_num=1, shuffle=True, multiprocessing=True, num_worker=16):
+    """create_dataset"""
+    dataset = WiderFace(data_dir)
+
+    if cfg['name'] == 'ResNet50':
+        device_num, rank_id = _get_rank_info()
+    elif cfg['name'] == 'MobileNet025':
+        init("nccl")
+        rank_id = get_rank()
+        device_num = get_group_size()
+    if device_num == 1:
+        de_dataset = de.GeneratorDataset(dataset, ["image", "annotation"],
+                                         shuffle=shuffle,
+                                         num_parallel_workers=num_worker)
+    else:
+        de_dataset = de.GeneratorDataset(dataset, ["image", "annotation"],
+                                         shuffle=shuffle,
+                                         num_parallel_workers=num_worker,
+                                         num_shards=device_num,
+                                         shard_id=rank_id)
+
+    aug = preproc(cfg['image_size'])
+    encode = bbox_encode(cfg)
+
+    def union_data(image, annot):
+        i, a = read_dataset(image, annot)
+        i, a = aug(i, a)
+        out = encode(i, a)
+
+        return out
+
+    de_dataset = de_dataset.map(input_columns=["image", "annotation"],
+                                output_columns=["image", "truths", "conf", "landm"],
+                                column_order=["image", "truths", "conf", "landm"],
+                                operations=union_data,
+                                python_multiprocessing=multiprocessing,
+                                num_parallel_workers=num_worker)
+
+    de_dataset = de_dataset.batch(batch_size, drop_remainder=True)
+    de_dataset = de_dataset.repeat(repeat_num)
+
+
+    return de_dataset
+
+
+def _get_rank_info():
+    """
+    get rank size and rank id
+    """
+    rank_size = int(os.environ.get("RANK_SIZE", 1))
+
+    if rank_size > 1:
+        rank_size = get_group_size()
+        rank_id = get_rank()
+    else:
+        rank_size = rank_id = None
+
+    return rank_size, rank_id
--- a/model_zoo/research/cv/retinaface/src/loss.py
+++ b/model_zoo/research/cv/retinaface/src/loss.py
@ -0,0 +1,121 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Loss."""
+import mindspore.common.dtype as mstype
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore import Tensor
+
+
+class SoftmaxCrossEntropyWithLogits(nn.Cell):
+    """SoftmaxCrossEntropyWithLogits"""
+    def __init__(self):
+        super(SoftmaxCrossEntropyWithLogits, self).__init__()
+        self.log_softmax = P.LogSoftmax()
+        self.neg = P.Neg()
+        self.one_hot = P.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.reduce_sum = P.ReduceSum()
+
+    def construct(self, logits, labels):
+        """construct"""
+        prob = self.log_softmax(logits)
+        labels = self.one_hot(labels, F.shape(logits)[-1], self.on_value, self.off_value)
+
+        return self.neg(self.reduce_sum(prob * labels, 1))
+
+
+class MultiBoxLoss(nn.Cell):
+    """MultiBoxLoss"""
+    def __init__(self, num_classes, num_boxes, neg_pre_positive, batch_size):
+        super(MultiBoxLoss, self).__init__()
+        self.num_classes = num_classes
+        self.num_boxes = num_boxes
+        self.neg_pre_positive = neg_pre_positive
+        self.notequal = P.NotEqual()
+        self.less = P.Less()
+        self.tile = P.Tile()
+        self.reduce_sum = P.ReduceSum()
+        self.reduce_mean = P.ReduceMean()
+        self.expand_dims = P.ExpandDims()
+        self.smooth_l1_loss = P.SmoothL1Loss()
+        self.cross_entropy = SoftmaxCrossEntropyWithLogits()
+        self.maximum = P.Maximum()
+        self.minimum = P.Minimum()
+        self.sort_descend = P.TopK(True)
+        self.sort = P.TopK(True)
+        self.max = P.ReduceMax()
+        self.log = P.Log()
+        self.exp = P.Exp()
+        self.concat = P.Concat(axis=1)
+        self.reduce_sum2 = P.ReduceSum(keep_dims=True)
+        self.mul = P.Mul()
+        self.reduce_sum_new = P.ReduceSum(keep_dims=True)
+
+    def construct(self, loc_data, loc_t, conf_data, conf_t, landm_data, landm_t):
+        """construct"""
+        # landm loss
+        mask_pos1 = F.cast(self.less(0.0, F.cast(conf_t, mstype.float32)), mstype.float32)
+
+        N1 = self.maximum(self.reduce_sum(mask_pos1), 1)
+        mask_pos_idx1 = self.tile(self.expand_dims(mask_pos1, -1), (1, 1, 10))
+        loss_landm = self.reduce_sum(self.smooth_l1_loss(landm_data, landm_t) * mask_pos_idx1)
+        loss_landm = loss_landm / N1
+
+        # Localization Loss
+        mask_pos = F.cast(self.notequal(0, conf_t), mstype.float32)
+        conf_t = F.cast(mask_pos, mstype.int32)
+
+        N = self.maximum(self.reduce_sum(mask_pos), 1)
+        mask_pos_idx = self.tile(self.expand_dims(mask_pos, -1), (1, 1, 4))
+        loss_l = self.reduce_sum(self.smooth_l1_loss(loc_data, loc_t) * mask_pos_idx)
+        loss_l = loss_l / N
+
+        # Conf Loss
+        conf_t_shape = F.shape(conf_t)
+        conf_t = F.reshape(conf_t, (-1,))
+        indices = self.concat((1 - F.reshape(conf_t, (-1, 1)), F.reshape(conf_t, (-1, 1))))
+
+        batch_conf = F.reshape(conf_data, (-1, self.num_classes))
+        x_max = self.max(batch_conf)
+        loss_c = self.log(self.reduce_sum2(self.exp(batch_conf - x_max), 1)) + x_max
+        mul_tensor = self.mul(indices, batch_conf)
+        loss_c = loss_c - self.reduce_sum_new(mul_tensor, 1)
+        loss_c = F.reshape(loss_c, conf_t_shape)
+
+        # hard example mining
+        num_matched_boxes = F.reshape(self.reduce_sum(mask_pos, 1), (-1,))
+        neg_masked_cross_entropy = F.cast(loss_c * (1 - mask_pos), mstype.float32)
+
+        _, loss_idx = self.sort_descend(neg_masked_cross_entropy, self.num_boxes)
+        _, relative_position = self.sort(F.cast(loss_idx, mstype.float32), self.num_boxes)
+        relative_position = F.cast(relative_position, mstype.float32)
+        relative_position = relative_position[:, ::-1]
+        relative_position = F.cast(relative_position, mstype.int32)
+
+        num_neg_boxes = self.minimum(num_matched_boxes * self.neg_pre_positive, self.num_boxes - 1)
+        tile_num_neg_boxes = self.tile(self.expand_dims(num_neg_boxes, -1), (1, self.num_boxes))
+        top_k_neg_mask = F.cast(self.less(relative_position, tile_num_neg_boxes), mstype.float32)
+
+        cross_entropy = self.cross_entropy(batch_conf, conf_t)
+        cross_entropy = F.reshape(cross_entropy, conf_t_shape)
+
+        loss_c = self.reduce_sum(cross_entropy * self.minimum(mask_pos + top_k_neg_mask, 1))
+
+        loss_c = loss_c / N
+
+        return loss_l, loss_c, loss_landm
--- a/model_zoo/research/cv/retinaface/src/lr_schedule.py
+++ b/model_zoo/research/cv/retinaface/src/lr_schedule.py
@ -0,0 +1,81 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""learning rate schedule."""
+import math
+import numpy as np
+
+
+def warmup_cosine_annealing_lr(lr5, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):
+    """ warmup cosine annealing lr"""
+    base_lr = lr5
+    warmup_init_lr = 0
+    total_steps = int(max_epoch * steps_per_epoch)
+    warmup_steps = int(warmup_epochs * steps_per_epoch)
+
+    lr_each_step = []
+    for i in range(total_steps):
+        last_epoch = i // steps_per_epoch
+        if i < warmup_steps:
+            lr5 = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
+        else:
+            lr5 = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi * last_epoch / T_max)) / 2
+        lr_each_step.append(lr5)
+
+    return np.array(lr_each_step).astype(np.float32)
+
+
+def _linear_warmup_learning_rate(current_step, warmup_steps, base_lr, init_lr):
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    learning_rate = float(init_lr) + lr_inc * current_step
+    return learning_rate
+
+
+def _a_cosine_learning_rate(current_step, base_lr, warmup_steps, decay_steps):
+    base = float(current_step - warmup_steps) / float(decay_steps)
+    learning_rate = (1 + math.cos(base * math.pi)) / 2 * base_lr
+    return learning_rate
+
+
+def _dynamic_lr(base_lr, total_steps, warmup_steps, warmup_ratio=1 / 3):
+    lr = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr.append(_linear_warmup_learning_rate(i, warmup_steps, base_lr, base_lr * warmup_ratio))
+        else:
+            lr.append(_a_cosine_learning_rate(i, base_lr, warmup_steps, total_steps))
+
+    return lr
+
+
+def adjust_learning_rate(initial_lr, gamma, stepvalues, steps_pre_epoch, total_epochs, warmup_epoch=5, lr_type1=None):
+    """adjust_learning_rate"""
+    if lr_type1 == 'dynamic_lr':
+        return _dynamic_lr(initial_lr, total_epochs * steps_pre_epoch, warmup_epoch * steps_pre_epoch,
+                           warmup_ratio=1 / 3)
+
+    lr_each_step = []
+    for epoch in range(1, total_epochs + 1):
+        for _ in range(steps_pre_epoch):
+            if epoch <= warmup_epoch:
+                lr = 0.1 * initial_lr * (1.5849 ** (epoch - 1))
+            else:
+                if stepvalues[0] <= epoch <= stepvalues[1]:
+                    lr = initial_lr * (gamma ** (1))
+                elif epoch > stepvalues[1]:
+                    lr = initial_lr * (gamma ** (2))
+                else:
+                    lr = initial_lr
+            lr_each_step.append(lr)
+    return lr_each_step
--- a/model_zoo/research/cv/retinaface/src/network_with_mobilenet.py
+++ b/model_zoo/research/cv/retinaface/src/network_with_mobilenet.py
@ -0,0 +1,610 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Network."""
+import math
+from functools import reduce
+import numpy as np
+
+import mindspore
+import mindspore.nn as nn
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+from mindspore import context, Tensor
+from mindspore.parallel._auto_parallel_context import auto_parallel_context
+from mindspore.communication.management import get_group_size
+
+# ResNet
+def _weight_variable(shape, factor=0.01):
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+def _conv3x3(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 3, 3)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
+
+def _conv1x1(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 1, 1)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
+
+def _conv7x7(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 7, 7)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
+
+def _bn(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _bn_last(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _fc(in_channel, out_channel):
+    weight_shape = (out_channel, in_channel)
+    weight = _weight_variable(weight_shape)
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+class ResidualBlock(nn.Cell):
+    """ResidualBlock"""
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1):
+        super(ResidualBlock, self).__init__()
+
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1)
+        self.bn1 = _bn(channel)
+
+        self.conv2 = _conv3x3(channel, channel, stride=stride)
+        self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1)
+        self.bn3 = _bn_last(out_channel)
+
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
+                                                        _bn(out_channel)])
+        self.add = P.Add()
+
+    def construct(self, x):
+        """construct"""
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+class ResNet(nn.Cell):
+    """ResNet"""
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+
+        self.conv1 = _conv7x7(3, 64, stride=2)
+        self.bn1 = _bn(64)
+        self.relu = P.ReLU()
+
+
+        self.pad = P.Pad(((0, 0), (0, 0), (1, 0), (1, 0)))
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
+
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0])
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1])
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2])
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3])
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
+        """_make_layer"""
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride)
+        layers.append(resnet_block)
+
+        for _ in range(1, layer_num):
+            resnet_block = block(out_channel, out_channel, stride=1)
+            layers.append(resnet_block)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """construct"""
+
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.pad(x)
+
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return c3, c4, c5
+
+def resnet50(class_num=10):
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
+
+
+# MobileNet0.25
+def conv_bn(inp, oup, stride=1, leaky=0):
+    return nn.SequentialCell([
+        nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=3, stride=stride,
+                  pad_mode='pad', padding=1, has_bias=False),
+        nn.BatchNorm2d(num_features=oup, momentum=0.9),
+        nn.LeakyReLU(alpha=leaky)  # ms official: nn.get_activation('relu6')
+    ])
+
+def conv_dw(inp, oup, stride, leaky=0.1):
+    return nn.SequentialCell([
+        nn.Conv2d(in_channels=inp, out_channels=inp, kernel_size=3, stride=stride,
+                  pad_mode='pad', padding=1, group=inp, has_bias=False),
+        nn.BatchNorm2d(num_features=inp, momentum=0.9),
+        nn.LeakyReLU(alpha=leaky),  # ms official: nn.get_activation('relu6')
+
+        nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, stride=1,
+                  pad_mode='pad', padding=0, has_bias=False),
+        nn.BatchNorm2d(num_features=oup, momentum=0.9),
+        nn.LeakyReLU(alpha=leaky),  # ms official: nn.get_activation('relu6')
+    ])
+
+
+class MobileNetV1(nn.Cell):
+    """MobileNetV1"""
+    def __init__(self, num_classes):
+        super(MobileNetV1, self).__init__()
+        self.stage1 = nn.SequentialCell([
+            conv_bn(3, 8, 2, leaky=0.1),  # 3
+            conv_dw(8, 16, 1),  # 7
+            conv_dw(16, 32, 2),  # 11
+            conv_dw(32, 32, 1),  # 19
+            conv_dw(32, 64, 2),  # 27
+            conv_dw(64, 64, 1),  # 43
+        ])
+        self.stage2 = nn.SequentialCell([
+            conv_dw(64, 128, 2),  # 43 + 16 = 59
+            conv_dw(128, 128, 1),  # 59 + 32 = 91
+            conv_dw(128, 128, 1),  # 91 + 32 = 123
+            conv_dw(128, 128, 1),  # 123 + 32 = 155
+            conv_dw(128, 128, 1),  # 155 + 32 = 187
+            conv_dw(128, 128, 1),  # 187 + 32 = 219
+        ])
+        self.stage3 = nn.SequentialCell([
+            conv_dw(128, 256, 2),  # 219 +3 2 = 241
+            conv_dw(256, 256, 1),  # 241 + 64 = 301
+        ])
+        self.avg = P.ReduceMean()
+        self.fc = nn.Dense(in_channels=256, out_channels=num_classes)
+
+    def construct(self, x):
+        x1 = self.stage1(x)
+        x2 = self.stage2(x1)
+        x3 = self.stage3(x2)
+        out = self.avg(x3, (2, 3))
+        out = self.fc(out)
+        return x1, x2, x3
+
+
+def mobilenet025(class_num=1000):
+    return MobileNetV1(class_num)
+
+
+# RetinaFace
+def Init_KaimingUniform(arr_shape, a=0, nonlinearity='leaky_relu', has_bias=False):
+    """Init_KaimingUniform"""
+    def _calculate_in_and_out(arr_shape):
+        dim = len(arr_shape)
+        if dim < 2:
+            raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
+
+        n_in = arr_shape[1]
+        n_out = arr_shape[0]
+
+        if dim > 2:
+
+            counter = reduce(lambda x, y: x * y, arr_shape[2:])
+            n_in *= counter
+            n_out *= counter
+        return n_in, n_out
+
+    def calculate_gain(nonlinearity, a=None):
+        linear_fans = ['linear', 'conv1d', 'conv2d', 'conv3d',
+                       'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+        if nonlinearity in linear_fans or nonlinearity == 'sigmoid':
+            return 1
+        if nonlinearity == 'tanh':
+            return 5.0 / 3
+        if nonlinearity == 'relu':
+            return math.sqrt(2.0)
+        if nonlinearity == 'leaky_relu':
+            if a is None:
+                negative_slope = 0.01
+            elif not isinstance(a, bool) and isinstance(a, int) or isinstance(a, float):
+                negative_slope = a
+            else:
+                raise ValueError("negative_slope {} not a valid number".format(a))
+            return math.sqrt(2.0 / (1 + negative_slope ** 2))
+
+        raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
+
+    fan_in, _ = _calculate_in_and_out(arr_shape)
+    gain = calculate_gain(nonlinearity, a)
+    std = gain / math.sqrt(fan_in)
+    bound = math.sqrt(3.0) * std
+    weight = np.random.uniform(-bound, bound, arr_shape).astype(np.float32)
+
+    bias = None
+    if has_bias:
+        bound_bias = 1 / math.sqrt(fan_in)
+        bias = np.random.uniform(-bound_bias, bound_bias, arr_shape[0:1]).astype(np.float32)
+        bias = Tensor(bias)
+
+    return Tensor(weight), bias
+
+class ConvBNReLU(nn.SequentialCell):
+    def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer, leaky=0):
+        weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
+        kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
+
+        super(ConvBNReLU, self).__init__(
+            nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
+                      has_bias=False, weight_init=kaiming_weight),
+            norm_layer(out_planes),
+            nn.LeakyReLU(alpha=leaky)
+        )
+
+class ConvBN(nn.SequentialCell):
+    def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer):
+        weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
+        kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
+
+        super(ConvBN, self).__init__(
+            nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
+                      has_bias=False, weight_init=kaiming_weight),
+            norm_layer(out_planes),
+        )
+
+class SSH(nn.Cell):
+    """SSH"""
+    def __init__(self, in_channel, out_channel):
+        super(SSH, self).__init__()
+        assert out_channel % 4 == 0
+        leaky = 0
+        if out_channel <= 64:
+            leaky = 0.1
+
+        norm_layer = nn.BatchNorm2d
+        self.conv3X3 = ConvBN(in_channel, out_channel // 2, kernel_size=3, stride=1, padding=1, groups=1,
+                              norm_layer=norm_layer)
+
+        self.conv5X5_1 = ConvBNReLU(in_channel, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                    norm_layer=norm_layer, leaky=leaky)
+        self.conv5X5_2 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                norm_layer=norm_layer)
+
+        self.conv7X7_2 = ConvBNReLU(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                    norm_layer=norm_layer, leaky=leaky)
+        self.conv7X7_3 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                norm_layer=norm_layer)
+
+        self.cat = P.Concat(axis=1)
+        self.relu = nn.ReLU()
+
+    def construct(self, x):
+        """construct"""
+        conv3X3 = self.conv3X3(x)
+
+        conv5X5_1 = self.conv5X5_1(x)
+        conv5X5 = self.conv5X5_2(conv5X5_1)
+
+        conv7X7_2 = self.conv7X7_2(conv5X5_1)
+        conv7X7 = self.conv7X7_3(conv7X7_2)
+
+        out = self.cat((conv3X3, conv5X5, conv7X7))
+        out = self.relu(out)
+
+        return out
+
+class FPN(nn.Cell):
+    """FPN"""
+    def __init__(self, cfg):
+        super(FPN, self).__init__()
+        out_channels = cfg['out_channel']
+        leaky = 0
+        if out_channels <= 64:
+            leaky = 0.1
+        norm_layer = nn.BatchNorm2d
+        self.output1 = ConvBNReLU(cfg['in_channel'] * 2, cfg['out_channel'], kernel_size=1, stride=1,
+                                  padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
+        self.output2 = ConvBNReLU(cfg['in_channel'] * 4, cfg['out_channel'], kernel_size=1, stride=1,
+                                  padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
+        self.output3 = ConvBNReLU(cfg['in_channel'] * 8, cfg['out_channel'], kernel_size=1, stride=1,
+                                  padding=0, groups=1, norm_layer=norm_layer, leaky=leaky)
+
+        self.merge1 = ConvBNReLU(cfg['out_channel'], cfg['out_channel'], kernel_size=3, stride=1, padding=1, groups=1,
+                                 norm_layer=norm_layer, leaky=leaky)
+        self.merge2 = ConvBNReLU(cfg['out_channel'], cfg['out_channel'], kernel_size=3, stride=1, padding=1, groups=1,
+                                 norm_layer=norm_layer, leaky=leaky)
+
+    def construct(self, input1, input2, input3):
+        """construct"""
+        output1 = self.output1(input1)
+        output2 = self.output2(input2)
+        output3 = self.output3(input3)
+
+        up3 = P.ResizeNearestNeighbor([P.Shape()(output2)[2], P.Shape()(output2)[3]])(output3)
+        output2 = up3 + output2
+        output2 = self.merge2(output2)
+
+        up2 = P.ResizeNearestNeighbor([P.Shape()(output1)[2], P.Shape()(output1)[3]])(output2)
+        output1 = up2 + output1
+        output1 = self.merge1(output1)
+
+        return output1, output2, output3
+
+class ClassHead(nn.Cell):
+    """ClassHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(ClassHead, self).__init__()
+        self.num_anchors = num_anchors
+
+        weight_shape = (self.num_anchors * 2, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, self.num_anchors * 2, kernel_size=(1, 1), stride=1, padding=0,
+                                 has_bias=True, weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 2))
+
+class BboxHead(nn.Cell):
+    """BboxHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(BboxHead, self).__init__()
+
+        weight_shape = (num_anchors * 4, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 4, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
+                                 weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 4))
+
+class LandmarkHead(nn.Cell):
+    """LandmarkHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(LandmarkHead, self).__init__()
+
+        weight_shape = (num_anchors * 10, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 10, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
+                                 weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 10))
+
+class RetinaFace(nn.Cell):
+    """RetinaFace"""
+    def __init__(self, phase='train', backbone=None, cfg=None):
+
+        super(RetinaFace, self).__init__()
+        self.phase = phase
+
+        self.base = backbone
+
+        self.fpn = FPN(cfg)
+
+        self.ssh1 = SSH(cfg['out_channel'], cfg['out_channel'])
+        self.ssh2 = SSH(cfg['out_channel'], cfg['out_channel'])
+        self.ssh3 = SSH(cfg['out_channel'], cfg['out_channel'])
+
+        self.ClassHead = self._make_class_head(fpn_num=3, inchannels=[cfg['out_channel'], cfg['out_channel'],
+                                                                      cfg['out_channel']], anchor_num=[2, 2, 2])
+        self.BboxHead = self._make_bbox_head(fpn_num=3, inchannels=[cfg['out_channel'], cfg['out_channel'],
+                                                                    cfg['out_channel']], anchor_num=[2, 2, 2])
+        self.LandmarkHead = self._make_landmark_head(fpn_num=3, inchannels=[cfg['out_channel'],
+                                                                            cfg['out_channel'],
+                                                                            cfg['out_channel']],
+                                                     anchor_num=[2, 2, 2])
+
+        self.cat = P.Concat(axis=1)
+
+    def _make_class_head(self, fpn_num, inchannels, anchor_num):
+        classhead = nn.CellList()
+        for i in range(fpn_num):
+            classhead.append(ClassHead(inchannels[i], anchor_num[i]))
+        return classhead
+
+    def _make_bbox_head(self, fpn_num, inchannels, anchor_num):
+        bboxhead = nn.CellList()
+        for i in range(fpn_num):
+            bboxhead.append(BboxHead(inchannels[i], anchor_num[i]))
+        return bboxhead
+
+    def _make_landmark_head(self, fpn_num, inchannels, anchor_num):
+        landmarkhead = nn.CellList()
+        for i in range(fpn_num):
+            landmarkhead.append(LandmarkHead(inchannels[i], anchor_num[i]))
+        return landmarkhead
+
+    def construct(self, inputs):
+        """construct"""
+        f1, f2, f3 = self.base(inputs)
+        f1, f2, f3 = self.fpn(f1, f2, f3)
+
+        # SSH
+        f1 = self.ssh1(f1)
+        f2 = self.ssh2(f2)
+        f3 = self.ssh3(f3)
+        features = [f1, f2, f3]
+
+        bbox = ()
+        for i, feature in enumerate(features):
+            bbox = bbox + (self.BboxHead[i](feature),)
+        bbox_regressions = self.cat(bbox)
+
+        cls = ()
+        for i, feature in enumerate(features):
+            cls = cls + (self.ClassHead[i](feature),)
+        classifications = self.cat(cls)
+
+        landm = ()
+        for i, feature in enumerate(features):
+            landm = landm + (self.LandmarkHead[i](feature),)
+        ldm_regressions = self.cat(landm)
+
+        if self.phase == 'train':
+            output = (bbox_regressions, classifications, ldm_regressions)
+        else:
+            output = (bbox_regressions, P.Softmax(-1)(classifications), ldm_regressions)
+
+        return output
+
+class RetinaFaceWithLossCell(nn.Cell):
+    """RetinaFaceWithLossCell"""
+    def __init__(self, network, multibox_loss, config):
+        super(RetinaFaceWithLossCell, self).__init__()
+        self.network = network
+        self.loc_weight = config['loc_weight']
+        self.class_weight = config['class_weight']
+        self.landm_weight = config['landm_weight']
+        self.multibox_loss = multibox_loss
+
+    def construct(self, img, loc_t, conf_t, landm_t):
+        pred_loc, pre_conf, pre_landm = self.network(img)
+        loss_loc, loss_conf, loss_landm = self.multibox_loss(pred_loc, loc_t, pre_conf, conf_t, pre_landm, landm_t)
+
+        return loss_loc * self.loc_weight + loss_conf * self.class_weight + loss_landm * self.landm_weight
+
+class TrainingWrapper(nn.Cell):
+    """TrainingWrapper"""
+    def __init__(self, network, optimizer, sens=1.0):
+        super(TrainingWrapper, self).__init__(auto_prefix=False)
+        self.network = network
+        self.weights = mindspore.ParameterTuple(network.trainable_params())
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.sens = sens
+        self.reducer_flag = False
+        self.grad_reducer = None
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        class_list = [mindspore.context.ParallelMode.DATA_PARALLEL, mindspore.context.ParallelMode.HYBRID_PARALLEL]
+        if self.parallel_mode in class_list:
+            self.reducer_flag = True
+        if self.reducer_flag:
+            mean = context.get_auto_parallel_context("gradients_mean")
+            if auto_parallel_context().get_device_num_is_set():
+                degree = context.get_auto_parallel_context("device_num")
+            else:
+                degree = get_group_size()
+            self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
+
+    def construct(self, *args):
+        weights = self.weights
+        loss = self.network(*args)
+        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
+        grads = self.grad(self.network, weights)(*args, sens)
+        if self.reducer_flag:
+            # apply grad reducer on grads
+            grads = self.grad_reducer(grads)
+        return F.depend(loss, self.optimizer(grads))
--- a/model_zoo/research/cv/retinaface/src/network_with_resnet.py
+++ b/model_zoo/research/cv/retinaface/src/network_with_resnet.py
@ -0,0 +1,578 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Network."""
+import math
+from functools import reduce
+import numpy as np
+
+import mindspore
+import mindspore.nn as nn
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+from mindspore import context, Tensor
+from mindspore.parallel._auto_parallel_context import auto_parallel_context
+from mindspore.communication.management import get_group_size
+
+conv_weight_init = 'HeUniform'
+
+# ResNet
+def _weight_variable(shape, factor=0.01):
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+def _conv3x3(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 3, 3)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
+
+def _conv1x1(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 1, 1)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
+
+def _conv7x7(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 7, 7)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
+
+def _bn(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _bn_last(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _fc(in_channel, out_channel):
+    weight_shape = (out_channel, in_channel)
+    weight = _weight_variable(weight_shape)
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+class ResidualBlock(nn.Cell):
+    """ResidualBlock"""
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1):
+        super(ResidualBlock, self).__init__()
+
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1)
+        self.bn1 = _bn(channel)
+
+        self.conv2 = _conv3x3(channel, channel, stride=stride)
+        self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1)
+        self.bn3 = _bn_last(out_channel)
+
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
+                                                        _bn(out_channel)])
+        self.add = P.Add()
+
+    def construct(self, x):
+        """construct"""
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+class ResNet(nn.Cell):
+    """ResNet"""
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+
+        self.conv1 = _conv7x7(3, 64, stride=2)
+        self.bn1 = _bn(64)
+        self.relu = P.ReLU()
+
+        self.zeros1 = P.Zeros()
+        self.zeros2 = P.Zeros()
+        self.concat1 = P.Concat(axis=2)
+        self.concat2 = P.Concat(axis=3)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
+
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0])
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1])
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2])
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3])
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
+        """_make_layer"""
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride)
+        layers.append(resnet_block)
+
+        for _ in range(1, layer_num):
+            resnet_block = block(out_channel, out_channel, stride=1)
+            layers.append(resnet_block)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """construct"""
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        zeros1 = self.zeros1((x.shape[0], x.shape[1], 1, x.shape[3]), mindspore.float32)
+        x = self.concat1((zeros1, x))
+        zeros2 = self.zeros2((x.shape[0], x.shape[1], x.shape[2], 1), mindspore.float32)
+        x = self.concat2((zeros2, x))
+
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return c3, c4, c5
+
+def resnet50(class_num=10):
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
+
+
+# RetinaFace
+def Init_KaimingUniform(arr_shape, a=0, nonlinearity='leaky_relu', has_bias=False):
+    """Init_KaimingUniform"""
+    def _calculate_in_and_out(arr_shape):
+        dim = len(arr_shape)
+        if dim < 2:
+            raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
+
+        n_in = arr_shape[1]
+        n_out = arr_shape[0]
+
+        if dim > 2:
+
+            counter = reduce(lambda x, y: x * y, arr_shape[2:])
+            n_in *= counter
+            n_out *= counter
+        return n_in, n_out
+
+    def calculate_gain(nonlinearity, a=None):
+        linear_fans = ['linear', 'conv1d', 'conv2d', 'conv3d',
+                       'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+        if nonlinearity in linear_fans or nonlinearity == 'sigmoid':
+            return 1
+        if nonlinearity == 'tanh':
+            return 5.0 / 3
+        if nonlinearity == 'relu':
+            return math.sqrt(2.0)
+        if nonlinearity == 'leaky_relu':
+            if a is None:
+                negative_slope = 0.01
+            elif not isinstance(a, bool) and isinstance(a, int) or isinstance(a, float):
+                negative_slope = a
+            else:
+                raise ValueError("negative_slope {} not a valid number".format(a))
+            return math.sqrt(2.0 / (1 + negative_slope ** 2))
+
+        raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
+
+    fan_in, _ = _calculate_in_and_out(arr_shape)
+    gain = calculate_gain(nonlinearity, a)
+    std = gain / math.sqrt(fan_in)
+    bound = math.sqrt(3.0) * std
+    weight = np.random.uniform(-bound, bound, arr_shape).astype(np.float32)
+
+    bias = None
+    if has_bias:
+        bound_bias = 1 / math.sqrt(fan_in)
+        bias = np.random.uniform(-bound_bias, bound_bias, arr_shape[0:1]).astype(np.float32)
+        bias = Tensor(bias)
+
+    return Tensor(weight), bias
+
+class ConvBNReLU(nn.SequentialCell):
+    def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer, leaky=0):
+        weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
+        kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
+
+        super(ConvBNReLU, self).__init__(
+            nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
+                      has_bias=False, weight_init=kaiming_weight),
+            norm_layer(out_planes),
+            nn.ReLU()
+        )
+
+class ConvBN(nn.SequentialCell):
+    def __init__(self, in_planes, out_planes, kernel_size, stride, padding, groups, norm_layer):
+        weight_shape = (out_planes, in_planes, kernel_size, kernel_size)
+        kaiming_weight, _ = Init_KaimingUniform(weight_shape, a=math.sqrt(5))
+
+        super(ConvBN, self).__init__(
+            nn.Conv2d(in_planes, out_planes, kernel_size, stride, pad_mode='pad', padding=padding, group=groups,
+                      has_bias=False, weight_init=kaiming_weight),
+            norm_layer(out_planes),
+        )
+
+class SSH(nn.Cell):
+    """SSH"""
+    def __init__(self, in_channel, out_channel):
+        super(SSH, self).__init__()
+        assert out_channel % 4 == 0
+        leaky = 0
+        if out_channel <= 64:
+            leaky = 0.1
+
+        norm_layer = nn.BatchNorm2d
+        self.conv3X3 = ConvBN(in_channel, out_channel // 2, kernel_size=3, stride=1, padding=1, groups=1,
+                              norm_layer=norm_layer)
+
+        self.conv5X5_1 = ConvBNReLU(in_channel, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                    norm_layer=norm_layer, leaky=leaky)
+        self.conv5X5_2 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                norm_layer=norm_layer)
+
+        self.conv7X7_2 = ConvBNReLU(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                    norm_layer=norm_layer, leaky=leaky)
+        self.conv7X7_3 = ConvBN(out_channel // 4, out_channel // 4, kernel_size=3, stride=1, padding=1, groups=1,
+                                norm_layer=norm_layer)
+
+        self.cat = P.Concat(axis=1)
+        self.relu = nn.ReLU()
+
+    def construct(self, x):
+        """construct"""
+        conv3X3 = self.conv3X3(x)
+
+        conv5X5_1 = self.conv5X5_1(x)
+        conv5X5 = self.conv5X5_2(conv5X5_1)
+
+        conv7X7_2 = self.conv7X7_2(conv5X5_1)
+        conv7X7 = self.conv7X7_3(conv7X7_2)
+
+        out = self.cat((conv3X3, conv5X5, conv7X7))
+        out = self.relu(out)
+
+        return out
+
+class FPN(nn.Cell):
+    """FPN"""
+    def __init__(self):
+        super(FPN, self).__init__()
+        out_channels = 256
+        leaky = 0
+        if out_channels <= 64:
+            leaky = 0.1
+        norm_layer = nn.BatchNorm2d
+        self.output1 = ConvBNReLU(512, 256, kernel_size=1, stride=1, padding=0, groups=1,
+                                  norm_layer=norm_layer, leaky=leaky)
+        self.output2 = ConvBNReLU(1024, 256, kernel_size=1, stride=1, padding=0, groups=1,
+                                  norm_layer=norm_layer, leaky=leaky)
+        self.output3 = ConvBNReLU(2048, 256, kernel_size=1, stride=1, padding=0, groups=1,
+                                  norm_layer=norm_layer, leaky=leaky)
+
+        self.merge1 = ConvBNReLU(256, 256, kernel_size=3, stride=1, padding=1, groups=1,
+                                 norm_layer=norm_layer, leaky=leaky)
+        self.merge2 = ConvBNReLU(256, 256, kernel_size=3, stride=1, padding=1, groups=1,
+                                 norm_layer=norm_layer, leaky=leaky)
+
+    def construct(self, input1, input2, input3):
+        """construct"""
+        output1 = self.output1(input1)
+        output2 = self.output2(input2)
+        output3 = self.output3(input3)
+
+        up3 = P.ResizeNearestNeighbor([P.Shape()(output2)[2], P.Shape()(output2)[3]])(output3)
+        output2 = up3 + output2
+        output2 = self.merge2(output2)
+
+        up2 = P.ResizeNearestNeighbor([P.Shape()(output1)[2], P.Shape()(output1)[3]])(output2)
+        output1 = up2 + output1
+        output1 = self.merge1(output1)
+
+        return output1, output2, output3
+
+class ClassHead(nn.Cell):
+    """ClassHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(ClassHead, self).__init__()
+        self.num_anchors = num_anchors
+
+        weight_shape = (self.num_anchors * 2, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, self.num_anchors * 2, kernel_size=(1, 1), stride=1, padding=0,
+                                 has_bias=True, weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        """construct"""
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 2))
+
+class BboxHead(nn.Cell):
+    """BboxHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(BboxHead, self).__init__()
+
+        weight_shape = (num_anchors * 4, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 4, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
+                                 weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        """construct"""
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 4))
+
+class LandmarkHead(nn.Cell):
+    """LandmarkHead"""
+    def __init__(self, inchannels=512, num_anchors=3):
+        super(LandmarkHead, self).__init__()
+
+        weight_shape = (num_anchors * 10, inchannels, 1, 1)
+        kaiming_weight, kaiming_bias = Init_KaimingUniform(weight_shape, a=math.sqrt(5), has_bias=True)
+        self.conv1x1 = nn.Conv2d(inchannels, num_anchors * 10, kernel_size=(1, 1), stride=1, padding=0, has_bias=True,
+                                 weight_init=kaiming_weight, bias_init=kaiming_bias)
+
+        self.permute = P.Transpose()
+        self.reshape = P.Reshape()
+
+    def construct(self, x):
+        """construct"""
+        out = self.conv1x1(x)
+        out = self.permute(out, (0, 2, 3, 1))
+        return self.reshape(out, (P.Shape()(out)[0], -1, 10))
+
+class RetinaFace(nn.Cell):
+    """RetinaFace"""
+    def __init__(self, phase='train', backbone=None):
+
+        super(RetinaFace, self).__init__()
+        self.phase = phase
+
+        self.base = backbone
+
+        self.fpn = FPN()
+
+        self.ssh1 = SSH(256, 256)
+        self.ssh2 = SSH(256, 256)
+        self.ssh3 = SSH(256, 256)
+
+        self.ClassHead = self._make_class_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
+        self.BboxHead = self._make_bbox_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
+        self.LandmarkHead = self._make_landmark_head(fpn_num=3, inchannels=[256, 256, 256], anchor_num=[2, 2, 2])
+
+        self.cat = P.Concat(axis=1)
+
+    def _make_class_head(self, fpn_num, inchannels, anchor_num):
+        classhead = nn.CellList()
+        for i in range(fpn_num):
+            classhead.append(ClassHead(inchannels[i], anchor_num[i]))
+        return classhead
+
+    def _make_bbox_head(self, fpn_num, inchannels, anchor_num):
+        bboxhead = nn.CellList()
+        for i in range(fpn_num):
+            bboxhead.append(BboxHead(inchannels[i], anchor_num[i]))
+        return bboxhead
+
+    def _make_landmark_head(self, fpn_num, inchannels, anchor_num):
+        landmarkhead = nn.CellList()
+        for i in range(fpn_num):
+            landmarkhead.append(LandmarkHead(inchannels[i], anchor_num[i]))
+        return landmarkhead
+
+    def construct(self, inputs):
+        """construct"""
+        f1, f2, f3 = self.base(inputs)
+        f1, f2, f3 = self.fpn(f1, f2, f3)
+
+        # SSH
+        f1 = self.ssh1(f1)
+        f2 = self.ssh2(f2)
+        f3 = self.ssh3(f3)
+        features = [f1, f2, f3]
+
+        bbox = ()
+        for i, feature in enumerate(features):
+            bbox = bbox + (self.BboxHead[i](feature),)
+        bbox_regressions = self.cat(bbox)
+
+        cls = ()
+        for i, feature in enumerate(features):
+            cls = cls + (self.ClassHead[i](feature),)
+        classifications = self.cat(cls)
+
+        landm = ()
+        for i, feature in enumerate(features):
+            landm = landm + (self.LandmarkHead[i](feature),)
+        ldm_regressions = self.cat(landm)
+
+        if self.phase == 'train':
+            output = (bbox_regressions, classifications, ldm_regressions)
+        else:
+            output = (bbox_regressions, P.Softmax(-1)(classifications), ldm_regressions)
+
+        return output
+
+class RetinaFaceWithLossCell(nn.Cell):
+    """RetinaFaceWithLossCell"""
+    def __init__(self, network, multibox_loss, config):
+        super(RetinaFaceWithLossCell, self).__init__()
+        self.network = network
+        self.loc_weight = config['loc_weight']
+        self.class_weight = config['class_weight']
+        self.landm_weight = config['landm_weight']
+        self.multibox_loss = multibox_loss
+
+    def construct(self, img, loc_t, conf_t, landm_t):
+        """construct"""
+        pred_loc, pre_conf, pre_landm = self.network(img)
+        loss_loc, loss_conf, loss_landm = self.multibox_loss(pred_loc, loc_t, pre_conf, conf_t, pre_landm, landm_t)  # 跳转到loss.py中的MultiBoxLoss类construct函数
+
+        return loss_loc * self.loc_weight + loss_conf * self.class_weight + loss_landm * self.landm_weight
+
+# form dsj
+GRADIENT_CLIP_TYPE = 1
+GRADIENT_CLIP_VALUE = 1.0
+
+clip_grad = C.MultitypeFuncGraph("clip_grad")
+
+
+@clip_grad.register("Number", "Number", "Tensor")
+def _clip_grad(clip_type, clip_value, grad):
+    """_clip_grad"""
+    if clip_type not in (0, 1):
+        return grad
+    dt = F.dtype(grad)
+    if clip_type == 0:
+        new_grad = C.clip_by_value(grad, F.cast(F.tuple_to_array((-clip_value,)), dt),
+                                   F.cast(F.tuple_to_array((clip_value,)), dt))
+    else:
+        new_grad = nn.ClipByNorm()(grad, F.cast(F.tuple_to_array((clip_value,)), dt))
+    return new_grad
+
+
+class TrainingWrapper(nn.Cell):
+    """TrainingWrapper"""
+    def __init__(self, network, optimizer, sens=1.0):
+        super(TrainingWrapper, self).__init__(auto_prefix=False)
+        self.network = network
+        self.weights = mindspore.ParameterTuple(network.trainable_params())
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.sens = sens
+        self.reducer_flag = False
+        self.grad_reducer = None
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        class_list = [mindspore.context.ParallelMode.DATA_PARALLEL, mindspore.context.ParallelMode.HYBRID_PARALLEL]
+        if self.parallel_mode in class_list:
+            self.reducer_flag = True
+        if self.reducer_flag:
+            mean = context.get_auto_parallel_context("gradients_mean")
+            if auto_parallel_context().get_device_num_is_set():
+                degree = context.get_auto_parallel_context("device_num")
+            else:
+                degree = get_group_size()
+            self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
+        # from dsj
+        self.hyper_map = mindspore.ops.HyperMap()
+
+    def construct(self, *args):
+        """construct"""
+        weights = self.weights
+        loss = self.network(*args)
+        sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
+        grads = self.grad(self.network, weights)(*args, sens)
+        # from dsj
+        grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+        if self.reducer_flag:
+            # apply grad reducer on grads
+            grads = self.grad_reducer(grads)
+        return F.depend(loss, self.optimizer(grads))
--- a/model_zoo/research/cv/retinaface/src/resnet.py
+++ b/model_zoo/research/cv/retinaface/src/resnet.py
@ -0,0 +1,204 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Network."""
+import numpy as np
+
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore import Tensor
+
+# ResNet
+def _weight_variable(shape, factor=0.01):
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+def _conv3x3(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 3, 3)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=3, stride=stride, padding=1, pad_mode='pad', weight_init=weight)
+
+def _conv1x1(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 1, 1)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=1, stride=stride, padding=0, pad_mode='pad', weight_init=weight)
+
+def _conv7x7(in_channel, out_channel, stride=1):
+    weight_shape = (out_channel, in_channel, 7, 7)
+    weight = _weight_variable(weight_shape)
+
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
+
+def _bn(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _bn_last(channel):
+    return nn.BatchNorm2d(channel)
+
+
+def _fc(in_channel, out_channel):
+    weight_shape = (out_channel, in_channel)
+    weight = _weight_variable(weight_shape)
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+class ResidualBlock(nn.Cell):
+    """ResidualBlock"""
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1):
+        super(ResidualBlock, self).__init__()
+
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1)
+        self.bn1 = _bn(channel)
+
+        self.conv2 = _conv3x3(channel, channel, stride=stride)
+        self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1)
+        self.bn3 = _bn_last(out_channel)
+
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride),
+                                                        _bn(out_channel)])
+        self.add = P.Add()
+
+    def construct(self, x):
+        """construct"""
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+class ResNet(nn.Cell):
+    """ResNet"""
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+
+        self.conv1 = _conv7x7(3, 64, stride=2)
+        self.bn1 = _bn(64)
+        self.relu = P.ReLU()
+
+
+        self.pad = P.Pad(((0, 0), (0, 0), (1, 0), (1, 0)))
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
+
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0])
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1])
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2])
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3])
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride):
+        """_make_layer"""
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride)
+        layers.append(resnet_block)
+
+        for _ in range(1, layer_num):
+            resnet_block = block(out_channel, out_channel, stride=1)
+            layers.append(resnet_block)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """construct"""
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.pad(x)
+
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return out
+
+def resnet50(class_num=1000):
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
--- a/model_zoo/research/cv/retinaface/src/utils.py
+++ b/model_zoo/research/cv/retinaface/src/utils.py
@ -0,0 +1,165 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Utils."""
+from itertools import product
+import math
+import numpy as np
+
+
+def prior_box(image_sizes, min_sizes, steps, clip=False):
+    """prior box"""
+    feature_maps = [
+        [math.ceil(image_sizes[0] / step), math.ceil(image_sizes[1] / step)]
+        for step in steps]
+
+    anchors = []
+    for k, f in enumerate(feature_maps):
+        for i, j in product(range(f[0]), range(f[1])):
+            for min_size in min_sizes[k]:
+                s_kx = min_size / image_sizes[1]
+                s_ky = min_size / image_sizes[0]
+                cx = (j + 0.5) * steps[k] / image_sizes[1]
+                cy = (i + 0.5) * steps[k] / image_sizes[0]
+                anchors += [cx, cy, s_kx, s_ky]
+
+    output = np.asarray(anchors).reshape([-1, 4]).astype(np.float32)
+
+    if clip:
+        output = np.clip(output, 0, 1)
+
+    return output
+
+def center_point_2_box(boxes):
+    return np.concatenate((boxes[:, 0:2] - boxes[:, 2:4] / 2,
+                           boxes[:, 0:2] + boxes[:, 2:4] / 2), axis=1)
+
+def compute_intersect(a, b):
+    """compute_intersect"""
+    A = a.shape[0]
+    B = b.shape[0]
+    max_xy = np.minimum(
+        np.broadcast_to(np.expand_dims(a[:, 2:4], 1), [A, B, 2]),
+        np.broadcast_to(np.expand_dims(b[:, 2:4], 0), [A, B, 2]))
+    min_xy = np.maximum(
+        np.broadcast_to(np.expand_dims(a[:, 0:2], 1), [A, B, 2]),
+        np.broadcast_to(np.expand_dims(b[:, 0:2], 0), [A, B, 2]))
+    inter = np.maximum((max_xy - min_xy), np.zeros_like(max_xy - min_xy))
+    return inter[:, :, 0] * inter[:, :, 1]
+
+def compute_overlaps(a, b):
+    """compute_overlaps"""
+    inter = compute_intersect(a, b)
+    area_a = np.broadcast_to(
+        np.expand_dims(
+            (a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), 1),
+        np.shape(inter))
+    area_b = np.broadcast_to(
+        np.expand_dims(
+            (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1]), 0),
+        np.shape(inter))
+    union = area_a + area_b - inter
+    return inter / union
+
+def match(threshold, boxes, priors, var, labels, landms):
+    """match"""
+    overlaps = compute_overlaps(boxes, center_point_2_box(priors))
+
+    best_prior_overlap = overlaps.max(1, keepdims=True)
+    best_prior_idx = np.argsort(-overlaps, axis=1)[:, 0:1]
+
+    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
+    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
+    if best_prior_idx_filter.shape[0] <= 0:
+        loc = np.zeros((priors.shape[0], 4), dtype=np.float32)
+        conf = np.zeros((priors.shape[0],), dtype=np.int32)
+        landm = np.zeros((priors.shape[0], 10), dtype=np.float32)
+        return loc, conf, landm
+
+    best_truth_overlap = overlaps.max(0, keepdims=True)
+    best_truth_idx = np.argsort(-overlaps, axis=0)[:1, :]
+
+    best_truth_idx = best_truth_idx.squeeze(0)
+    best_truth_overlap = best_truth_overlap.squeeze(0)
+    best_prior_idx = best_prior_idx.squeeze(1)
+    best_prior_idx_filter = best_prior_idx_filter.squeeze(1)
+    best_truth_overlap[best_prior_idx_filter] = 2
+
+    for j in range(best_prior_idx.shape[0]):
+        best_truth_idx[best_prior_idx[j]] = j
+
+    matches = boxes[best_truth_idx]
+
+    # encode boxes
+    offset_cxcy = (matches[:, 0:2] + matches[:, 2:4]) / 2 - priors[:, 0:2]
+    offset_cxcy /= (var[0] * priors[:, 2:4])
+    wh = (matches[:, 2:4] - matches[:, 0:2]) / priors[:, 2:4]
+    wh[wh == 0] = 1e-12
+    wh = np.log(wh) / var[1]
+    loc = np.concatenate([offset_cxcy, wh], axis=1)
+
+
+    conf = labels[best_truth_idx]
+    conf[best_truth_overlap < threshold] = 0
+
+    matches_landm = landms[best_truth_idx]
+
+    # encode landms
+    matched = np.reshape(matches_landm, [-1, 5, 2])
+    priors = np.broadcast_to(np.expand_dims(priors, 1), [priors.shape[0], 5, 4])
+    offset_cxcy = matched[:, :, 0:2] - priors[:, :, 0:2]
+    offset_cxcy /= (priors[:, :, 2:4] * var[0])
+    landm = np.reshape(offset_cxcy, [-1, 10])
+
+
+    return loc, np.array(conf, dtype=np.int32), landm
+
+
+class bbox_encode():
+    """bbox_encode"""
+    def __init__(self, cfg):
+        self.match_thresh = cfg['match_thresh']
+        self.variances = cfg['variance']
+        self.priors = prior_box((cfg['image_size'], cfg['image_size']),
+                                [[16, 32], [64, 128], [256, 512]],
+                                [8, 16, 32],
+                                cfg['clip'])
+
+    def __call__(self, image, targets):
+
+        boxes = targets[:, :4]
+        labels = targets[:, -1]
+        landms = targets[:, 4:14]
+        priors = self.priors
+
+        loc_t, conf_t, landm_t = match(self.match_thresh, boxes, priors, self.variances, labels, landms)
+
+        return image, loc_t, conf_t, landm_t
+
+def decode_bbox(bbox, priors, var):
+    boxes = np.concatenate((
+        priors[:, 0:2] + bbox[:, 0:2] * var[0] * priors[:, 2:4],
+        priors[:, 2:4] * np.exp(bbox[:, 2:4] * var[1])), axis=1)  # (xc, yc, w, h)
+    boxes[:, :2] -= boxes[:, 2:] / 2    # (x0, y0, w, h)
+    boxes[:, 2:] += boxes[:, :2]        # (x0, y0, x1, y1)
+    return boxes
+
+def decode_landm(landm, priors, var):
+
+    return np.concatenate((priors[:, 0:2] + landm[:, 0:2] * var[0] * priors[:, 2:4],
+                           priors[:, 0:2] + landm[:, 2:4] * var[0] * priors[:, 2:4],
+                           priors[:, 0:2] + landm[:, 4:6] * var[0] * priors[:, 2:4],
+                           priors[:, 0:2] + landm[:, 6:8] * var[0] * priors[:, 2:4],
+                           priors[:, 0:2] + landm[:, 8:10] * var[0] * priors[:, 2:4],
+                           ), axis=1)
--- a/model_zoo/research/cv/retinaface/train.py
+++ b/model_zoo/research/cv/retinaface/train.py
@ -0,0 +1,227 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Train Retinaface_resnet50ormobilenet0.25."""
+
+import argparse
+import math
+import mindspore
+
+from mindspore import context
+from mindspore.context import ParallelMode
+from mindspore.train import Model
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
+from mindspore.communication.management import init, get_rank, get_group_size
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+
+from src.config import cfg_res50, cfg_mobile025
+from src.loss import MultiBoxLoss
+from src.dataset import create_dataset
+from src.lr_schedule import adjust_learning_rate, warmup_cosine_annealing_lr
+
+def train_with_resnet(cfg):
+    """train_with_resnet"""
+    mindspore.common.seed.set_seed(cfg['seed'])
+    from src.network_with_resnet import RetinaFace, RetinaFaceWithLossCell, TrainingWrapper, resnet50
+    context.set_context(mode=context.GRAPH_MODE, device_target=cfg['device_target'])
+    device_num = cfg['nnpu']
+    rank = 0
+    if cfg['device_target'] == "Ascend":
+        if device_num > 1:
+            context.reset_auto_parallel_context()
+            context.set_auto_parallel_context(device_num=device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+            init()
+            rank = get_rank()
+        else:
+            context.set_context(device_id=cfg['device_id'])
+    elif cfg['device_target'] == "GPU":
+        if cfg['ngpu'] > 1:
+            init("nccl")
+            context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
+                                              gradients_mean=True)
+            rank = get_rank()
+
+    batch_size = cfg['batch_size']
+    max_epoch = cfg['epoch']
+
+    momentum = cfg['momentum']
+    lr_type = cfg['lr_type']
+    weight_decay = cfg['weight_decay']
+    loss_scale = cfg['loss_scale']
+    initial_lr = cfg['initial_lr']
+    gamma = cfg['gamma']
+    T_max = cfg['T_max']
+    eta_min = cfg['eta_min']
+    training_dataset = cfg['training_dataset']
+    num_classes = 2
+    negative_ratio = 7
+    stepvalues = (cfg['decay1'], cfg['decay2'])
+
+    ds_train = create_dataset(training_dataset, cfg, batch_size, multiprocessing=True, num_worker=cfg['num_workers'])
+    print('dataset size is : \n', ds_train.get_dataset_size())
+
+    steps_per_epoch = math.ceil(ds_train.get_dataset_size())
+
+    multibox_loss = MultiBoxLoss(num_classes, cfg['num_anchor'], negative_ratio, cfg['batch_size'])
+    backbone = resnet50(1001)
+    backbone.set_train(True)
+
+    if cfg['pretrain'] and cfg['resume_net'] is None:
+        pretrained_res50 = cfg['pretrain_path']
+        param_dict_res50 = load_checkpoint(pretrained_res50)
+        load_param_into_net(backbone, param_dict_res50)
+        print('Load resnet50 from [{}] done.'.format(pretrained_res50))
+
+    net = RetinaFace(phase='train', backbone=backbone)
+    net.set_train(True)
+
+    if cfg['resume_net'] is not None:
+        pretrain_model_path = cfg['resume_net']
+        param_dict_retinaface = load_checkpoint(pretrain_model_path)
+        load_param_into_net(net, param_dict_retinaface)
+        print('Resume Model from [{}] Done.'.format(cfg['resume_net']))
+
+    net = RetinaFaceWithLossCell(net, multibox_loss, cfg)
+
+    if lr_type == 'dynamic_lr':
+        lr = adjust_learning_rate(initial_lr, gamma, stepvalues, steps_per_epoch, max_epoch,
+                                  warmup_epoch=cfg['warmup_epoch'], lr_type1=lr_type)
+    elif lr_type == 'cosine_annealing':
+        lr = warmup_cosine_annealing_lr(initial_lr, steps_per_epoch, cfg['warmup_epoch'], max_epoch, T_max, eta_min)
+
+    if cfg['optim'] == 'momentum':
+        opt = mindspore.nn.Momentum(net.trainable_params(), lr, momentum, weight_decay, loss_scale)
+    elif cfg['optim'] == 'sgd':
+        opt = mindspore.nn.SGD(params=net.trainable_params(), learning_rate=lr, momentum=momentum,
+                               weight_decay=weight_decay, loss_scale=loss_scale)
+    else:
+        raise ValueError('optim is not define.')
+
+    net = TrainingWrapper(net, opt)
+
+    model = Model(net)
+
+    config_ck = CheckpointConfig(save_checkpoint_steps=ds_train.get_dataset_size() * 1,
+                                 keep_checkpoint_max=cfg['keep_checkpoint_max'])
+    cfg['ckpt_path'] = cfg['ckpt_path'] + "ckpt_" + str(rank) + "/"
+    ckpoint_cb = ModelCheckpoint(prefix="RetinaFace", directory=cfg['ckpt_path'], config=config_ck)
+
+    time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
+    callback_list = [LossMonitor(), time_cb, ckpoint_cb]
+
+    print("============== Starting Training ==============")
+    model.train(max_epoch, ds_train, callbacks=callback_list)
+
+
+def train_with_mobilenet(cfg):
+    """train_with_mobilenet"""
+    mindspore.common.seed.set_seed(cfg['seed'])
+    from src.network_with_mobilenet import RetinaFace, RetinaFaceWithLossCell, TrainingWrapper, resnet50, mobilenet025
+    context.set_context(mode=context.GRAPH_MODE, device_target='GPU', save_graphs=False)
+    if context.get_context("device_target") == "GPU":
+        # Enable graph kernel
+        context.set_context(enable_graph_kernel=True, graph_kernel_flags="--enable_parallel_fusion")
+    if cfg['ngpu'] > 1:
+        init("nccl")
+        context.set_auto_parallel_context(device_num=get_group_size(), parallel_mode=ParallelMode.DATA_PARALLEL,
+                                          gradients_mean=True)
+        cfg['ckpt_path'] = cfg['ckpt_path'] + "ckpt_" + str(get_rank()) + "/"
+
+    batch_size = cfg['batch_size']
+    max_epoch = cfg['epoch']
+
+    momentum = cfg['momentum']
+    lr_type = cfg['lr_type']
+    weight_decay = cfg['weight_decay']
+    initial_lr = cfg['initial_lr']
+    gamma = cfg['gamma']
+    training_dataset = cfg['training_dataset']
+    num_classes = 2
+    negative_ratio = 7
+    stepvalues = (cfg['decay1'], cfg['decay2'])
+
+    ds_train = create_dataset(training_dataset, cfg, batch_size, multiprocessing=True, num_worker=cfg['num_workers'])
+    print('dataset size is : \n', ds_train.get_dataset_size())
+
+    steps_per_epoch = math.ceil(ds_train.get_dataset_size())
+
+    multibox_loss = MultiBoxLoss(num_classes, cfg['num_anchor'], negative_ratio, cfg['batch_size'])
+    if cfg['name'] == 'ResNet50':
+        backbone = resnet50(1001)
+    elif cfg['name'] == 'MobileNet025':
+        backbone = mobilenet025(1000)
+    backbone.set_train(True)
+
+    if cfg['name'] == 'ResNet50' and cfg['pretrain'] and cfg['resume_net'] is None:
+        pretrained_res50 = cfg['pretrain_path']
+        param_dict_res50 = load_checkpoint(pretrained_res50)
+        load_param_into_net(backbone, param_dict_res50)
+        print('Load resnet50 from [{}] done.'.format(pretrained_res50))
+    elif cfg['name'] == 'MobileNet025' and cfg['pretrain'] and cfg['resume_net'] is None:
+        pretrained_mobile025 = cfg['pretrain_path']
+        param_dict_mobile025 = load_checkpoint(pretrained_mobile025)
+        load_param_into_net(backbone, param_dict_mobile025)
+        print('Load mobilenet0.25 from [{}] done.'.format(pretrained_mobile025))
+
+    net = RetinaFace(phase='train', backbone=backbone, cfg=cfg)
+    net.set_train(True)
+
+    if cfg['resume_net'] is not None:
+        pretrain_model_path = cfg['resume_net']
+        param_dict_retinaface = load_checkpoint(pretrain_model_path)
+        load_param_into_net(net, param_dict_retinaface)
+        print('Resume Model from [{}] Done.'.format(cfg['resume_net']))
+
+    net = RetinaFaceWithLossCell(net, multibox_loss, cfg)
+
+    lr = adjust_learning_rate(initial_lr, gamma, stepvalues, steps_per_epoch, max_epoch,
+                              warmup_epoch=cfg['warmup_epoch'], lr_type1=lr_type)
+
+    if cfg['optim'] == 'momentum':
+        opt = mindspore.nn.Momentum(net.trainable_params(), lr, momentum)
+    elif cfg['optim'] == 'sgd':
+        opt = mindspore.nn.SGD(params=net.trainable_params(), learning_rate=lr, momentum=momentum,
+                               weight_decay=weight_decay, loss_scale=1)
+    else:
+        raise ValueError('optim is not define.')
+
+    net = TrainingWrapper(net, opt)
+
+    model = Model(net)
+
+    config_ck = CheckpointConfig(save_checkpoint_steps=cfg['save_checkpoint_steps'],
+                                 keep_checkpoint_max=cfg['keep_checkpoint_max'])
+    ckpoint_cb = ModelCheckpoint(prefix="RetinaFace", directory=cfg['ckpt_path'], config=config_ck)
+
+    time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
+    callback_list = [LossMonitor(), time_cb, ckpoint_cb]
+
+    print("============== Starting Training ==============")
+    model.train(max_epoch, ds_train, callbacks=callback_list, dataset_sink_mode=True)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='train')
+    parser.add_argument('--backbone_name', type=str, default='ResNet50',
+                        help='backbone name')
+    args_opt = parser.parse_args()
+
+    if args_opt.backbone_name == 'ResNet50':
+        config = cfg_res50
+        train_with_resnet(cfg=config)
+    elif args_opt.backbone_name == 'MobileNet025':
+        config = cfg_mobile025
+        train_with_mobilenet(cfg=config)
+    print('train config:\n', config)