adding GPU mode and CPU mode

2021-04-16 16:04:16 +08:00 · 2021-04-16 16:04:16 +08:00 · 2732d7333a
parent 17e42347fc
commit 2732d7333a
14 changed files with 656 additions and 392 deletions
--- a/model_zoo/research/cv/FaceRecognitionForTracking/README.md
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/README.md
@ -13,7 +13,7 @@

 # [Face Recognition For Tracking Description](#contents)

-This is a face recognition for tracking network based on Resnet, with support for training and evaluation on Ascend910.
+This is a face recognition for tracking network based on Resnet, with support for training and evaluation on Ascend910, GPU and CPU.

 ResNet (residual neural network) was proposed by Kaiming He and other four Chinese of Microsoft Research Institute. Through the use of ResNet unit, it successfully trained 152 layers of neural network, and won the championship in ilsvrc2015. The error rate on top 5 was 3.57%, and the parameter quantity was lower than vggnet, so the effect was very outstanding. Traditional convolution network or full connection network will have more or less information loss. At the same time, it will lead to the disappearance or explosion of gradient, which leads to the failure of deep network training. ResNet solves this problem to a certain extent. By passing the input information to the output, the integrity of the information is protected. The whole network only needs to learn the part of the difference between input and output, which simplifies the learning objectives and difficulties.The structure of ResNet can accelerate the training of neural network very quickly, and the accuracy of the model is also greatly improved. At the same time, ResNet is very popular, even can be directly used in the concept net network.

@ -55,7 +55,7 @@ The directory structure is as follows:

 # [Environment Requirements](#contents)

- Hardware(Ascend)
+- Hardware(Ascend/GPU/CPU)
    - Prepare hardware environment with Ascend processor.
 - Framework
    - [MindSpore](https://www.mindspore.cn/install/en)
@ -77,19 +77,25 @@ The entire code structure is as following:
    ├─ run_standalone_train.sh              # launch standalone training(1p) in ascend
    ├─ run_distribute_train.sh              # launch distributed training(8p) in ascend
    ├─ run_eval.sh                          # launch evaluating in ascend
-    └─ run_export.sh                        # launch exporting air model
+    ├─ run_export.sh                        # launch exporting air/mindir model
+    ├─ run_standalone_train_gpu.sh          # launch standalone training(1p) in gpu
+    ├─ run_distribute_train_gpu.sh          # launch distributed training(8p) in gpu
+    ├─ run_eval_gpu.sh                      # launch evaluating in gpu
+    ├─ run_export_gpu.sh                    # launch exporting mindir model in gpu
+    ├─ run_train_cpu.sh                     # launch standalone training in cpu
+    ├─ run_eval_cpu.sh                      # launch evaluating in cpu
+    └─ run_export_cpu.sh                    # launch exporting mindir model in cpu
  ├─ src
    ├─ config.py                            # parameter configuration
    ├─ dataset.py                           # dataset loading and preprocessing for training
    ├─ reid.py                              # network backbone
-    ├─ reid_for_export.py                   # network backbone for export
    ├─ log.py                               # log function
    ├─ loss.py                              # loss function
    ├─ lr_generator.py                      # generate learning rate
    └─ me_init.py                           # network initialization
  ├─ train.py                               # training scripts
  ├─ eval.py                                # evaluation scripts
-  └─ export.py                              # export air model
+  └─ export.py                              # export air/mindir model
 ```

 ## [Running Example](#contents)
@ -99,18 +105,50 @@ The entire code structure is as following:
 - Stand alone mode

    ```bash
+    Ascend:
+
    cd ./scripts
    sh run_standalone_train.sh [DATA_DIR] [USE_DEVICE_ID]
    ```

+    ```bash
+    GPU:
+
+    cd ./scripts
+    sh run_standalone_train_gpu.sh [DATA_DIR]
+    ```
+
+    ```bash
+    CPU:
+
+    cd ./scripts
+    sh run_train_cpu.sh [DATA_DIR]
+    ```
+
    or (fine-tune)

    ```bash
+    Ascend:
+
    cd ./scripts
    sh run_standalone_train.sh [DATA_DIR] [USE_DEVICE_ID] [PRETRAINED_BACKBONE]
    ```

-    for example:
+    ```bash
+    GPU:
+
+    cd ./scripts
+    sh run_standalone_train.sh [DATA_DIR] [PRETRAINED_BACKBONE]
+    ```
+
+    ```bash
+    CPU:
+
+    cd ./scripts
+    sh run_train.sh [DATA_DIR] [PRETRAINED_BACKBONE]
+    ```
+
+    for example, on Ascend:

    ```bash
    cd ./scripts
@ -120,17 +158,35 @@ The entire code structure is as following:
 - Distribute mode (recommended)

    ```bash
+    Ascend:
+
    cd ./scripts
    sh run_distribute_train.sh [DATA_DIR] [RANK_TABLE]
    ```

+    ```bash
+    GPU:
+
+    cd ./scripts
+    sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0, 1, 2, 3, 4, 5, 6, 7)] [DATASET_PATH]
+    ```
+
    or (fine-tune)

    ```bash
+    Ascend:
+
    cd ./scripts
    sh run_distribute_train.sh [DATA_DIR] [RANK_TABLE] [PRETRAINED_BACKBONE]
    ```

+    ```bash
+    GPU:
+
+    cd ./scripts
+    sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0, 1, 2, 3, 4, 5, 6, 7)] [DATASET_PATH] [PRE_TRAINED]
+    ```
+
    for example:

    ```bash
@ -156,11 +212,27 @@ epoch[179], iter[14930], loss:1.694281, 13417.38 imgs/sec, lr=0.0250000003725290
 ### Evaluation

 ```bash
+Ascend:
+
 cd ./scripts
 sh run_eval.sh [EVAL_DIR] [USE_DEVICE_ID] [PRETRAINED_BACKBONE]
 ```

-for example:
+```bash
+GPU:
+
+cd ./scripts
+sh run_eval_gpu.sh [EVAL_DIR] [PRETRAINED_BACKBONE]
+```
+
+```bash
+CPU:
+
+cd ./scripts
+sh run_eval_cpu.sh [EVAL_DIR] [PRETRAINED_BACKBONE]
+```
+
+for example, on Ascend:

 ```bash
 cd ./scripts
@ -184,44 +256,62 @@ You will get the result as following in "./scripts/device0/eval.log" or txt file
 If you want to infer the network on Ascend 310, you should convert the model to AIR:

 ```bash
+Ascend:
+
 cd ./scripts
 sh run_export.sh [BATCH_SIZE] [USE_DEVICE_ID] [PRETRAINED_BACKBONE]
 ```

+Or if you would like to convert your model to MINDIR file on GPU or CPU:
+
+```bash
+GPU:
+
+cd ./scripts
+sh run_export_gpu.sh [PRETRAINED_BACKBONE] [BATCH_SIZE] [FILE_NAME](optional)
+```
+
+```bash
+CPU:
+
+cd ./scripts
+sh run_export_cpu.sh [PRETRAINED_BACKBONE] [BATCH_SIZE] [FILE_NAME](optional)
+```
+
 # [Model Description](#contents)

 ## [Performance](#contents)

 ### Training Performance

-| Parameters                 | Face Recognition For Tracking                               |
-| -------------------------- | ----------------------------------------------------------- |
-| Model Version              | V1                                                          |
-| Resource                   | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8            |
-| uploaded Date              | 09/30/2020 (month/day/year)                                 |
-| MindSpore Version          | 1.0.0                                                       |
-| Dataset                    | 10K images                                                  |
-| Training Parameters        | epoch=180, batch_size=16, momentum=0.9                      |
-| Optimizer                  | Momentum                                                    |
-| Loss Function              | Softmax Cross Entropy                                       |
-| outputs                    | probability                                                 |
-| Speed                      | 1pc: 8~10 ms/step; 8pcs: 9~11 ms/step                       |
-| Total time                 | 1pc: 1 hours; 8pcs: 0.1 hours                               |
-| Checkpoint for Fine tuning | 17M (.ckpt file)                                            |
+| Parameters                 | Ascend                 |GPU    |CPU    |
+| -------------------------- | ----------------------------------------------------------- | ----------------------------------------------------------- | ----------------------------------------------------------- |
+| Model Version              | V1         | V1      | V1    |
+| Resource            | Ascend 910; CPU 2.60GHz, 192cores; Memory, 755G; OS Euler2.8       |Tesla V100-PCIE       |Intel(R) Xeon(R) CPU E5-2690 v4       |
+| uploaded Date              | 09/30/2020 (month/day/year)  |04/17/2021 (month/day/year)      |04/17/2021 (month/day/year)                                 |
+| MindSpore Version          | 1.0.0          | 1.2.0     |1.2.0               |
+| Dataset                    | 10K images           | 10K images         | 10K images       |
+| Training Parameters        | epoch=180, batch_size=16, momentum=0.9                      | epoch=40, batch_size=128(1p); 16(8p), momentum=0.9                      | epoch=40, batch_size=128, momentum=0.9                      |
+| Optimizer                  | SGD         | SGD   | SGD   |
+| Loss Function              | Softmax Cross Entropy        | Softmax Cross Entropy   | Softmax Cross Entropy            |
+| outputs     | probability              | probability        |probability     |
+| Speed                      | 1pc: 8-10 ms/step; 8pcs: 9-11 ms/step                       | 1pc: 30 ms/step; 8pcs: 20 ms/step                | 1pc: 2.5 s/step    |
+| Total time                 | 1pc: 1 hour; 8pcs: 0.1 hours           | 1pc: 2 minutes; 8pcs: 1.5 minutes                    |1pc: 2 hours    |
+| Checkpoint for Fine tuning | 17M (.ckpt file)                 | 17M (.ckpt file)                 | 17M (.ckpt file)                 |

 ### Evaluation Performance

-| Parameters          |Face Recognition For Tracking|
-| ------------------- | --------------------------- |
-| Model Version       | V1                          |
-| Resource            | Ascend 910; OS Euler2.8                  |
-| Uploaded Date       | 09/30/2020 (month/day/year) |
-| MindSpore Version   | 1.0.0                       |
-| Dataset             | 2K images                   |
-| batch_size          | 128                         |
-| outputs             | recall                      |
-| Recall(8pcs)        | 0.62(FAR=0.1)               |
-| Model for inference | 17M (.ckpt file)            |
+| Parameters          |Ascend     |GPU           |CPU           |
+| ------------------- | --------------------------- | --------------------------- | --------------------------- |
+| Model Version       |V1            |V1   |V1 |
+| Resource            | Ascend 910; OS Euler2.8                  |Tesla V100-PCIE                 |Intel(R) Xeon(R) CPU E5-2690 v4        |
+| Uploaded Date       | 09/30/2020 (month/day/year) | 04/17/2021 (month/day/year) | 04/17/2021 (month/day/year) |
+| MindSpore Version   | 1.0.0                       | 1.2.0                       |1.2.0                       |
+| Dataset             | 2K images                   | 2K images                   | 2K images                   |
+| batch_size          | 128                         | 128                         |128                         |
+| outputs             | recall                      | recall                      |recall                      |
+| Recall       | 0.62(FAR=0.1)               | 0.62(FAR=0.1)               | 0.62(FAR=0.1)               |
+| Model for inference | 17M (.ckpt file)            | 17M (.ckpt file)            | 17M (.ckpt file)            |

 # [ModelZoo Homepage](#contents)

--- a/model_zoo/research/cv/FaceRecognitionForTracking/eval.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/eval.py
@ -1,4 +1,4 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -29,8 +29,6 @@ from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from src.reid import SphereNet

 warnings.filterwarnings('ignore')
-devid = int(os.getenv('DEVICE_ID'))
-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=True, device_id=devid)


 def inclass_likehood(ims_info, types='cos'):
@ -135,7 +133,10 @@ def main(args):
        else:
            print('-----------------------load model failed -----------------------')

-        network.add_flags_recursive(fp16=True)
+        if args.device_target == 'CPU':
+            network.add_flags_recursive(fp32=True)
+        else:
+            network.add_flags_recursive(fp16=True)
        network.set_train(False)

        root_path = args.eval_dir
@ -178,8 +179,15 @@ if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='reid test')
    parser.add_argument('--pretrained', type=str, default='', help='pretrained model to load')
    parser.add_argument('--eval_dir', type=str, default='', help='eval image dir, e.g. /home/test')
+    parser.add_argument('--device_target', type=str, choices=['Ascend', 'GPU', 'CPU'], default='Ascend',
+                        help='device_target')

    arg = parser.parse_args()
+    context.set_context(mode=context.GRAPH_MODE, device_target=arg.device_target, save_graphs=False)
+
+    if arg.device_target == 'Ascend':
+        devid = int(os.getenv('DEVICE_ID'))
+        context.set_context(device_id=devid)
    print(arg)

    main(arg)
--- a/model_zoo/research/cv/FaceRecognitionForTracking/export.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/export.py
@ -1,4 +1,4 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""Convert ckpt to air."""
+"""Convert ckpt to air/mindir."""
 import os
 import argparse
 import numpy as np
@ -21,14 +21,11 @@ from mindspore import context
 from mindspore import Tensor
 from mindspore.train.serialization import export, load_checkpoint, load_param_into_net

-from src.reid_for_export import SphereNet
-
-devid = int(os.getenv('DEVICE_ID'))
-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=True, device_id=devid)
+from src.reid import SphereNet_float32


 def main(args):
-    network = SphereNet(num_layers=12, feature_dim=128, shape=(96, 64))
+    network = SphereNet_float32(num_layers=12, feature_dim=128, shape=(96, 64))
    ckpt_path = args.pretrained
    if os.path.isfile(ckpt_path):
        param_dict = load_checkpoint(ckpt_path)
@ -45,23 +42,35 @@ def main(args):
    else:
        print('-----------------------load model failed -----------------------')

-    network.add_flags_recursive(fp16=True)
+    if args.device_target == 'CPU':
+        network.add_flags_recursive(fp32=True)
+    else:
+        network.add_flags_recursive(fp16=True)
    network.set_train(False)

    input_data = np.random.uniform(low=0, high=1.0, size=(args.batch_size, 3, 96, 64)).astype(np.float32)
    tensor_input_data = Tensor(input_data)

-    export(network, tensor_input_data, file_name=ckpt_path.replace('.ckpt', '_' + str(args.batch_size) + 'b.air'),
-           file_format='AIR')
+    export(network, tensor_input_data, file_name=args.file_name, file_format=args.file_format)
    print('-----------------------export model success-----------------------')


 if __name__ == "__main__":

-    parser = argparse.ArgumentParser(description='Convert ckpt to air')
+    parser = argparse.ArgumentParser(description='Convert ckpt to air/mindir')
    parser.add_argument('--pretrained', type=str, default='', help='pretrained model to load')
    parser.add_argument('--batch_size', type=int, default=8, help='batch size')
+    parser.add_argument('--device_target', type=str, choices=['Ascend', 'GPU', 'CPU'], default='Ascend',
+                        help='device_target')
+    parser.add_argument('--file_name', type=str, default='FaceRecognitionForTracking', help='output file name')
+    parser.add_argument('--file_format', type=str, choices=['AIR', 'ONNX', 'MINDIR'], default='AIR', help='file format')

    arg = parser.parse_args()

+    if arg.device_target == 'Ascend':
+        devid = int(os.getenv('DEVICE_ID'))
+        context.set_context(device_id=devid)
+
+    context.set_context(mode=context.GRAPH_MODE, device_target=arg.device_target)
+
    main(arg)
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_distributed_train_gpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_distributed_train_gpu.sh
@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 3 ]
+then
+  echo "Usage: sh run_distributed_train_gpu.sh [DEVICE_NUM] [VISIBLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH]
+       [PRE_TRAINED](optional)"
+  exit 1
+fi
+
+if [ $1 -lt 1 ] && [ $1 -gt 8 ]
+then
+  echo "error: DEVICE_NUM=$1 is not in (1-8)"
+  exit 1
+fi
+
+export DEVICE_NUM=$1
+export RANK_SIZE=$1
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+if [ -d "../train" ]
+then
+  rm -rf ../train
+fi
+
+mkdir ../train
+cd ../train || exit
+
+export CUDA_VISIBLE_DEVICES="$2"
+
+if [ $4 ] #pretrained ckpt
+then
+  if [ $1 -gt 1 ]
+  then
+    mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../train.py \
+                                                  --data_dir=$3 \
+                                                  --is_distributed=1 \
+                                                  --device_target='GPU'
+  else
+    python3 ${BASEPATH}/../train.py \
+            --data_dir=$3 \
+            --is_distributed=0 \
+            --device_target='GPU'
+  fi
+fi
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_eval_cpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_eval_cpu.sh
@ -0,0 +1,29 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 2 ]
+then
+  echo "Usage: sh run_eval_cpu.sh [EVALDATA_PATH] [PRE_TRAINED]"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+
+python3 ${BASEPATH}/../eval.py \
+          --eval_dir=$1 \
+          --device_target='CPU' \
+          --pretrained=$2
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_eval_gpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_eval_gpu.sh
@ -0,0 +1,29 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 2 ]
+then
+  echo "Usage: sh run_eval_gpu.sh [EVALDATA_PATH] [PRE_TRAINED]"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+
+python3 ${BASEPATH}/../eval.py \
+          --eval_dir=$1 \
+          --device_target='GPU' \
+          --pretrained=$2
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_export_cpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_export_cpu.sh
@ -0,0 +1,42 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 2 ]
+then
+  echo "Usage: sh run_export_cpu.sh [PRE_TRAINED] [BATCH_SIZE] [FILE_NAME](optional)"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+
+cd ..
+
+if [ $3 ] #file name
+then
+  python3 ${BASEPATH}/../export.py \
+            --pretrained=$1 \
+            --device_target='CPU' \
+            --batch_size=$2 \
+            --file_format=MINDIR \
+            --file_name=$3
+else
+  python3 ${BASEPATH}/../export.py \
+            --pretrained=$1 \
+            --device_target='CPU' \
+            --batch_size=$2 \
+            --file_format=MINDIR
+fi
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_export_gpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_export_gpu.sh
@ -0,0 +1,42 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 2 ]
+then
+  echo "Usage: sh run_export_gpu.sh [PRE_TRAINED] [BATCH_SIZE] [FILE_NAME](optional)"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+
+cd ..
+
+if [ $3 ] #file name
+then
+  python3 ${BASEPATH}/../export.py \
+            --pretrained=$1 \
+            --device_target='GPU' \
+            --batch_size=$2 \
+            --file_format=MINDIR \
+            --file_name=$3
+else
+  python3 ${BASEPATH}/../export.py \
+            --pretrained=$1 \
+            --device_target='GPU' \
+            --batch_size=$2 \
+            --file_format=MINDIR
+fi
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_standalone_train_gpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_standalone_train_gpu.sh
@ -0,0 +1,43 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 1 ]
+then
+  echo "Usage: sh run_standalone_train_gpu.sh [DATASET_PATH] [PRE_TRAINED] (optional)"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+if [ -d "../train" ]
+then
+  rm -rf ../train
+fi
+
+mkdir ../train
+cd ../train || exit
+
+if [ $2 ] #pretrained ckpt
+then
+  python3 ${BASEPATH}/../train.py \
+            --data_dir=$1 \
+            --device_target='GPU' \
+            --pretrained=$2
+else
+  python3 ${BASEPATH}/../train.py \
+            --data_dir=$1 \
+            --device_target='GPU'
+fi
--- a/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_train_cpu.sh
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/scripts/run_train_cpu.sh
@ -0,0 +1,43 @@
+#!/bin/bash
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+if [ $# -lt 1 ]
+then
+  echo "Usage: sh run_train_cpu.sh [DATASET_PATH] [PRE_TRAINED](optional)"
+  exit 1
+fi
+
+BASEPATH=$(cd "`dirname $0`" || exit; pwd)
+export PYTHONPATH=${BASEPATH}:$PYTHONPATH
+if [ -d "../train" ]
+then
+  rm -rf ../train
+fi
+
+mkdir ../train
+cd ../train || exit
+
+if [ $2 ] #pretrained ckpt
+then
+  python3 ${BASEPATH}/../train.py \
+            --data_dir=$1 \
+            --device_target='CPU' \
+            --pretrained=$2
+else
+  python3 ${BASEPATH}/../train.py \
+            --data_dir=$1 \
+            --device_target='CPU'
+fi
--- a/model_zoo/research/cv/FaceRecognitionForTracking/src/config.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/src/config.py
@ -1,4 +1,4 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -15,8 +15,8 @@
 """Network config setting, will be used in train.py and eval.py"""
 from easydict import EasyDict as edict

-reid_1p_cfg = edict({
-    'task': 'REID_1p',
+reid_1p_cfg_ascend = edict({
+    'task': 'REID_1p_ascend',

    # dataset related
    'per_batch_size': 128,
@ -52,8 +52,8 @@ reid_1p_cfg = edict({
 })


-reid_8p_cfg = edict({
-    'task': 'REID_8p',
+reid_8p_cfg_ascend = edict({
+    'task': 'REID_8p_ascend',

    # dataset related
    'per_batch_size': 16,
@ -87,3 +87,76 @@ reid_8p_cfg = edict({
    'ckpt_path': '../../output',
    'ckpt_interval': 200,
 })
+
+reid_1p_cfg = edict({
+    'task': 'REID_1p',
+
+    # dataset related
+    'per_batch_size': 128,
+
+    # network structure related
+    'fp16': 1,
+    'loss_scale': 2048.0,
+    'input_size': (96, 64),
+    'net_depth': 12,
+    'embedding_size': 128,
+
+    # optimizer related
+    'lr': 0.1,
+    'lr_scale': 1,
+    'lr_gamma': 1,
+    'lr_epochs': '30,60,120,150',
+    'epoch_size': 30,
+    'warmup_epochs': 0,
+    'steps_per_epoch': 0,
+    'max_epoch': 40,
+    'weight_decay': 0.0005,
+    'momentum': 0.9,
+
+    # distributed parameter
+    'is_distributed': 0,
+    'local_rank': 0,
+    'world_size': 1,
+
+    # logging related
+    'log_interval': 10,
+    'ckpt_path': '../output',
+    'ckpt_interval': 200,
+})
+
+
+reid_8p_cfg_gpu = edict({
+    'task': 'REID_8p_gpu',
+
+    # dataset related
+    'per_batch_size': 16,
+
+    # network structure related
+    'fp16': 1,
+    'loss_scale': 2048.0,
+    'input_size': (96, 64),
+    'net_depth': 12,
+    'embedding_size': 128,
+
+    # optimizer related
+    'lr': 0.1,
+    'lr_scale': 1,
+    'lr_gamma': 1,
+    'lr_epochs': '30,60,120,150',
+    'epoch_size': 30,
+    'warmup_epochs': 0,
+    'steps_per_epoch': 0,
+    'max_epoch': 40,
+    'weight_decay': 0.0005,
+    'momentum': 0.9,
+
+    # distributed parameter
+    'is_distributed': 1,
+    'local_rank': 0,
+    'world_size': 8,
+
+    # logging related
+    'log_interval': 10,
+    'ckpt_path': '../output',
+    'ckpt_interval': 200,
+})
--- a/model_zoo/research/cv/FaceRecognitionForTracking/src/reid.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/src/reid.py
@ -1,4 +1,4 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -23,18 +23,11 @@ from mindspore.nn import Dense, Cell
 from mindspore.common import dtype as mstype
 from mindspore.common.initializer import initializer
 from mindspore import Tensor, Parameter
+from mindspore import context

 from src import me_init


-class Cut(nn.Cell):
-
-
-
-    def construct(self, x):
-        return x
-
-
 def bn_with_initialize(out_channels):
    bn = nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5).add_flags_recursive(fp32=True)
    return bn
@ -77,6 +70,7 @@ class BaseBlock(Cell):

        self.cast = P.Cast()
        self.add = Add()
+        self.device_target = context.get_context('device_target')

    def construct(self, x):
        '''Construct function.'''
@ -88,8 +82,9 @@ class BaseBlock(Cell):
        out = self.bn2(out)
        out = self.relu2(out)
        # hand cast
-        identity = self.cast(identity, mstype.float16)
-        out = self.cast(out, mstype.float16)
+        if self.device_target != 'CPU':
+            identity = self.cast(identity, mstype.float16)
+            out = self.cast(out, mstype.float16)

        out = self.add(out, identity)
        return out
@ -143,7 +138,6 @@ class SphereNet(Cell):
            raise ValueError('sphere' + str(num_layers) + " IS NOT SUPPORTED! (sphere20 or sphere64)")
        self.shape = P.Shape()
        self.reshape = P.Reshape()
-        self.arg_shape = shape
        block = BaseBlock

        self.layer1 = MakeLayer(block, filter_list[0], filter_list[1], layers[0], stride=2)
@ -153,6 +147,7 @@ class SphereNet(Cell):

        self.fc = fc_with_initialize(fc_size, feature_dim)
        self.last_bn = nn.BatchNorm1d(feature_dim, momentum=0.9).add_flags_recursive(fp32=True)
+        self.last_bn_sub = nn.BatchNorm2d(feature_dim, momentum=0.9).add_flags_recursive(fp32=True)
        self.cast = P.Cast()
        self.l2norm = P.L2Normalize(axis=1)

@ -164,6 +159,7 @@ class SphereNet(Cell):
                    cell.bias.set_data(initializer('zeros', cell.bias.shape))
                else:
                    cell.weight.set_data(initializer(me_init.ReidXavierUniform(), cell.weight.shape))
+        self.device_target = context.get_context('device_target')

    def construct(self, x):
        '''Construct function.'''
@ -175,13 +171,99 @@ class SphereNet(Cell):
        b, _, _, _ = self.shape(x)
        x = self.reshape(x, (b, -1))
        x = self.fc(x)
-        x = self.last_bn(x)
-        x = self.cast(x, mstype.float16)
+
+        if self.device_target == 'Ascend':
+            x = self.last_bn(x)
+        else:
+            old_shape = x.shape
+            x = self.reshape(x, (old_shape[0], old_shape[1], 1, 1))
+            x = self.last_bn_sub(x)
+            x = self.reshape(x, old_shape)
+
+        if self.device_target != 'CPU':
+            x = self.cast(x, mstype.float16)
+
        x = self.l2norm(x)

        return x


+class SphereNet_float32(Cell):
+    '''SphereNet_float32'''
+    def __init__(self, num_layers=36, feature_dim=128, shape=(96, 64)):
+        super(SphereNet_float32, self).__init__()
+        assert num_layers in [12, 20, 36, 64], 'SphereNet num_layers should be 12, 20 or 64'
+        if num_layers == 12:
+            layers = [1, 1, 1, 1]
+            filter_list = [3, 16, 32, 64, 128]
+            fc_size = 128 * 6 * 4
+        elif num_layers == 20:
+            layers = [1, 2, 4, 1]
+            filter_list = [3, 64, 128, 256, 512]
+            fc_size = 512 * 6 * 4
+        elif num_layers == 36:
+            layers = [2, 4, 4, 2]
+            filter_list = [3, 32, 64, 128, 256]
+            fc_size = 256 * 6 * 4
+        elif num_layers == 64:
+            layers = [3, 7, 16, 3]
+            filter_list = [3, 64, 128, 256, 512]
+            fc_size = 512 * 6 * 4
+        else:
+            raise ValueError('sphere' + str(num_layers) + " IS NOT SUPPORTED! (sphere20 or sphere64)")
+        self.shape = P.Shape()
+        self.reshape = P.Reshape()
+        block = BaseBlock
+
+        self.layer1 = MakeLayer(block, filter_list[0], filter_list[1], layers[0], stride=2)
+        self.layer2 = MakeLayer(block, filter_list[1], filter_list[2], layers[1], stride=2)
+        self.layer3 = MakeLayer(block, filter_list[2], filter_list[3], layers[2], stride=2)
+        self.layer4 = MakeLayer(block, filter_list[3], filter_list[4], layers[3], stride=2)
+
+        self.fc = fc_with_initialize(fc_size, feature_dim)
+        self.last_bn = nn.BatchNorm1d(feature_dim, momentum=0.9).add_flags_recursive(fp32=True)
+        self.last_bn_sub = nn.BatchNorm2d(feature_dim, momentum=0.9).add_flags_recursive(fp32=True)
+        self.cast = P.Cast()
+        self.l2norm = P.L2Normalize(axis=1)
+
+        for _, cell in self.cells_and_names():
+            if isinstance(cell, (nn.Conv2d, nn.Dense)):
+                if cell.bias is not None:
+                    cell.weight.set_data(initializer(me_init.ReidKaimingUniform(a=math.sqrt(5), mode='fan_out'),
+                                                     cell.weight.shape))
+                    cell.bias.set_data(initializer('zeros', cell.bias.shape))
+                else:
+                    cell.weight.set_data(initializer(me_init.ReidXavierUniform(), cell.weight.shape))
+        self.device_target = context.get_context('device_target')
+
+    def construct(self, x):
+        '''Construct function.'''
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+
+        b, _, _, _ = self.shape(x)
+        x = self.reshape(x, (b, -1))
+        x = self.fc(x)
+
+        if self.device_target == 'Ascend':
+            x = self.last_bn(x)
+        else:
+            old_shape = x.shape
+            x = self.reshape(x, (old_shape[0], old_shape[1], 1, 1))
+            x = self.last_bn_sub(x)
+            x = self.reshape(x, old_shape)
+
+        if self.device_target != 'CPU':
+            x = self.cast(x, mstype.float16)
+
+        x = self.l2norm(x)
+        x = self.cast(x, mstype.float32)
+
+        return x
+
+
 class CombineMarginFC(nn.Cell):
    '''CombineMarginFC'''
    def __init__(self, embbeding_size=128, classnum=270762, s=32, a=1.0, m=0.3, b=0.2):
@ -208,12 +290,16 @@ class CombineMarginFC(nn.Cell):
        self.cast = P.Cast()
        self.on_value = Tensor(1.0, mstype.float32)
        self.off_value = Tensor(0.0, mstype.float32)
+        self.device_target = context.get_context('device_target')

    def construct(self, x, label):
        '''Construct function.'''
        w = self.normalize(self.weight)
-        cosine = self.fc(self.cast(x, mstype.float16), self.cast(w, mstype.float16))
-        cosine = self.cast(cosine, mstype.float32)
+        if self.device_target == 'CPU':
+            cosine = self.fc(x, w)
+        else:
+            cosine = self.fc(self.cast(x, mstype.float16), self.cast(w, mstype.float16))
+            cosine = self.cast(cosine, mstype.float32)
        cosine_shape = F.shape(cosine)

        one_hot_float = self.onehot(
--- a/model_zoo/research/cv/FaceRecognitionForTracking/src/reid_for_export.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/src/reid_for_export.py
@ -1,310 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""Face Recognition backbone."""
-import math
-
-import mindspore.nn as nn
-from mindspore.ops.operations import Add
-from mindspore.ops import operations as P
-from mindspore.ops import functional as F
-from mindspore.nn import Dense, Cell
-from mindspore.common import dtype as mstype
-from mindspore.common.initializer import initializer
-from mindspore import Tensor, Parameter
-
-from src import me_init
-
-
-class Cut(nn.Cell):
-
-
-
-    def construct(self, x):
-        return x
-
-
-def bn_with_initialize(out_channels):
-    bn = nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5).add_flags_recursive(fp32=True)
-    return bn
-
-
-def fc_with_initialize(input_channels, out_channels):
-    return Dense(input_channels, out_channels)
-
-
-def conv3x3(in_channels, out_channels, stride=1, groups=1, dilation=1, pad_mode="pad", padding=1, bias=True):
-    """3x3 convolution with padding"""
-
-    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride,
-                     pad_mode=pad_mode, group=groups, has_bias=bias, dilation=dilation, padding=padding)
-
-
-def conv1x1(in_channels, out_channels, pad_mode="pad", stride=1, padding=0, bias=True):
-    """1x1 convolution"""
-    return nn.Conv2d(in_channels, out_channels, pad_mode=pad_mode, kernel_size=1, stride=stride, has_bias=bias,
-                     padding=padding)
-
-
-def conv4x4(in_channels, out_channels, stride=1, groups=1, dilation=1, pad_mode="pad", padding=1, bias=True):
-    """4x4 convolution with padding"""
-
-    return nn.Conv2d(in_channels, out_channels, kernel_size=4, stride=stride,
-                     pad_mode=pad_mode, group=groups, has_bias=bias, dilation=dilation, padding=padding)
-
-
-class BaseBlock(Cell):
-    '''BaseBlock'''
-    def __init__(self, channels):
-        super(BaseBlock, self).__init__()
-
-        self.conv1 = conv3x3(channels, channels, stride=1, padding=1, bias=False)
-        self.bn1 = bn_with_initialize(channels)
-        self.relu1 = P.ReLU()
-        self.conv2 = conv3x3(channels, channels, stride=1, padding=1, bias=False)
-        self.bn2 = bn_with_initialize(channels)
-        self.relu2 = P.ReLU()
-
-        self.cast = P.Cast()
-        self.add = Add()
-
-    def construct(self, x):
-        '''Construct function.'''
-        identity = x
-        out = self.conv1(x)
-        out = self.bn1(out)
-        out = self.relu1(out)
-        out = self.conv2(out)
-        out = self.bn2(out)
-        out = self.relu2(out)
-
-        # hand cast
-        identity = self.cast(identity, mstype.float16)
-        out = self.cast(out, mstype.float16)
-
-        out = self.add(out, identity)
-        return out
-
-
-class MakeLayer(Cell):
-    '''MakeLayer'''
-    def __init__(self, block, inplanes, planes, blocks, stride=2):
-        super(MakeLayer, self).__init__()
-
-        self.conv = conv3x3(inplanes, planes, stride=stride, padding=1, bias=True)
-        self.bn = bn_with_initialize(planes)
-        self.relu = P.ReLU()
-
-        self.layers = []
-
-        for _ in range(0, blocks):
-            self.layers.append(block(planes))
-        self.layers = nn.CellList(self.layers)
-
-    def construct(self, x):
-        x = self.conv(x)
-        x = self.bn(x)
-        x = self.relu(x)
-        for block in self.layers:
-            x = block(x)
-        return x
-
-class SphereNet(Cell):
-    '''SphereNet'''
-    def __init__(self, num_layers=36, feature_dim=128, shape=(96, 64)):
-        super(SphereNet, self).__init__()
-        assert num_layers in [12, 20, 36, 64], 'SphereNet num_layers should be 12, 20 or 64'
-        if num_layers == 12:
-            layers = [1, 1, 1, 1]
-            filter_list = [3, 16, 32, 64, 128]
-            fc_size = 128 * 6 * 4
-        elif num_layers == 20:
-            layers = [1, 2, 4, 1]
-            filter_list = [3, 64, 128, 256, 512]
-            fc_size = 512 * 6 * 4
-        elif num_layers == 36:
-            layers = [2, 4, 4, 2]
-            filter_list = [3, 32, 64, 128, 256]
-            fc_size = 256 * 6 * 4
-        elif num_layers == 64:
-            layers = [3, 7, 16, 3]
-            filter_list = [3, 64, 128, 256, 512]
-            fc_size = 512 * 6 * 4
-        else:
-            raise ValueError('sphere' + str(num_layers) + " IS NOT SUPPORTED! (sphere20 or sphere64)")
-        self.shape = P.Shape()
-        self.reshape = P.Reshape()
-        self.arg_shape = shape
-        block = BaseBlock
-
-        self.layer1 = MakeLayer(block, filter_list[0], filter_list[1], layers[0], stride=2)
-        self.layer2 = MakeLayer(block, filter_list[1], filter_list[2], layers[1], stride=2)
-        self.layer3 = MakeLayer(block, filter_list[2], filter_list[3], layers[2], stride=2)
-        self.layer4 = MakeLayer(block, filter_list[3], filter_list[4], layers[3], stride=2)
-
-        self.fc = fc_with_initialize(fc_size, feature_dim)
-        self.last_bn = nn.BatchNorm1d(feature_dim, momentum=0.9).add_flags_recursive(fp32=True)
-        self.cast = P.Cast()
-        self.l2norm = P.L2Normalize(axis=1)
-
-        for _, cell in self.cells_and_names():
-            if isinstance(cell, (nn.Conv2d, nn.Dense)):
-                if cell.bias is not None:
-                    cell.weight.set_data(initializer(me_init.ReidKaimingUniform(a=math.sqrt(5), mode='fan_out'),
-                                                     cell.weight.shape))
-                    cell.bias.set_data(initializer('zeros', cell.bias.shape))
-                else:
-                    cell.weight.set_data(initializer(me_init.ReidXavierUniform(), cell.weight.shape))
-
-    def construct(self, x):
-        '''Construct function.'''
-        x = self.layer1(x)
-        x = self.layer2(x)
-        x = self.layer3(x)
-        x = self.layer4(x)
-
-        b, _, _, _ = self.shape(x)
-        x = self.reshape(x, (b, -1))
-        x = self.fc(x)
-        x = self.last_bn(x)
-        x = self.cast(x, mstype.float16)
-        x = self.l2norm(x)
-        x = self.cast(x, mstype.float32)
-
-        return x
-
-
-class CombineMarginFC(nn.Cell):
-    '''CombineMarginFC'''
-    def __init__(self, embbeding_size=128, classnum=270762, s=32, a=1.0, m=0.3, b=0.2):
-        super(CombineMarginFC, self).__init__()
-        weight_shape = [classnum, embbeding_size]
-        weight_init = initializer(me_init.ReidXavierUniform(), weight_shape)
-        self.weight = Parameter(weight_init, name='weight')
-        self.m = m
-        self.s = s
-        self.a = a
-        self.b = b
-        self.m_const = Tensor(self.m, dtype=mstype.float32)
-        self.a_const = Tensor(self.a, dtype=mstype.float32)
-        self.b_const = Tensor(self.b, dtype=mstype.float32)
-        self.s_const = Tensor(self.s, dtype=mstype.float32)
-        self.m_const_zero = Tensor(0.0, dtype=mstype.float32)
-        self.a_const_one = Tensor(1.0, dtype=mstype.float32)
-        self.normalize = P.L2Normalize(axis=1)
-        self.fc = P.MatMul(transpose_b=True)
-        self.onehot = P.OneHot()
-        self.transpose = P.Transpose()
-        self.acos = P.ACos()
-        self.cos = P.Cos()
-        self.cast = P.Cast()
-        self.on_value = Tensor(1.0, mstype.float32)
-        self.off_value = Tensor(0.0, mstype.float32)
-
-    def construct(self, x, label):
-        '''Construct function.'''
-        w = self.normalize(self.weight)
-        cosine = self.fc(self.cast(x, mstype.float16), self.cast(w, mstype.float16))
-        cosine = self.cast(cosine, mstype.float32)
-        cosine_shape = F.shape(cosine)
-
-        one_hot_float = self.onehot(
-            self.cast(label, mstype.int32), cosine_shape[1], self.on_value, self.off_value)
-        theta = self.acos(cosine)
-        theta = self.a_const * theta
-        theta = self.m_const + theta
-        body = self.cos(theta)
-        body = body - self.b_const
-        cos_mask = F.scalar_to_array(1.0) - one_hot_float
-        output = body * one_hot_float + cosine * cos_mask
-        output = output * self.s_const
-        return output, cosine
-
-
-class CombineMarginFCFp16(nn.Cell):
-    '''CombineMarginFCFp16'''
-    def __init__(self, embbeding_size=128, classnum=270762, s=32, a=1.0, m=0.3, b=0.2):
-        super(CombineMarginFCFp16, self).__init__()
-        weight_shape = [classnum, embbeding_size]
-        weight_init = initializer(me_init.ReidXavierUniform(), weight_shape)
-        self.weight = Parameter(weight_init, name='weight')
-
-        self.m = m
-        self.s = s
-        self.a = a
-        self.b = b
-        self.m_const = Tensor(self.m, dtype=mstype.float16)
-        self.a_const = Tensor(self.a, dtype=mstype.float16)
-        self.b_const = Tensor(self.b, dtype=mstype.float16)
-        self.s_const = Tensor(self.s, dtype=mstype.float16)
-        self.m_const_zero = Tensor(0, dtype=mstype.float16)
-        self.a_const_one = Tensor(1, dtype=mstype.float16)
-        self.normalize = P.L2Normalize(axis=1)
-        self.fc = P.MatMul(transpose_b=True)
-
-        self.onehot = P.OneHot()
-        self.transpose = P.Transpose()
-        self.acos = P.ACos()
-        self.cos = P.Cos()
-        self.cast = P.Cast()
-        self.on_value = Tensor(1.0, mstype.float32)
-        self.off_value = Tensor(0.0, mstype.float32)
-
-    def construct(self, x, label):
-        '''Construct function.'''
-        w = self.normalize(self.weight)
-        cosine = self.fc(x, w)
-        cosine_shape = F.shape(cosine)
-
-        one_hot_float = self.onehot(
-            self.cast(label, mstype.int32), cosine_shape[1], self.on_value, self.off_value)
-        one_hot_float = self.cast(one_hot_float, mstype.float16)
-        theta = self.acos(cosine)
-        theta = self.a_const * theta
-        theta = self.m_const + theta
-        body = self.cos(theta)
-        body = body - self.b_const
-        cos_mask = self.cast(F.scalar_to_array(1.0), mstype.float16) - one_hot_float
-        output = body * one_hot_float + cosine * cos_mask
-        output = output * self.s_const
-
-        return output, cosine
-
-
-class BuildTrainNetwork(Cell):
-    def __init__(self, network, criterion):
-        super(BuildTrainNetwork, self).__init__()
-        self.network = network
-        self.criterion = criterion
-
-    def construct(self, input_data, label):
-        output = self.network(input_data)
-        loss = self.criterion(output, label)
-        return loss
-
-
-class BuildTrainNetworkWithHead(nn.Cell):
-    '''Build TrainNetwork With Head.'''
-    def __init__(self, model, head, criterion):
-        super(BuildTrainNetworkWithHead, self).__init__()
-        self.model = model
-        self.head = head
-        self.criterion = criterion
-
-    def construct(self, input_data, labels):
-        embeddings = self.model(input_data)
-        thetas, _ = self.head(embeddings, labels)
-        loss = self.criterion(thetas, labels)
-
-        return loss
--- a/model_zoo/research/cv/FaceRecognitionForTracking/train.py
+++ b/model_zoo/research/cv/FaceRecognitionForTracking/train.py
@ -1,4 +1,4 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@ -32,31 +32,45 @@ from mindspore.nn import TrainOneStepCell
 from mindspore.communication.management import get_group_size, init, get_rank

 from src.dataset import get_de_dataset
-from src.config import reid_1p_cfg, reid_8p_cfg
+from src.config import reid_1p_cfg_ascend, reid_1p_cfg, reid_8p_cfg_ascend, reid_8p_cfg_gpu
 from src.lr_generator import step_lr
 from src.log import get_logger, AverageMeter
-from src.reid import SphereNet, CombineMarginFCFp16, BuildTrainNetworkWithHead
+from src.reid import SphereNet, CombineMarginFCFp16, BuildTrainNetworkWithHead, CombineMarginFC
 from src.loss import CrossEntropy

 warnings.filterwarnings('ignore')
-devid = int(os.getenv('DEVICE_ID'))
-context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=True, device_id=devid)
 random.seed(1)
 np.random.seed(1)

 def init_argument():
    """init config argument."""
-    parser = argparse.ArgumentParser(description='Cifar10 classification')
+    parser = argparse.ArgumentParser(description='Face Recognition For Tracking')
+    parser.add_argument('--device_target', type=str, choices=['Ascend', 'GPU', 'CPU'], default='Ascend',
+                        help='device_target')
    parser.add_argument('--is_distributed', type=int, default=0, help='if multi device')
-    parser.add_argument('--data_dir', type=str, default='', help='image label list file, e.g. /home/label.txt')
+    parser.add_argument('--data_dir', type=str, default='', help='image folders')
    parser.add_argument('--pretrained', type=str, default='', help='pretrained model to load')

    args = parser.parse_args()

+    graph_path = os.path.join('./graphs_graphmode', datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
+    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, save_graphs=True,
+                        save_graphs_path=graph_path)
+
+    if args.device_target == 'Ascend':
+        devid = int(os.getenv('DEVICE_ID'))
+        context.set_context(device_id=devid)
+
    if args.is_distributed == 0:
-        cfg = reid_1p_cfg
+        if args.device_target == 'Ascend':
+            cfg = reid_1p_cfg_ascend
+        else:
+            cfg = reid_1p_cfg
    else:
-        cfg = reid_8p_cfg
+        if args.device_target == 'Ascend':
+            cfg = reid_8p_cfg_ascend
+        else:
+            cfg = reid_8p_cfg_gpu
    cfg.pretrained = args.pretrained
    cfg.data_dir = args.data_dir

@ -81,10 +95,10 @@ def init_argument():

    # Show cfg
    cfg.logger.save_args(cfg)
-    return cfg
+    return cfg, args

 def main():
-    cfg = init_argument()
+    cfg, args = init_argument()
    loss_meter = AverageMeter('loss')
    # dataloader
    cfg.logger.info('start create dataloader')
@ -104,7 +118,10 @@ def main():
    create_network_start = time.time()

    network = SphereNet(num_layers=cfg.net_depth, feature_dim=cfg.embedding_size, shape=cfg.input_size)
-    head = CombineMarginFCFp16(embbeding_size=cfg.embedding_size, classnum=cfg.class_num)
+    if args.device_target == 'CPU':
+        head = CombineMarginFC(embbeding_size=cfg.embedding_size, classnum=cfg.class_num)
+    else:
+        head = CombineMarginFCFp16(embbeding_size=cfg.embedding_size, classnum=cfg.class_num)
    criterion = CrossEntropy()

    # load the pretrained model
@ -122,8 +139,12 @@ def main():
        cfg.logger.info('load model %s success' % cfg.pretrained)

    # mixed precision training
-    network.add_flags_recursive(fp16=True)
-    head.add_flags_recursive(fp16=True)
+    if args.device_target == 'CPU':
+        network.add_flags_recursive(fp32=True)
+        head.add_flags_recursive(fp32=True)
+    else:
+        network.add_flags_recursive(fp16=True)
+        head.add_flags_recursive(fp16=True)
    criterion.add_flags_recursive(fp32=True)

    train_net = BuildTrainNetworkWithHead(network, head, criterion)