!10684 add README_CN.md for bert_thor

From: @sl_wang Reviewed-by: @wang_zi_dong Signed-off-by:
2020-12-28 18:49:34 +08:00 · 2020-12-28 18:49:34 +08:00 · 92049c4012
parent deb4f7d46f 1566022045
commit 92049c4012
2 changed files with 303 additions and 58 deletions
--- a/model_zoo/official/nlp/bert_thor/README.md
+++ b/model_zoo/official/nlp/bert_thor/README.md
@ -11,60 +11,66 @@
    - [Script Parameters](#Script-Parameters)
    - [Training Process](#Training-Process)
    - [Evaluation Process](#Evaluation-Process)
- [Model Description](#Model-Description) 
+- [Model Description](#Model-Description)
    - [Evaluation Performance](#Evaluation-Performance)
 - [Description of Random Situation](#Description-of-Random-Situation)
 - [ModelZoo Homepage](#ModelZoo-Homepage)

 ## Description

-This is an example of training Bert with MLPerf v0.7 dataset by second-order optimizer THOR. THOR is a novel approximate seond-order optimization method in MindSpore. With fewer iterations, THOR can finish Bert-Large training in 14 minutes to a masked lm accuracy of 71.3% using 8 Ascend 910, which is much faster than SGD with Momentum. 
+This is an example of training Bert with MLPerf v0.7 dataset by second-order optimizer THOR. THOR is a novel approximate seond-order optimization method in MindSpore. With fewer iterations, THOR can finish Bert-Large training in 14 minutes to a masked lm accuracy of 71.3% using 8 Ascend 910, which is much faster than SGD with Momentum.

 ## Model Architecture
+
 The architecture of Bert contains 3 embedding layers which are used to look up token embeddings, position embeddings and segmentation embeddings; Then BERT basically consists of a stack of Transformer encoder blocks; finally bert is trained for two tasks: Masked Language Model and Next Sentence Prediction.

 ## Dataset
+
 Dataset used: MLPerf v0.7 dataset for BERT
+
 - Dataset size 9,600,000 samples
-  - Train：9,600,000 samples
-  - Test：first 10,000 consecutive samples of the training set
-  
- Data format：tfrecord
-  
+    - Train：9,600,000 samples
+    - Test：first 10,000 consecutive samples of the training set
+- Data format：tfrecord  
 - Download and preporcess datasets
-  - Note：Data will be processed using scripts in [pretraining data creation](https://github.com/mlperf/training/tree/master/language_model/tensorflow/bert),
+    - Note：Data will be processed using scripts in [pretraining data creation](https://github.com/mlperf/training/tree/master/language_model/tensorflow/bert)
  with the help of this link users could make the data files step by step.

-> The generated tfrecord has 500 parts:
-> ```
+- The generated tfrecord has 500 parts:
+
+> ```shell
 > ├── part-00000-of-00500.tfrecord        # train dataset
 > └── part-00001-of-00500.tfrecord        # train dataset
 > ```

-
 ## Features
+
 The classical first-order optimization algorithm, such as SGD, has a small amount of computation, but the convergence speed is slow and requires lots of iterations. The second-order optimization algorithm uses the second-order derivative of the target function to accelerate convergence, can converge faster to the optimal value of the model and requires less iterations. But the application of the second-order optimization algorithm in deep neural network training is not common because of the high computation cost. The main computational cost of the second-order optimization algorithm lies in the inverse operation of the second-order information matrix (Hessian matrix, FIM, etc.), and the time complexity is about O (n^3). On the basis of the existing natural gradient algorithm, we developed the available second-order optimizer THOR in MindSpore by adopting approximation and shearing of FIM information matrix to reduce the computational complexity of the inverse matrix. With eight Ascend 910 chips, THOR can complete Bert-Large training in 14min.

 ## Environment Requirements
- Hardware（Ascend/GPU）
-  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
+
+- Hardware（Ascend）
+    - Prepare hardware environment with Ascend. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
 - Framework
-  - [MindSpore](https://www.mindspore.cn/install/en)
+    - [MindSpore](https://www.mindspore.cn/install/en)
 - For more information, please check the resources below：
-  - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
-  - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
+    - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)

 ## Quick Start
-After installing MindSpore via the official website, you can start training and evaluation as follows: 
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
 - Running on Ascend

-```python
+```shell
 # run distributed training example
 sh scripts/run_distribute_pretrain.sh [DEVICE_NUM] [EPOCH_SIZE] [DATA_DIR] [SCHEMA_DIR] [RANK_TABLE_FILE]

 # run evaluation example
 python pretrain_eval.py
 ```
+
 > For distributed training, a hccl configuration file with JSON format needs to be created in advance. About the configuration file, you can refer to the [HCCL_TOOL](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).

 ## Script Description
@ -73,39 +79,38 @@ python pretrain_eval.py

 ```shell
 ├── model_zoo
-	├──official
-		├──nlp
-			├── bert_thor
-				├── README.md                                # descriptions bert_thor
-				├── scripts
-					├── run_distribute_pretrain.sh           # launch distributed training for Ascend
-					└── run_standalone_pretrain.sh           # launch single training for Ascend
-				├──src
-					├── bert_for_pre_training.py             # Bert for pretraining
-					├── bert_model.py                        # Bert model
-					├── bert_net_config.py                   # network config setting
-					├── config.py                            # config setting used in dataset.py
-					├── dataset.py                           # Data operations used in run_pretrain.py
-					├── dataset_helper.py                    # Dataset help for minddata dataset
-					├── evaluation_config.py                 # config settings, will be used in finetune.py
-					├── fused_layer_norm.py                  # fused layernorm
-					├── grad_reducer_thor.py                 # grad_reducer_thor
-					├── lr_generator.py                      # learning rate generator
-					├── model_thor.py                        # Model
-					├── thor_for_bert.py                     # thor_for_bert
-					├── thor_for_bert_arg.py                 # thor_for_bert_arg
-					├── thor_layer.py                        # thor_layer
-					└── utils.py                             # utils
-				├── pretrain_eval.py                         # infer script
-    			└── run_pretrain.py                          # train script
+    ├──official
+        ├──nlp
+            ├── bert_thor
+                ├── README.md                                # descriptions bert_thor
+                ├── scripts
+                    ├── run_distribute_pretrain.sh           # launch distributed training for Ascend
+                    └── run_standalone_pretrain.sh           # launch single training for Ascend
+                ├──src
+                    ├── bert_for_pre_training.py             # Bert for pretraining
+                    ├── bert_model.py                        # Bert model
+                    ├── bert_net_config.py                   # network config setting
+                    ├── config.py                            # config setting used in dataset.py
+                    ├── dataset.py                           # Data operations used in run_pretrain.py
+                    ├── dataset_helper.py                    # Dataset help for minddata dataset
+                    ├── evaluation_config.py                 # config settings, will be used in finetune.py
+                    ├── fused_layer_norm.py                  # fused layernorm
+                    ├── grad_reducer_thor.py                 # grad_reducer_thor
+                    ├── lr_generator.py                      # learning rate generator
+                    ├── model_thor.py                        # Model
+                    ├── thor_for_bert.py                     # thor_for_bert
+                    ├── thor_for_bert_arg.py                 # thor_for_bert_arg
+                    ├── thor_layer.py                        # thor_layer
+                    └── utils.py                             # utils
+                ├── pretrain_eval.py                         # infer script
+                └── run_pretrain.py                          # train script
 ```

-
 ### Script Parameters

 Parameters for both training and inference can be set in config.py.

-```
+```shell
 "device_target": 'Ascend',            # device where the code will be implemented
 "distribute": "false",                # Run distribute
 "epoch_size": "1",                    # Epoch size
@ -121,22 +126,26 @@ Parameters for both training and inference can be set in config.py.
 "save_checkpoint_steps",: 1000,       # Save checkpoint steps
 "save_checkpoint_num": 1,             # Save checkpoint numbers, default is 1
 ```
+
 ### Training Process

-####  Ascend 910
+#### Ascend 910

-```
+```shell
  sh run_distribute_pretrain.sh [DEVICE_NUM] [EPOCH_SIZE] [DATA_DIR] [SCHEMA_DIR] [RANK_TABLE_FILE]
 ```
+
 We need five parameters for this scripts.
+
 - `DEVICE_NUM`: the device number for distributed train.
 - `EPOCH_SIZE`: Epoch size used in the model
 - `DATA_DIR`：Data path, it is better to use absolute path.
- `SCHEMA_DIR `：Schema path, it is better to use absolute path
+- `SCHEMA_DIR`：Schema path, it is better to use absolute path
 - `RANK_TABLE_FILE`: rank table file with JSON format

 Training result will be stored in the current path, whose folder name begins with the file name that the user defines.  Under this, you can find checkpoint file together with result like the followings in log.
-```
+
+```shell
 ...
 epoch: 1, step: 1, outputs are [5.0842705], total_time_span is 795.4807660579681, step_time_span is 795.4807660579681
 epoch: 1, step: 100, outputs are [4.4550357], total_time_span is 579.6836116313934, step_time_span is 5.855390016478721
@ -163,16 +172,18 @@ epoch: 3, step: 2500, outputs are [1.265375], total_time_span is 26.374578714370
 ...
 ```

-
 ### Evaluation Process

 Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/bert_thor/LOG0/checkpoint_bert-3_1000.ckpt".
-#### Ascend 910

-```
+#### Ascend910
+
+```shell
  python pretrain_eval.py
 ```
+
 We need two parameters in evaluation_config.py for this scripts.
+
 - `DATA_FILE`：the file of evaluation dataset.
 - `FINETUNE_CKPT`: the absolute path for checkpoint file.

@ -180,14 +191,15 @@ We need two parameters in evaluation_config.py for this scripts.

 Inference result will be stored in the example path,  you can find result like the followings in log.

-```
+```shell
 step:  1000 Accuracy:  [0.27491578]
 step:  2000 Accuracy:  [0.69612586]
 step:  3000 Accuracy:  [0.71377236]
 ```
+
 ## Model Description

-### Evaluation Performance 
+### Evaluation Performance

 | Parameters                 | Ascend 910                                                   |
 | -------------------------- | -------------------------------------- |
@ -195,7 +207,7 @@ step:  3000 Accuracy:  [0.71377236]
 | Resource                   | Ascend 910，CPU 2.60GHz 192cores，Memory 755G  |
 | uploaded Date              | 08/20/2020 (month/day/year)                         |
 | MindSpore Version          | 0.6.0-alpha                                                       |
-| Dataset                    | MLPerf v0.7 dataset                                                   | 
+| Dataset                    | MLPerf v0.7 dataset                                                   |
 | Training Parameters        | total steps=3000, batch_size = 12             |
 | Optimizer                  | THOR                                                         |
 | Loss Function              | Softmax Cross Entropy                                       |
@ -207,12 +219,10 @@ step:  3000 Accuracy:  [0.71377236]
 | Checkpoint for Fine tuning | 4.5G(.ckpt file)                                         |
 | Scripts                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/nlp/bert_thor |

-
-
 ## Description of Random Situation

 In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in  run_pretrain.py.

-
 ## ModelZoo Homepage
+
 Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
--- a/model_zoo/official/nlp/bert_thor/README_CN.md
+++ b/model_zoo/official/nlp/bert_thor/README_CN.md
@ -0,0 +1,235 @@
+# BERT-THOR示例
+
+<!-- TOC -->
+
+- [BERT-THOR示例](#bert-thor示例)
+    - [概述](#概述)
+    - [模型架构](#模型架构)
+    - [数据集](#数据集)
+    - [特性](#特性)
+    - [快速入门](#快速入门)
+    - [脚本说明](#脚本说明)
+        - [脚本代码结构](#脚本代码结构)
+        - [脚本参数](#脚本参数)
+        - [训练过程](#训练过程)
+            - [Ascend 910](#ascend-910)
+        - [评估过程](#评估过程)
+            - [Ascend 910](#ascend-910-1)
+- [模型描述](#模型描述)
+        - [评估性能](#评估性能)
+    - [随机情况说明](#随机情况说明)
+    - [ModelZoo首页](#modelzoo首页)
+
+<!-- /TOC -->
+
+## 概述
+
+本文举例说明了如何用二阶优化器THOR及MLPerf v0.7数据集训练BERT网络。THOR是MindSpore中一种近似二阶优化、迭代更少的新方法。THOR采用8卡Ascend 910，能在14分钟内完成Bert-Large训练，在掩码语言模型任务上达到71.3%的准确率，远高于运用Momentum算法的SGD。
+
+## 模型架构
+
+BERT的总体架构包含3个嵌入层，用于查找令牌嵌入、位置嵌入和分割嵌入。BERT通常由一堆Transformer编码器块组成，最后被训练完成两个任务：掩码语言模型（MLM）与下句预测（NSP）。
+
+## 数据集
+
+本文运用数据集包括：用于训练BERT网络的MLPerf v0.7数据集
+
+- 数据集大小为9,600,000个样本
+    - 训练：9,600,000个样本
+    - 测试：训练集前10000个连续样本
+- 数据格式：TFRecord
+- 下载和预处理数据集
+    - 注：数据使用[预训练数据创建](https://github.com/mlperf/training/tree/master/language_model/tensorflow/bert)中的脚本进行处理。
+  您可参考此链接，一步步地制作数据文件。
+
+- 生成的TFRecord文件分成500份：
+
+> ```shell
+> ├── part-00000-of-00500.tfrecord        # 训练数据集
+> └── part-00001-of-00500.tfrecord        # 训练数据集
+> ```
+
+## 特性
+
+传统一阶优化算法，如SGD，计算量小，但收敛速度慢，迭代次数多。二阶优化算法利用目标函数的二阶导数加速收敛，收敛速度更快，迭代次数较少。但是，由于计算成本高，二阶优化算法在深度神经网络训练中的应用并不普遍。二阶优化算法的主要计算成本在于二阶信息矩阵（Hessian矩阵、Fisher信息矩阵等）的逆运算，时间复杂度约为$O (n^3)$。在现有自然梯度算法的基础上，通过近似和剪切Fisher信息矩阵，开发了MindSpore中可用的二阶优化器THOR，以降低逆矩阵的计算复杂度。THOR使用8卡Ascend 910芯片，可在14分钟内完成对Bert-Large网络的训练。
+
+环境要求
+
+- 硬件（Ascend）
+    - 使用Ascend处理器准备硬件环境。- 如需试用昇腾处理器，请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com，申请通过即可获得资源。
+- 框架
+    - [MindSpore](https://www.mindspore.cn/install)
+- 更多关于Mindspore的信息，请查看以下资源：
+    - [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
+    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/zh-CN/master/index.html)
+
+## 快速入门
+
+从官网下载安装MindSpore之后，您可以按照如下步骤进行训练和评估：
+
+- Ascend处理器上运行
+
+```shell
+# 分布式运行训练示例
+sh scripts/run_distribute_pretrain.sh [DEVICE_NUM] [EPOCH_SIZE] [DATA_DIR] [SCHEMA_DIR] [RANK_TABLE_FILE]
+
+# 运行评估示例
+python pretrain_eval.py
+```
+
+> 分布式训练，请提前创建JSON格式的HCCL配置文件。关于配置文件，可以参考[HCCL_TOOL](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)。
+
+## 脚本说明
+
+### 脚本代码结构
+
+```shell
+├── model_zoo
+  ├──official
+    ├──nlp
+      ├── bert_thor
+        ├── README.md                                # BERT-THOR相关说明
+        ├── scripts
+          ├── run_distribute_pretrain.sh           # 启动Ascend分布式训练
+          └── run_standalone_pretrain.sh           # 启动Ascend单卡训练
+        ├──src
+          ├── bert_for_pre_training.py             # 预训练BERT网络
+          ├── bert_model.py                        # BERT模型
+          ├── bert_net_config.py                   # 网络配置
+          ├── config.py                            # 配置dataset.py
+          ├── dataset.py                           # run_pretrain.py中数据操作
+          ├── dataset_helper.py                    # minddata数据集帮助函数
+          ├── evaluation_config.py                 # finetune.py会用到的配置
+          ├── fused_layer_norm.py                  # 熔接层规范
+          ├── grad_reducer_thor.py                 # Thor的梯度聚合器
+          ├── lr_generator.py                      # 学习速率生成器
+          ├── model_thor.py                        # 模型
+          ├── thor_for_bert.py                     # BERT单卡二阶优化器Thor
+          ├── thor_for_bert_arg.py                 # BERT分布式二阶优化器Thor
+          ├── thor_layer.py                        # Thor层
+          └── utils.py                             # utils
+        ├── pretrain_eval.py                         # 推理脚本
+        └── run_pretrain.py                          # 训练脚本
+
+```
+
+### 脚本参数
+
+可以在config.py中配置训练和推理参数。
+
+```shell
+"device_target": 'Ascend',            # 代码实现的设备
+"distribute": "false",                # 运行分布式训练
+"epoch_size": "1",                    # 轮次
+"enable_save_ckpt": "true",           # 启用保存检查点
+"enable_lossscale": "false",          # 是否采用损失放大
+"do_shuffle": "true",                 # 是否轮换数据集
+"save_checkpoint_path": "",           # 检查点保存路径
+"load_checkpoint_path": "",           # 检查点文件加载路径
+"train_steps": -1,                    # 根据轮次序号运行全部步骤
+"device_id": 4,                       # 设备号，默认为4
+"enable_data_sink": "true",           # 启用数据下沉模式，默认为true
+"data_sink_steps": "100",             # 每个轮次的下沉步数，默认为100
+"save_checkpoint_steps",: 1000,       # 保存检查点的步数
+"save_checkpoint_num": 1,             # 保存检查点的数量，默认为1
+```
+
+### 训练过程
+
+#### Ascend 910
+
+```shell
+  sh run_distribute_pretrain.sh [DEVICE_NUM] [EPOCH_SIZE] [DATA_DIR] [SCHEMA_DIR] [RANK_TABLE_FILE]
+```
+
+此脚本需设置如下参数：
+
+- `DEVICE_NUM`：分布式训练设备号
+- `EPOCH_SIZE`：模型中采用的轮次大小
+- `DATA_DIR`：数据路径，建议采用绝对路径
+- `SCHEMA_DIR`：模式路径，建议采用绝对路径
+- `RANK_TABLE_FILE`：JSON格式的排名表
+
+训练结果保存在当前路径，文件夹名称前缀为用户自定义文件名。您可以在此文件夹中找到检查点文件及如下日志结果：
+
+```shell
+...
+epoch: 1, step: 1, outputs are [5.0842705], total_time_span is 795.4807660579681, step_time_span is 795.4807660579681
+epoch: 1, step: 100, outputs are [4.4550357], total_time_span is 579.6836116313934, step_time_span is 5.855390016478721
+epoch: 1, step: 101, outputs are [4.804837], total_time_span is 0.6697461605072021, step_time_span is 0.6697461605072021
+epoch: 1, step: 200, outputs are [4.453913], total_time_span is 26.3735454082489, step_time_span is 0.2663994485681707
+epoch: 1, step: 201, outputs are [4.6619444], total_time_span is 0.6340286731719971, step_time_span is 0.6340286731719971
+epoch: 1, step: 300, outputs are [4.251204], total_time_span is 26.366267919540405, step_time_span is 0.2663259385812162
+epoch: 1, step: 301, outputs are [4.1396527], total_time_span is 0.6269843578338623, step_time_span is 0.6269843578338623
+epoch: 1, step: 400, outputs are [4.3717675], total_time_span is 26.37460947036743, step_time_span is 0.2664101966703781
+epoch: 1, step: 401, outputs are [4.9887424], total_time_span is 0.6313872337341309, step_time_span is 0.6313872337341309
+epoch: 1, step: 500, outputs are [4.7275505], total_time_span is 26.377585411071777, step_time_span is 0.2664402566774927
+......
+epoch: 3, step: 2001, outputs are [1.5040319], total_time_span is 0.6242287158966064, step_time_span is 0.6242287158966064
+epoch: 3, step: 2100, outputs are [1.232682], total_time_span is 26.37802791595459, step_time_span is 0.26644472642378375
+epoch: 3, step: 2101, outputs are [1.1442064], total_time_span is 0.6277685165405273, step_time_span is 0.6277685165405273
+epoch: 3, step: 2200, outputs are [1.8860981], total_time_span is 26.378745555877686, step_time_span is 0.2664519753118958
+epoch: 3, step: 2201, outputs are [1.4248213], total_time_span is 0.6273438930511475, step_time_span is 0.6273438930511475
+epoch: 3, step: 2300, outputs are [1.2741681], total_time_span is 26.374130964279175, step_time_span is 0.2664053632755472
+epoch: 3, step: 2301, outputs are [1.2470423], total_time_span is 0.6276984214782715, step_time_span is 0.6276984214782715
+epoch: 3, step: 2400, outputs are [1.2646998], total_time_span is 26.37843370437622, step_time_span is 0.2664488252967295
+epoch: 3, step: 2401, outputs are [1.2794371], total_time_span is 0.6266779899597168, step_time_span is 0.6266779899597168
+epoch: 3, step: 2500, outputs are [1.265375], total_time_span is 26.374578714370728, step_time_span is 0.2664098860037447
+
+...
+```
+
+### 评估过程
+
+运行以下命令之前，请检查用于评估的检查点路径。请将检查点路径设置为绝对全路径，例如，username/bert_thor/LOG0/checkpoint_bert-3_1000.ckpt。
+
+#### Ascend910
+
+```shell
+  python pretrain_eval.py
+```
+
+此脚本需设置两个参数：
+
+- `DATA_FILE`：评估数据集的路径。
+- `FINETUNE_CKPT`：检查点文件的绝对路径。
+
+> 在训练过程中可以生成检查点。
+
+评估结果保存在示例路径，您可以在`./eval/infer.log`中找到如下结果.
+
+```shell
+step:  1000 Accuracy:  [0.27491578]
+step:  2000 Accuracy:  [0.69612586]
+step:  3000 Accuracy:  [0.71377236]
+```
+
+## 模型描述
+
+### 评估性能
+
+| 参数 | Ascend 910                                                   |
+| -------------------------- | -------------------------------------- |
+| 模型版本              | BERT-LARGE       |
+| 资源 | Ascend 910 ;CPU 2.60GHz,192cores；内存，755G |
+| 上传日期 | 2020-08-20 |
+| MindSpore版本          | 0.6.0-beta                                                     |
+| 数据集 | MLPerf v0.7 |
+| 训练参数 |总步数=3000，batch_size=12 |
+| 优化器 | THOR |
+| 损失函数              | Softmax Cross Entropy                                          |
+| 输出 | 概率 |
+| 损失 | 1.5654222 |
+| 速度 | 275毫秒/步|
+| 总时长 | 14分钟 |
+| 参数（M） | 330 |
+| 微调检查点 | 4.5G （.ckpt文件） |
+| 脚本                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/nlp/bert_thor |
+
+## 随机情况说明
+
+dataset.py设置了create_dataset函数内的种子。我们还在train.py中使用随机种子。
+
+## ModelZoo首页
+
+ 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。