bert can been used on ModelArts

2021-07-03 12:00:19 +08:00 · 2021-07-03 12:00:19 +08:00 · c9d5b13e37
parent 630defa9e7
commit c9d5b13e37
24 changed files with 1278 additions and 447 deletions
--- a/model_zoo/official/nlp/bert/README.md
+++ b/model_zoo/official/nlp/bert/README.md
@ -134,6 +134,56 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
  bash scripts/run_squad.sh
 ```

+- running on ModelArts
+
+If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows
+
+- Pretraining with 8 cards on ModelArts
+
+  ```python
+  # (1) Upload the code folder to S3 bucket.
+  # (2) Click to "create training task" on the website UI interface.
+  # (3) Set the code directory to "/{path}/bert" on the website UI interface.
+  # (4) Set the startup file to /{path}/bert/train.py" on the website UI interface.
+  # (5) Perform a or b.
+  #     a. setting parameters in /{path}/bert/pretrain_config.yaml.
+  #         1. Set ”enable_modelarts=True“
+  #         2. Set other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
+  #     b. adding on the website UI interface.
+  #         1. Add ”enable_modelarts=True“
+  #         3. Add other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
+  # (6) Upload the dataset to S3 bucket.
+  # (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
+  # (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
+  # (9) Under the item "resource pool selection", select the specification of 8 cards.
+  # (10) Create your job.
+  # After training, the '*.ckpt' file will be saved under the'training output file path'
+  ```
+
+- Running downstream tasks with single card on ModelArts
+
+  ```python
+  # (1) Upload the code folder to S3 bucket.
+  # (2)  Click to "create training task" on the website UI interface.
+  # (3) Set the code directory to "/{path}/bert" on the website UI interface.
+  # (4) Set the startup file to /{path}/bert/run_ner.py"(or run_pretrain.py or run_squad.py) on the website UI interface.
+  # (5) Perform a or b.
+  #     a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
+  #         1. Set ”enable_modelarts=True“
+  #         2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
+  #     b. adding on the website UI interface.
+  #         1. Add ”enable_modelarts=True“
+  #         2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
+  #     Note that vocab_file_path, label_file_path, train_data_file_path, eval_data_file_path, schema_file_path fill in the relative path relative to the path selected in step 7.
+  #     Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
+  # (6) Upload the dataset to S3 bucket.
+  # (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
+  # (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
+  # (9) Under the item "resource pool selection", select the specification of a single card.
+  # (10) Create your job.
+  # After training, the '*.ckpt' file will be saved under the'training output file path'
+  ```
+
 For distributed training on Ascend, an hccl configuration file with JSON format needs to be created in advance.

 For distributed training on single machine, [here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json) is an example hccl.json.
@ -205,7 +255,9 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
 ```shell
 .
 └─bert
+  ├─ascend310_infer
  ├─README.md
+  ├─README_CN.md
  ├─scripts
    ├─ascend_distributed_launcher
        ├─__init__.py
@ -220,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
    ├─run_distributed_pretrain_gpu.sh         # shell script for distributed pretrain on gpu
    └─run_standaloned_pretrain_gpu.sh         # shell script for distributed pretrain on gpu
  ├─src
+    ├─model_utils
+      ├── config.py                           # parse *.yaml parameter configuration file
+      ├── devcie_adapter.py                   # distinguish local/ModelArts training
+      ├── local_adapter.py                    # get related environment variables in local training
+      └── moxing_adapter.py                   # get related environment variables in ModelArts training
    ├─__init__.py
    ├─assessment_method.py                    # assessment method for evaluation
    ├─bert_for_finetune.py                    # backbone code of network
@ -227,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
    ├─bert_model.py                           # backbone code of network
    ├─finetune_data_preprocess.py             # data preprocessing
    ├─cluner_evaluation.py                    # evaluation for cluner
-    ├─config.py                               # parameter configuration for pretraining
    ├─CRF.py                                  # assessment method for clue dataset
    ├─dataset.py                              # data preprocessing
-    ├─finetune_eval_config.py                 # parameter configuration for finetuning
    ├─finetune_eval_model.py                  # backbone code of network
    ├─sample_process.py                       # sample processing
    ├─utils.py                                # util function
+  ├─pretrain_config.yaml                      # parameter configuration for pretrain
+  ├─task_ner_config.yaml                      # parameter configuration for downstream_task_ner
+  ├─task_classifier_config.yaml               # parameter configuration for downstream_task_classifier
+  ├─task_squad_config.yaml                    # parameter configuration for downstream_task_squad
  ├─pretrain_eval.py                          # train and eval net  
  ├─run_classifier.py                         # finetune and eval net for classifier task
  ├─run_ner.py                                # finetune and eval net for ner task
@ -591,8 +650,38 @@ The result will be as follows:

 ### [Export MindIR](#contents)

+- Export on local
+
 ```shell
-python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
+python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
+```
+
+- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows)
+
+```python
+# (1) Upload the code folder to S3 bucket.
+# (2) Click to "create training task" on the website UI interface.
+# (3) Set the code directory to "/{path}/bert" on the website UI interface.
+# (4) Set the startup file to /{path}/bert/export.py" on the website UI interface.
+# (5) Perform a or b.
+#     a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
+#         1. Set ”enable_modelarts: True“
+#         2. Set “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
+#         3. Set ”export_file_name: bert_ner“
+#         4. Set ”file_format：MINDIR“
+#         5. Set ”label_file_path：{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
+#     b. adding on the website UI interface.
+#         1. Add ”enable_modelarts=True“
+#         2. Add “export_ckpt_file=./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
+#         3. Add ”export_file_name=bert_ner“
+#         4. Add ”file_format=MINDIR“
+#         5. Add ”label_file_path：{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
+#     Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
+# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
+# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
+# (9) Under the item "resource pool selection", select the specification of a single card.
+# (10) Create your job.
+# You will see bert_ner.mindir under {Output file path}.
 ```

 The ckpt_file parameter is required,
--- a/model_zoo/official/nlp/bert/README_CN.md
+++ b/model_zoo/official/nlp/bert/README_CN.md
@ -139,6 +139,54 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
  bash scripts/run_squad.sh
 ```

+- 在ModelArts上运行(如果你想在modelarts上运行，可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
+
+    - 在ModelArt上使用8卡预训练
+
+    ```python
+    # (1) 上传你的代码到 s3 桶上
+    # (2) 在ModelArts上创建训练任务
+    # (3) 选择代码目录 /{path}/bert
+    # (4) 选择启动文件 /{path}/bert/run_pretrain.py
+    # (5) 执行a或b
+    #     a. 在 /{path}/bert/default_config.yaml 文件中设置参数
+    #         1. 设置 ”enable_modelarts=True“
+    #         2. 设置其它参数，其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
+    #     b. 在 网页上设置
+    #         1. 添加 ”run_distributed=True“
+    #         2. 添加其它参数，其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
+    # (6) 上传你的 数据 到 s3 桶上
+    # (7) 在网页上勾选数据存储位置，设置“训练数据集”路径
+    # (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
+    # (9) 在网页上的’资源池选择‘项目下， 选择8卡规格的资源
+    # (10) 创建训练作业
+    # 训练结束后会在'训练输出文件路径'下保存训练的权重
+    ```
+
+    - 在ModelArts上使用单卡运行下游任务
+
+    ```python
+    # (1) 上传你的代码到 s3 桶上
+    # (2) 在ModelArts上创建训练任务
+    # (3) 选择代码目录 /{path}/bert
+    # (4) 选择启动文件 /{path}/bert/run_ner.py(或 run_squad.py 或 run_classifier.py)
+    # (5) 执行a或b
+    #     a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml` 或 `task_classifier_config.yaml`) 文件中设置参数
+    #         1. 设置 ”enable_modelarts=True“
+    #         2. 设置其它参数，其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
+    #     b. 在 网页上设置
+    #         1. 添加 ”enable_modelarts=True“
+    #         2. 添加其它参数，其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
+    #     注意vocab_file_path，label_file_path，train_data_file_path，eval_data_file_path，schema_file_path填写相对于第7步所选路径的相对路径。
+    #     最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
+    # (6) 上传你的 数据 到 s3 桶上
+    # (7) 在网页上勾选数据存储位置，设置“训练数据集”路径（该路径下仅有 数据/数据zip压缩包）
+    # (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
+    # (9) 在网页上的’资源池选择‘项目下， 选择单卡规格的资源
+    # (10) 创建训练作业
+    # 训练结束后会在'训练输出文件路径'下保存训练的权重
+    ```
+
 在Ascend设备上做分布式训练时，请提前创建JSON格式的HCCL配置文件。

 在Ascend设备上做单机分布式训练时，请参考[here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json)创建HCCL配置文件。
@ -207,12 +255,14 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
 ```shell
 .
 └─bert
+  ├─ascend310_infer
  ├─README.md
+  ├─README_CN.md
  ├─scripts
    ├─ascend_distributed_launcher
        ├─__init__.py
        ├─hyper_parameter_config.ini          # 分布式预训练超参
-        ├─get_distribute_pretrain_cmd.py          # 分布式预训练脚本
+        ├─get_distribute_pretrain_cmd.py      # 分布式预训练脚本
        --README.md
    ├─run_classifier.sh                       # Ascend或GPU设备上单机分类器任务shell脚本
    ├─run_ner.sh                              # Ascend或GPU设备上单机NER任务shell脚本
@ -222,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
    ├─run_distributed_pretrain_gpu.sh         # GPU设备上分布式预训练shell脚本
    └─run_standaloned_pretrain_gpu.sh         # GPU设备上单机预训练shell脚本
  ├─src
+    ├─model_utils
+      ├── config.py                           # 解析 *.yaml参数配置文件
+      ├── devcie_adapter.py                   # 区分本地/ModelArts训练
+      ├── local_adapter.py                    # 本地训练获取相关环境变量
+      └── moxing_adapter.py                   # ModelArts训练获取相关环境变量、交换数据
    ├─__init__.py
    ├─assessment_method.py                    # 评估过程的测评方法
    ├─bert_for_finetune.py                    # 网络骨干编码
@ -229,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
    ├─bert_model.py                           # 网络骨干编码
    ├─finetune_data_preprocess.py             # 数据预处理
    ├─cluner_evaluation.py                    # 评估线索生成工具
-    ├─config.py                               # 预训练参数配置
    ├─CRF.py                                  # 线索数据集评估方法
    ├─dataset.py                              # 数据预处理
-    ├─finetune_eval_config.py                 # 微调参数配置
    ├─finetune_eval_model.py                  # 网络骨干编码
    ├─sample_process.py                       # 样例处理
    ├─utils.py                                # util函数
+  ├─pretrain_config.yaml                      # 预训练参数配置
+  ├─task_ner_config.yaml                      # 下游任务_ner 参数配置
+  ├─task_classifier_config.yaml               # 下游任务_classifier 参数配置
+  ├─task_squad_config.yaml                    # 下游任务_squad 参数配置
  ├─pretrain_eval.py                          # 训练和评估网络
  ├─run_classifier.py                         # 分类器任务的微调和评估网络
  ├─run_ner.py                                # NER任务的微调和评估网络
@ -556,8 +613,38 @@ bash scripts/squad.sh

 ## 导出mindir模型

+- 在本地导出
+
 ```shell
-python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
+python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
+```
+
+- 在ModelArts上导出
+
+```python
+# (1) 上传你的代码到 s3 桶上
+# (2) 在ModelArts上创建训练任务
+# (3) 选择代码目录 /{path}/bert
+# (4) 选择启动文件 /{path}/bert/export.py
+# (5) 执行a或b
+#     a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml` 或 `task_classifier_config.yaml`) 文件中设置参数
+#         1. 设置 ”enable_modelarts: True“
+#         2. 设置 “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
+#         3. 设置 ”export_file_name: bert_ner“
+#         4. 设置 ”file_format：MINDIR“
+#         5. 设置 ”label_file_path：{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
+#     b. 在 网页上设置
+#         1. 添加 ”enable_modelarts=True“
+#         2. 添加 “export_ckpt_file=./{path}/*.ckpt”(('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
+#         3. 添加 ”export_file_name=bert_ner“
+#         4. 添加 ”file_format=MINDIR“
+#         5. 添加 ”label_file_path：{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
+#     最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
+# (7) 在网页上勾选数据存储位置，设置“训练数据集”路径
+# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
+# (9) 在网页上的’资源池选择‘项目下， 选择单卡规格的资源
+# (10) 创建训练作业
+# 你将在{Output file path}下看到 'bert_ner.mindir'文件
 ```

 参数`ckpt_file` 是必需的，`EXPORT_FORMAT` 必须在 ["AIR", "MINDIR"]中进行选择。
--- a/model_zoo/official/nlp/bert/export.py
+++ b/model_zoo/official/nlp/bert/export.py
@ -13,74 +13,77 @@
 # limitations under the License.
 # ============================================================================
 """export checkpoint file into models"""
-import argparse
+import os
 import numpy as np

 import mindspore.common.dtype as mstype
 from mindspore import Tensor, context, load_checkpoint, export

 from src.finetune_eval_model import BertCLSModel, BertSquadModel, BertNERModel
-from src.finetune_eval_config import bert_net_cfg
 from src.bert_for_finetune import BertNER
 from src.utils import convert_labels_to_index
+from src.model_utils.config import config as args, bert_net_cfg
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.model_utils.device_adapter import get_device_id

-parser = argparse.ArgumentParser(description="Bert export")
-parser.add_argument("--device_id", type=int, default=0, help="Device id")
-parser.add_argument("--use_crf", type=str, default="false", help="Use cfg, default is false.")
-parser.add_argument("--downstream_task", type=str, choices=["NER", "CLS", "SQUAD"], default="NER",
-                    help="at present，support NER only")
-parser.add_argument("--batch_size", type=int, default=1, help="batch size")
-parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark.")
-parser.add_argument("--ckpt_file", type=str, required=True, help="Bert ckpt file.")
-parser.add_argument("--file_name", type=str, default="Bert", help="bert output air name.")
-parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format")
-parser.add_argument("--device_target", type=str, default="Ascend",
-                    choices=["Ascend", "GPU", "CPU"], help="device target (default: Ascend)")
-args = parser.parse_args()

-context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
-if args.device_target == "Ascend":
-    context.set_context(device_id=args.device_id)
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    args.device_id = get_device_id()
+    _file_dir = os.path.dirname(os.path.abspath(__file__))
+    args.export_ckpt_file = os.path.join(_file_dir, args.export_ckpt_file)
+    args.label_file_path = os.path.join(args.data_path, args.label_file_path)
+    args.export_file_name = os.path.join(_file_dir, args.export_file_name)

-label_list = []
-with open(args.label_file_path) as f:
-    for label in f:
-        label_list.append(label.strip())

-tag_to_index = convert_labels_to_index(label_list)
+@moxing_wrapper(pre_process=modelarts_pre_process)
+def run_export():
+    '''export function'''
+    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
+    if args.device_target == "Ascend":
+        context.set_context(device_id=args.device_id)

-if args.use_crf.lower() == "true":
-    max_val = max(tag_to_index.values())
-    tag_to_index["<START>"] = max_val + 1
-    tag_to_index["<STOP>"] = max_val + 2
-    number_labels = len(tag_to_index)
-else:
-    number_labels = len(tag_to_index)
+    label_list = []
+    with open(args.label_file_path) as f:
+        for label in f:
+            label_list.append(label.strip())

-if __name__ == "__main__":
-    if args.downstream_task == "NER":
+    tag_to_index = convert_labels_to_index(label_list)
+
+    if args.use_crf.lower() == "true":
+        max_val = max(tag_to_index.values())
+        tag_to_index["<START>"] = max_val + 1
+        tag_to_index["<STOP>"] = max_val + 2
+        number_labels = len(tag_to_index)
+    else:
+        number_labels = len(tag_to_index)
+    if args.description == "run_ner":
        if args.use_crf.lower() == "true":
-            net = BertNER(bert_net_cfg, args.batch_size, False, num_labels=number_labels,
+            net = BertNER(bert_net_cfg, args.export_batch_size, False, num_labels=number_labels,
                          use_crf=True, tag_to_index=tag_to_index)
        else:
            net = BertNERModel(bert_net_cfg, False, number_labels, use_crf=(args.use_crf.lower() == "true"))
-    elif args.downstream_task == "CLS":
+    elif args.description == "run_classifier":
        net = BertCLSModel(bert_net_cfg, False, num_labels=number_labels)
-    elif args.downstream_task == "SQUAD":
+    elif args.description == "run_squad":
        net = BertSquadModel(bert_net_cfg, False)
    else:
        raise ValueError("unsupported downstream task")

-    load_checkpoint(args.ckpt_file, net=net)
+    load_checkpoint(args.export_ckpt_file, net=net)
    net.set_train(False)

-    input_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
-    input_mask = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
-    token_type_id = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
-    label_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
+    input_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
+    input_mask = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
+    token_type_id = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
+    label_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)

-    if args.downstream_task == "NER" and args.use_crf.lower() == "true":
+    if args.description == "run_ner" and args.use_crf.lower() == "true":
        input_data = [input_ids, input_mask, token_type_id, label_ids]
    else:
        input_data = [input_ids, input_mask, token_type_id]
-    export(net, *input_data, file_name=args.file_name, file_format=args.file_format)
+    export(net, *input_data, file_name=args.export_file_name, file_format=args.file_format)
+
+
+if __name__ == "__main__":
+    run_export()
--- a/model_zoo/official/nlp/bert/postprocess.py
+++ b/model_zoo/official/nlp/bert/postprocess.py
@ -21,7 +21,7 @@ import os
 import argparse
 import numpy as np
 from mindspore import Tensor
-from src.finetune_eval_config import bert_net_cfg
+from src.model_utils.config import bert_net_cfg
 from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
 from run_ner import eval_result_print

--- a/model_zoo/official/nlp/bert/pretrain_config.yaml
+++ b/model_zoo/official/nlp/bert/pretrain_config.yaml
@ -0,0 +1,174 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path"
+device_target: "Ascend"
+enable_profiling: False
+
+# ==============================================================================
+description: 'run_pretrain'
+distribute: 'false'
+epoch_size: 1
+device_id: 0
+device_num: 1
+enable_save_ckpt: 'true'
+enable_lossscale: 'true'
+do_shuffle: 'true'
+enable_data_sink: 'true'
+data_sink_steps: 1
+accumulation_steps: 1
+allreduce_post_accumulation: 'true'
+save_checkpoint_path: ''
+load_checkpoint_path: ''
+save_checkpoint_steps: 1000
+train_steps: -1
+save_checkpoint_num: 1
+data_dir: ''
+schema_dir: ''
+
+# ==============================================================================
+# pretrain related
+batch_size: 32
+bert_network: 'base'
+loss_scale_value: 65536
+scale_factor: 2
+scale_window: 1000
+optimizer: 'Lamb'
+enable_global_norm: False
+# pretrain_eval related
+data_file: ""
+schema_file: ""
+finetune_ckpt: ""
+# optimizer related
+AdamWeightDecay:
+    learning_rate: 0.00003  # 3e-5
+    end_learning_rate: 0.0
+    power: 5.0
+    weight_decay: 0.00001  # 1e-5
+    decay_filter: ['layernorm', 'bias']
+    eps: 0.000001  # 1e-6
+    warmup_steps: 10000
+
+Lamb:
+    learning_rate: 0.0003  # 3e-4
+    end_learning_rate: 0.0
+    power: 2.0
+    warmup_steps: 10000
+    weight_decay: 0.01
+    decay_filter: ['layernorm', 'bias']
+    eps: 0.00000001  # 1e-8,
+
+Momentum:
+    learning_rate: 0.00002  # 2e-5
+    momentum: 0.9
+
+Thor:
+    lr_max: 0.0034
+    lr_min: 0.00003244  # 3.244e-5
+    lr_power: 1.0
+    lr_total_steps: 30000
+    damping_max: 0.05  # 5e-2
+    damping_min: 0.000001  # 1e-6
+    damping_power: 1.0
+    damping_total_steps: 30000
+    momentum: 0.9
+    weight_decay: 0.0005  # 5e-4,
+    loss_scale: 1.0
+    frequency: 100
+# ==============================================================================
+# base
+base_batch_size: 256
+base_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 768
+    num_hidden_layers: 12
+    num_attention_heads: 12
+    intermediate_size: 3072
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+# nezha
+nezha_batch_size: 96
+nezha_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 1024
+    num_hidden_layers: 24
+    num_attention_heads: 16
+    intermediate_size: 4096
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: True
+    dtype: mstype.float32
+    compute_type: mstype.float16
+# large
+large_batch_size: 24
+large_net_cfg:
+    seq_length: 512
+    vocab_size: 30522
+    hidden_size: 1024
+    num_hidden_layers: 24
+    num_attention_heads: 16
+    intermediate_size: 4096
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Url for modelarts"
+train_url: "Url for modelarts"
+data_path: "The location of the input data."
+output_path: "The location of the output file."
+device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
+enable_profiling: 'Whether enable profiling while training, default: False'
+
+distribute: "Run distribute, default is 'false'."
+epoch_size: "Epoch size, default is 1."
+enable_save_ckpt: "Enable save checkpoint, default is true."
+enable_lossscale: "Use lossscale or not, default is not."
+do_shuffle: "Enable shuffle for dataset, default is true."
+enable_data_sink: "Enable data sink, default is true."
+data_sink_steps: "Sink steps for each epoch, default is 1."
+accumulation_steps: "Accumulating gradients N times before weight update, default is 1."
+allreduce_post_accumulation: "Whether to allreduce after accumulation of N steps or after each step, default is true."
+save_checkpoint_path: "Save checkpoint path"
+load_checkpoint_path: "Load checkpoint file path"
+save_checkpoint_steps: "Save checkpoint steps, default is 1000"
+train_steps: "Training Steps, default is -1, meaning run all steps according to epoch number."
+save_checkpoint_num: "Save checkpoint numbers, default is 1."
+data_dir: "Data path, it is better to use absolute path"
+schema_dir: "Schema path, it is better to use absolute path"
+---
+# chocies
+device_target: ['Ascend', 'GPU']
+distribute: ["true", "false"]
+enable_save_ckpt: ["true", "false"]
+enable_lossscale: ["true", "false"]
+do_shuffle: ["true", "false"]
+enable_data_sink: ["true", "false"]
+allreduce_post_accumulation: ["true", "false"]
--- a/model_zoo/official/nlp/bert/pretrain_eval.py
+++ b/model_zoo/official/nlp/bert/pretrain_eval.py
@ -19,7 +19,7 @@ Bert evaluation script.

 import os
 from src import BertModel, GetMaskedLMOutput
-from src.config import cfg, bert_net_cfg
+from src.model_utils.config import config as cfg, bert_net_cfg
 import mindspore.common.dtype as mstype
 from mindspore import context
 from mindspore.common.tensor import Tensor
--- a/model_zoo/official/nlp/bert/run_classifier.py
+++ b/model_zoo/official/nlp/bert/run_classifier.py
@ -18,12 +18,6 @@ Bert finetune and evaluation script.
 '''

 import os
-import argparse
-from src.bert_for_finetune import BertFinetuneCell, BertCLS
-from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
-from src.dataset import create_classification_dataset
-from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
-from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
 import mindspore.common.dtype as mstype
 from mindspore import context
 from mindspore import log as logger
@ -33,8 +27,17 @@ from mindspore.train.model import Model
 from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
 from mindspore.train.serialization import load_checkpoint, load_param_into_net

+from src.bert_for_finetune import BertFinetuneCell, BertCLS
+from src.dataset import create_classification_dataset
+from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
+from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
+from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.model_utils.device_adapter import get_device_id
+
 _cur_dir = os.getcwd()

+
 def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
    """ do train """
    if load_checkpoint_path == "":
@ -81,6 +84,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
    callbacks = [TimeMonitor(dataset.get_dataset_size()), LossCallBack(dataset.get_dataset_size()), ckpoint_cb]
    model.train(epoch_num, dataset, callbacks=callbacks)

+
 def eval_result_print(assessment_method="accuracy", callback=None):
    """ print eval result """
    if assessment_method == "accuracy":
@ -97,6 +101,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
    else:
        raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")

+
 def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy", load_checkpoint_path=""):
    """ do eval """
    if load_checkpoint_path == "":
@ -130,51 +135,34 @@ def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy
    eval_result_print(assessment_method, callback)
    print("==============================================================")

+
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    args_opt.device_id = get_device_id()
+    _file_dir = os.path.dirname(os.path.abspath(__file__))
+    args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
+    args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
+    args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
+    if args_opt.schema_file_path:
+        args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
+    args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
+    args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
+
+
+@moxing_wrapper(pre_process=modelarts_pre_process)
 def run_classifier():
    """run classifier task"""
-    parser = argparse.ArgumentParser(description="run classifier")
-    parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
-                        help="Device type, default is Ascend")
-    parser.add_argument("--assessment_method", type=str, default="Accuracy",
-                        choices=["Mcc", "Spearman_correlation", "Accuracy", "F1"],
-                        help="assessment_method including [Mcc, Spearman_correlation, Accuracy, F1],\
-                             default is Accuracy")
-    parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
-                        help="Enable train, default is false")
-    parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
-                        help="Enable eval, default is false")
-    parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
-    parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 3.")
-    parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
-    parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
-                        help="Enable train data shuffle, default is true")
-    parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
-                        help="Enable eval data shuffle, default is false")
-    parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
-    parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
-    parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
-    parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--train_data_file_path", type=str, default="",
-                        help="Data path, it is better to use absolute path")
-    parser.add_argument("--eval_data_file_path", type=str, default="",
-                        help="Data path, it is better to use absolute path")
-    parser.add_argument("--schema_file_path", type=str, default="",
-                        help="Schema path, it is better to use absolute path")
-    args_opt = parser.parse_args()
-    epoch_num = args_opt.epoch_num
-    assessment_method = args_opt.assessment_method.lower()
-    load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
-    save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
-    load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
-
    if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
        raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
    if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
        raise ValueError("'train_data_file_path' must be set when do finetune task")
    if args_opt.do_eval.lower() == "true" and args_opt.eval_data_file_path == "":
        raise ValueError("'eval_data_file_path' must be set when do evaluation task")
-
+    epoch_num = args_opt.epoch_num
+    assessment_method = args_opt.assessment_method.lower()
+    load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
+    save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
+    load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
    target = args_opt.device_target
    if target == "Ascend":
        context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
@ -214,5 +202,6 @@ def run_classifier():
                                           do_shuffle=(args_opt.eval_data_shuffle.lower() == "true"))
        do_eval(ds, BertCLS, args_opt.num_class, assessment_method, load_finetune_checkpoint_path)

+
 if __name__ == "__main__":
    run_classifier()
--- a/model_zoo/official/nlp/bert/run_ner.py
+++ b/model_zoo/official/nlp/bert/run_ner.py
@ -18,13 +18,7 @@ Bert finetune and evaluation script.
 '''

 import os
-import argparse
 import time
-from src.bert_for_finetune import BertFinetuneCell, BertNER
-from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
-from src.dataset import create_ner_dataset
-from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
-from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
 import mindspore.common.dtype as mstype
 from mindspore import context
 from mindspore import log as logger
@ -34,6 +28,13 @@ from mindspore.train.model import Model
 from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
 from mindspore.train.serialization import load_checkpoint, load_param_into_net

+from src.bert_for_finetune import BertFinetuneCell, BertNER
+from src.dataset import create_ner_dataset
+from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
+from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
+from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.model_utils.device_adapter import get_device_id
 _cur_dir = os.getcwd()


@ -85,6 +86,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
    train_end = time.time()
    print("latency: {:.6f} s".format(train_end - train_begin))

+
 def eval_result_print(assessment_method="accuracy", callback=None):
    """print eval result"""
    if assessment_method == "accuracy":
@ -103,6 +105,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
    else:
        raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")

+
 def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_method="accuracy", data_file="",
            load_checkpoint_path="", vocab_file="", label_file="", tag_to_index=None, batch_size=1):
    """ do eval """
@ -146,41 +149,22 @@ def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_met
        print("==============================================================")


-def parse_args():
-    """set and check parameters."""
-    parser = argparse.ArgumentParser(description="run ner")
-    parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
-                        help="Device type, default is Ascend")
-    parser.add_argument("--assessment_method", type=str, default="BF1", choices=["BF1", "clue_benchmark", "MF1"],
-                        help="assessment_method include: [BF1, clue_benchmark, MF1], default is BF1")
-    parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
-                        help="Eable train, default is false")
-    parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
-                        help="Eable eval, default is false")
-    parser.add_argument("--use_crf", type=str, default="false", choices=["true", "false"],
-                        help="Use crf, default is false")
-    parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
-    parser.add_argument("--epoch_num", type=int, default=5, help="Epoch number, default is 5.")
-    parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
-                        help="Enable train data shuffle, default is true")
-    parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
-                        help="Enable eval data shuffle, default is false")
-    parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
-    parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
-    parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path, used in clue benchmark")
-    parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark")
-    parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
-    parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--train_data_file_path", type=str, default="",
-                        help="Data path, it is better to use absolute path")
-    parser.add_argument("--eval_data_file_path", type=str, default="",
-                        help="Data path, it is better to use absolute path")
-    parser.add_argument("--dataset_format", type=str, default="mindrecord", choices=["mindrecord", "tfrecord"],
-                        help="Dataset format, support mindrecord or tfrecord")
-    parser.add_argument("--schema_file_path", type=str, default="",
-                        help="Schema path, it is better to use absolute path")
-    args_opt = parser.parse_args()
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    args_opt.device_id = get_device_id()
+    _file_dir = os.path.dirname(os.path.abspath(__file__))
+    args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
+    args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
+    args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
+    if args_opt.schema_file_path:
+        args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
+    args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
+    args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
+    args_opt.label_file_path = os.path.join(args_opt.data_path, args_opt.label_file_path)
+
+
+def determine_params():
+    """Determine whether the parameters are reasonable."""
    if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
        raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
    if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
@ -193,14 +177,14 @@ def parse_args():
        raise ValueError("'label_file_path' must be set to use crf")
    if args_opt.assessment_method.lower() == "clue_benchmark" and args_opt.label_file_path == "":
        raise ValueError("'label_file_path' must be set to do clue benchmark")
-    if args_opt.assessment_method.lower() == "clue_benchmark":
-        args_opt.eval_batch_size = 1
-    return args_opt


+@moxing_wrapper(pre_process=modelarts_pre_process)
 def run_ner():
    """run ner task"""
-    args_opt = parse_args()
+    determine_params()
+    if args_opt.assessment_method.lower() == "clue_benchmark":
+        args_opt.eval_batch_size = 1
    epoch_num = args_opt.epoch_num
    assessment_method = args_opt.assessment_method.lower()
    load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
@ -262,5 +246,6 @@ def run_ner():
                args_opt.eval_data_file_path, load_finetune_checkpoint_path, args_opt.vocab_file_path,
                args_opt.label_file_path, tag_to_index, args_opt.eval_batch_size)

+
 if __name__ == "__main__":
    run_ner()
--- a/model_zoo/official/nlp/bert/run_pretrain.py
+++ b/model_zoo/official/nlp/bert/run_pretrain.py
@ -16,9 +16,7 @@
 #################pre_train bert example on zh-wiki########################
 python run_pretrain.py
 """
-
 import os
-import argparse
 import mindspore.communication.management as D
 from mindspore.communication.management import get_rank
 import mindspore.common.dtype as mstype
@ -38,8 +36,10 @@ from src import BertNetworkWithLoss, BertTrainOneStepCell, BertTrainOneStepWithL
                BertTrainOneStepWithLossScaleCellForAdam, \
                AdamWeightDecayForBert, AdamWeightDecayOp
 from src.dataset import create_bert_dataset
-from src.config import cfg, bert_net_cfg
 from src.utils import LossCallBack, BertLearningRate
+from src.model_utils.config import config as cfg, bert_net_cfg
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.model_utils.device_adapter import get_device_id, get_device_num
 _current_dir = os.path.dirname(os.path.realpath(__file__))


@ -150,60 +150,31 @@ def _check_compute_type(args_opt):
        logger.warning(warning_message)


-def argparse_init():
-    """Argparse init."""
-    parser = argparse.ArgumentParser(description='bert pre_training')
-    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
-                        help='device where the code will be implemented. (Default: Ascend)')
-    parser.add_argument("--distribute", type=str, default="false", choices=["true", "false"],
-                        help="Run distribute, default is false.")
-    parser.add_argument("--epoch_size", type=int, default="1", help="Epoch size, default is 1.")
-    parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
-    parser.add_argument("--device_num", type=int, default=1, help="Use device nums, default is 1.")
-    parser.add_argument("--enable_save_ckpt", type=str, default="true", choices=["true", "false"],
-                        help="Enable save checkpoint, default is true.")
-    parser.add_argument("--enable_lossscale", type=str, default="true", choices=["true", "false"],
-                        help="Use lossscale or not, default is not.")
-    parser.add_argument("--do_shuffle", type=str, default="true", choices=["true", "false"],
-                        help="Enable shuffle for dataset, default is true.")
-    parser.add_argument("--enable_data_sink", type=str, default="true", choices=["true", "false"],
-                        help="Enable data sink, default is true.")
-    parser.add_argument("--data_sink_steps", type=int, default="1", help="Sink steps for each epoch, default is 1.")
-    parser.add_argument("--accumulation_steps", type=int, default="1",
-                        help="Accumulating gradients N times before weight update, default is 1.")
-    parser.add_argument("--allreduce_post_accumulation", type=str, default="true", choices=["true", "false"],
-                        help="Whether to allreduce after accumulation of N steps or after each step, default is true.")
-    parser.add_argument("--save_checkpoint_path", type=str, default="", help="Save checkpoint path")
-    parser.add_argument("--load_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--save_checkpoint_steps", type=int, default=1000, help="Save checkpoint steps, "
-                                                                                "default is 1000.")
-    parser.add_argument("--train_steps", type=int, default=-1, help="Training Steps, default is -1, "
-                                                                    "meaning run all steps according to epoch number.")
-    parser.add_argument("--save_checkpoint_num", type=int, default=1, help="Save checkpoint numbers, default is 1.")
-    parser.add_argument("--data_dir", type=str, default="", help="Data path, it is better to use absolute path")
-    parser.add_argument("--schema_dir", type=str, default="", help="Schema path, it is better to use absolute path")
-
-    return parser
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    cfg.device_id = get_device_id()
+    cfg.device_num = get_device_num()
+    cfg.data_dir = cfg.data_path
+    cfg.save_checkpoint_path = os.path.join(cfg.output_path, cfg.save_checkpoint_path)


+@moxing_wrapper(pre_process=modelarts_pre_process)
 def run_pretrain():
    """pre-train bert_clue"""
-    parser = argparse_init()
-    args_opt = parser.parse_args()
-    context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target, device_id=args_opt.device_id)
+    context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target, device_id=cfg.device_id)
    context.set_context(reserve_class_name_in_scope=False)
-    _set_graph_kernel_context(args_opt.device_target)
-    ckpt_save_dir = args_opt.save_checkpoint_path
-    if args_opt.distribute == "true":
-        if args_opt.device_target == 'Ascend':
+    _set_graph_kernel_context(cfg.device_target)
+    ckpt_save_dir = cfg.save_checkpoint_path
+    if cfg.distribute == "true":
+        if cfg.device_target == 'Ascend':
            D.init()
-            device_num = args_opt.device_num
-            rank = args_opt.device_id % device_num
+            device_num = cfg.device_num
+            rank = cfg.device_id % device_num
        else:
            D.init()
            device_num = D.get_group_size()
            rank = D.get_rank()
-        ckpt_save_dir = args_opt.save_checkpoint_path + 'ckpt_' + str(get_rank()) + '/'
+        ckpt_save_dir = os.path.join(cfg.save_checkpoint_path, 'ckpt_' + str(get_rank()))

        context.reset_auto_parallel_context()
        context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True,
@ -213,57 +184,57 @@ def run_pretrain():
        rank = 0
        device_num = 1

-    _check_compute_type(args_opt)
+    _check_compute_type(cfg)

-    if args_opt.accumulation_steps > 1:
-        logger.info("accumulation steps: {}".format(args_opt.accumulation_steps))
-        logger.info("global batch size: {}".format(cfg.batch_size * args_opt.accumulation_steps))
-        if args_opt.enable_data_sink == "true":
-            args_opt.data_sink_steps *= args_opt.accumulation_steps
-            logger.info("data sink steps: {}".format(args_opt.data_sink_steps))
-        if args_opt.enable_save_ckpt == "true":
-            args_opt.save_checkpoint_steps *= args_opt.accumulation_steps
-            logger.info("save checkpoint steps: {}".format(args_opt.save_checkpoint_steps))
+    if cfg.accumulation_steps > 1:
+        logger.info("accumulation steps: {}".format(cfg.accumulation_steps))
+        logger.info("global batch size: {}".format(cfg.batch_size * cfg.accumulation_steps))
+        if cfg.enable_data_sink == "true":
+            cfg.data_sink_steps *= cfg.accumulation_steps
+            logger.info("data sink steps: {}".format(cfg.data_sink_steps))
+        if cfg.enable_save_ckpt == "true":
+            cfg.save_checkpoint_steps *= cfg.accumulation_steps
+            logger.info("save checkpoint steps: {}".format(cfg.save_checkpoint_steps))

-    ds = create_bert_dataset(device_num, rank, args_opt.do_shuffle, args_opt.data_dir, args_opt.schema_dir)
+    ds = create_bert_dataset(device_num, rank, cfg.do_shuffle, cfg.data_dir, cfg.schema_dir)
    net_with_loss = BertNetworkWithLoss(bert_net_cfg, True)

-    new_repeat_count = args_opt.epoch_size * ds.get_dataset_size() // args_opt.data_sink_steps
-    if args_opt.train_steps > 0:
-        train_steps = args_opt.train_steps * args_opt.accumulation_steps
-        new_repeat_count = min(new_repeat_count, train_steps // args_opt.data_sink_steps)
+    new_repeat_count = cfg.epoch_size * ds.get_dataset_size() // cfg.data_sink_steps
+    if cfg.train_steps > 0:
+        train_steps = cfg.train_steps * cfg.accumulation_steps
+        new_repeat_count = min(new_repeat_count, train_steps // cfg.data_sink_steps)
    else:
-        args_opt.train_steps = args_opt.epoch_size * ds.get_dataset_size() // args_opt.accumulation_steps
-        logger.info("train steps: {}".format(args_opt.train_steps))
+        cfg.train_steps = cfg.epoch_size * ds.get_dataset_size() // cfg.accumulation_steps
+        logger.info("train steps: {}".format(cfg.train_steps))

-    optimizer = _get_optimizer(args_opt, net_with_loss)
-    callback = [TimeMonitor(args_opt.data_sink_steps), LossCallBack(ds.get_dataset_size())]
-    if args_opt.enable_save_ckpt == "true" and args_opt.device_id % min(8, device_num) == 0:
-        config_ck = CheckpointConfig(save_checkpoint_steps=args_opt.save_checkpoint_steps,
-                                     keep_checkpoint_max=args_opt.save_checkpoint_num)
+    optimizer = _get_optimizer(cfg, net_with_loss)
+    callback = [TimeMonitor(cfg.data_sink_steps), LossCallBack(ds.get_dataset_size())]
+    if cfg.enable_save_ckpt == "true" and cfg.device_id % min(8, device_num) == 0:
+        config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps,
+                                     keep_checkpoint_max=cfg.save_checkpoint_num)
        ckpoint_cb = ModelCheckpoint(prefix='checkpoint_bert',
                                     directory=None if ckpt_save_dir == "" else ckpt_save_dir, config=config_ck)
        callback.append(ckpoint_cb)

-    if args_opt.load_checkpoint_path:
-        param_dict = load_checkpoint(args_opt.load_checkpoint_path)
+    if cfg.load_checkpoint_path:
+        param_dict = load_checkpoint(cfg.load_checkpoint_path)
        load_param_into_net(net_with_loss, param_dict)

-    if args_opt.enable_lossscale == "true":
+    if cfg.enable_lossscale == "true":
        update_cell = DynamicLossScaleUpdateCell(loss_scale_value=cfg.loss_scale_value,
                                                 scale_factor=cfg.scale_factor,
                                                 scale_window=cfg.scale_window)
-        accumulation_steps = args_opt.accumulation_steps
+        accumulation_steps = cfg.accumulation_steps
        enable_global_norm = cfg.enable_global_norm
        if accumulation_steps <= 1:
-            if cfg.optimizer == 'AdamWeightDecay' and args_opt.device_target == 'GPU':
+            if cfg.optimizer == 'AdamWeightDecay' and cfg.device_target == 'GPU':
                net_with_grads = BertTrainOneStepWithLossScaleCellForAdam(net_with_loss, optimizer=optimizer,
                                                                          scale_update_cell=update_cell)
            else:
                net_with_grads = BertTrainOneStepWithLossScaleCell(net_with_loss, optimizer=optimizer,
                                                                   scale_update_cell=update_cell)
        else:
-            allreduce_post = args_opt.distribute == "false" or args_opt.allreduce_post_accumulation == "true"
+            allreduce_post = cfg.distribute == "false" or cfg.allreduce_post_accumulation == "true"
            net_with_accumulation = (BertTrainAccumulationAllReducePostWithLossScaleCell if allreduce_post else
                                     BertTrainAccumulationAllReduceEachWithLossScaleCell)
            net_with_grads = net_with_accumulation(net_with_loss, optimizer=optimizer,
@ -280,7 +251,7 @@ def run_pretrain():
    model = Model(net_with_grads)
    model = ConvertModelUtils().convert_to_thor_model(model, network=net_with_grads, optimizer=optimizer)
    model.train(new_repeat_count, ds, callbacks=callback,
-                dataset_sink_mode=(args_opt.enable_data_sink == "true"), sink_size=args_opt.data_sink_steps)
+                dataset_sink_mode=(cfg.enable_data_sink == "true"), sink_size=cfg.data_sink_steps)


 if __name__ == '__main__':
--- a/model_zoo/official/nlp/bert/run_squad.py
+++ b/model_zoo/official/nlp/bert/run_squad.py
@ -17,12 +17,7 @@
 Bert finetune and evaluation script.
 '''
 import os
-import argparse
 import collections
-from src.bert_for_finetune import BertSquadCell, BertSquad
-from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
-from src.dataset import create_squad_dataset
-from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
 import mindspore.common.dtype as mstype
 from mindspore import context
 from mindspore import log as logger
@ -33,8 +28,16 @@ from mindspore.train.model import Model
 from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
 from mindspore.train.serialization import load_checkpoint, load_param_into_net

+from src.bert_for_finetune import BertSquadCell, BertSquad
+from src.dataset import create_squad_dataset
+from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
+from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
+from src.model_utils.moxing_adapter import moxing_wrapper
+from src.model_utils.device_adapter import get_device_id
+
 _cur_dir = os.getcwd()

+
 def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
    """ do train """
    if load_checkpoint_path == "":
@ -118,39 +121,24 @@ def do_eval(dataset=None, load_checkpoint_path="", eval_batch_size=1):
                end_logits=end_logits))
    return output

+
+def modelarts_pre_process():
+    '''modelarts pre process function.'''
+    args_opt.device_id = get_device_id()
+    _file_dir = os.path.dirname(os.path.abspath(__file__))
+    args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
+    args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
+    args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
+    args_opt.vocab_file_path = os.path.join(args_opt.data_path, args_opt.vocab_file_path)
+    if args_opt.schema_file_path:
+        args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
+    args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
+    args_opt.eval_json_path = os.path.join(args_opt.data_path, args_opt.eval_json_path)
+
+
+@moxing_wrapper(pre_process=modelarts_pre_process)
 def run_squad():
    """run squad task"""
-    parser = argparse.ArgumentParser(description="run squad")
-    parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
-                        help="Device type, default is Ascend")
-    parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
-                        help="Eable train, default is false")
-    parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
-                        help="Eable eval, default is false")
-    parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
-    parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 1.")
-    parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
-    parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
-                        help="Enable train data shuffle, default is true")
-    parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
-                        help="Enable eval data shuffle, default is false")
-    parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
-    parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
-    parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path")
-    parser.add_argument("--eval_json_path", type=str, default="", help="Evaluation json file path, can be eval.json")
-    parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
-    parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
-    parser.add_argument("--train_data_file_path", type=str, default="",
-                        help="Data path, it is better to use absolute path")
-    parser.add_argument("--schema_file_path", type=str, default="",
-                        help="Schema path, it is better to use absolute path")
-    args_opt = parser.parse_args()
-    epoch_num = args_opt.epoch_num
-    load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
-    save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
-    load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
-
    if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
        raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
    if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
@ -160,8 +148,10 @@ def run_squad():
            raise ValueError("'vocab_file_path' must be set when do evaluation task")
        if args_opt.eval_json_path == "":
            raise ValueError("'tokenization_file_path' must be set when do evaluation task")
-
-
+    epoch_num = args_opt.epoch_num
+    load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
+    save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
+    load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
    target = args_opt.device_target
    if target == "Ascend":
        context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
--- a/model_zoo/official/nlp/bert/scripts/run_classifier.sh
+++ b/model_zoo/official/nlp/bert/scripts/run_classifier.sh
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
 export GLOG_log_dir=${CUR_DIR}/ms_log
 export GLOG_logtostderr=0
 python ${PROJECT_DIR}/../run_classifier.py  \
+    --config_path="../../task_classifier_config.yaml" \
    --device_target="Ascend" \
    --do_train="true" \
    --do_eval="false" \
--- a/model_zoo/official/nlp/bert/scripts/run_ner.sh
+++ b/model_zoo/official/nlp/bert/scripts/run_ner.sh
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
 export GLOG_log_dir=${CUR_DIR}/ms_log
 export GLOG_logtostderr=0
 python ${PROJECT_DIR}/../run_ner.py  \
+    --config_path="../../task_ner_config.yaml" \
    --device_target="Ascend" \
    --do_train="true" \
    --do_eval="false" \
--- a/model_zoo/official/nlp/bert/scripts/run_squad.sh
+++ b/model_zoo/official/nlp/bert/scripts/run_squad.sh
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
 export GLOG_log_dir=${CUR_DIR}/ms_log
 export GLOG_logtostderr=0
 python ${PROJECT_DIR}/../run_squad.py  \
+    --config_path="../../task_squad_config.yaml" \
    --device_target="Ascend" \
    --do_train="true" \
    --do_eval="false" \
--- a/model_zoo/official/nlp/bert/src/cluener_evaluation.py
+++ b/model_zoo/official/nlp/bert/src/cluener_evaluation.py
@ -22,7 +22,7 @@ from mindspore.common.tensor import Tensor
 from src import tokenization
 from src.sample_process import label_generation, process_one_example_p
 from src.CRF import postprocess
-from src.finetune_eval_config import bert_net_cfg
+from src.model_utils.config import bert_net_cfg
 from src.score import get_result

 def process(model=None, text="", tokenizer_=None, use_crf="", tag_to_index=None, vocab=""):
--- a/model_zoo/official/nlp/bert/src/config.py
+++ b/model_zoo/official/nlp/bert/src/config.py
@ -1,129 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""
-network config setting, will be used in dataset.py, run_pretrain.py
-"""
-from easydict import EasyDict as edict
-import mindspore.common.dtype as mstype
-from .bert_model import BertConfig
-cfg = edict({
-    'batch_size': 32,
-    'bert_network': 'base',
-    'loss_scale_value': 65536,
-    'scale_factor': 2,
-    'scale_window': 1000,
-    'optimizer': 'Lamb',
-    'enable_global_norm': False,
-    'AdamWeightDecay': edict({
-        'learning_rate': 3e-5,
-        'end_learning_rate': 0.0,
-        'power': 5.0,
-        'weight_decay': 1e-5,
-        'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
-        'eps': 1e-6,
-        'warmup_steps': 10000,
-    }),
-    'Lamb': edict({
-        'learning_rate': 3e-4,
-        'end_learning_rate': 0.0,
-        'power': 2.0,
-        'warmup_steps': 10000,
-        'weight_decay': 0.01,
-        'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
-        'eps': 1e-8,
-    }),
-    'Momentum': edict({
-        'learning_rate': 2e-5,
-        'momentum': 0.9,
-    }),
-    'Thor': edict({
-        'lr_max': 0.0034,
-        'lr_min': 3.244e-5,
-        'lr_power': 1.0,
-        'lr_total_steps': 30000,
-        'damping_max': 5e-2,
-        'damping_min': 1e-6,
-        'damping_power': 1.0,
-        'damping_total_steps': 30000,
-        'momentum': 0.9,
-        'weight_decay': 5e-4,
-        'loss_scale': 1.0,
-        'frequency': 100,
-    }),
-})
-
-'''
-Including two kinds of network: \
-base: Google BERT-base(the base version of BERT model).
-large: BERT-NEZHA(a Chinese pretrained language model developed by Huawei, which introduced a improvement of \
-       Functional Relative Posetional Encoding as an effective positional encoding scheme).
-'''
-if cfg.bert_network == 'base':
-    cfg.batch_size = 64
-    bert_net_cfg = BertConfig(
-        seq_length=128,
-        vocab_size=21128,
-        hidden_size=768,
-        num_hidden_layers=12,
-        num_attention_heads=12,
-        intermediate_size=3072,
-        hidden_act="gelu",
-        hidden_dropout_prob=0.1,
-        attention_probs_dropout_prob=0.1,
-        max_position_embeddings=512,
-        type_vocab_size=2,
-        initializer_range=0.02,
-        use_relative_positions=False,
-        dtype=mstype.float32,
-        compute_type=mstype.float16
-    )
-if cfg.bert_network == 'nezha':
-    cfg.batch_size = 96
-    bert_net_cfg = BertConfig(
-        seq_length=128,
-        vocab_size=21128,
-        hidden_size=1024,
-        num_hidden_layers=24,
-        num_attention_heads=16,
-        intermediate_size=4096,
-        hidden_act="gelu",
-        hidden_dropout_prob=0.1,
-        attention_probs_dropout_prob=0.1,
-        max_position_embeddings=512,
-        type_vocab_size=2,
-        initializer_range=0.02,
-        use_relative_positions=True,
-        dtype=mstype.float32,
-        compute_type=mstype.float16
-    )
-if cfg.bert_network == 'large':
-    cfg.batch_size = 24
-    bert_net_cfg = BertConfig(
-        seq_length=512,
-        vocab_size=30522,
-        hidden_size=1024,
-        num_hidden_layers=24,
-        num_attention_heads=16,
-        intermediate_size=4096,
-        hidden_act="gelu",
-        hidden_dropout_prob=0.1,
-        attention_probs_dropout_prob=0.1,
-        max_position_embeddings=512,
-        type_vocab_size=2,
-        initializer_range=0.02,
-        use_relative_positions=False,
-        dtype=mstype.float32,
-        compute_type=mstype.float16
-    )
--- a/model_zoo/official/nlp/bert/src/dataset.py
+++ b/model_zoo/official/nlp/bert/src/dataset.py
@ -20,7 +20,7 @@ import mindspore.common.dtype as mstype
 import mindspore.dataset as ds
 import mindspore.dataset.transforms.c_transforms as C
 from mindspore import log as logger
-from .config import cfg
+from .model_utils.config import config as cfg


 def create_bert_dataset(device_num=1, rank=0, do_shuffle="true", data_dir=None, schema_dir=None):
--- a/model_zoo/official/nlp/bert/src/finetune_eval_config.py
+++ b/model_zoo/official/nlp/bert/src/finetune_eval_config.py
@ -1,63 +0,0 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-
-"""
-config settings, will be used in finetune.py
-"""
-
-from easydict import EasyDict as edict
-import mindspore.common.dtype as mstype
-from .bert_model import BertConfig
-
-optimizer_cfg = edict({
-    'optimizer': 'Lamb',
-    'AdamWeightDecay': edict({
-        'learning_rate': 2e-5,
-        'end_learning_rate': 1e-7,
-        'power': 1.0,
-        'weight_decay': 1e-5,
-        'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
-        'eps': 1e-6,
-    }),
-    'Lamb': edict({
-        'learning_rate': 2e-5,
-        'end_learning_rate': 1e-7,
-        'power': 1.0,
-        'weight_decay': 0.01,
-        'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
-    }),
-    'Momentum': edict({
-        'learning_rate': 2e-5,
-        'momentum': 0.9,
-    }),
-})
-
-bert_net_cfg = BertConfig(
-    seq_length=128,
-    vocab_size=21128,
-    hidden_size=768,
-    num_hidden_layers=12,
-    num_attention_heads=12,
-    intermediate_size=3072,
-    hidden_act="gelu",
-    hidden_dropout_prob=0.1,
-    attention_probs_dropout_prob=0.1,
-    max_position_embeddings=512,
-    type_vocab_size=2,
-    initializer_range=0.02,
-    use_relative_positions=False,
-    dtype=mstype.float32,
-    compute_type=mstype.float16,
-)
--- a/model_zoo/official/nlp/bert/src/model_utils/config.py
+++ b/model_zoo/official/nlp/bert/src/model_utils/config.py
@ -0,0 +1,200 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Parse arguments"""
+
+import os
+import ast
+import argparse
+from pprint import pformat
+import yaml
+import mindspore.common.dtype as mstype
+from src.bert_model import BertConfig
+
+
+class Config:
+    """
+    Configuration namespace. Convert dictionary to members.
+    """
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="pretrain_base_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+            # print(cfg_helper)
+        except:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def extra_operations(cfg):
+    """
+    Do extra work on config
+
+    Args:
+        config: Object after instantiation of class 'Config'.
+    """
+    def create_filter_fun(keywords):
+        return lambda x: not (True in [key in x.name.lower() for key in keywords])
+
+    if cfg.description == 'run_pretrain':
+        cfg.AdamWeightDecay.decay_filter = create_filter_fun(cfg.AdamWeightDecay.decay_filter)
+        cfg.Lamb.decay_filter = create_filter_fun(cfg.Lamb.decay_filter)
+        cfg.base_net_cfg.dtype = mstype.float32
+        cfg.base_net_cfg.compute_type = mstype.float16
+        cfg.nezha_net_cfg.dtype = mstype.float32
+        cfg.nezha_net_cfg.compute_type = mstype.float16
+        cfg.large_net_cfg.dtype = mstype.float32
+        cfg.large_net_cfg.compute_type = mstype.float16
+        if cfg.bert_network == 'base':
+            cfg.batch_size = cfg.base_batch_size
+            _bert_net_cfg = cfg.base_net_cfg
+        elif cfg.bert_network == 'nezha':
+            cfg.batch_size = cfg.nezha_batch_size
+            _bert_net_cfg = cfg.nezha_net_cfg
+        elif cfg.bert_network == 'large':
+            cfg.batch_size = cfg.large_batch_size
+            _bert_net_cfg = cfg.large_net_cfg
+        else:
+            pass
+        cfg.bert_net_cfg = BertConfig(**_bert_net_cfg.__dict__)
+    elif cfg.description == 'run_ner':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+
+    elif cfg.description == 'run_squad':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+
+    elif cfg.description == 'run_classifier':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+    else:
+        pass
+
+
+def get_config():
+    """
+    Get Config according to the yaml file and cli arguments.
+    """
+    def get_abs_path(path_relative):
+        current_dir = os.path.dirname(os.path.abspath(__file__))
+        return os.path.join(current_dir, path_relative)
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    parser.add_argument("--config_path", type=get_abs_path, default="../../pretrain_config.yaml",
+                        help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    # pprint(default)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
+    final_config = merge(args, default)
+    config_obj = Config(final_config)
+    extra_operations(config_obj)
+    return config_obj
+
+
+config = get_config()
+bert_net_cfg = config.bert_net_cfg
+if config.description in ('run_classifier', 'run_ner', 'run_squad'):
+    optimizer_cfg = config.optimizer_cfg
+
+
+if __name__ == '__main__':
+    print(config)
--- a/model_zoo/official/nlp/bert/src/model_utils/device_adapter.py
+++ b/model_zoo/official/nlp/bert/src/model_utils/device_adapter.py
@ -0,0 +1,27 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Device adapter for ModelArts"""
+
+from src.model_utils.config import config
+
+if config.enable_modelarts:
+    from src.model_utils.moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
+else:
+    from src.model_utils.local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
+
+__all__ = [
+    "get_device_id", "get_device_num", "get_rank_id", "get_job_id"
+]
--- a/model_zoo/official/nlp/bert/src/model_utils/local_adapter.py
+++ b/model_zoo/official/nlp/bert/src/model_utils/local_adapter.py
@ -0,0 +1,36 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Local adapter"""
+
+import os
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    return "Local Job"
--- a/model_zoo/official/nlp/bert/src/model_utils/moxing_adapter.py
+++ b/model_zoo/official/nlp/bert/src/model_utils/moxing_adapter.py
@ -0,0 +1,123 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Moxing adapter for ModelArts"""
+
+import os
+import functools
+from mindspore import context
+from mindspore.profiler import Profiler
+from src.model_utils.config import config
+
+_global_sync_count = 0
+
+def get_device_id():
+    device_id = os.getenv('DEVICE_ID', '0')
+    return int(device_id)
+
+
+def get_device_num():
+    device_num = os.getenv('RANK_SIZE', '1')
+    return int(device_num)
+
+
+def get_rank_id():
+    global_rank_id = os.getenv('RANK_ID', '0')
+    return int(global_rank_id)
+
+
+def get_job_id():
+    job_id = os.getenv('JOB_ID')
+    job_id = job_id if job_id != "" else "default"
+    return job_id
+
+def sync_data(from_path, to_path):
+    """
+    Download data from remote obs to local directory if the first url is remote url and the second one is local path
+    Upload data from local directory to remote obs in contrast.
+    """
+    import moxing as mox
+    import time
+    global _global_sync_count
+    sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
+    _global_sync_count += 1
+
+    # Each server contains 8 devices as most.
+    if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
+        print("from path: ", from_path)
+        print("to path: ", to_path)
+        mox.file.copy_parallel(from_path, to_path)
+        print("===finish data synchronization===")
+        try:
+            os.mknod(sync_lock)
+            # print("os.mknod({}) success".format(sync_lock))
+        except IOError:
+            pass
+        print("===save flag===")
+
+    while True:
+        if os.path.exists(sync_lock):
+            break
+        time.sleep(1)
+
+    print("Finish sync data from {} to {}.".format(from_path, to_path))
+
+
+def moxing_wrapper(pre_process=None, post_process=None):
+    """
+    Moxing wrapper to download dataset and upload outputs.
+    """
+    def wrapper(run_func):
+        @functools.wraps(run_func)
+        def wrapped_func(*args, **kwargs):
+            # Download data from data_url
+            if config.enable_modelarts:
+                if config.data_url:
+                    sync_data(config.data_url, config.data_path)
+                    print("Dataset downloaded: ", os.listdir(config.data_path))
+                if config.checkpoint_url:
+                    sync_data(config.checkpoint_url, config.load_path)
+                    print("Preload downloaded: ", os.listdir(config.load_path))
+                if config.train_url:
+                    sync_data(config.train_url, config.output_path)
+                    print("Workspace downloaded: ", os.listdir(config.output_path))
+
+                context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
+                config.device_num = get_device_num()
+                config.device_id = get_device_id()
+                if not os.path.exists(config.output_path):
+                    os.makedirs(config.output_path)
+
+                if pre_process:
+                    pre_process()
+
+            if config.enable_profiling:
+                profiler = Profiler()
+
+            run_func(*args, **kwargs)
+
+            if config.enable_profiling:
+                profiler.analyse()
+
+            # Upload data to train_url
+            if config.enable_modelarts:
+                if post_process:
+                    post_process()
+
+                if config.train_url:
+                    print("Start to copy output directory")
+                    sync_data(config.output_path, config.train_url)
+        return wrapped_func
+    return wrapper
--- a/model_zoo/official/nlp/bert/task_classifier_config.yaml
+++ b/model_zoo/official/nlp/bert/task_classifier_config.yaml
@ -0,0 +1,113 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path"
+device_target: "Ascend"
+enable_profiling: False
+
+# ==============================================================================
+description: "run_classifier"
+assessment_method: "Accuracy"
+do_train: "false"
+do_eval: "false"
+device_id: 0
+epoch_num: 3
+num_class: 2
+train_data_shuffle: "true"
+eval_data_shuffle: "false"
+train_batch_size: 32
+eval_batch_size: 1
+save_finetune_checkpoint_path: "./classifier_finetune/ckpt/"
+load_pretrain_checkpoint_path: ""
+load_finetune_checkpoint_path: ""
+train_data_file_path: ""
+eval_data_file_path: ""
+schema_file_path: ""
+# export related
+export_batch_size: 1
+export_ckpt_file: ''
+export_file_name: 'bert_classifier'
+file_format: 'AIR'
+
+optimizer_cfg:
+    optimizer: 'Lamb'
+    AdamWeightDecay:
+        learning_rate: 0.00002  # 2e-5
+        end_learning_rate: 0.0000000001  # 1e-10
+        power: 1.0
+        weight_decay: 0.00001  # 1e-5
+        decay_filter: ['layernorm', 'bias']
+        eps: 0.000001  # 1e-6
+    Lamb:
+        learning_rate: 0.00002  # 2e-5,
+        end_learning_rate: 0.0000000001  # 1e-10
+        power: 1.0
+        weight_decay: 0.01
+        decay_filter: ['layernorm', 'bias']
+    Momentum:
+        learning_rate: 0.00002  # 2e-5
+        momentum: 0.9
+
+bert_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 768
+    num_hidden_layers: 12
+    num_attention_heads: 12
+    intermediate_size: 3072
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Url for modelarts"
+train_url: "Url for modelarts"
+data_path: "The location of the input data."
+output_path: "The location of the output file."
+device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
+enable_profiling: 'Whether enable profiling while training, default: False'
+
+assessment_method: "assessment_method including [Mcc, Spearman_correlation, Accuracy, F1], default is Accuracy"
+do_train: "Enable train, default is false"
+do_eval: "Enable eval, default is false"
+device_id: "Device id, default is 0."
+epoch_num: "Epoch number, default is 3."
+num_class: "The number of class, default is 2."
+train_data_shuffle: "Enable train data shuffle, default is true"
+eval_data_shuffle: "Enable eval data shuffle, default is false"
+train_batch_size: "Train batch size, default is 32"
+eval_batch_size: "Eval batch size, default is 1"
+save_finetune_checkpoint_path: "Save checkpoint path"
+load_pretrain_checkpoint_path: "Load checkpoint file path"
+load_finetune_checkpoint_path: "Load checkpoint file path"
+train_data_file_path: "Data path, it is better to use absolute path"
+eval_data_file_path: "Data path, it is better to use absolute path"
+schema_file_path: "Schema path, it is better to use absolute path"
+
+export_batch_size: "export batch size."
+export_ckpt_file: "Bert ckpt file."
+export_file_name: "bert output air name."
+file_format: "file format"
+---
+# chocies
+device_target: ['Ascend', 'GPU']
+assessment_method: ["Mcc", "Spearman_correlation", "Accuracy", "F1"]
+do_train: ["true", "false"]
+do_eval: ["true", "false"]
+train_data_shuffle: ["true", "false"]
+eval_data_shuffle: ["true", "false"]
+file_format: ["AIR", "ONNX", "MINDIR"]
--- a/model_zoo/official/nlp/bert/task_ner_config.yaml
+++ b/model_zoo/official/nlp/bert/task_ner_config.yaml
@ -0,0 +1,121 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path"
+device_target: "Ascend"
+enable_profiling: False
+
+# ==============================================================================
+description: "run_ner"
+assessment_method: "BF1"
+do_train: "false"
+do_eval: "false"
+use_crf: "false"
+device_id: 0
+epoch_num: 5
+train_data_shuffle: "true"
+eval_data_shuffle: "false"
+train_batch_size: 32
+eval_batch_size: 1
+vocab_file_path: ""
+label_file_path: ""
+save_finetune_checkpoint_path: "./ner_finetune/ckpt/"
+load_pretrain_checkpoint_path: ""
+load_finetune_checkpoint_path: ""
+train_data_file_path: ""
+eval_data_file_path: ""
+dataset_format: "mindrecord"
+schema_file_path: ""
+# export related
+export_batch_size: 1
+export_ckpt_file: ''
+export_file_name: 'bert_ner'
+file_format: 'AIR'
+
+optimizer_cfg:
+    optimizer: 'Lamb'
+    AdamWeightDecay:
+        learning_rate: 0.00002  # 2e-5
+        end_learning_rate: 0.0000000001  # 1e-10
+        power: 1.0
+        weight_decay: 0.00001  # 1e-5
+        decay_filter: ['layernorm', 'bias']
+        eps: 0.000001  # 1e-6
+    Lamb:
+        learning_rate: 0.00002  # 2e-5,
+        end_learning_rate: 0.0000000001  # 1e-10
+        power: 1.0
+        weight_decay: 0.01
+        decay_filter: ['layernorm', 'bias']
+    Momentum:
+        learning_rate: 0.00002  # 2e-5
+        momentum: 0.9
+
+bert_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 768
+    num_hidden_layers: 12
+    num_attention_heads: 12
+    intermediate_size: 3072
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Url for modelarts"
+train_url: "Url for modelarts"
+data_path: "The location of the input data."
+output_path: "The location of the output file."
+device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
+enable_profiling: 'Whether enable profiling while training, default: False'
+
+assessment_method: "assessment_method include: [BF1, clue_benchmark, MF1], default is BF1"
+do_train: "Eable train, default is false"
+do_eval: "Eable eval, default is false"
+use_crf: "Use crf, default is false"
+device_id: "Device id, default is 0."
+epoch_num: "Epoch number, default is 5."
+train_data_shuffle: "Enable train data shuffle, default is true"
+eval_data_shuffle: "Enable eval data shuffle, default is false"
+train_batch_size: "Train batch size, default is 32"
+eval_batch_size: "Eval batch size, default is 1"
+vocab_file_path: "Vocab file path, used in clue benchmark"
+label_file_path: "label file path, used in clue benchmark"
+save_finetune_checkpoint_path: "Save checkpoint path"
+load_pretrain_checkpoint_path: "Load checkpoint file path"
+load_finetune_checkpoint_path: "Load checkpoint file path"
+train_data_file_path: "Data path, it is better to use absolute path"
+eval_data_file_path: "Data path, it is better to use absolute path"
+dataset_format: "Dataset format, support mindrecord or tfrecord"
+schema_file_path: "Schema path, it is better to use absolute path"
+
+export_batch_size: "export batch size."
+export_ckpt_file: "Bert ckpt file."
+export_file_name: "bert output air name."
+file_format: "file format"
+---
+# chocies
+device_target: ['Ascend', 'GPU']
+assessment_method: ["BF1", "clue_benchmark", "MF1"]
+do_train: ["true", "false"]
+do_eval: ["true", "false"]
+use_crf: ["true", "false"]
+train_data_shuffle: ["true", "false"]
+eval_data_shuffle: ["true", "false"]
+dataset_format: ["mindrecord", "tfrecord"]
+file_format: ["AIR", "ONNX", "MINDIR"]
--- a/model_zoo/official/nlp/bert/task_squad_config.yaml
+++ b/model_zoo/official/nlp/bert/task_squad_config.yaml
@ -0,0 +1,112 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path"
+device_target: "Ascend"
+enable_profiling: False
+
+# ==============================================================================
+description: "run_squad"
+do_train: "false"
+do_eval: "false"
+device_id: 0
+epoch_num: 3
+num_class: 2
+train_data_shuffle: "true"
+eval_data_shuffle: "false"
+train_batch_size: 32
+eval_batch_size: 1
+vocab_file_path: ""
+eval_json_path: ""
+save_finetune_checkpoint_path: "./squad_finetune/ckpt/"
+load_pretrain_checkpoint_path: ""
+load_finetune_checkpoint_path: ""
+train_data_file_path: ""
+schema_file_path: ""
+# export related
+export_batch_size: 1
+export_ckpt_file: ''
+export_file_name: 'bert_squad'
+file_format: 'AIR'
+
+optimizer_cfg:
+    optimizer: 'Lamb'
+    AdamWeightDecay:
+        learning_rate: 0.0001  # 1e-4
+        end_learning_rate: 0.00000000001  # 1e-11
+        power: 5.0
+        weight_decay: 0.001  # 1e-3
+        decay_filter: ['layernorm', 'bias']
+        eps: 0.000001  # 1e-6
+    Lamb:
+        learning_rate: 0.0001  # 1e-4,
+        end_learning_rate: 0.00000000001  # 1e-11
+        power: 5.0
+        weight_decay: 0.01
+        decay_filter: ['layernorm', 'bias']
+    Momentum:
+        learning_rate: 0.0001  # 1e-4
+        momentum: 0.9
+
+bert_net_cfg:
+    seq_length: 384
+    vocab_size: 30522
+    hidden_size: 768
+    num_hidden_layers: 12
+    num_attention_heads: 12
+    intermediate_size: 3072
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Url for modelarts"
+train_url: "Url for modelarts"
+data_path: "The location of the input data."
+output_path: "The location of the output file."
+device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
+enable_profiling: 'Whether enable profiling while training, default: False'
+
+do_train: "Eable train, default is false"
+do_eval: "Eable eval, default is false"
+device_id: "Device id, default is 0."
+epoch_num: "Epoch number, default is 1."
+num_class: "The number of class, default is 2."
+train_data_shuffle: "Enable train data shuffle, default is true"
+eval_data_shuffle: "Enable eval data shuffle, default is false"
+train_batch_size: "Train batch size, default is 32"
+eval_batch_size: "Eval batch size, default is 1"
+vocab_file_path: "Vocab file path"
+eval_json_path: "Evaluation json file path, can be eval.json"
+save_finetune_checkpoint_path: "Save checkpoint path"
+load_pretrain_checkpoint_path: "Load checkpoint file path"
+load_finetune_checkpoint_path: "Load checkpoint file path"
+train_data_file_path: "Data path, it is better to use absolute path"
+schema_file_path: "Schema path, it is better to use absolute path"
+
+export_batch_size: "export batch size."
+export_ckpt_file: "Bert ckpt file."
+export_file_name: "bert output air name."
+file_format: "file format"
+---
+# chocies
+device_target: ['Ascend', 'GPU']
+do_train: ["true", "false"]
+do_eval: ["true", "false"]
+train_data_shuffle: ["true", "false"]
+eval_data_shuffle: ["true", "false"]
+file_format: ["AIR", "ONNX", "MINDIR"]