bert can been used on ModelArts

This commit is contained in:
郑彬 2021-07-03 12:00:19 +08:00
parent 630defa9e7
commit c9d5b13e37
24 changed files with 1278 additions and 447 deletions

View File

@ -134,6 +134,56 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
bash scripts/run_squad.sh
```
- running on ModelArts
If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows
- Pretraining with 8 cards on ModelArts
```python
# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
# (4) Set the startup file to /{path}/bert/train.py" on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in /{path}/bert/pretrain_config.yaml.
# 1. Set ”enable_modelarts=True“
# 2. Set other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts=True“
# 3. Add other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
# (6) Upload the dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of 8 cards.
# (10) Create your job.
# After training, the '*.ckpt' file will be saved under the'training output file path'
```
- Running downstream tasks with single card on ModelArts
```python
# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
# (4) Set the startup file to /{path}/bert/run_ner.py"(or run_pretrain.py or run_squad.py) on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
# 1. Set ”enable_modelarts=True“
# 2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts=True“
# 2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
# Note that vocab_file_path, label_file_path, train_data_file_path, eval_data_file_path, schema_file_path fill in the relative path relative to the path selected in step 7.
# Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
# (6) Upload the dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of a single card.
# (10) Create your job.
# After training, the '*.ckpt' file will be saved under the'training output file path'
```
For distributed training on Ascend, an hccl configuration file with JSON format needs to be created in advance.
For distributed training on single machine, [here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json) is an example hccl.json.
@ -205,7 +255,9 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
```shell
.
└─bert
├─ascend310_infer
├─README.md
├─README_CN.md
├─scripts
├─ascend_distributed_launcher
├─__init__.py
@ -220,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
├─run_distributed_pretrain_gpu.sh # shell script for distributed pretrain on gpu
└─run_standaloned_pretrain_gpu.sh # shell script for distributed pretrain on gpu
├─src
├─model_utils
├── config.py # parse *.yaml parameter configuration file
├── devcie_adapter.py # distinguish local/ModelArts training
├── local_adapter.py # get related environment variables in local training
└── moxing_adapter.py # get related environment variables in ModelArts training
├─__init__.py
├─assessment_method.py # assessment method for evaluation
├─bert_for_finetune.py # backbone code of network
@ -227,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
├─bert_model.py # backbone code of network
├─finetune_data_preprocess.py # data preprocessing
├─cluner_evaluation.py # evaluation for cluner
├─config.py # parameter configuration for pretraining
├─CRF.py # assessment method for clue dataset
├─dataset.py # data preprocessing
├─finetune_eval_config.py # parameter configuration for finetuning
├─finetune_eval_model.py # backbone code of network
├─sample_process.py # sample processing
├─utils.py # util function
├─pretrain_config.yaml # parameter configuration for pretrain
├─task_ner_config.yaml # parameter configuration for downstream_task_ner
├─task_classifier_config.yaml # parameter configuration for downstream_task_classifier
├─task_squad_config.yaml # parameter configuration for downstream_task_squad
├─pretrain_eval.py # train and eval net
├─run_classifier.py # finetune and eval net for classifier task
├─run_ner.py # finetune and eval net for ner task
@ -591,8 +650,38 @@ The result will be as follows:
### [Export MindIR](#contents)
- Export on local
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows)
```python
# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
# (4) Set the startup file to /{path}/bert/export.py" on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
# 1. Set ”enable_modelarts: True“
# 2. Set “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
# 3. Set ”export_file_name: bert_ner“
# 4. Set ”file_formatMINDIR“
# 5. Set ”label_file_path{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts=True“
# 2. Add “export_ckpt_file=./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
# 3. Add ”export_file_name=bert_ner“
# 4. Add ”file_format=MINDIR“
# 5. Add ”label_file_path{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
# Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of a single card.
# (10) Create your job.
# You will see bert_ner.mindir under {Output file path}.
```
The ckpt_file parameter is required,

View File

@ -139,6 +139,54 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
bash scripts/run_squad.sh
```
- 在ModelArts上运行(如果你想在modelarts上运行可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
- 在ModelArt上使用8卡预训练
```python
# (1) 上传你的代码到 s3 桶上
# (2) 在ModelArts上创建训练任务
# (3) 选择代码目录 /{path}/bert
# (4) 选择启动文件 /{path}/bert/run_pretrain.py
# (5) 执行a或b
# a. 在 /{path}/bert/default_config.yaml 文件中设置参数
# 1. 设置 ”enable_modelarts=True“
# 2. 设置其它参数,其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
# b. 在 网页上设置
# 1. 添加 ”run_distributed=True“
# 2. 添加其它参数,其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
# (6) 上传你的 数据 到 s3 桶上
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
# (9) 在网页上的’资源池选择‘项目下, 选择8卡规格的资源
# (10) 创建训练作业
# 训练结束后会在'训练输出文件路径'下保存训练的权重
```
- 在ModelArts上使用单卡运行下游任务
```python
# (1) 上传你的代码到 s3 桶上
# (2) 在ModelArts上创建训练任务
# (3) 选择代码目录 /{path}/bert
# (4) 选择启动文件 /{path}/bert/run_ner.py(或 run_squad.py 或 run_classifier.py)
# (5) 执行a或b
# a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml``task_classifier_config.yaml`) 文件中设置参数
# 1. 设置 ”enable_modelarts=True“
# 2. 设置其它参数,其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
# b. 在 网页上设置
# 1. 添加 ”enable_modelarts=True“
# 2. 添加其它参数,其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
# 注意vocab_file_pathlabel_file_pathtrain_data_file_patheval_data_file_pathschema_file_path填写相对于第7步所选路径的相对路径。
# 最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
# (6) 上传你的 数据 到 s3 桶上
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径(该路径下仅有 数据/数据zip压缩包
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
# (10) 创建训练作业
# 训练结束后会在'训练输出文件路径'下保存训练的权重
```
在Ascend设备上做分布式训练时请提前创建JSON格式的HCCL配置文件。
在Ascend设备上做单机分布式训练时请参考[here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json)创建HCCL配置文件。
@ -207,12 +255,14 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
```shell
.
└─bert
├─ascend310_infer
├─README.md
├─README_CN.md
├─scripts
├─ascend_distributed_launcher
├─__init__.py
├─hyper_parameter_config.ini # 分布式预训练超参
├─get_distribute_pretrain_cmd.py # 分布式预训练脚本
├─get_distribute_pretrain_cmd.py # 分布式预训练脚本
--README.md
├─run_classifier.sh # Ascend或GPU设备上单机分类器任务shell脚本
├─run_ner.sh # Ascend或GPU设备上单机NER任务shell脚本
@ -222,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
├─run_distributed_pretrain_gpu.sh # GPU设备上分布式预训练shell脚本
└─run_standaloned_pretrain_gpu.sh # GPU设备上单机预训练shell脚本
├─src
├─model_utils
├── config.py # 解析 *.yaml参数配置文件
├── devcie_adapter.py # 区分本地/ModelArts训练
├── local_adapter.py # 本地训练获取相关环境变量
└── moxing_adapter.py # ModelArts训练获取相关环境变量、交换数据
├─__init__.py
├─assessment_method.py # 评估过程的测评方法
├─bert_for_finetune.py # 网络骨干编码
@ -229,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
├─bert_model.py # 网络骨干编码
├─finetune_data_preprocess.py # 数据预处理
├─cluner_evaluation.py # 评估线索生成工具
├─config.py # 预训练参数配置
├─CRF.py # 线索数据集评估方法
├─dataset.py # 数据预处理
├─finetune_eval_config.py # 微调参数配置
├─finetune_eval_model.py # 网络骨干编码
├─sample_process.py # 样例处理
├─utils.py # util函数
├─pretrain_config.yaml # 预训练参数配置
├─task_ner_config.yaml # 下游任务_ner 参数配置
├─task_classifier_config.yaml # 下游任务_classifier 参数配置
├─task_squad_config.yaml # 下游任务_squad 参数配置
├─pretrain_eval.py # 训练和评估网络
├─run_classifier.py # 分类器任务的微调和评估网络
├─run_ner.py # NER任务的微调和评估网络
@ -556,8 +613,38 @@ bash scripts/squad.sh
## 导出mindir模型
- 在本地导出
```shell
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
```
- 在ModelArts上导出
```python
# (1) 上传你的代码到 s3 桶上
# (2) 在ModelArts上创建训练任务
# (3) 选择代码目录 /{path}/bert
# (4) 选择启动文件 /{path}/bert/export.py
# (5) 执行a或b
# a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml``task_classifier_config.yaml`) 文件中设置参数
# 1. 设置 ”enable_modelarts: True“
# 2. 设置 “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
# 3. 设置 ”export_file_name: bert_ner“
# 4. 设置 ”file_formatMINDIR“
# 5. 设置 ”label_file_path{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
# b. 在 网页上设置
# 1. 添加 ”enable_modelarts=True“
# 2. 添加 “export_ckpt_file=./{path}/*.ckpt”(('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
# 3. 添加 ”export_file_name=bert_ner“
# 4. 添加 ”file_format=MINDIR“
# 5. 添加 ”label_file_path{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
# 最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
# (10) 创建训练作业
# 你将在{Output file path}下看到 'bert_ner.mindir'文件
```
参数`ckpt_file` 是必需的,`EXPORT_FORMAT` 必须在 ["AIR", "MINDIR"]中进行选择。

View File

@ -13,74 +13,77 @@
# limitations under the License.
# ============================================================================
"""export checkpoint file into models"""
import argparse
import os
import numpy as np
import mindspore.common.dtype as mstype
from mindspore import Tensor, context, load_checkpoint, export
from src.finetune_eval_model import BertCLSModel, BertSquadModel, BertNERModel
from src.finetune_eval_config import bert_net_cfg
from src.bert_for_finetune import BertNER
from src.utils import convert_labels_to_index
from src.model_utils.config import config as args, bert_net_cfg
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id
parser = argparse.ArgumentParser(description="Bert export")
parser.add_argument("--device_id", type=int, default=0, help="Device id")
parser.add_argument("--use_crf", type=str, default="false", help="Use cfg, default is false.")
parser.add_argument("--downstream_task", type=str, choices=["NER", "CLS", "SQUAD"], default="NER",
help="at presentsupport NER only")
parser.add_argument("--batch_size", type=int, default=1, help="batch size")
parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark.")
parser.add_argument("--ckpt_file", type=str, required=True, help="Bert ckpt file.")
parser.add_argument("--file_name", type=str, default="Bert", help="bert output air name.")
parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format")
parser.add_argument("--device_target", type=str, default="Ascend",
choices=["Ascend", "GPU", "CPU"], help="device target (default: Ascend)")
args = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
if args.device_target == "Ascend":
context.set_context(device_id=args.device_id)
def modelarts_pre_process():
'''modelarts pre process function.'''
args.device_id = get_device_id()
_file_dir = os.path.dirname(os.path.abspath(__file__))
args.export_ckpt_file = os.path.join(_file_dir, args.export_ckpt_file)
args.label_file_path = os.path.join(args.data_path, args.label_file_path)
args.export_file_name = os.path.join(_file_dir, args.export_file_name)
label_list = []
with open(args.label_file_path) as f:
for label in f:
label_list.append(label.strip())
tag_to_index = convert_labels_to_index(label_list)
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_export():
'''export function'''
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
if args.device_target == "Ascend":
context.set_context(device_id=args.device_id)
if args.use_crf.lower() == "true":
max_val = max(tag_to_index.values())
tag_to_index["<START>"] = max_val + 1
tag_to_index["<STOP>"] = max_val + 2
number_labels = len(tag_to_index)
else:
number_labels = len(tag_to_index)
label_list = []
with open(args.label_file_path) as f:
for label in f:
label_list.append(label.strip())
if __name__ == "__main__":
if args.downstream_task == "NER":
tag_to_index = convert_labels_to_index(label_list)
if args.use_crf.lower() == "true":
max_val = max(tag_to_index.values())
tag_to_index["<START>"] = max_val + 1
tag_to_index["<STOP>"] = max_val + 2
number_labels = len(tag_to_index)
else:
number_labels = len(tag_to_index)
if args.description == "run_ner":
if args.use_crf.lower() == "true":
net = BertNER(bert_net_cfg, args.batch_size, False, num_labels=number_labels,
net = BertNER(bert_net_cfg, args.export_batch_size, False, num_labels=number_labels,
use_crf=True, tag_to_index=tag_to_index)
else:
net = BertNERModel(bert_net_cfg, False, number_labels, use_crf=(args.use_crf.lower() == "true"))
elif args.downstream_task == "CLS":
elif args.description == "run_classifier":
net = BertCLSModel(bert_net_cfg, False, num_labels=number_labels)
elif args.downstream_task == "SQUAD":
elif args.description == "run_squad":
net = BertSquadModel(bert_net_cfg, False)
else:
raise ValueError("unsupported downstream task")
load_checkpoint(args.ckpt_file, net=net)
load_checkpoint(args.export_ckpt_file, net=net)
net.set_train(False)
input_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
input_mask = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
token_type_id = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
label_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
input_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
input_mask = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
token_type_id = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
label_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
if args.downstream_task == "NER" and args.use_crf.lower() == "true":
if args.description == "run_ner" and args.use_crf.lower() == "true":
input_data = [input_ids, input_mask, token_type_id, label_ids]
else:
input_data = [input_ids, input_mask, token_type_id]
export(net, *input_data, file_name=args.file_name, file_format=args.file_format)
export(net, *input_data, file_name=args.export_file_name, file_format=args.file_format)
if __name__ == "__main__":
run_export()

View File

@ -21,7 +21,7 @@ import os
import argparse
import numpy as np
from mindspore import Tensor
from src.finetune_eval_config import bert_net_cfg
from src.model_utils.config import bert_net_cfg
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
from run_ner import eval_result_print

View File

@ -0,0 +1,174 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
enable_profiling: False
# ==============================================================================
description: 'run_pretrain'
distribute: 'false'
epoch_size: 1
device_id: 0
device_num: 1
enable_save_ckpt: 'true'
enable_lossscale: 'true'
do_shuffle: 'true'
enable_data_sink: 'true'
data_sink_steps: 1
accumulation_steps: 1
allreduce_post_accumulation: 'true'
save_checkpoint_path: ''
load_checkpoint_path: ''
save_checkpoint_steps: 1000
train_steps: -1
save_checkpoint_num: 1
data_dir: ''
schema_dir: ''
# ==============================================================================
# pretrain related
batch_size: 32
bert_network: 'base'
loss_scale_value: 65536
scale_factor: 2
scale_window: 1000
optimizer: 'Lamb'
enable_global_norm: False
# pretrain_eval related
data_file: ""
schema_file: ""
finetune_ckpt: ""
# optimizer related
AdamWeightDecay:
learning_rate: 0.00003 # 3e-5
end_learning_rate: 0.0
power: 5.0
weight_decay: 0.00001 # 1e-5
decay_filter: ['layernorm', 'bias']
eps: 0.000001 # 1e-6
warmup_steps: 10000
Lamb:
learning_rate: 0.0003 # 3e-4
end_learning_rate: 0.0
power: 2.0
warmup_steps: 10000
weight_decay: 0.01
decay_filter: ['layernorm', 'bias']
eps: 0.00000001 # 1e-8,
Momentum:
learning_rate: 0.00002 # 2e-5
momentum: 0.9
Thor:
lr_max: 0.0034
lr_min: 0.00003244 # 3.244e-5
lr_power: 1.0
lr_total_steps: 30000
damping_max: 0.05 # 5e-2
damping_min: 0.000001 # 1e-6
damping_power: 1.0
damping_total_steps: 30000
momentum: 0.9
weight_decay: 0.0005 # 5e-4,
loss_scale: 1.0
frequency: 100
# ==============================================================================
# base
base_batch_size: 256
base_net_cfg:
seq_length: 128
vocab_size: 21128
hidden_size: 768
num_hidden_layers: 12
num_attention_heads: 12
intermediate_size: 3072
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: False
dtype: mstype.float32
compute_type: mstype.float16
# nezha
nezha_batch_size: 96
nezha_net_cfg:
seq_length: 128
vocab_size: 21128
hidden_size: 1024
num_hidden_layers: 24
num_attention_heads: 16
intermediate_size: 4096
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: True
dtype: mstype.float32
compute_type: mstype.float16
# large
large_batch_size: 24
large_net_cfg:
seq_length: 512
vocab_size: 30522
hidden_size: 1024
num_hidden_layers: 24
num_attention_heads: 16
intermediate_size: 4096
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: False
dtype: mstype.float32
compute_type: mstype.float16
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Url for modelarts"
train_url: "Url for modelarts"
data_path: "The location of the input data."
output_path: "The location of the output file."
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
enable_profiling: 'Whether enable profiling while training, default: False'
distribute: "Run distribute, default is 'false'."
epoch_size: "Epoch size, default is 1."
enable_save_ckpt: "Enable save checkpoint, default is true."
enable_lossscale: "Use lossscale or not, default is not."
do_shuffle: "Enable shuffle for dataset, default is true."
enable_data_sink: "Enable data sink, default is true."
data_sink_steps: "Sink steps for each epoch, default is 1."
accumulation_steps: "Accumulating gradients N times before weight update, default is 1."
allreduce_post_accumulation: "Whether to allreduce after accumulation of N steps or after each step, default is true."
save_checkpoint_path: "Save checkpoint path"
load_checkpoint_path: "Load checkpoint file path"
save_checkpoint_steps: "Save checkpoint steps, default is 1000"
train_steps: "Training Steps, default is -1, meaning run all steps according to epoch number."
save_checkpoint_num: "Save checkpoint numbers, default is 1."
data_dir: "Data path, it is better to use absolute path"
schema_dir: "Schema path, it is better to use absolute path"
---
# chocies
device_target: ['Ascend', 'GPU']
distribute: ["true", "false"]
enable_save_ckpt: ["true", "false"]
enable_lossscale: ["true", "false"]
do_shuffle: ["true", "false"]
enable_data_sink: ["true", "false"]
allreduce_post_accumulation: ["true", "false"]

View File

@ -19,7 +19,7 @@ Bert evaluation script.
import os
from src import BertModel, GetMaskedLMOutput
from src.config import cfg, bert_net_cfg
from src.model_utils.config import config as cfg, bert_net_cfg
import mindspore.common.dtype as mstype
from mindspore import context
from mindspore.common.tensor import Tensor

View File

@ -18,12 +18,6 @@ Bert finetune and evaluation script.
'''
import os
import argparse
from src.bert_for_finetune import BertFinetuneCell, BertCLS
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
from src.dataset import create_classification_dataset
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
import mindspore.common.dtype as mstype
from mindspore import context
from mindspore import log as logger
@ -33,8 +27,17 @@ from mindspore.train.model import Model
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.bert_for_finetune import BertFinetuneCell, BertCLS
from src.dataset import create_classification_dataset
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id
_cur_dir = os.getcwd()
def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
""" do train """
if load_checkpoint_path == "":
@ -81,6 +84,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
callbacks = [TimeMonitor(dataset.get_dataset_size()), LossCallBack(dataset.get_dataset_size()), ckpoint_cb]
model.train(epoch_num, dataset, callbacks=callbacks)
def eval_result_print(assessment_method="accuracy", callback=None):
""" print eval result """
if assessment_method == "accuracy":
@ -97,6 +101,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
else:
raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")
def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy", load_checkpoint_path=""):
""" do eval """
if load_checkpoint_path == "":
@ -130,51 +135,34 @@ def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy
eval_result_print(assessment_method, callback)
print("==============================================================")
def modelarts_pre_process():
'''modelarts pre process function.'''
args_opt.device_id = get_device_id()
_file_dir = os.path.dirname(os.path.abspath(__file__))
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
if args_opt.schema_file_path:
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_classifier():
"""run classifier task"""
parser = argparse.ArgumentParser(description="run classifier")
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
help="Device type, default is Ascend")
parser.add_argument("--assessment_method", type=str, default="Accuracy",
choices=["Mcc", "Spearman_correlation", "Accuracy", "F1"],
help="assessment_method including [Mcc, Spearman_correlation, Accuracy, F1],\
default is Accuracy")
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
help="Enable train, default is false")
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
help="Enable eval, default is false")
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 3.")
parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
help="Enable train data shuffle, default is true")
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
help="Enable eval data shuffle, default is false")
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--train_data_file_path", type=str, default="",
help="Data path, it is better to use absolute path")
parser.add_argument("--eval_data_file_path", type=str, default="",
help="Data path, it is better to use absolute path")
parser.add_argument("--schema_file_path", type=str, default="",
help="Schema path, it is better to use absolute path")
args_opt = parser.parse_args()
epoch_num = args_opt.epoch_num
assessment_method = args_opt.assessment_method.lower()
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
raise ValueError("'train_data_file_path' must be set when do finetune task")
if args_opt.do_eval.lower() == "true" and args_opt.eval_data_file_path == "":
raise ValueError("'eval_data_file_path' must be set when do evaluation task")
epoch_num = args_opt.epoch_num
assessment_method = args_opt.assessment_method.lower()
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
target = args_opt.device_target
if target == "Ascend":
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
@ -214,5 +202,6 @@ def run_classifier():
do_shuffle=(args_opt.eval_data_shuffle.lower() == "true"))
do_eval(ds, BertCLS, args_opt.num_class, assessment_method, load_finetune_checkpoint_path)
if __name__ == "__main__":
run_classifier()

View File

@ -18,13 +18,7 @@ Bert finetune and evaluation script.
'''
import os
import argparse
import time
from src.bert_for_finetune import BertFinetuneCell, BertNER
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
from src.dataset import create_ner_dataset
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
import mindspore.common.dtype as mstype
from mindspore import context
from mindspore import log as logger
@ -34,6 +28,13 @@ from mindspore.train.model import Model
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.bert_for_finetune import BertFinetuneCell, BertNER
from src.dataset import create_ner_dataset
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id
_cur_dir = os.getcwd()
@ -85,6 +86,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
train_end = time.time()
print("latency: {:.6f} s".format(train_end - train_begin))
def eval_result_print(assessment_method="accuracy", callback=None):
"""print eval result"""
if assessment_method == "accuracy":
@ -103,6 +105,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
else:
raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")
def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_method="accuracy", data_file="",
load_checkpoint_path="", vocab_file="", label_file="", tag_to_index=None, batch_size=1):
""" do eval """
@ -146,41 +149,22 @@ def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_met
print("==============================================================")
def parse_args():
"""set and check parameters."""
parser = argparse.ArgumentParser(description="run ner")
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
help="Device type, default is Ascend")
parser.add_argument("--assessment_method", type=str, default="BF1", choices=["BF1", "clue_benchmark", "MF1"],
help="assessment_method include: [BF1, clue_benchmark, MF1], default is BF1")
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
help="Eable train, default is false")
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
help="Eable eval, default is false")
parser.add_argument("--use_crf", type=str, default="false", choices=["true", "false"],
help="Use crf, default is false")
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
parser.add_argument("--epoch_num", type=int, default=5, help="Epoch number, default is 5.")
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
help="Enable train data shuffle, default is true")
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
help="Enable eval data shuffle, default is false")
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path, used in clue benchmark")
parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark")
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--train_data_file_path", type=str, default="",
help="Data path, it is better to use absolute path")
parser.add_argument("--eval_data_file_path", type=str, default="",
help="Data path, it is better to use absolute path")
parser.add_argument("--dataset_format", type=str, default="mindrecord", choices=["mindrecord", "tfrecord"],
help="Dataset format, support mindrecord or tfrecord")
parser.add_argument("--schema_file_path", type=str, default="",
help="Schema path, it is better to use absolute path")
args_opt = parser.parse_args()
def modelarts_pre_process():
'''modelarts pre process function.'''
args_opt.device_id = get_device_id()
_file_dir = os.path.dirname(os.path.abspath(__file__))
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
if args_opt.schema_file_path:
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
args_opt.label_file_path = os.path.join(args_opt.data_path, args_opt.label_file_path)
def determine_params():
"""Determine whether the parameters are reasonable."""
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
@ -193,14 +177,14 @@ def parse_args():
raise ValueError("'label_file_path' must be set to use crf")
if args_opt.assessment_method.lower() == "clue_benchmark" and args_opt.label_file_path == "":
raise ValueError("'label_file_path' must be set to do clue benchmark")
if args_opt.assessment_method.lower() == "clue_benchmark":
args_opt.eval_batch_size = 1
return args_opt
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_ner():
"""run ner task"""
args_opt = parse_args()
determine_params()
if args_opt.assessment_method.lower() == "clue_benchmark":
args_opt.eval_batch_size = 1
epoch_num = args_opt.epoch_num
assessment_method = args_opt.assessment_method.lower()
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
@ -262,5 +246,6 @@ def run_ner():
args_opt.eval_data_file_path, load_finetune_checkpoint_path, args_opt.vocab_file_path,
args_opt.label_file_path, tag_to_index, args_opt.eval_batch_size)
if __name__ == "__main__":
run_ner()

View File

@ -16,9 +16,7 @@
#################pre_train bert example on zh-wiki########################
python run_pretrain.py
"""
import os
import argparse
import mindspore.communication.management as D
from mindspore.communication.management import get_rank
import mindspore.common.dtype as mstype
@ -38,8 +36,10 @@ from src import BertNetworkWithLoss, BertTrainOneStepCell, BertTrainOneStepWithL
BertTrainOneStepWithLossScaleCellForAdam, \
AdamWeightDecayForBert, AdamWeightDecayOp
from src.dataset import create_bert_dataset
from src.config import cfg, bert_net_cfg
from src.utils import LossCallBack, BertLearningRate
from src.model_utils.config import config as cfg, bert_net_cfg
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id, get_device_num
_current_dir = os.path.dirname(os.path.realpath(__file__))
@ -150,60 +150,31 @@ def _check_compute_type(args_opt):
logger.warning(warning_message)
def argparse_init():
"""Argparse init."""
parser = argparse.ArgumentParser(description='bert pre_training')
parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
help='device where the code will be implemented. (Default: Ascend)')
parser.add_argument("--distribute", type=str, default="false", choices=["true", "false"],
help="Run distribute, default is false.")
parser.add_argument("--epoch_size", type=int, default="1", help="Epoch size, default is 1.")
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
parser.add_argument("--device_num", type=int, default=1, help="Use device nums, default is 1.")
parser.add_argument("--enable_save_ckpt", type=str, default="true", choices=["true", "false"],
help="Enable save checkpoint, default is true.")
parser.add_argument("--enable_lossscale", type=str, default="true", choices=["true", "false"],
help="Use lossscale or not, default is not.")
parser.add_argument("--do_shuffle", type=str, default="true", choices=["true", "false"],
help="Enable shuffle for dataset, default is true.")
parser.add_argument("--enable_data_sink", type=str, default="true", choices=["true", "false"],
help="Enable data sink, default is true.")
parser.add_argument("--data_sink_steps", type=int, default="1", help="Sink steps for each epoch, default is 1.")
parser.add_argument("--accumulation_steps", type=int, default="1",
help="Accumulating gradients N times before weight update, default is 1.")
parser.add_argument("--allreduce_post_accumulation", type=str, default="true", choices=["true", "false"],
help="Whether to allreduce after accumulation of N steps or after each step, default is true.")
parser.add_argument("--save_checkpoint_path", type=str, default="", help="Save checkpoint path")
parser.add_argument("--load_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--save_checkpoint_steps", type=int, default=1000, help="Save checkpoint steps, "
"default is 1000.")
parser.add_argument("--train_steps", type=int, default=-1, help="Training Steps, default is -1, "
"meaning run all steps according to epoch number.")
parser.add_argument("--save_checkpoint_num", type=int, default=1, help="Save checkpoint numbers, default is 1.")
parser.add_argument("--data_dir", type=str, default="", help="Data path, it is better to use absolute path")
parser.add_argument("--schema_dir", type=str, default="", help="Schema path, it is better to use absolute path")
return parser
def modelarts_pre_process():
'''modelarts pre process function.'''
cfg.device_id = get_device_id()
cfg.device_num = get_device_num()
cfg.data_dir = cfg.data_path
cfg.save_checkpoint_path = os.path.join(cfg.output_path, cfg.save_checkpoint_path)
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_pretrain():
"""pre-train bert_clue"""
parser = argparse_init()
args_opt = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target, device_id=args_opt.device_id)
context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target, device_id=cfg.device_id)
context.set_context(reserve_class_name_in_scope=False)
_set_graph_kernel_context(args_opt.device_target)
ckpt_save_dir = args_opt.save_checkpoint_path
if args_opt.distribute == "true":
if args_opt.device_target == 'Ascend':
_set_graph_kernel_context(cfg.device_target)
ckpt_save_dir = cfg.save_checkpoint_path
if cfg.distribute == "true":
if cfg.device_target == 'Ascend':
D.init()
device_num = args_opt.device_num
rank = args_opt.device_id % device_num
device_num = cfg.device_num
rank = cfg.device_id % device_num
else:
D.init()
device_num = D.get_group_size()
rank = D.get_rank()
ckpt_save_dir = args_opt.save_checkpoint_path + 'ckpt_' + str(get_rank()) + '/'
ckpt_save_dir = os.path.join(cfg.save_checkpoint_path, 'ckpt_' + str(get_rank()))
context.reset_auto_parallel_context()
context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True,
@ -213,57 +184,57 @@ def run_pretrain():
rank = 0
device_num = 1
_check_compute_type(args_opt)
_check_compute_type(cfg)
if args_opt.accumulation_steps > 1:
logger.info("accumulation steps: {}".format(args_opt.accumulation_steps))
logger.info("global batch size: {}".format(cfg.batch_size * args_opt.accumulation_steps))
if args_opt.enable_data_sink == "true":
args_opt.data_sink_steps *= args_opt.accumulation_steps
logger.info("data sink steps: {}".format(args_opt.data_sink_steps))
if args_opt.enable_save_ckpt == "true":
args_opt.save_checkpoint_steps *= args_opt.accumulation_steps
logger.info("save checkpoint steps: {}".format(args_opt.save_checkpoint_steps))
if cfg.accumulation_steps > 1:
logger.info("accumulation steps: {}".format(cfg.accumulation_steps))
logger.info("global batch size: {}".format(cfg.batch_size * cfg.accumulation_steps))
if cfg.enable_data_sink == "true":
cfg.data_sink_steps *= cfg.accumulation_steps
logger.info("data sink steps: {}".format(cfg.data_sink_steps))
if cfg.enable_save_ckpt == "true":
cfg.save_checkpoint_steps *= cfg.accumulation_steps
logger.info("save checkpoint steps: {}".format(cfg.save_checkpoint_steps))
ds = create_bert_dataset(device_num, rank, args_opt.do_shuffle, args_opt.data_dir, args_opt.schema_dir)
ds = create_bert_dataset(device_num, rank, cfg.do_shuffle, cfg.data_dir, cfg.schema_dir)
net_with_loss = BertNetworkWithLoss(bert_net_cfg, True)
new_repeat_count = args_opt.epoch_size * ds.get_dataset_size() // args_opt.data_sink_steps
if args_opt.train_steps > 0:
train_steps = args_opt.train_steps * args_opt.accumulation_steps
new_repeat_count = min(new_repeat_count, train_steps // args_opt.data_sink_steps)
new_repeat_count = cfg.epoch_size * ds.get_dataset_size() // cfg.data_sink_steps
if cfg.train_steps > 0:
train_steps = cfg.train_steps * cfg.accumulation_steps
new_repeat_count = min(new_repeat_count, train_steps // cfg.data_sink_steps)
else:
args_opt.train_steps = args_opt.epoch_size * ds.get_dataset_size() // args_opt.accumulation_steps
logger.info("train steps: {}".format(args_opt.train_steps))
cfg.train_steps = cfg.epoch_size * ds.get_dataset_size() // cfg.accumulation_steps
logger.info("train steps: {}".format(cfg.train_steps))
optimizer = _get_optimizer(args_opt, net_with_loss)
callback = [TimeMonitor(args_opt.data_sink_steps), LossCallBack(ds.get_dataset_size())]
if args_opt.enable_save_ckpt == "true" and args_opt.device_id % min(8, device_num) == 0:
config_ck = CheckpointConfig(save_checkpoint_steps=args_opt.save_checkpoint_steps,
keep_checkpoint_max=args_opt.save_checkpoint_num)
optimizer = _get_optimizer(cfg, net_with_loss)
callback = [TimeMonitor(cfg.data_sink_steps), LossCallBack(ds.get_dataset_size())]
if cfg.enable_save_ckpt == "true" and cfg.device_id % min(8, device_num) == 0:
config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps,
keep_checkpoint_max=cfg.save_checkpoint_num)
ckpoint_cb = ModelCheckpoint(prefix='checkpoint_bert',
directory=None if ckpt_save_dir == "" else ckpt_save_dir, config=config_ck)
callback.append(ckpoint_cb)
if args_opt.load_checkpoint_path:
param_dict = load_checkpoint(args_opt.load_checkpoint_path)
if cfg.load_checkpoint_path:
param_dict = load_checkpoint(cfg.load_checkpoint_path)
load_param_into_net(net_with_loss, param_dict)
if args_opt.enable_lossscale == "true":
if cfg.enable_lossscale == "true":
update_cell = DynamicLossScaleUpdateCell(loss_scale_value=cfg.loss_scale_value,
scale_factor=cfg.scale_factor,
scale_window=cfg.scale_window)
accumulation_steps = args_opt.accumulation_steps
accumulation_steps = cfg.accumulation_steps
enable_global_norm = cfg.enable_global_norm
if accumulation_steps <= 1:
if cfg.optimizer == 'AdamWeightDecay' and args_opt.device_target == 'GPU':
if cfg.optimizer == 'AdamWeightDecay' and cfg.device_target == 'GPU':
net_with_grads = BertTrainOneStepWithLossScaleCellForAdam(net_with_loss, optimizer=optimizer,
scale_update_cell=update_cell)
else:
net_with_grads = BertTrainOneStepWithLossScaleCell(net_with_loss, optimizer=optimizer,
scale_update_cell=update_cell)
else:
allreduce_post = args_opt.distribute == "false" or args_opt.allreduce_post_accumulation == "true"
allreduce_post = cfg.distribute == "false" or cfg.allreduce_post_accumulation == "true"
net_with_accumulation = (BertTrainAccumulationAllReducePostWithLossScaleCell if allreduce_post else
BertTrainAccumulationAllReduceEachWithLossScaleCell)
net_with_grads = net_with_accumulation(net_with_loss, optimizer=optimizer,
@ -280,7 +251,7 @@ def run_pretrain():
model = Model(net_with_grads)
model = ConvertModelUtils().convert_to_thor_model(model, network=net_with_grads, optimizer=optimizer)
model.train(new_repeat_count, ds, callbacks=callback,
dataset_sink_mode=(args_opt.enable_data_sink == "true"), sink_size=args_opt.data_sink_steps)
dataset_sink_mode=(cfg.enable_data_sink == "true"), sink_size=cfg.data_sink_steps)
if __name__ == '__main__':

View File

@ -17,12 +17,7 @@
Bert finetune and evaluation script.
'''
import os
import argparse
import collections
from src.bert_for_finetune import BertSquadCell, BertSquad
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
from src.dataset import create_squad_dataset
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
import mindspore.common.dtype as mstype
from mindspore import context
from mindspore import log as logger
@ -33,8 +28,16 @@ from mindspore.train.model import Model
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from src.bert_for_finetune import BertSquadCell, BertSquad
from src.dataset import create_squad_dataset
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.device_adapter import get_device_id
_cur_dir = os.getcwd()
def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
""" do train """
if load_checkpoint_path == "":
@ -118,39 +121,24 @@ def do_eval(dataset=None, load_checkpoint_path="", eval_batch_size=1):
end_logits=end_logits))
return output
def modelarts_pre_process():
'''modelarts pre process function.'''
args_opt.device_id = get_device_id()
_file_dir = os.path.dirname(os.path.abspath(__file__))
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
args_opt.vocab_file_path = os.path.join(args_opt.data_path, args_opt.vocab_file_path)
if args_opt.schema_file_path:
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
args_opt.eval_json_path = os.path.join(args_opt.data_path, args_opt.eval_json_path)
@moxing_wrapper(pre_process=modelarts_pre_process)
def run_squad():
"""run squad task"""
parser = argparse.ArgumentParser(description="run squad")
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
help="Device type, default is Ascend")
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
help="Eable train, default is false")
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
help="Eable eval, default is false")
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 1.")
parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
help="Enable train data shuffle, default is true")
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
help="Enable eval data shuffle, default is false")
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path")
parser.add_argument("--eval_json_path", type=str, default="", help="Evaluation json file path, can be eval.json")
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
parser.add_argument("--train_data_file_path", type=str, default="",
help="Data path, it is better to use absolute path")
parser.add_argument("--schema_file_path", type=str, default="",
help="Schema path, it is better to use absolute path")
args_opt = parser.parse_args()
epoch_num = args_opt.epoch_num
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
@ -160,8 +148,10 @@ def run_squad():
raise ValueError("'vocab_file_path' must be set when do evaluation task")
if args_opt.eval_json_path == "":
raise ValueError("'tokenization_file_path' must be set when do evaluation task")
epoch_num = args_opt.epoch_num
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
target = args_opt.device_target
if target == "Ascend":
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)

View File

@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
export GLOG_log_dir=${CUR_DIR}/ms_log
export GLOG_logtostderr=0
python ${PROJECT_DIR}/../run_classifier.py \
--config_path="../../task_classifier_config.yaml" \
--device_target="Ascend" \
--do_train="true" \
--do_eval="false" \

View File

@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
export GLOG_log_dir=${CUR_DIR}/ms_log
export GLOG_logtostderr=0
python ${PROJECT_DIR}/../run_ner.py \
--config_path="../../task_ner_config.yaml" \
--device_target="Ascend" \
--do_train="true" \
--do_eval="false" \

View File

@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
export GLOG_log_dir=${CUR_DIR}/ms_log
export GLOG_logtostderr=0
python ${PROJECT_DIR}/../run_squad.py \
--config_path="../../task_squad_config.yaml" \
--device_target="Ascend" \
--do_train="true" \
--do_eval="false" \

View File

@ -22,7 +22,7 @@ from mindspore.common.tensor import Tensor
from src import tokenization
from src.sample_process import label_generation, process_one_example_p
from src.CRF import postprocess
from src.finetune_eval_config import bert_net_cfg
from src.model_utils.config import bert_net_cfg
from src.score import get_result
def process(model=None, text="", tokenizer_=None, use_crf="", tag_to_index=None, vocab=""):

View File

@ -1,129 +0,0 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
network config setting, will be used in dataset.py, run_pretrain.py
"""
from easydict import EasyDict as edict
import mindspore.common.dtype as mstype
from .bert_model import BertConfig
cfg = edict({
'batch_size': 32,
'bert_network': 'base',
'loss_scale_value': 65536,
'scale_factor': 2,
'scale_window': 1000,
'optimizer': 'Lamb',
'enable_global_norm': False,
'AdamWeightDecay': edict({
'learning_rate': 3e-5,
'end_learning_rate': 0.0,
'power': 5.0,
'weight_decay': 1e-5,
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
'eps': 1e-6,
'warmup_steps': 10000,
}),
'Lamb': edict({
'learning_rate': 3e-4,
'end_learning_rate': 0.0,
'power': 2.0,
'warmup_steps': 10000,
'weight_decay': 0.01,
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
'eps': 1e-8,
}),
'Momentum': edict({
'learning_rate': 2e-5,
'momentum': 0.9,
}),
'Thor': edict({
'lr_max': 0.0034,
'lr_min': 3.244e-5,
'lr_power': 1.0,
'lr_total_steps': 30000,
'damping_max': 5e-2,
'damping_min': 1e-6,
'damping_power': 1.0,
'damping_total_steps': 30000,
'momentum': 0.9,
'weight_decay': 5e-4,
'loss_scale': 1.0,
'frequency': 100,
}),
})
'''
Including two kinds of network: \
base: Google BERT-base(the base version of BERT model).
large: BERT-NEZHA(a Chinese pretrained language model developed by Huawei, which introduced a improvement of \
Functional Relative Posetional Encoding as an effective positional encoding scheme).
'''
if cfg.bert_network == 'base':
cfg.batch_size = 64
bert_net_cfg = BertConfig(
seq_length=128,
vocab_size=21128,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
use_relative_positions=False,
dtype=mstype.float32,
compute_type=mstype.float16
)
if cfg.bert_network == 'nezha':
cfg.batch_size = 96
bert_net_cfg = BertConfig(
seq_length=128,
vocab_size=21128,
hidden_size=1024,
num_hidden_layers=24,
num_attention_heads=16,
intermediate_size=4096,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
use_relative_positions=True,
dtype=mstype.float32,
compute_type=mstype.float16
)
if cfg.bert_network == 'large':
cfg.batch_size = 24
bert_net_cfg = BertConfig(
seq_length=512,
vocab_size=30522,
hidden_size=1024,
num_hidden_layers=24,
num_attention_heads=16,
intermediate_size=4096,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
use_relative_positions=False,
dtype=mstype.float32,
compute_type=mstype.float16
)

View File

@ -20,7 +20,7 @@ import mindspore.common.dtype as mstype
import mindspore.dataset as ds
import mindspore.dataset.transforms.c_transforms as C
from mindspore import log as logger
from .config import cfg
from .model_utils.config import config as cfg
def create_bert_dataset(device_num=1, rank=0, do_shuffle="true", data_dir=None, schema_dir=None):

View File

@ -1,63 +0,0 @@
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""
config settings, will be used in finetune.py
"""
from easydict import EasyDict as edict
import mindspore.common.dtype as mstype
from .bert_model import BertConfig
optimizer_cfg = edict({
'optimizer': 'Lamb',
'AdamWeightDecay': edict({
'learning_rate': 2e-5,
'end_learning_rate': 1e-7,
'power': 1.0,
'weight_decay': 1e-5,
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
'eps': 1e-6,
}),
'Lamb': edict({
'learning_rate': 2e-5,
'end_learning_rate': 1e-7,
'power': 1.0,
'weight_decay': 0.01,
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
}),
'Momentum': edict({
'learning_rate': 2e-5,
'momentum': 0.9,
}),
})
bert_net_cfg = BertConfig(
seq_length=128,
vocab_size=21128,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
use_relative_positions=False,
dtype=mstype.float32,
compute_type=mstype.float16,
)

View File

@ -0,0 +1,200 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Parse arguments"""
import os
import ast
import argparse
from pprint import pformat
import yaml
import mindspore.common.dtype as mstype
from src.bert_model import BertConfig
class Config:
"""
Configuration namespace. Convert dictionary to members.
"""
def __init__(self, cfg_dict):
for k, v in cfg_dict.items():
if isinstance(v, (list, tuple)):
setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
else:
setattr(self, k, Config(v) if isinstance(v, dict) else v)
def __str__(self):
return pformat(self.__dict__)
def __repr__(self):
return self.__str__()
def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="pretrain_base_config.yaml"):
"""
Parse command line arguments to the configuration according to the default yaml.
Args:
parser: Parent parser.
cfg: Base configuration.
helper: Helper description.
cfg_path: Path to the default yaml config.
"""
parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
parents=[parser])
helper = {} if helper is None else helper
choices = {} if choices is None else choices
for item in cfg:
if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
choice = choices[item] if item in choices else None
if isinstance(cfg[item], bool):
parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
help=help_description)
else:
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
help=help_description)
args = parser.parse_args()
return args
def parse_yaml(yaml_path):
"""
Parse the yaml config file.
Args:
yaml_path: Path to the yaml config.
"""
with open(yaml_path, 'r') as fin:
try:
cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
cfgs = [x for x in cfgs]
if len(cfgs) == 1:
cfg_helper = {}
cfg = cfgs[0]
cfg_choices = {}
elif len(cfgs) == 2:
cfg, cfg_helper = cfgs
cfg_choices = {}
elif len(cfgs) == 3:
cfg, cfg_helper, cfg_choices = cfgs
else:
raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
# print(cfg_helper)
except:
raise ValueError("Failed to parse yaml")
return cfg, cfg_helper, cfg_choices
def merge(args, cfg):
"""
Merge the base config from yaml file and command line arguments.
Args:
args: Command line arguments.
cfg: Base configuration.
"""
args_var = vars(args)
for item in args_var:
cfg[item] = args_var[item]
return cfg
def extra_operations(cfg):
"""
Do extra work on config
Args:
config: Object after instantiation of class 'Config'.
"""
def create_filter_fun(keywords):
return lambda x: not (True in [key in x.name.lower() for key in keywords])
if cfg.description == 'run_pretrain':
cfg.AdamWeightDecay.decay_filter = create_filter_fun(cfg.AdamWeightDecay.decay_filter)
cfg.Lamb.decay_filter = create_filter_fun(cfg.Lamb.decay_filter)
cfg.base_net_cfg.dtype = mstype.float32
cfg.base_net_cfg.compute_type = mstype.float16
cfg.nezha_net_cfg.dtype = mstype.float32
cfg.nezha_net_cfg.compute_type = mstype.float16
cfg.large_net_cfg.dtype = mstype.float32
cfg.large_net_cfg.compute_type = mstype.float16
if cfg.bert_network == 'base':
cfg.batch_size = cfg.base_batch_size
_bert_net_cfg = cfg.base_net_cfg
elif cfg.bert_network == 'nezha':
cfg.batch_size = cfg.nezha_batch_size
_bert_net_cfg = cfg.nezha_net_cfg
elif cfg.bert_network == 'large':
cfg.batch_size = cfg.large_batch_size
_bert_net_cfg = cfg.large_net_cfg
else:
pass
cfg.bert_net_cfg = BertConfig(**_bert_net_cfg.__dict__)
elif cfg.description == 'run_ner':
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
cfg.bert_net_cfg.dtype = mstype.float32
cfg.bert_net_cfg.compute_type = mstype.float16
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
elif cfg.description == 'run_squad':
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
cfg.bert_net_cfg.dtype = mstype.float32
cfg.bert_net_cfg.compute_type = mstype.float16
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
elif cfg.description == 'run_classifier':
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
cfg.bert_net_cfg.dtype = mstype.float32
cfg.bert_net_cfg.compute_type = mstype.float16
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
else:
pass
def get_config():
"""
Get Config according to the yaml file and cli arguments.
"""
def get_abs_path(path_relative):
current_dir = os.path.dirname(os.path.abspath(__file__))
return os.path.join(current_dir, path_relative)
parser = argparse.ArgumentParser(description="default name", add_help=False)
parser.add_argument("--config_path", type=get_abs_path, default="../../pretrain_config.yaml",
help="Config file path")
path_args, _ = parser.parse_known_args()
default, helper, choices = parse_yaml(path_args.config_path)
# pprint(default)
args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
final_config = merge(args, default)
config_obj = Config(final_config)
extra_operations(config_obj)
return config_obj
config = get_config()
bert_net_cfg = config.bert_net_cfg
if config.description in ('run_classifier', 'run_ner', 'run_squad'):
optimizer_cfg = config.optimizer_cfg
if __name__ == '__main__':
print(config)

View File

@ -0,0 +1,27 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Device adapter for ModelArts"""
from src.model_utils.config import config
if config.enable_modelarts:
from src.model_utils.moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
else:
from src.model_utils.local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
__all__ = [
"get_device_id", "get_device_num", "get_rank_id", "get_job_id"
]

View File

@ -0,0 +1,36 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Local adapter"""
import os
def get_device_id():
device_id = os.getenv('DEVICE_ID', '0')
return int(device_id)
def get_device_num():
device_num = os.getenv('RANK_SIZE', '1')
return int(device_num)
def get_rank_id():
global_rank_id = os.getenv('RANK_ID', '0')
return int(global_rank_id)
def get_job_id():
return "Local Job"

View File

@ -0,0 +1,123 @@
# Copyright 2021 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""Moxing adapter for ModelArts"""
import os
import functools
from mindspore import context
from mindspore.profiler import Profiler
from src.model_utils.config import config
_global_sync_count = 0
def get_device_id():
device_id = os.getenv('DEVICE_ID', '0')
return int(device_id)
def get_device_num():
device_num = os.getenv('RANK_SIZE', '1')
return int(device_num)
def get_rank_id():
global_rank_id = os.getenv('RANK_ID', '0')
return int(global_rank_id)
def get_job_id():
job_id = os.getenv('JOB_ID')
job_id = job_id if job_id != "" else "default"
return job_id
def sync_data(from_path, to_path):
"""
Download data from remote obs to local directory if the first url is remote url and the second one is local path
Upload data from local directory to remote obs in contrast.
"""
import moxing as mox
import time
global _global_sync_count
sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
_global_sync_count += 1
# Each server contains 8 devices as most.
if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
print("from path: ", from_path)
print("to path: ", to_path)
mox.file.copy_parallel(from_path, to_path)
print("===finish data synchronization===")
try:
os.mknod(sync_lock)
# print("os.mknod({}) success".format(sync_lock))
except IOError:
pass
print("===save flag===")
while True:
if os.path.exists(sync_lock):
break
time.sleep(1)
print("Finish sync data from {} to {}.".format(from_path, to_path))
def moxing_wrapper(pre_process=None, post_process=None):
"""
Moxing wrapper to download dataset and upload outputs.
"""
def wrapper(run_func):
@functools.wraps(run_func)
def wrapped_func(*args, **kwargs):
# Download data from data_url
if config.enable_modelarts:
if config.data_url:
sync_data(config.data_url, config.data_path)
print("Dataset downloaded: ", os.listdir(config.data_path))
if config.checkpoint_url:
sync_data(config.checkpoint_url, config.load_path)
print("Preload downloaded: ", os.listdir(config.load_path))
if config.train_url:
sync_data(config.train_url, config.output_path)
print("Workspace downloaded: ", os.listdir(config.output_path))
context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
config.device_num = get_device_num()
config.device_id = get_device_id()
if not os.path.exists(config.output_path):
os.makedirs(config.output_path)
if pre_process:
pre_process()
if config.enable_profiling:
profiler = Profiler()
run_func(*args, **kwargs)
if config.enable_profiling:
profiler.analyse()
# Upload data to train_url
if config.enable_modelarts:
if post_process:
post_process()
if config.train_url:
print("Start to copy output directory")
sync_data(config.output_path, config.train_url)
return wrapped_func
return wrapper

View File

@ -0,0 +1,113 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
enable_profiling: False
# ==============================================================================
description: "run_classifier"
assessment_method: "Accuracy"
do_train: "false"
do_eval: "false"
device_id: 0
epoch_num: 3
num_class: 2
train_data_shuffle: "true"
eval_data_shuffle: "false"
train_batch_size: 32
eval_batch_size: 1
save_finetune_checkpoint_path: "./classifier_finetune/ckpt/"
load_pretrain_checkpoint_path: ""
load_finetune_checkpoint_path: ""
train_data_file_path: ""
eval_data_file_path: ""
schema_file_path: ""
# export related
export_batch_size: 1
export_ckpt_file: ''
export_file_name: 'bert_classifier'
file_format: 'AIR'
optimizer_cfg:
optimizer: 'Lamb'
AdamWeightDecay:
learning_rate: 0.00002 # 2e-5
end_learning_rate: 0.0000000001 # 1e-10
power: 1.0
weight_decay: 0.00001 # 1e-5
decay_filter: ['layernorm', 'bias']
eps: 0.000001 # 1e-6
Lamb:
learning_rate: 0.00002 # 2e-5,
end_learning_rate: 0.0000000001 # 1e-10
power: 1.0
weight_decay: 0.01
decay_filter: ['layernorm', 'bias']
Momentum:
learning_rate: 0.00002 # 2e-5
momentum: 0.9
bert_net_cfg:
seq_length: 128
vocab_size: 21128
hidden_size: 768
num_hidden_layers: 12
num_attention_heads: 12
intermediate_size: 3072
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: False
dtype: mstype.float32
compute_type: mstype.float16
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Url for modelarts"
train_url: "Url for modelarts"
data_path: "The location of the input data."
output_path: "The location of the output file."
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
enable_profiling: 'Whether enable profiling while training, default: False'
assessment_method: "assessment_method including [Mcc, Spearman_correlation, Accuracy, F1], default is Accuracy"
do_train: "Enable train, default is false"
do_eval: "Enable eval, default is false"
device_id: "Device id, default is 0."
epoch_num: "Epoch number, default is 3."
num_class: "The number of class, default is 2."
train_data_shuffle: "Enable train data shuffle, default is true"
eval_data_shuffle: "Enable eval data shuffle, default is false"
train_batch_size: "Train batch size, default is 32"
eval_batch_size: "Eval batch size, default is 1"
save_finetune_checkpoint_path: "Save checkpoint path"
load_pretrain_checkpoint_path: "Load checkpoint file path"
load_finetune_checkpoint_path: "Load checkpoint file path"
train_data_file_path: "Data path, it is better to use absolute path"
eval_data_file_path: "Data path, it is better to use absolute path"
schema_file_path: "Schema path, it is better to use absolute path"
export_batch_size: "export batch size."
export_ckpt_file: "Bert ckpt file."
export_file_name: "bert output air name."
file_format: "file format"
---
# chocies
device_target: ['Ascend', 'GPU']
assessment_method: ["Mcc", "Spearman_correlation", "Accuracy", "F1"]
do_train: ["true", "false"]
do_eval: ["true", "false"]
train_data_shuffle: ["true", "false"]
eval_data_shuffle: ["true", "false"]
file_format: ["AIR", "ONNX", "MINDIR"]

View File

@ -0,0 +1,121 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
enable_profiling: False
# ==============================================================================
description: "run_ner"
assessment_method: "BF1"
do_train: "false"
do_eval: "false"
use_crf: "false"
device_id: 0
epoch_num: 5
train_data_shuffle: "true"
eval_data_shuffle: "false"
train_batch_size: 32
eval_batch_size: 1
vocab_file_path: ""
label_file_path: ""
save_finetune_checkpoint_path: "./ner_finetune/ckpt/"
load_pretrain_checkpoint_path: ""
load_finetune_checkpoint_path: ""
train_data_file_path: ""
eval_data_file_path: ""
dataset_format: "mindrecord"
schema_file_path: ""
# export related
export_batch_size: 1
export_ckpt_file: ''
export_file_name: 'bert_ner'
file_format: 'AIR'
optimizer_cfg:
optimizer: 'Lamb'
AdamWeightDecay:
learning_rate: 0.00002 # 2e-5
end_learning_rate: 0.0000000001 # 1e-10
power: 1.0
weight_decay: 0.00001 # 1e-5
decay_filter: ['layernorm', 'bias']
eps: 0.000001 # 1e-6
Lamb:
learning_rate: 0.00002 # 2e-5,
end_learning_rate: 0.0000000001 # 1e-10
power: 1.0
weight_decay: 0.01
decay_filter: ['layernorm', 'bias']
Momentum:
learning_rate: 0.00002 # 2e-5
momentum: 0.9
bert_net_cfg:
seq_length: 128
vocab_size: 21128
hidden_size: 768
num_hidden_layers: 12
num_attention_heads: 12
intermediate_size: 3072
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: False
dtype: mstype.float32
compute_type: mstype.float16
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Url for modelarts"
train_url: "Url for modelarts"
data_path: "The location of the input data."
output_path: "The location of the output file."
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
enable_profiling: 'Whether enable profiling while training, default: False'
assessment_method: "assessment_method include: [BF1, clue_benchmark, MF1], default is BF1"
do_train: "Eable train, default is false"
do_eval: "Eable eval, default is false"
use_crf: "Use crf, default is false"
device_id: "Device id, default is 0."
epoch_num: "Epoch number, default is 5."
train_data_shuffle: "Enable train data shuffle, default is true"
eval_data_shuffle: "Enable eval data shuffle, default is false"
train_batch_size: "Train batch size, default is 32"
eval_batch_size: "Eval batch size, default is 1"
vocab_file_path: "Vocab file path, used in clue benchmark"
label_file_path: "label file path, used in clue benchmark"
save_finetune_checkpoint_path: "Save checkpoint path"
load_pretrain_checkpoint_path: "Load checkpoint file path"
load_finetune_checkpoint_path: "Load checkpoint file path"
train_data_file_path: "Data path, it is better to use absolute path"
eval_data_file_path: "Data path, it is better to use absolute path"
dataset_format: "Dataset format, support mindrecord or tfrecord"
schema_file_path: "Schema path, it is better to use absolute path"
export_batch_size: "export batch size."
export_ckpt_file: "Bert ckpt file."
export_file_name: "bert output air name."
file_format: "file format"
---
# chocies
device_target: ['Ascend', 'GPU']
assessment_method: ["BF1", "clue_benchmark", "MF1"]
do_train: ["true", "false"]
do_eval: ["true", "false"]
use_crf: ["true", "false"]
train_data_shuffle: ["true", "false"]
eval_data_shuffle: ["true", "false"]
dataset_format: ["mindrecord", "tfrecord"]
file_format: ["AIR", "ONNX", "MINDIR"]

View File

@ -0,0 +1,112 @@
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
enable_modelarts: False
# Url for modelarts
data_url: ""
train_url: ""
checkpoint_url: ""
# Path for local
data_path: "/cache/data"
output_path: "/cache/train"
load_path: "/cache/checkpoint_path"
device_target: "Ascend"
enable_profiling: False
# ==============================================================================
description: "run_squad"
do_train: "false"
do_eval: "false"
device_id: 0
epoch_num: 3
num_class: 2
train_data_shuffle: "true"
eval_data_shuffle: "false"
train_batch_size: 32
eval_batch_size: 1
vocab_file_path: ""
eval_json_path: ""
save_finetune_checkpoint_path: "./squad_finetune/ckpt/"
load_pretrain_checkpoint_path: ""
load_finetune_checkpoint_path: ""
train_data_file_path: ""
schema_file_path: ""
# export related
export_batch_size: 1
export_ckpt_file: ''
export_file_name: 'bert_squad'
file_format: 'AIR'
optimizer_cfg:
optimizer: 'Lamb'
AdamWeightDecay:
learning_rate: 0.0001 # 1e-4
end_learning_rate: 0.00000000001 # 1e-11
power: 5.0
weight_decay: 0.001 # 1e-3
decay_filter: ['layernorm', 'bias']
eps: 0.000001 # 1e-6
Lamb:
learning_rate: 0.0001 # 1e-4,
end_learning_rate: 0.00000000001 # 1e-11
power: 5.0
weight_decay: 0.01
decay_filter: ['layernorm', 'bias']
Momentum:
learning_rate: 0.0001 # 1e-4
momentum: 0.9
bert_net_cfg:
seq_length: 384
vocab_size: 30522
hidden_size: 768
num_hidden_layers: 12
num_attention_heads: 12
intermediate_size: 3072
hidden_act: "gelu"
hidden_dropout_prob: 0.1
attention_probs_dropout_prob: 0.1
max_position_embeddings: 512
type_vocab_size: 2
initializer_range: 0.02
use_relative_positions: False
dtype: mstype.float32
compute_type: mstype.float16
---
# Help description for each configuration
enable_modelarts: "Whether training on modelarts, default: False"
data_url: "Url for modelarts"
train_url: "Url for modelarts"
data_path: "The location of the input data."
output_path: "The location of the output file."
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
enable_profiling: 'Whether enable profiling while training, default: False'
do_train: "Eable train, default is false"
do_eval: "Eable eval, default is false"
device_id: "Device id, default is 0."
epoch_num: "Epoch number, default is 1."
num_class: "The number of class, default is 2."
train_data_shuffle: "Enable train data shuffle, default is true"
eval_data_shuffle: "Enable eval data shuffle, default is false"
train_batch_size: "Train batch size, default is 32"
eval_batch_size: "Eval batch size, default is 1"
vocab_file_path: "Vocab file path"
eval_json_path: "Evaluation json file path, can be eval.json"
save_finetune_checkpoint_path: "Save checkpoint path"
load_pretrain_checkpoint_path: "Load checkpoint file path"
load_finetune_checkpoint_path: "Load checkpoint file path"
train_data_file_path: "Data path, it is better to use absolute path"
schema_file_path: "Schema path, it is better to use absolute path"
export_batch_size: "export batch size."
export_ckpt_file: "Bert ckpt file."
export_file_name: "bert output air name."
file_format: "file format"
---
# chocies
device_target: ['Ascend', 'GPU']
do_train: ["true", "false"]
do_eval: ["true", "false"]
train_data_shuffle: ["true", "false"]
eval_data_shuffle: ["true", "false"]
file_format: ["AIR", "ONNX", "MINDIR"]