forked from mindspore-Ecosystem/mindspore
bert can been used on ModelArts
This commit is contained in:
parent
630defa9e7
commit
c9d5b13e37
|
@ -134,6 +134,56 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
|
|||
bash scripts/run_squad.sh
|
||||
```
|
||||
|
||||
- running on ModelArts
|
||||
|
||||
If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows
|
||||
|
||||
- Pretraining with 8 cards on ModelArts
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/bert/train.py" on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# a. setting parameters in /{path}/bert/pretrain_config.yaml.
|
||||
# 1. Set ”enable_modelarts=True“
|
||||
# 2. Set other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
|
||||
# b. adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 3. Add other parameters, other parameter configuration can refer to `./scripts/run_distributed_pretrain_ascend.sh`
|
||||
# (6) Upload the dataset to S3 bucket.
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of 8 cards.
|
||||
# (10) Create your job.
|
||||
# After training, the '*.ckpt' file will be saved under the'training output file path'
|
||||
```
|
||||
|
||||
- Running downstream tasks with single card on ModelArts
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/bert/run_ner.py"(or run_pretrain.py or run_squad.py) on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
|
||||
# 1. Set ”enable_modelarts=True“
|
||||
# 2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
|
||||
# b. adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 2. Set other parameters, other parameter configuration can refer to `run_ner.sh`(or run_squad.sh or run_classifier.sh) under the folder '{path}/bert/scripts/'.
|
||||
# Note that vocab_file_path, label_file_path, train_data_file_path, eval_data_file_path, schema_file_path fill in the relative path relative to the path selected in step 7.
|
||||
# Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
|
||||
# (6) Upload the dataset to S3 bucket.
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of a single card.
|
||||
# (10) Create your job.
|
||||
# After training, the '*.ckpt' file will be saved under the'training output file path'
|
||||
```
|
||||
|
||||
For distributed training on Ascend, an hccl configuration file with JSON format needs to be created in advance.
|
||||
|
||||
For distributed training on single machine, [here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json) is an example hccl.json.
|
||||
|
@ -205,7 +255,9 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
```shell
|
||||
.
|
||||
└─bert
|
||||
├─ascend310_infer
|
||||
├─README.md
|
||||
├─README_CN.md
|
||||
├─scripts
|
||||
├─ascend_distributed_launcher
|
||||
├─__init__.py
|
||||
|
@ -220,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
├─run_distributed_pretrain_gpu.sh # shell script for distributed pretrain on gpu
|
||||
└─run_standaloned_pretrain_gpu.sh # shell script for distributed pretrain on gpu
|
||||
├─src
|
||||
├─model_utils
|
||||
├── config.py # parse *.yaml parameter configuration file
|
||||
├── devcie_adapter.py # distinguish local/ModelArts training
|
||||
├── local_adapter.py # get related environment variables in local training
|
||||
└── moxing_adapter.py # get related environment variables in ModelArts training
|
||||
├─__init__.py
|
||||
├─assessment_method.py # assessment method for evaluation
|
||||
├─bert_for_finetune.py # backbone code of network
|
||||
|
@ -227,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
├─bert_model.py # backbone code of network
|
||||
├─finetune_data_preprocess.py # data preprocessing
|
||||
├─cluner_evaluation.py # evaluation for cluner
|
||||
├─config.py # parameter configuration for pretraining
|
||||
├─CRF.py # assessment method for clue dataset
|
||||
├─dataset.py # data preprocessing
|
||||
├─finetune_eval_config.py # parameter configuration for finetuning
|
||||
├─finetune_eval_model.py # backbone code of network
|
||||
├─sample_process.py # sample processing
|
||||
├─utils.py # util function
|
||||
├─pretrain_config.yaml # parameter configuration for pretrain
|
||||
├─task_ner_config.yaml # parameter configuration for downstream_task_ner
|
||||
├─task_classifier_config.yaml # parameter configuration for downstream_task_classifier
|
||||
├─task_squad_config.yaml # parameter configuration for downstream_task_squad
|
||||
├─pretrain_eval.py # train and eval net
|
||||
├─run_classifier.py # finetune and eval net for classifier task
|
||||
├─run_ner.py # finetune and eval net for ner task
|
||||
|
@ -591,8 +650,38 @@ The result will be as follows:
|
|||
|
||||
### [Export MindIR](#contents)
|
||||
|
||||
- Export on local
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows)
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/bert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/bert/export.py" on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
|
||||
# 1. Set ”enable_modelarts: True“
|
||||
# 2. Set “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
|
||||
# 3. Set ”export_file_name: bert_ner“
|
||||
# 4. Set ”file_format:MINDIR“
|
||||
# 5. Set ”label_file_path:{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
|
||||
# b. adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 2. Add “export_ckpt_file=./{path}/*.ckpt”('export_ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
|
||||
# 3. Add ”export_file_name=bert_ner“
|
||||
# 4. Add ”file_format=MINDIR“
|
||||
# 5. Add ”label_file_path:{path}/*.txt“('label_file_path' refers to the relative path relative to the folder selected in step 7.)
|
||||
# Finally, "config_path=../../*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of a single card.
|
||||
# (10) Create your job.
|
||||
# You will see bert_ner.mindir under {Output file path}.
|
||||
```
|
||||
|
||||
The ckpt_file parameter is required,
|
||||
|
|
|
@ -139,6 +139,54 @@ bash scripts/run_distributed_pretrain_for_gpu.sh 8 40 /path/cn-wiki-128
|
|||
bash scripts/run_squad.sh
|
||||
```
|
||||
|
||||
- 在ModelArts上运行(如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
- 在ModelArt上使用8卡预训练
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/bert
|
||||
# (4) 选择启动文件 /{path}/bert/run_pretrain.py
|
||||
# (5) 执行a或b
|
||||
# a. 在 /{path}/bert/default_config.yaml 文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts=True“
|
||||
# 2. 设置其它参数,其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”run_distributed=True“
|
||||
# 2. 添加其它参数,其它参数配置可以参考 `./scripts/run_distributed_pretrain_ascend.sh`
|
||||
# (6) 上传你的 数据 到 s3 桶上
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择8卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 训练结束后会在'训练输出文件路径'下保存训练的权重
|
||||
```
|
||||
|
||||
- 在ModelArts上使用单卡运行下游任务
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/bert
|
||||
# (4) 选择启动文件 /{path}/bert/run_ner.py(或 run_squad.py 或 run_classifier.py)
|
||||
# (5) 执行a或b
|
||||
# a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml` 或 `task_classifier_config.yaml`) 文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts=True“
|
||||
# 2. 设置其它参数,其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”enable_modelarts=True“
|
||||
# 2. 添加其它参数,其它参数配置可以参考 './scripts/'下的 `run_ner.sh`或`run_squad.sh`或`run_classifier.sh`
|
||||
# 注意vocab_file_path,label_file_path,train_data_file_path,eval_data_file_path,schema_file_path填写相对于第7步所选路径的相对路径。
|
||||
# 最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
|
||||
# (6) 上传你的 数据 到 s3 桶上
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径(该路径下仅有 数据/数据zip压缩包)
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 训练结束后会在'训练输出文件路径'下保存训练的权重
|
||||
```
|
||||
|
||||
在Ascend设备上做分布式训练时,请提前创建JSON格式的HCCL配置文件。
|
||||
|
||||
在Ascend设备上做单机分布式训练时,请参考[here](https://gitee.com/mindspore/mindspore/tree/master/config/hccl_single_machine_multi_rank.json)创建HCCL配置文件。
|
||||
|
@ -207,12 +255,14 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
```shell
|
||||
.
|
||||
└─bert
|
||||
├─ascend310_infer
|
||||
├─README.md
|
||||
├─README_CN.md
|
||||
├─scripts
|
||||
├─ascend_distributed_launcher
|
||||
├─__init__.py
|
||||
├─hyper_parameter_config.ini # 分布式预训练超参
|
||||
├─get_distribute_pretrain_cmd.py # 分布式预训练脚本
|
||||
├─get_distribute_pretrain_cmd.py # 分布式预训练脚本
|
||||
--README.md
|
||||
├─run_classifier.sh # Ascend或GPU设备上单机分类器任务shell脚本
|
||||
├─run_ner.sh # Ascend或GPU设备上单机NER任务shell脚本
|
||||
|
@ -222,6 +272,11 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
├─run_distributed_pretrain_gpu.sh # GPU设备上分布式预训练shell脚本
|
||||
└─run_standaloned_pretrain_gpu.sh # GPU设备上单机预训练shell脚本
|
||||
├─src
|
||||
├─model_utils
|
||||
├── config.py # 解析 *.yaml参数配置文件
|
||||
├── devcie_adapter.py # 区分本地/ModelArts训练
|
||||
├── local_adapter.py # 本地训练获取相关环境变量
|
||||
└── moxing_adapter.py # ModelArts训练获取相关环境变量、交换数据
|
||||
├─__init__.py
|
||||
├─assessment_method.py # 评估过程的测评方法
|
||||
├─bert_for_finetune.py # 网络骨干编码
|
||||
|
@ -229,13 +284,15 @@ For example, the schema file of cn-wiki-128 dataset for pretraining shows as fol
|
|||
├─bert_model.py # 网络骨干编码
|
||||
├─finetune_data_preprocess.py # 数据预处理
|
||||
├─cluner_evaluation.py # 评估线索生成工具
|
||||
├─config.py # 预训练参数配置
|
||||
├─CRF.py # 线索数据集评估方法
|
||||
├─dataset.py # 数据预处理
|
||||
├─finetune_eval_config.py # 微调参数配置
|
||||
├─finetune_eval_model.py # 网络骨干编码
|
||||
├─sample_process.py # 样例处理
|
||||
├─utils.py # util函数
|
||||
├─pretrain_config.yaml # 预训练参数配置
|
||||
├─task_ner_config.yaml # 下游任务_ner 参数配置
|
||||
├─task_classifier_config.yaml # 下游任务_classifier 参数配置
|
||||
├─task_squad_config.yaml # 下游任务_squad 参数配置
|
||||
├─pretrain_eval.py # 训练和评估网络
|
||||
├─run_classifier.py # 分类器任务的微调和评估网络
|
||||
├─run_ner.py # NER任务的微调和评估网络
|
||||
|
@ -556,8 +613,38 @@ bash scripts/squad.sh
|
|||
|
||||
## 导出mindir模型
|
||||
|
||||
- 在本地导出
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
python export.py --config_path [../../*.yaml] --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
- 在ModelArts上导出
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/bert
|
||||
# (4) 选择启动文件 /{path}/bert/export.py
|
||||
# (5) 执行a或b
|
||||
# a. 在 /path/bert 下的`task_ner_config.yaml`(或 `task_squad_config.yaml` 或 `task_classifier_config.yaml`) 文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts: True“
|
||||
# 2. 设置 “export_ckpt_file: ./{path}/*.ckpt”('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
|
||||
# 3. 设置 ”export_file_name: bert_ner“
|
||||
# 4. 设置 ”file_format:MINDIR“
|
||||
# 5. 设置 ”label_file_path:{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”enable_modelarts=True“
|
||||
# 2. 添加 “export_ckpt_file=./{path}/*.ckpt”(('export_ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
|
||||
# 3. 添加 ”export_file_name=bert_ner“
|
||||
# 4. 添加 ”file_format=MINDIR“
|
||||
# 5. 添加 ”label_file_path:{path}/*.txt“('label_file_path'指相对于第7步所选文件夹的相对路径)
|
||||
# 最后必须在网页上添加 “config_path=../../*.yaml”(根据下游任务选择 *.yaml 配置文件)
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 你将在{Output file path}下看到 'bert_ner.mindir'文件
|
||||
```
|
||||
|
||||
参数`ckpt_file` 是必需的,`EXPORT_FORMAT` 必须在 ["AIR", "MINDIR"]中进行选择。
|
||||
|
|
|
@ -13,74 +13,77 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""export checkpoint file into models"""
|
||||
import argparse
|
||||
import os
|
||||
import numpy as np
|
||||
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import Tensor, context, load_checkpoint, export
|
||||
|
||||
from src.finetune_eval_model import BertCLSModel, BertSquadModel, BertNERModel
|
||||
from src.finetune_eval_config import bert_net_cfg
|
||||
from src.bert_for_finetune import BertNER
|
||||
from src.utils import convert_labels_to_index
|
||||
from src.model_utils.config import config as args, bert_net_cfg
|
||||
from src.model_utils.moxing_adapter import moxing_wrapper
|
||||
from src.model_utils.device_adapter import get_device_id
|
||||
|
||||
parser = argparse.ArgumentParser(description="Bert export")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id")
|
||||
parser.add_argument("--use_crf", type=str, default="false", help="Use cfg, default is false.")
|
||||
parser.add_argument("--downstream_task", type=str, choices=["NER", "CLS", "SQUAD"], default="NER",
|
||||
help="at present,support NER only")
|
||||
parser.add_argument("--batch_size", type=int, default=1, help="batch size")
|
||||
parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark.")
|
||||
parser.add_argument("--ckpt_file", type=str, required=True, help="Bert ckpt file.")
|
||||
parser.add_argument("--file_name", type=str, default="Bert", help="bert output air name.")
|
||||
parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend",
|
||||
choices=["Ascend", "GPU", "CPU"], help="device target (default: Ascend)")
|
||||
args = parser.parse_args()
|
||||
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
|
||||
if args.device_target == "Ascend":
|
||||
context.set_context(device_id=args.device_id)
|
||||
def modelarts_pre_process():
|
||||
'''modelarts pre process function.'''
|
||||
args.device_id = get_device_id()
|
||||
_file_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
args.export_ckpt_file = os.path.join(_file_dir, args.export_ckpt_file)
|
||||
args.label_file_path = os.path.join(args.data_path, args.label_file_path)
|
||||
args.export_file_name = os.path.join(_file_dir, args.export_file_name)
|
||||
|
||||
label_list = []
|
||||
with open(args.label_file_path) as f:
|
||||
for label in f:
|
||||
label_list.append(label.strip())
|
||||
|
||||
tag_to_index = convert_labels_to_index(label_list)
|
||||
@moxing_wrapper(pre_process=modelarts_pre_process)
|
||||
def run_export():
|
||||
'''export function'''
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
|
||||
if args.device_target == "Ascend":
|
||||
context.set_context(device_id=args.device_id)
|
||||
|
||||
if args.use_crf.lower() == "true":
|
||||
max_val = max(tag_to_index.values())
|
||||
tag_to_index["<START>"] = max_val + 1
|
||||
tag_to_index["<STOP>"] = max_val + 2
|
||||
number_labels = len(tag_to_index)
|
||||
else:
|
||||
number_labels = len(tag_to_index)
|
||||
label_list = []
|
||||
with open(args.label_file_path) as f:
|
||||
for label in f:
|
||||
label_list.append(label.strip())
|
||||
|
||||
if __name__ == "__main__":
|
||||
if args.downstream_task == "NER":
|
||||
tag_to_index = convert_labels_to_index(label_list)
|
||||
|
||||
if args.use_crf.lower() == "true":
|
||||
max_val = max(tag_to_index.values())
|
||||
tag_to_index["<START>"] = max_val + 1
|
||||
tag_to_index["<STOP>"] = max_val + 2
|
||||
number_labels = len(tag_to_index)
|
||||
else:
|
||||
number_labels = len(tag_to_index)
|
||||
if args.description == "run_ner":
|
||||
if args.use_crf.lower() == "true":
|
||||
net = BertNER(bert_net_cfg, args.batch_size, False, num_labels=number_labels,
|
||||
net = BertNER(bert_net_cfg, args.export_batch_size, False, num_labels=number_labels,
|
||||
use_crf=True, tag_to_index=tag_to_index)
|
||||
else:
|
||||
net = BertNERModel(bert_net_cfg, False, number_labels, use_crf=(args.use_crf.lower() == "true"))
|
||||
elif args.downstream_task == "CLS":
|
||||
elif args.description == "run_classifier":
|
||||
net = BertCLSModel(bert_net_cfg, False, num_labels=number_labels)
|
||||
elif args.downstream_task == "SQUAD":
|
||||
elif args.description == "run_squad":
|
||||
net = BertSquadModel(bert_net_cfg, False)
|
||||
else:
|
||||
raise ValueError("unsupported downstream task")
|
||||
|
||||
load_checkpoint(args.ckpt_file, net=net)
|
||||
load_checkpoint(args.export_ckpt_file, net=net)
|
||||
net.set_train(False)
|
||||
|
||||
input_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
input_mask = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
token_type_id = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
label_ids = Tensor(np.zeros([args.batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
input_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
input_mask = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
token_type_id = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
label_ids = Tensor(np.zeros([args.export_batch_size, bert_net_cfg.seq_length]), mstype.int32)
|
||||
|
||||
if args.downstream_task == "NER" and args.use_crf.lower() == "true":
|
||||
if args.description == "run_ner" and args.use_crf.lower() == "true":
|
||||
input_data = [input_ids, input_mask, token_type_id, label_ids]
|
||||
else:
|
||||
input_data = [input_ids, input_mask, token_type_id]
|
||||
export(net, *input_data, file_name=args.file_name, file_format=args.file_format)
|
||||
export(net, *input_data, file_name=args.export_file_name, file_format=args.file_format)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_export()
|
||||
|
|
|
@ -21,7 +21,7 @@ import os
|
|||
import argparse
|
||||
import numpy as np
|
||||
from mindspore import Tensor
|
||||
from src.finetune_eval_config import bert_net_cfg
|
||||
from src.model_utils.config import bert_net_cfg
|
||||
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
|
||||
from run_ner import eval_result_print
|
||||
|
||||
|
|
|
@ -0,0 +1,174 @@
|
|||
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
|
||||
enable_modelarts: False
|
||||
# Url for modelarts
|
||||
data_url: ""
|
||||
train_url: ""
|
||||
checkpoint_url: ""
|
||||
# Path for local
|
||||
data_path: "/cache/data"
|
||||
output_path: "/cache/train"
|
||||
load_path: "/cache/checkpoint_path"
|
||||
device_target: "Ascend"
|
||||
enable_profiling: False
|
||||
|
||||
# ==============================================================================
|
||||
description: 'run_pretrain'
|
||||
distribute: 'false'
|
||||
epoch_size: 1
|
||||
device_id: 0
|
||||
device_num: 1
|
||||
enable_save_ckpt: 'true'
|
||||
enable_lossscale: 'true'
|
||||
do_shuffle: 'true'
|
||||
enable_data_sink: 'true'
|
||||
data_sink_steps: 1
|
||||
accumulation_steps: 1
|
||||
allreduce_post_accumulation: 'true'
|
||||
save_checkpoint_path: ''
|
||||
load_checkpoint_path: ''
|
||||
save_checkpoint_steps: 1000
|
||||
train_steps: -1
|
||||
save_checkpoint_num: 1
|
||||
data_dir: ''
|
||||
schema_dir: ''
|
||||
|
||||
# ==============================================================================
|
||||
# pretrain related
|
||||
batch_size: 32
|
||||
bert_network: 'base'
|
||||
loss_scale_value: 65536
|
||||
scale_factor: 2
|
||||
scale_window: 1000
|
||||
optimizer: 'Lamb'
|
||||
enable_global_norm: False
|
||||
# pretrain_eval related
|
||||
data_file: ""
|
||||
schema_file: ""
|
||||
finetune_ckpt: ""
|
||||
# optimizer related
|
||||
AdamWeightDecay:
|
||||
learning_rate: 0.00003 # 3e-5
|
||||
end_learning_rate: 0.0
|
||||
power: 5.0
|
||||
weight_decay: 0.00001 # 1e-5
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
eps: 0.000001 # 1e-6
|
||||
warmup_steps: 10000
|
||||
|
||||
Lamb:
|
||||
learning_rate: 0.0003 # 3e-4
|
||||
end_learning_rate: 0.0
|
||||
power: 2.0
|
||||
warmup_steps: 10000
|
||||
weight_decay: 0.01
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
eps: 0.00000001 # 1e-8,
|
||||
|
||||
Momentum:
|
||||
learning_rate: 0.00002 # 2e-5
|
||||
momentum: 0.9
|
||||
|
||||
Thor:
|
||||
lr_max: 0.0034
|
||||
lr_min: 0.00003244 # 3.244e-5
|
||||
lr_power: 1.0
|
||||
lr_total_steps: 30000
|
||||
damping_max: 0.05 # 5e-2
|
||||
damping_min: 0.000001 # 1e-6
|
||||
damping_power: 1.0
|
||||
damping_total_steps: 30000
|
||||
momentum: 0.9
|
||||
weight_decay: 0.0005 # 5e-4,
|
||||
loss_scale: 1.0
|
||||
frequency: 100
|
||||
# ==============================================================================
|
||||
# base
|
||||
base_batch_size: 256
|
||||
base_net_cfg:
|
||||
seq_length: 128
|
||||
vocab_size: 21128
|
||||
hidden_size: 768
|
||||
num_hidden_layers: 12
|
||||
num_attention_heads: 12
|
||||
intermediate_size: 3072
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: False
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
# nezha
|
||||
nezha_batch_size: 96
|
||||
nezha_net_cfg:
|
||||
seq_length: 128
|
||||
vocab_size: 21128
|
||||
hidden_size: 1024
|
||||
num_hidden_layers: 24
|
||||
num_attention_heads: 16
|
||||
intermediate_size: 4096
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: True
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
# large
|
||||
large_batch_size: 24
|
||||
large_net_cfg:
|
||||
seq_length: 512
|
||||
vocab_size: 30522
|
||||
hidden_size: 1024
|
||||
num_hidden_layers: 24
|
||||
num_attention_heads: 16
|
||||
intermediate_size: 4096
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: False
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
|
||||
---
|
||||
# Help description for each configuration
|
||||
enable_modelarts: "Whether training on modelarts, default: False"
|
||||
data_url: "Url for modelarts"
|
||||
train_url: "Url for modelarts"
|
||||
data_path: "The location of the input data."
|
||||
output_path: "The location of the output file."
|
||||
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
|
||||
enable_profiling: 'Whether enable profiling while training, default: False'
|
||||
|
||||
distribute: "Run distribute, default is 'false'."
|
||||
epoch_size: "Epoch size, default is 1."
|
||||
enable_save_ckpt: "Enable save checkpoint, default is true."
|
||||
enable_lossscale: "Use lossscale or not, default is not."
|
||||
do_shuffle: "Enable shuffle for dataset, default is true."
|
||||
enable_data_sink: "Enable data sink, default is true."
|
||||
data_sink_steps: "Sink steps for each epoch, default is 1."
|
||||
accumulation_steps: "Accumulating gradients N times before weight update, default is 1."
|
||||
allreduce_post_accumulation: "Whether to allreduce after accumulation of N steps or after each step, default is true."
|
||||
save_checkpoint_path: "Save checkpoint path"
|
||||
load_checkpoint_path: "Load checkpoint file path"
|
||||
save_checkpoint_steps: "Save checkpoint steps, default is 1000"
|
||||
train_steps: "Training Steps, default is -1, meaning run all steps according to epoch number."
|
||||
save_checkpoint_num: "Save checkpoint numbers, default is 1."
|
||||
data_dir: "Data path, it is better to use absolute path"
|
||||
schema_dir: "Schema path, it is better to use absolute path"
|
||||
---
|
||||
# chocies
|
||||
device_target: ['Ascend', 'GPU']
|
||||
distribute: ["true", "false"]
|
||||
enable_save_ckpt: ["true", "false"]
|
||||
enable_lossscale: ["true", "false"]
|
||||
do_shuffle: ["true", "false"]
|
||||
enable_data_sink: ["true", "false"]
|
||||
allreduce_post_accumulation: ["true", "false"]
|
|
@ -19,7 +19,7 @@ Bert evaluation script.
|
|||
|
||||
import os
|
||||
from src import BertModel, GetMaskedLMOutput
|
||||
from src.config import cfg, bert_net_cfg
|
||||
from src.model_utils.config import config as cfg, bert_net_cfg
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import context
|
||||
from mindspore.common.tensor import Tensor
|
||||
|
|
|
@ -18,12 +18,6 @@ Bert finetune and evaluation script.
|
|||
'''
|
||||
|
||||
import os
|
||||
import argparse
|
||||
from src.bert_for_finetune import BertFinetuneCell, BertCLS
|
||||
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
|
||||
from src.dataset import create_classification_dataset
|
||||
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import context
|
||||
from mindspore import log as logger
|
||||
|
@ -33,8 +27,17 @@ from mindspore.train.model import Model
|
|||
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
|
||||
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||
|
||||
from src.bert_for_finetune import BertFinetuneCell, BertCLS
|
||||
from src.dataset import create_classification_dataset
|
||||
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
|
||||
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
|
||||
from src.model_utils.moxing_adapter import moxing_wrapper
|
||||
from src.model_utils.device_adapter import get_device_id
|
||||
|
||||
_cur_dir = os.getcwd()
|
||||
|
||||
|
||||
def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
|
||||
""" do train """
|
||||
if load_checkpoint_path == "":
|
||||
|
@ -81,6 +84,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
|
|||
callbacks = [TimeMonitor(dataset.get_dataset_size()), LossCallBack(dataset.get_dataset_size()), ckpoint_cb]
|
||||
model.train(epoch_num, dataset, callbacks=callbacks)
|
||||
|
||||
|
||||
def eval_result_print(assessment_method="accuracy", callback=None):
|
||||
""" print eval result """
|
||||
if assessment_method == "accuracy":
|
||||
|
@ -97,6 +101,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
|
|||
else:
|
||||
raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")
|
||||
|
||||
|
||||
def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy", load_checkpoint_path=""):
|
||||
""" do eval """
|
||||
if load_checkpoint_path == "":
|
||||
|
@ -130,51 +135,34 @@ def do_eval(dataset=None, network=None, num_class=2, assessment_method="accuracy
|
|||
eval_result_print(assessment_method, callback)
|
||||
print("==============================================================")
|
||||
|
||||
|
||||
def modelarts_pre_process():
|
||||
'''modelarts pre process function.'''
|
||||
args_opt.device_id = get_device_id()
|
||||
_file_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
|
||||
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
|
||||
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
|
||||
if args_opt.schema_file_path:
|
||||
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
|
||||
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
|
||||
args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
|
||||
|
||||
|
||||
@moxing_wrapper(pre_process=modelarts_pre_process)
|
||||
def run_classifier():
|
||||
"""run classifier task"""
|
||||
parser = argparse.ArgumentParser(description="run classifier")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
|
||||
help="Device type, default is Ascend")
|
||||
parser.add_argument("--assessment_method", type=str, default="Accuracy",
|
||||
choices=["Mcc", "Spearman_correlation", "Accuracy", "F1"],
|
||||
help="assessment_method including [Mcc, Spearman_correlation, Accuracy, F1],\
|
||||
default is Accuracy")
|
||||
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable train, default is false")
|
||||
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval, default is false")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
|
||||
parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 3.")
|
||||
parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
|
||||
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable train data shuffle, default is true")
|
||||
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval data shuffle, default is false")
|
||||
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
|
||||
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
|
||||
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
|
||||
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--train_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--eval_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--schema_file_path", type=str, default="",
|
||||
help="Schema path, it is better to use absolute path")
|
||||
args_opt = parser.parse_args()
|
||||
epoch_num = args_opt.epoch_num
|
||||
assessment_method = args_opt.assessment_method.lower()
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
|
||||
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
|
||||
|
||||
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
|
||||
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
|
||||
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
|
||||
raise ValueError("'train_data_file_path' must be set when do finetune task")
|
||||
if args_opt.do_eval.lower() == "true" and args_opt.eval_data_file_path == "":
|
||||
raise ValueError("'eval_data_file_path' must be set when do evaluation task")
|
||||
|
||||
epoch_num = args_opt.epoch_num
|
||||
assessment_method = args_opt.assessment_method.lower()
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
|
||||
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
|
||||
target = args_opt.device_target
|
||||
if target == "Ascend":
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
|
||||
|
@ -214,5 +202,6 @@ def run_classifier():
|
|||
do_shuffle=(args_opt.eval_data_shuffle.lower() == "true"))
|
||||
do_eval(ds, BertCLS, args_opt.num_class, assessment_method, load_finetune_checkpoint_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_classifier()
|
||||
|
|
|
@ -18,13 +18,7 @@ Bert finetune and evaluation script.
|
|||
'''
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import time
|
||||
from src.bert_for_finetune import BertFinetuneCell, BertNER
|
||||
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
|
||||
from src.dataset import create_ner_dataset
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
|
||||
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import context
|
||||
from mindspore import log as logger
|
||||
|
@ -34,6 +28,13 @@ from mindspore.train.model import Model
|
|||
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
|
||||
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||
|
||||
from src.bert_for_finetune import BertFinetuneCell, BertNER
|
||||
from src.dataset import create_ner_dataset
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate, convert_labels_to_index
|
||||
from src.assessment_method import Accuracy, F1, MCC, Spearman_Correlation
|
||||
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
|
||||
from src.model_utils.moxing_adapter import moxing_wrapper
|
||||
from src.model_utils.device_adapter import get_device_id
|
||||
_cur_dir = os.getcwd()
|
||||
|
||||
|
||||
|
@ -85,6 +86,7 @@ def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoin
|
|||
train_end = time.time()
|
||||
print("latency: {:.6f} s".format(train_end - train_begin))
|
||||
|
||||
|
||||
def eval_result_print(assessment_method="accuracy", callback=None):
|
||||
"""print eval result"""
|
||||
if assessment_method == "accuracy":
|
||||
|
@ -103,6 +105,7 @@ def eval_result_print(assessment_method="accuracy", callback=None):
|
|||
else:
|
||||
raise ValueError("Assessment method not supported, support: [accuracy, f1, mcc, spearman_correlation]")
|
||||
|
||||
|
||||
def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_method="accuracy", data_file="",
|
||||
load_checkpoint_path="", vocab_file="", label_file="", tag_to_index=None, batch_size=1):
|
||||
""" do eval """
|
||||
|
@ -146,41 +149,22 @@ def do_eval(dataset=None, network=None, use_crf="", num_class=41, assessment_met
|
|||
print("==============================================================")
|
||||
|
||||
|
||||
def parse_args():
|
||||
"""set and check parameters."""
|
||||
parser = argparse.ArgumentParser(description="run ner")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
|
||||
help="Device type, default is Ascend")
|
||||
parser.add_argument("--assessment_method", type=str, default="BF1", choices=["BF1", "clue_benchmark", "MF1"],
|
||||
help="assessment_method include: [BF1, clue_benchmark, MF1], default is BF1")
|
||||
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
|
||||
help="Eable train, default is false")
|
||||
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
|
||||
help="Eable eval, default is false")
|
||||
parser.add_argument("--use_crf", type=str, default="false", choices=["true", "false"],
|
||||
help="Use crf, default is false")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
|
||||
parser.add_argument("--epoch_num", type=int, default=5, help="Epoch number, default is 5.")
|
||||
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable train data shuffle, default is true")
|
||||
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval data shuffle, default is false")
|
||||
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
|
||||
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
|
||||
parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path, used in clue benchmark")
|
||||
parser.add_argument("--label_file_path", type=str, default="", help="label file path, used in clue benchmark")
|
||||
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
|
||||
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--train_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--eval_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--dataset_format", type=str, default="mindrecord", choices=["mindrecord", "tfrecord"],
|
||||
help="Dataset format, support mindrecord or tfrecord")
|
||||
parser.add_argument("--schema_file_path", type=str, default="",
|
||||
help="Schema path, it is better to use absolute path")
|
||||
args_opt = parser.parse_args()
|
||||
def modelarts_pre_process():
|
||||
'''modelarts pre process function.'''
|
||||
args_opt.device_id = get_device_id()
|
||||
_file_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
|
||||
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
|
||||
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
|
||||
if args_opt.schema_file_path:
|
||||
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
|
||||
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
|
||||
args_opt.eval_data_file_path = os.path.join(args_opt.data_path, args_opt.eval_data_file_path)
|
||||
args_opt.label_file_path = os.path.join(args_opt.data_path, args_opt.label_file_path)
|
||||
|
||||
|
||||
def determine_params():
|
||||
"""Determine whether the parameters are reasonable."""
|
||||
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
|
||||
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
|
||||
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
|
||||
|
@ -193,14 +177,14 @@ def parse_args():
|
|||
raise ValueError("'label_file_path' must be set to use crf")
|
||||
if args_opt.assessment_method.lower() == "clue_benchmark" and args_opt.label_file_path == "":
|
||||
raise ValueError("'label_file_path' must be set to do clue benchmark")
|
||||
if args_opt.assessment_method.lower() == "clue_benchmark":
|
||||
args_opt.eval_batch_size = 1
|
||||
return args_opt
|
||||
|
||||
|
||||
@moxing_wrapper(pre_process=modelarts_pre_process)
|
||||
def run_ner():
|
||||
"""run ner task"""
|
||||
args_opt = parse_args()
|
||||
determine_params()
|
||||
if args_opt.assessment_method.lower() == "clue_benchmark":
|
||||
args_opt.eval_batch_size = 1
|
||||
epoch_num = args_opt.epoch_num
|
||||
assessment_method = args_opt.assessment_method.lower()
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
|
@ -262,5 +246,6 @@ def run_ner():
|
|||
args_opt.eval_data_file_path, load_finetune_checkpoint_path, args_opt.vocab_file_path,
|
||||
args_opt.label_file_path, tag_to_index, args_opt.eval_batch_size)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_ner()
|
||||
|
|
|
@ -16,9 +16,7 @@
|
|||
#################pre_train bert example on zh-wiki########################
|
||||
python run_pretrain.py
|
||||
"""
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import mindspore.communication.management as D
|
||||
from mindspore.communication.management import get_rank
|
||||
import mindspore.common.dtype as mstype
|
||||
|
@ -38,8 +36,10 @@ from src import BertNetworkWithLoss, BertTrainOneStepCell, BertTrainOneStepWithL
|
|||
BertTrainOneStepWithLossScaleCellForAdam, \
|
||||
AdamWeightDecayForBert, AdamWeightDecayOp
|
||||
from src.dataset import create_bert_dataset
|
||||
from src.config import cfg, bert_net_cfg
|
||||
from src.utils import LossCallBack, BertLearningRate
|
||||
from src.model_utils.config import config as cfg, bert_net_cfg
|
||||
from src.model_utils.moxing_adapter import moxing_wrapper
|
||||
from src.model_utils.device_adapter import get_device_id, get_device_num
|
||||
_current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||
|
||||
|
||||
|
@ -150,60 +150,31 @@ def _check_compute_type(args_opt):
|
|||
logger.warning(warning_message)
|
||||
|
||||
|
||||
def argparse_init():
|
||||
"""Argparse init."""
|
||||
parser = argparse.ArgumentParser(description='bert pre_training')
|
||||
parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
|
||||
help='device where the code will be implemented. (Default: Ascend)')
|
||||
parser.add_argument("--distribute", type=str, default="false", choices=["true", "false"],
|
||||
help="Run distribute, default is false.")
|
||||
parser.add_argument("--epoch_size", type=int, default="1", help="Epoch size, default is 1.")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
|
||||
parser.add_argument("--device_num", type=int, default=1, help="Use device nums, default is 1.")
|
||||
parser.add_argument("--enable_save_ckpt", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable save checkpoint, default is true.")
|
||||
parser.add_argument("--enable_lossscale", type=str, default="true", choices=["true", "false"],
|
||||
help="Use lossscale or not, default is not.")
|
||||
parser.add_argument("--do_shuffle", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable shuffle for dataset, default is true.")
|
||||
parser.add_argument("--enable_data_sink", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable data sink, default is true.")
|
||||
parser.add_argument("--data_sink_steps", type=int, default="1", help="Sink steps for each epoch, default is 1.")
|
||||
parser.add_argument("--accumulation_steps", type=int, default="1",
|
||||
help="Accumulating gradients N times before weight update, default is 1.")
|
||||
parser.add_argument("--allreduce_post_accumulation", type=str, default="true", choices=["true", "false"],
|
||||
help="Whether to allreduce after accumulation of N steps or after each step, default is true.")
|
||||
parser.add_argument("--save_checkpoint_path", type=str, default="", help="Save checkpoint path")
|
||||
parser.add_argument("--load_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--save_checkpoint_steps", type=int, default=1000, help="Save checkpoint steps, "
|
||||
"default is 1000.")
|
||||
parser.add_argument("--train_steps", type=int, default=-1, help="Training Steps, default is -1, "
|
||||
"meaning run all steps according to epoch number.")
|
||||
parser.add_argument("--save_checkpoint_num", type=int, default=1, help="Save checkpoint numbers, default is 1.")
|
||||
parser.add_argument("--data_dir", type=str, default="", help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--schema_dir", type=str, default="", help="Schema path, it is better to use absolute path")
|
||||
|
||||
return parser
|
||||
def modelarts_pre_process():
|
||||
'''modelarts pre process function.'''
|
||||
cfg.device_id = get_device_id()
|
||||
cfg.device_num = get_device_num()
|
||||
cfg.data_dir = cfg.data_path
|
||||
cfg.save_checkpoint_path = os.path.join(cfg.output_path, cfg.save_checkpoint_path)
|
||||
|
||||
|
||||
@moxing_wrapper(pre_process=modelarts_pre_process)
|
||||
def run_pretrain():
|
||||
"""pre-train bert_clue"""
|
||||
parser = argparse_init()
|
||||
args_opt = parser.parse_args()
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target, device_id=args_opt.device_id)
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=cfg.device_target, device_id=cfg.device_id)
|
||||
context.set_context(reserve_class_name_in_scope=False)
|
||||
_set_graph_kernel_context(args_opt.device_target)
|
||||
ckpt_save_dir = args_opt.save_checkpoint_path
|
||||
if args_opt.distribute == "true":
|
||||
if args_opt.device_target == 'Ascend':
|
||||
_set_graph_kernel_context(cfg.device_target)
|
||||
ckpt_save_dir = cfg.save_checkpoint_path
|
||||
if cfg.distribute == "true":
|
||||
if cfg.device_target == 'Ascend':
|
||||
D.init()
|
||||
device_num = args_opt.device_num
|
||||
rank = args_opt.device_id % device_num
|
||||
device_num = cfg.device_num
|
||||
rank = cfg.device_id % device_num
|
||||
else:
|
||||
D.init()
|
||||
device_num = D.get_group_size()
|
||||
rank = D.get_rank()
|
||||
ckpt_save_dir = args_opt.save_checkpoint_path + 'ckpt_' + str(get_rank()) + '/'
|
||||
ckpt_save_dir = os.path.join(cfg.save_checkpoint_path, 'ckpt_' + str(get_rank()))
|
||||
|
||||
context.reset_auto_parallel_context()
|
||||
context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True,
|
||||
|
@ -213,57 +184,57 @@ def run_pretrain():
|
|||
rank = 0
|
||||
device_num = 1
|
||||
|
||||
_check_compute_type(args_opt)
|
||||
_check_compute_type(cfg)
|
||||
|
||||
if args_opt.accumulation_steps > 1:
|
||||
logger.info("accumulation steps: {}".format(args_opt.accumulation_steps))
|
||||
logger.info("global batch size: {}".format(cfg.batch_size * args_opt.accumulation_steps))
|
||||
if args_opt.enable_data_sink == "true":
|
||||
args_opt.data_sink_steps *= args_opt.accumulation_steps
|
||||
logger.info("data sink steps: {}".format(args_opt.data_sink_steps))
|
||||
if args_opt.enable_save_ckpt == "true":
|
||||
args_opt.save_checkpoint_steps *= args_opt.accumulation_steps
|
||||
logger.info("save checkpoint steps: {}".format(args_opt.save_checkpoint_steps))
|
||||
if cfg.accumulation_steps > 1:
|
||||
logger.info("accumulation steps: {}".format(cfg.accumulation_steps))
|
||||
logger.info("global batch size: {}".format(cfg.batch_size * cfg.accumulation_steps))
|
||||
if cfg.enable_data_sink == "true":
|
||||
cfg.data_sink_steps *= cfg.accumulation_steps
|
||||
logger.info("data sink steps: {}".format(cfg.data_sink_steps))
|
||||
if cfg.enable_save_ckpt == "true":
|
||||
cfg.save_checkpoint_steps *= cfg.accumulation_steps
|
||||
logger.info("save checkpoint steps: {}".format(cfg.save_checkpoint_steps))
|
||||
|
||||
ds = create_bert_dataset(device_num, rank, args_opt.do_shuffle, args_opt.data_dir, args_opt.schema_dir)
|
||||
ds = create_bert_dataset(device_num, rank, cfg.do_shuffle, cfg.data_dir, cfg.schema_dir)
|
||||
net_with_loss = BertNetworkWithLoss(bert_net_cfg, True)
|
||||
|
||||
new_repeat_count = args_opt.epoch_size * ds.get_dataset_size() // args_opt.data_sink_steps
|
||||
if args_opt.train_steps > 0:
|
||||
train_steps = args_opt.train_steps * args_opt.accumulation_steps
|
||||
new_repeat_count = min(new_repeat_count, train_steps // args_opt.data_sink_steps)
|
||||
new_repeat_count = cfg.epoch_size * ds.get_dataset_size() // cfg.data_sink_steps
|
||||
if cfg.train_steps > 0:
|
||||
train_steps = cfg.train_steps * cfg.accumulation_steps
|
||||
new_repeat_count = min(new_repeat_count, train_steps // cfg.data_sink_steps)
|
||||
else:
|
||||
args_opt.train_steps = args_opt.epoch_size * ds.get_dataset_size() // args_opt.accumulation_steps
|
||||
logger.info("train steps: {}".format(args_opt.train_steps))
|
||||
cfg.train_steps = cfg.epoch_size * ds.get_dataset_size() // cfg.accumulation_steps
|
||||
logger.info("train steps: {}".format(cfg.train_steps))
|
||||
|
||||
optimizer = _get_optimizer(args_opt, net_with_loss)
|
||||
callback = [TimeMonitor(args_opt.data_sink_steps), LossCallBack(ds.get_dataset_size())]
|
||||
if args_opt.enable_save_ckpt == "true" and args_opt.device_id % min(8, device_num) == 0:
|
||||
config_ck = CheckpointConfig(save_checkpoint_steps=args_opt.save_checkpoint_steps,
|
||||
keep_checkpoint_max=args_opt.save_checkpoint_num)
|
||||
optimizer = _get_optimizer(cfg, net_with_loss)
|
||||
callback = [TimeMonitor(cfg.data_sink_steps), LossCallBack(ds.get_dataset_size())]
|
||||
if cfg.enable_save_ckpt == "true" and cfg.device_id % min(8, device_num) == 0:
|
||||
config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps,
|
||||
keep_checkpoint_max=cfg.save_checkpoint_num)
|
||||
ckpoint_cb = ModelCheckpoint(prefix='checkpoint_bert',
|
||||
directory=None if ckpt_save_dir == "" else ckpt_save_dir, config=config_ck)
|
||||
callback.append(ckpoint_cb)
|
||||
|
||||
if args_opt.load_checkpoint_path:
|
||||
param_dict = load_checkpoint(args_opt.load_checkpoint_path)
|
||||
if cfg.load_checkpoint_path:
|
||||
param_dict = load_checkpoint(cfg.load_checkpoint_path)
|
||||
load_param_into_net(net_with_loss, param_dict)
|
||||
|
||||
if args_opt.enable_lossscale == "true":
|
||||
if cfg.enable_lossscale == "true":
|
||||
update_cell = DynamicLossScaleUpdateCell(loss_scale_value=cfg.loss_scale_value,
|
||||
scale_factor=cfg.scale_factor,
|
||||
scale_window=cfg.scale_window)
|
||||
accumulation_steps = args_opt.accumulation_steps
|
||||
accumulation_steps = cfg.accumulation_steps
|
||||
enable_global_norm = cfg.enable_global_norm
|
||||
if accumulation_steps <= 1:
|
||||
if cfg.optimizer == 'AdamWeightDecay' and args_opt.device_target == 'GPU':
|
||||
if cfg.optimizer == 'AdamWeightDecay' and cfg.device_target == 'GPU':
|
||||
net_with_grads = BertTrainOneStepWithLossScaleCellForAdam(net_with_loss, optimizer=optimizer,
|
||||
scale_update_cell=update_cell)
|
||||
else:
|
||||
net_with_grads = BertTrainOneStepWithLossScaleCell(net_with_loss, optimizer=optimizer,
|
||||
scale_update_cell=update_cell)
|
||||
else:
|
||||
allreduce_post = args_opt.distribute == "false" or args_opt.allreduce_post_accumulation == "true"
|
||||
allreduce_post = cfg.distribute == "false" or cfg.allreduce_post_accumulation == "true"
|
||||
net_with_accumulation = (BertTrainAccumulationAllReducePostWithLossScaleCell if allreduce_post else
|
||||
BertTrainAccumulationAllReduceEachWithLossScaleCell)
|
||||
net_with_grads = net_with_accumulation(net_with_loss, optimizer=optimizer,
|
||||
|
@ -280,7 +251,7 @@ def run_pretrain():
|
|||
model = Model(net_with_grads)
|
||||
model = ConvertModelUtils().convert_to_thor_model(model, network=net_with_grads, optimizer=optimizer)
|
||||
model.train(new_repeat_count, ds, callbacks=callback,
|
||||
dataset_sink_mode=(args_opt.enable_data_sink == "true"), sink_size=args_opt.data_sink_steps)
|
||||
dataset_sink_mode=(cfg.enable_data_sink == "true"), sink_size=cfg.data_sink_steps)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -17,12 +17,7 @@
|
|||
Bert finetune and evaluation script.
|
||||
'''
|
||||
import os
|
||||
import argparse
|
||||
import collections
|
||||
from src.bert_for_finetune import BertSquadCell, BertSquad
|
||||
from src.finetune_eval_config import optimizer_cfg, bert_net_cfg
|
||||
from src.dataset import create_squad_dataset
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import context
|
||||
from mindspore import log as logger
|
||||
|
@ -33,8 +28,16 @@ from mindspore.train.model import Model
|
|||
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
|
||||
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||
|
||||
from src.bert_for_finetune import BertSquadCell, BertSquad
|
||||
from src.dataset import create_squad_dataset
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, BertLearningRate
|
||||
from src.model_utils.config import config as args_opt, optimizer_cfg, bert_net_cfg
|
||||
from src.model_utils.moxing_adapter import moxing_wrapper
|
||||
from src.model_utils.device_adapter import get_device_id
|
||||
|
||||
_cur_dir = os.getcwd()
|
||||
|
||||
|
||||
def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
|
||||
""" do train """
|
||||
if load_checkpoint_path == "":
|
||||
|
@ -118,39 +121,24 @@ def do_eval(dataset=None, load_checkpoint_path="", eval_batch_size=1):
|
|||
end_logits=end_logits))
|
||||
return output
|
||||
|
||||
|
||||
def modelarts_pre_process():
|
||||
'''modelarts pre process function.'''
|
||||
args_opt.device_id = get_device_id()
|
||||
_file_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
args_opt.load_pretrain_checkpoint_path = os.path.join(_file_dir, args_opt.load_pretrain_checkpoint_path)
|
||||
args_opt.load_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.load_finetune_checkpoint_path)
|
||||
args_opt.save_finetune_checkpoint_path = os.path.join(args_opt.output_path, args_opt.save_finetune_checkpoint_path)
|
||||
args_opt.vocab_file_path = os.path.join(args_opt.data_path, args_opt.vocab_file_path)
|
||||
if args_opt.schema_file_path:
|
||||
args_opt.schema_file_path = os.path.join(args_opt.data_path, args_opt.schema_file_path)
|
||||
args_opt.train_data_file_path = os.path.join(args_opt.data_path, args_opt.train_data_file_path)
|
||||
args_opt.eval_json_path = os.path.join(args_opt.data_path, args_opt.eval_json_path)
|
||||
|
||||
|
||||
@moxing_wrapper(pre_process=modelarts_pre_process)
|
||||
def run_squad():
|
||||
"""run squad task"""
|
||||
parser = argparse.ArgumentParser(description="run squad")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
|
||||
help="Device type, default is Ascend")
|
||||
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
|
||||
help="Eable train, default is false")
|
||||
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
|
||||
help="Eable eval, default is false")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
|
||||
parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 1.")
|
||||
parser.add_argument("--num_class", type=int, default=2, help="The number of class, default is 2.")
|
||||
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable train data shuffle, default is true")
|
||||
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval data shuffle, default is false")
|
||||
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
|
||||
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
|
||||
parser.add_argument("--vocab_file_path", type=str, default="", help="Vocab file path")
|
||||
parser.add_argument("--eval_json_path", type=str, default="", help="Evaluation json file path, can be eval.json")
|
||||
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
|
||||
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--train_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--schema_file_path", type=str, default="",
|
||||
help="Schema path, it is better to use absolute path")
|
||||
args_opt = parser.parse_args()
|
||||
epoch_num = args_opt.epoch_num
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
|
||||
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
|
||||
|
||||
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
|
||||
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
|
||||
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
|
||||
|
@ -160,8 +148,10 @@ def run_squad():
|
|||
raise ValueError("'vocab_file_path' must be set when do evaluation task")
|
||||
if args_opt.eval_json_path == "":
|
||||
raise ValueError("'tokenization_file_path' must be set when do evaluation task")
|
||||
|
||||
|
||||
epoch_num = args_opt.epoch_num
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
|
||||
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
|
||||
target = args_opt.device_target
|
||||
if target == "Ascend":
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
|
||||
|
|
|
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
|
|||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${PROJECT_DIR}/../run_classifier.py \
|
||||
--config_path="../../task_classifier_config.yaml" \
|
||||
--device_target="Ascend" \
|
||||
--do_train="true" \
|
||||
--do_eval="false" \
|
||||
|
|
|
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
|
|||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${PROJECT_DIR}/../run_ner.py \
|
||||
--config_path="../../task_ner_config.yaml" \
|
||||
--device_target="Ascend" \
|
||||
--do_train="true" \
|
||||
--do_eval="false" \
|
||||
|
|
|
@ -27,6 +27,7 @@ PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
|
|||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${PROJECT_DIR}/../run_squad.py \
|
||||
--config_path="../../task_squad_config.yaml" \
|
||||
--device_target="Ascend" \
|
||||
--do_train="true" \
|
||||
--do_eval="false" \
|
||||
|
|
|
@ -22,7 +22,7 @@ from mindspore.common.tensor import Tensor
|
|||
from src import tokenization
|
||||
from src.sample_process import label_generation, process_one_example_p
|
||||
from src.CRF import postprocess
|
||||
from src.finetune_eval_config import bert_net_cfg
|
||||
from src.model_utils.config import bert_net_cfg
|
||||
from src.score import get_result
|
||||
|
||||
def process(model=None, text="", tokenizer_=None, use_crf="", tag_to_index=None, vocab=""):
|
||||
|
|
|
@ -1,129 +0,0 @@
|
|||
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""
|
||||
network config setting, will be used in dataset.py, run_pretrain.py
|
||||
"""
|
||||
from easydict import EasyDict as edict
|
||||
import mindspore.common.dtype as mstype
|
||||
from .bert_model import BertConfig
|
||||
cfg = edict({
|
||||
'batch_size': 32,
|
||||
'bert_network': 'base',
|
||||
'loss_scale_value': 65536,
|
||||
'scale_factor': 2,
|
||||
'scale_window': 1000,
|
||||
'optimizer': 'Lamb',
|
||||
'enable_global_norm': False,
|
||||
'AdamWeightDecay': edict({
|
||||
'learning_rate': 3e-5,
|
||||
'end_learning_rate': 0.0,
|
||||
'power': 5.0,
|
||||
'weight_decay': 1e-5,
|
||||
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
|
||||
'eps': 1e-6,
|
||||
'warmup_steps': 10000,
|
||||
}),
|
||||
'Lamb': edict({
|
||||
'learning_rate': 3e-4,
|
||||
'end_learning_rate': 0.0,
|
||||
'power': 2.0,
|
||||
'warmup_steps': 10000,
|
||||
'weight_decay': 0.01,
|
||||
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
|
||||
'eps': 1e-8,
|
||||
}),
|
||||
'Momentum': edict({
|
||||
'learning_rate': 2e-5,
|
||||
'momentum': 0.9,
|
||||
}),
|
||||
'Thor': edict({
|
||||
'lr_max': 0.0034,
|
||||
'lr_min': 3.244e-5,
|
||||
'lr_power': 1.0,
|
||||
'lr_total_steps': 30000,
|
||||
'damping_max': 5e-2,
|
||||
'damping_min': 1e-6,
|
||||
'damping_power': 1.0,
|
||||
'damping_total_steps': 30000,
|
||||
'momentum': 0.9,
|
||||
'weight_decay': 5e-4,
|
||||
'loss_scale': 1.0,
|
||||
'frequency': 100,
|
||||
}),
|
||||
})
|
||||
|
||||
'''
|
||||
Including two kinds of network: \
|
||||
base: Google BERT-base(the base version of BERT model).
|
||||
large: BERT-NEZHA(a Chinese pretrained language model developed by Huawei, which introduced a improvement of \
|
||||
Functional Relative Posetional Encoding as an effective positional encoding scheme).
|
||||
'''
|
||||
if cfg.bert_network == 'base':
|
||||
cfg.batch_size = 64
|
||||
bert_net_cfg = BertConfig(
|
||||
seq_length=128,
|
||||
vocab_size=21128,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=512,
|
||||
type_vocab_size=2,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=False,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float16
|
||||
)
|
||||
if cfg.bert_network == 'nezha':
|
||||
cfg.batch_size = 96
|
||||
bert_net_cfg = BertConfig(
|
||||
seq_length=128,
|
||||
vocab_size=21128,
|
||||
hidden_size=1024,
|
||||
num_hidden_layers=24,
|
||||
num_attention_heads=16,
|
||||
intermediate_size=4096,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=512,
|
||||
type_vocab_size=2,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=True,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float16
|
||||
)
|
||||
if cfg.bert_network == 'large':
|
||||
cfg.batch_size = 24
|
||||
bert_net_cfg = BertConfig(
|
||||
seq_length=512,
|
||||
vocab_size=30522,
|
||||
hidden_size=1024,
|
||||
num_hidden_layers=24,
|
||||
num_attention_heads=16,
|
||||
intermediate_size=4096,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=512,
|
||||
type_vocab_size=2,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=False,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float16
|
||||
)
|
|
@ -20,7 +20,7 @@ import mindspore.common.dtype as mstype
|
|||
import mindspore.dataset as ds
|
||||
import mindspore.dataset.transforms.c_transforms as C
|
||||
from mindspore import log as logger
|
||||
from .config import cfg
|
||||
from .model_utils.config import config as cfg
|
||||
|
||||
|
||||
def create_bert_dataset(device_num=1, rank=0, do_shuffle="true", data_dir=None, schema_dir=None):
|
||||
|
|
|
@ -1,63 +0,0 @@
|
|||
# Copyright 2020 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""
|
||||
config settings, will be used in finetune.py
|
||||
"""
|
||||
|
||||
from easydict import EasyDict as edict
|
||||
import mindspore.common.dtype as mstype
|
||||
from .bert_model import BertConfig
|
||||
|
||||
optimizer_cfg = edict({
|
||||
'optimizer': 'Lamb',
|
||||
'AdamWeightDecay': edict({
|
||||
'learning_rate': 2e-5,
|
||||
'end_learning_rate': 1e-7,
|
||||
'power': 1.0,
|
||||
'weight_decay': 1e-5,
|
||||
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
|
||||
'eps': 1e-6,
|
||||
}),
|
||||
'Lamb': edict({
|
||||
'learning_rate': 2e-5,
|
||||
'end_learning_rate': 1e-7,
|
||||
'power': 1.0,
|
||||
'weight_decay': 0.01,
|
||||
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
|
||||
}),
|
||||
'Momentum': edict({
|
||||
'learning_rate': 2e-5,
|
||||
'momentum': 0.9,
|
||||
}),
|
||||
})
|
||||
|
||||
bert_net_cfg = BertConfig(
|
||||
seq_length=128,
|
||||
vocab_size=21128,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=512,
|
||||
type_vocab_size=2,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=False,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float16,
|
||||
)
|
|
@ -0,0 +1,200 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""Parse arguments"""
|
||||
|
||||
import os
|
||||
import ast
|
||||
import argparse
|
||||
from pprint import pformat
|
||||
import yaml
|
||||
import mindspore.common.dtype as mstype
|
||||
from src.bert_model import BertConfig
|
||||
|
||||
|
||||
class Config:
|
||||
"""
|
||||
Configuration namespace. Convert dictionary to members.
|
||||
"""
|
||||
def __init__(self, cfg_dict):
|
||||
for k, v in cfg_dict.items():
|
||||
if isinstance(v, (list, tuple)):
|
||||
setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
|
||||
else:
|
||||
setattr(self, k, Config(v) if isinstance(v, dict) else v)
|
||||
|
||||
def __str__(self):
|
||||
return pformat(self.__dict__)
|
||||
|
||||
def __repr__(self):
|
||||
return self.__str__()
|
||||
|
||||
|
||||
def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="pretrain_base_config.yaml"):
|
||||
"""
|
||||
Parse command line arguments to the configuration according to the default yaml.
|
||||
|
||||
Args:
|
||||
parser: Parent parser.
|
||||
cfg: Base configuration.
|
||||
helper: Helper description.
|
||||
cfg_path: Path to the default yaml config.
|
||||
"""
|
||||
parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
|
||||
parents=[parser])
|
||||
helper = {} if helper is None else helper
|
||||
choices = {} if choices is None else choices
|
||||
for item in cfg:
|
||||
if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
|
||||
help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
|
||||
choice = choices[item] if item in choices else None
|
||||
if isinstance(cfg[item], bool):
|
||||
parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
|
||||
help=help_description)
|
||||
else:
|
||||
parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
|
||||
help=help_description)
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
|
||||
def parse_yaml(yaml_path):
|
||||
"""
|
||||
Parse the yaml config file.
|
||||
|
||||
Args:
|
||||
yaml_path: Path to the yaml config.
|
||||
"""
|
||||
with open(yaml_path, 'r') as fin:
|
||||
try:
|
||||
cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
|
||||
cfgs = [x for x in cfgs]
|
||||
if len(cfgs) == 1:
|
||||
cfg_helper = {}
|
||||
cfg = cfgs[0]
|
||||
cfg_choices = {}
|
||||
elif len(cfgs) == 2:
|
||||
cfg, cfg_helper = cfgs
|
||||
cfg_choices = {}
|
||||
elif len(cfgs) == 3:
|
||||
cfg, cfg_helper, cfg_choices = cfgs
|
||||
else:
|
||||
raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
|
||||
# print(cfg_helper)
|
||||
except:
|
||||
raise ValueError("Failed to parse yaml")
|
||||
return cfg, cfg_helper, cfg_choices
|
||||
|
||||
|
||||
def merge(args, cfg):
|
||||
"""
|
||||
Merge the base config from yaml file and command line arguments.
|
||||
|
||||
Args:
|
||||
args: Command line arguments.
|
||||
cfg: Base configuration.
|
||||
"""
|
||||
args_var = vars(args)
|
||||
for item in args_var:
|
||||
cfg[item] = args_var[item]
|
||||
return cfg
|
||||
|
||||
|
||||
def extra_operations(cfg):
|
||||
"""
|
||||
Do extra work on config
|
||||
|
||||
Args:
|
||||
config: Object after instantiation of class 'Config'.
|
||||
"""
|
||||
def create_filter_fun(keywords):
|
||||
return lambda x: not (True in [key in x.name.lower() for key in keywords])
|
||||
|
||||
if cfg.description == 'run_pretrain':
|
||||
cfg.AdamWeightDecay.decay_filter = create_filter_fun(cfg.AdamWeightDecay.decay_filter)
|
||||
cfg.Lamb.decay_filter = create_filter_fun(cfg.Lamb.decay_filter)
|
||||
cfg.base_net_cfg.dtype = mstype.float32
|
||||
cfg.base_net_cfg.compute_type = mstype.float16
|
||||
cfg.nezha_net_cfg.dtype = mstype.float32
|
||||
cfg.nezha_net_cfg.compute_type = mstype.float16
|
||||
cfg.large_net_cfg.dtype = mstype.float32
|
||||
cfg.large_net_cfg.compute_type = mstype.float16
|
||||
if cfg.bert_network == 'base':
|
||||
cfg.batch_size = cfg.base_batch_size
|
||||
_bert_net_cfg = cfg.base_net_cfg
|
||||
elif cfg.bert_network == 'nezha':
|
||||
cfg.batch_size = cfg.nezha_batch_size
|
||||
_bert_net_cfg = cfg.nezha_net_cfg
|
||||
elif cfg.bert_network == 'large':
|
||||
cfg.batch_size = cfg.large_batch_size
|
||||
_bert_net_cfg = cfg.large_net_cfg
|
||||
else:
|
||||
pass
|
||||
cfg.bert_net_cfg = BertConfig(**_bert_net_cfg.__dict__)
|
||||
elif cfg.description == 'run_ner':
|
||||
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
|
||||
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
|
||||
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
|
||||
cfg.bert_net_cfg.dtype = mstype.float32
|
||||
cfg.bert_net_cfg.compute_type = mstype.float16
|
||||
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
|
||||
|
||||
elif cfg.description == 'run_squad':
|
||||
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
|
||||
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
|
||||
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
|
||||
cfg.bert_net_cfg.dtype = mstype.float32
|
||||
cfg.bert_net_cfg.compute_type = mstype.float16
|
||||
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
|
||||
|
||||
elif cfg.description == 'run_classifier':
|
||||
cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
|
||||
create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
|
||||
cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
|
||||
cfg.bert_net_cfg.dtype = mstype.float32
|
||||
cfg.bert_net_cfg.compute_type = mstype.float16
|
||||
cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
|
||||
else:
|
||||
pass
|
||||
|
||||
|
||||
def get_config():
|
||||
"""
|
||||
Get Config according to the yaml file and cli arguments.
|
||||
"""
|
||||
def get_abs_path(path_relative):
|
||||
current_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
return os.path.join(current_dir, path_relative)
|
||||
parser = argparse.ArgumentParser(description="default name", add_help=False)
|
||||
parser.add_argument("--config_path", type=get_abs_path, default="../../pretrain_config.yaml",
|
||||
help="Config file path")
|
||||
path_args, _ = parser.parse_known_args()
|
||||
default, helper, choices = parse_yaml(path_args.config_path)
|
||||
# pprint(default)
|
||||
args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
|
||||
final_config = merge(args, default)
|
||||
config_obj = Config(final_config)
|
||||
extra_operations(config_obj)
|
||||
return config_obj
|
||||
|
||||
|
||||
config = get_config()
|
||||
bert_net_cfg = config.bert_net_cfg
|
||||
if config.description in ('run_classifier', 'run_ner', 'run_squad'):
|
||||
optimizer_cfg = config.optimizer_cfg
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
print(config)
|
|
@ -0,0 +1,27 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""Device adapter for ModelArts"""
|
||||
|
||||
from src.model_utils.config import config
|
||||
|
||||
if config.enable_modelarts:
|
||||
from src.model_utils.moxing_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
|
||||
else:
|
||||
from src.model_utils.local_adapter import get_device_id, get_device_num, get_rank_id, get_job_id
|
||||
|
||||
__all__ = [
|
||||
"get_device_id", "get_device_num", "get_rank_id", "get_job_id"
|
||||
]
|
|
@ -0,0 +1,36 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""Local adapter"""
|
||||
|
||||
import os
|
||||
|
||||
def get_device_id():
|
||||
device_id = os.getenv('DEVICE_ID', '0')
|
||||
return int(device_id)
|
||||
|
||||
|
||||
def get_device_num():
|
||||
device_num = os.getenv('RANK_SIZE', '1')
|
||||
return int(device_num)
|
||||
|
||||
|
||||
def get_rank_id():
|
||||
global_rank_id = os.getenv('RANK_ID', '0')
|
||||
return int(global_rank_id)
|
||||
|
||||
|
||||
def get_job_id():
|
||||
return "Local Job"
|
|
@ -0,0 +1,123 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""Moxing adapter for ModelArts"""
|
||||
|
||||
import os
|
||||
import functools
|
||||
from mindspore import context
|
||||
from mindspore.profiler import Profiler
|
||||
from src.model_utils.config import config
|
||||
|
||||
_global_sync_count = 0
|
||||
|
||||
def get_device_id():
|
||||
device_id = os.getenv('DEVICE_ID', '0')
|
||||
return int(device_id)
|
||||
|
||||
|
||||
def get_device_num():
|
||||
device_num = os.getenv('RANK_SIZE', '1')
|
||||
return int(device_num)
|
||||
|
||||
|
||||
def get_rank_id():
|
||||
global_rank_id = os.getenv('RANK_ID', '0')
|
||||
return int(global_rank_id)
|
||||
|
||||
|
||||
def get_job_id():
|
||||
job_id = os.getenv('JOB_ID')
|
||||
job_id = job_id if job_id != "" else "default"
|
||||
return job_id
|
||||
|
||||
def sync_data(from_path, to_path):
|
||||
"""
|
||||
Download data from remote obs to local directory if the first url is remote url and the second one is local path
|
||||
Upload data from local directory to remote obs in contrast.
|
||||
"""
|
||||
import moxing as mox
|
||||
import time
|
||||
global _global_sync_count
|
||||
sync_lock = "/tmp/copy_sync.lock" + str(_global_sync_count)
|
||||
_global_sync_count += 1
|
||||
|
||||
# Each server contains 8 devices as most.
|
||||
if get_device_id() % min(get_device_num(), 8) == 0 and not os.path.exists(sync_lock):
|
||||
print("from path: ", from_path)
|
||||
print("to path: ", to_path)
|
||||
mox.file.copy_parallel(from_path, to_path)
|
||||
print("===finish data synchronization===")
|
||||
try:
|
||||
os.mknod(sync_lock)
|
||||
# print("os.mknod({}) success".format(sync_lock))
|
||||
except IOError:
|
||||
pass
|
||||
print("===save flag===")
|
||||
|
||||
while True:
|
||||
if os.path.exists(sync_lock):
|
||||
break
|
||||
time.sleep(1)
|
||||
|
||||
print("Finish sync data from {} to {}.".format(from_path, to_path))
|
||||
|
||||
|
||||
def moxing_wrapper(pre_process=None, post_process=None):
|
||||
"""
|
||||
Moxing wrapper to download dataset and upload outputs.
|
||||
"""
|
||||
def wrapper(run_func):
|
||||
@functools.wraps(run_func)
|
||||
def wrapped_func(*args, **kwargs):
|
||||
# Download data from data_url
|
||||
if config.enable_modelarts:
|
||||
if config.data_url:
|
||||
sync_data(config.data_url, config.data_path)
|
||||
print("Dataset downloaded: ", os.listdir(config.data_path))
|
||||
if config.checkpoint_url:
|
||||
sync_data(config.checkpoint_url, config.load_path)
|
||||
print("Preload downloaded: ", os.listdir(config.load_path))
|
||||
if config.train_url:
|
||||
sync_data(config.train_url, config.output_path)
|
||||
print("Workspace downloaded: ", os.listdir(config.output_path))
|
||||
|
||||
context.set_context(save_graphs_path=os.path.join(config.output_path, str(get_rank_id())))
|
||||
config.device_num = get_device_num()
|
||||
config.device_id = get_device_id()
|
||||
if not os.path.exists(config.output_path):
|
||||
os.makedirs(config.output_path)
|
||||
|
||||
if pre_process:
|
||||
pre_process()
|
||||
|
||||
if config.enable_profiling:
|
||||
profiler = Profiler()
|
||||
|
||||
run_func(*args, **kwargs)
|
||||
|
||||
if config.enable_profiling:
|
||||
profiler.analyse()
|
||||
|
||||
# Upload data to train_url
|
||||
if config.enable_modelarts:
|
||||
if post_process:
|
||||
post_process()
|
||||
|
||||
if config.train_url:
|
||||
print("Start to copy output directory")
|
||||
sync_data(config.output_path, config.train_url)
|
||||
return wrapped_func
|
||||
return wrapper
|
|
@ -0,0 +1,113 @@
|
|||
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
|
||||
enable_modelarts: False
|
||||
# Url for modelarts
|
||||
data_url: ""
|
||||
train_url: ""
|
||||
checkpoint_url: ""
|
||||
# Path for local
|
||||
data_path: "/cache/data"
|
||||
output_path: "/cache/train"
|
||||
load_path: "/cache/checkpoint_path"
|
||||
device_target: "Ascend"
|
||||
enable_profiling: False
|
||||
|
||||
# ==============================================================================
|
||||
description: "run_classifier"
|
||||
assessment_method: "Accuracy"
|
||||
do_train: "false"
|
||||
do_eval: "false"
|
||||
device_id: 0
|
||||
epoch_num: 3
|
||||
num_class: 2
|
||||
train_data_shuffle: "true"
|
||||
eval_data_shuffle: "false"
|
||||
train_batch_size: 32
|
||||
eval_batch_size: 1
|
||||
save_finetune_checkpoint_path: "./classifier_finetune/ckpt/"
|
||||
load_pretrain_checkpoint_path: ""
|
||||
load_finetune_checkpoint_path: ""
|
||||
train_data_file_path: ""
|
||||
eval_data_file_path: ""
|
||||
schema_file_path: ""
|
||||
# export related
|
||||
export_batch_size: 1
|
||||
export_ckpt_file: ''
|
||||
export_file_name: 'bert_classifier'
|
||||
file_format: 'AIR'
|
||||
|
||||
optimizer_cfg:
|
||||
optimizer: 'Lamb'
|
||||
AdamWeightDecay:
|
||||
learning_rate: 0.00002 # 2e-5
|
||||
end_learning_rate: 0.0000000001 # 1e-10
|
||||
power: 1.0
|
||||
weight_decay: 0.00001 # 1e-5
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
eps: 0.000001 # 1e-6
|
||||
Lamb:
|
||||
learning_rate: 0.00002 # 2e-5,
|
||||
end_learning_rate: 0.0000000001 # 1e-10
|
||||
power: 1.0
|
||||
weight_decay: 0.01
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
Momentum:
|
||||
learning_rate: 0.00002 # 2e-5
|
||||
momentum: 0.9
|
||||
|
||||
bert_net_cfg:
|
||||
seq_length: 128
|
||||
vocab_size: 21128
|
||||
hidden_size: 768
|
||||
num_hidden_layers: 12
|
||||
num_attention_heads: 12
|
||||
intermediate_size: 3072
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: False
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
|
||||
---
|
||||
# Help description for each configuration
|
||||
enable_modelarts: "Whether training on modelarts, default: False"
|
||||
data_url: "Url for modelarts"
|
||||
train_url: "Url for modelarts"
|
||||
data_path: "The location of the input data."
|
||||
output_path: "The location of the output file."
|
||||
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
|
||||
enable_profiling: 'Whether enable profiling while training, default: False'
|
||||
|
||||
assessment_method: "assessment_method including [Mcc, Spearman_correlation, Accuracy, F1], default is Accuracy"
|
||||
do_train: "Enable train, default is false"
|
||||
do_eval: "Enable eval, default is false"
|
||||
device_id: "Device id, default is 0."
|
||||
epoch_num: "Epoch number, default is 3."
|
||||
num_class: "The number of class, default is 2."
|
||||
train_data_shuffle: "Enable train data shuffle, default is true"
|
||||
eval_data_shuffle: "Enable eval data shuffle, default is false"
|
||||
train_batch_size: "Train batch size, default is 32"
|
||||
eval_batch_size: "Eval batch size, default is 1"
|
||||
save_finetune_checkpoint_path: "Save checkpoint path"
|
||||
load_pretrain_checkpoint_path: "Load checkpoint file path"
|
||||
load_finetune_checkpoint_path: "Load checkpoint file path"
|
||||
train_data_file_path: "Data path, it is better to use absolute path"
|
||||
eval_data_file_path: "Data path, it is better to use absolute path"
|
||||
schema_file_path: "Schema path, it is better to use absolute path"
|
||||
|
||||
export_batch_size: "export batch size."
|
||||
export_ckpt_file: "Bert ckpt file."
|
||||
export_file_name: "bert output air name."
|
||||
file_format: "file format"
|
||||
---
|
||||
# chocies
|
||||
device_target: ['Ascend', 'GPU']
|
||||
assessment_method: ["Mcc", "Spearman_correlation", "Accuracy", "F1"]
|
||||
do_train: ["true", "false"]
|
||||
do_eval: ["true", "false"]
|
||||
train_data_shuffle: ["true", "false"]
|
||||
eval_data_shuffle: ["true", "false"]
|
||||
file_format: ["AIR", "ONNX", "MINDIR"]
|
|
@ -0,0 +1,121 @@
|
|||
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
|
||||
enable_modelarts: False
|
||||
# Url for modelarts
|
||||
data_url: ""
|
||||
train_url: ""
|
||||
checkpoint_url: ""
|
||||
# Path for local
|
||||
data_path: "/cache/data"
|
||||
output_path: "/cache/train"
|
||||
load_path: "/cache/checkpoint_path"
|
||||
device_target: "Ascend"
|
||||
enable_profiling: False
|
||||
|
||||
# ==============================================================================
|
||||
description: "run_ner"
|
||||
assessment_method: "BF1"
|
||||
do_train: "false"
|
||||
do_eval: "false"
|
||||
use_crf: "false"
|
||||
device_id: 0
|
||||
epoch_num: 5
|
||||
train_data_shuffle: "true"
|
||||
eval_data_shuffle: "false"
|
||||
train_batch_size: 32
|
||||
eval_batch_size: 1
|
||||
vocab_file_path: ""
|
||||
label_file_path: ""
|
||||
save_finetune_checkpoint_path: "./ner_finetune/ckpt/"
|
||||
load_pretrain_checkpoint_path: ""
|
||||
load_finetune_checkpoint_path: ""
|
||||
train_data_file_path: ""
|
||||
eval_data_file_path: ""
|
||||
dataset_format: "mindrecord"
|
||||
schema_file_path: ""
|
||||
# export related
|
||||
export_batch_size: 1
|
||||
export_ckpt_file: ''
|
||||
export_file_name: 'bert_ner'
|
||||
file_format: 'AIR'
|
||||
|
||||
optimizer_cfg:
|
||||
optimizer: 'Lamb'
|
||||
AdamWeightDecay:
|
||||
learning_rate: 0.00002 # 2e-5
|
||||
end_learning_rate: 0.0000000001 # 1e-10
|
||||
power: 1.0
|
||||
weight_decay: 0.00001 # 1e-5
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
eps: 0.000001 # 1e-6
|
||||
Lamb:
|
||||
learning_rate: 0.00002 # 2e-5,
|
||||
end_learning_rate: 0.0000000001 # 1e-10
|
||||
power: 1.0
|
||||
weight_decay: 0.01
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
Momentum:
|
||||
learning_rate: 0.00002 # 2e-5
|
||||
momentum: 0.9
|
||||
|
||||
bert_net_cfg:
|
||||
seq_length: 128
|
||||
vocab_size: 21128
|
||||
hidden_size: 768
|
||||
num_hidden_layers: 12
|
||||
num_attention_heads: 12
|
||||
intermediate_size: 3072
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: False
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
|
||||
---
|
||||
# Help description for each configuration
|
||||
enable_modelarts: "Whether training on modelarts, default: False"
|
||||
data_url: "Url for modelarts"
|
||||
train_url: "Url for modelarts"
|
||||
data_path: "The location of the input data."
|
||||
output_path: "The location of the output file."
|
||||
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
|
||||
enable_profiling: 'Whether enable profiling while training, default: False'
|
||||
|
||||
assessment_method: "assessment_method include: [BF1, clue_benchmark, MF1], default is BF1"
|
||||
do_train: "Eable train, default is false"
|
||||
do_eval: "Eable eval, default is false"
|
||||
use_crf: "Use crf, default is false"
|
||||
device_id: "Device id, default is 0."
|
||||
epoch_num: "Epoch number, default is 5."
|
||||
train_data_shuffle: "Enable train data shuffle, default is true"
|
||||
eval_data_shuffle: "Enable eval data shuffle, default is false"
|
||||
train_batch_size: "Train batch size, default is 32"
|
||||
eval_batch_size: "Eval batch size, default is 1"
|
||||
vocab_file_path: "Vocab file path, used in clue benchmark"
|
||||
label_file_path: "label file path, used in clue benchmark"
|
||||
save_finetune_checkpoint_path: "Save checkpoint path"
|
||||
load_pretrain_checkpoint_path: "Load checkpoint file path"
|
||||
load_finetune_checkpoint_path: "Load checkpoint file path"
|
||||
train_data_file_path: "Data path, it is better to use absolute path"
|
||||
eval_data_file_path: "Data path, it is better to use absolute path"
|
||||
dataset_format: "Dataset format, support mindrecord or tfrecord"
|
||||
schema_file_path: "Schema path, it is better to use absolute path"
|
||||
|
||||
export_batch_size: "export batch size."
|
||||
export_ckpt_file: "Bert ckpt file."
|
||||
export_file_name: "bert output air name."
|
||||
file_format: "file format"
|
||||
---
|
||||
# chocies
|
||||
device_target: ['Ascend', 'GPU']
|
||||
assessment_method: ["BF1", "clue_benchmark", "MF1"]
|
||||
do_train: ["true", "false"]
|
||||
do_eval: ["true", "false"]
|
||||
use_crf: ["true", "false"]
|
||||
train_data_shuffle: ["true", "false"]
|
||||
eval_data_shuffle: ["true", "false"]
|
||||
dataset_format: ["mindrecord", "tfrecord"]
|
||||
file_format: ["AIR", "ONNX", "MINDIR"]
|
|
@ -0,0 +1,112 @@
|
|||
# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
|
||||
enable_modelarts: False
|
||||
# Url for modelarts
|
||||
data_url: ""
|
||||
train_url: ""
|
||||
checkpoint_url: ""
|
||||
# Path for local
|
||||
data_path: "/cache/data"
|
||||
output_path: "/cache/train"
|
||||
load_path: "/cache/checkpoint_path"
|
||||
device_target: "Ascend"
|
||||
enable_profiling: False
|
||||
|
||||
# ==============================================================================
|
||||
description: "run_squad"
|
||||
do_train: "false"
|
||||
do_eval: "false"
|
||||
device_id: 0
|
||||
epoch_num: 3
|
||||
num_class: 2
|
||||
train_data_shuffle: "true"
|
||||
eval_data_shuffle: "false"
|
||||
train_batch_size: 32
|
||||
eval_batch_size: 1
|
||||
vocab_file_path: ""
|
||||
eval_json_path: ""
|
||||
save_finetune_checkpoint_path: "./squad_finetune/ckpt/"
|
||||
load_pretrain_checkpoint_path: ""
|
||||
load_finetune_checkpoint_path: ""
|
||||
train_data_file_path: ""
|
||||
schema_file_path: ""
|
||||
# export related
|
||||
export_batch_size: 1
|
||||
export_ckpt_file: ''
|
||||
export_file_name: 'bert_squad'
|
||||
file_format: 'AIR'
|
||||
|
||||
optimizer_cfg:
|
||||
optimizer: 'Lamb'
|
||||
AdamWeightDecay:
|
||||
learning_rate: 0.0001 # 1e-4
|
||||
end_learning_rate: 0.00000000001 # 1e-11
|
||||
power: 5.0
|
||||
weight_decay: 0.001 # 1e-3
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
eps: 0.000001 # 1e-6
|
||||
Lamb:
|
||||
learning_rate: 0.0001 # 1e-4,
|
||||
end_learning_rate: 0.00000000001 # 1e-11
|
||||
power: 5.0
|
||||
weight_decay: 0.01
|
||||
decay_filter: ['layernorm', 'bias']
|
||||
Momentum:
|
||||
learning_rate: 0.0001 # 1e-4
|
||||
momentum: 0.9
|
||||
|
||||
bert_net_cfg:
|
||||
seq_length: 384
|
||||
vocab_size: 30522
|
||||
hidden_size: 768
|
||||
num_hidden_layers: 12
|
||||
num_attention_heads: 12
|
||||
intermediate_size: 3072
|
||||
hidden_act: "gelu"
|
||||
hidden_dropout_prob: 0.1
|
||||
attention_probs_dropout_prob: 0.1
|
||||
max_position_embeddings: 512
|
||||
type_vocab_size: 2
|
||||
initializer_range: 0.02
|
||||
use_relative_positions: False
|
||||
dtype: mstype.float32
|
||||
compute_type: mstype.float16
|
||||
|
||||
---
|
||||
# Help description for each configuration
|
||||
enable_modelarts: "Whether training on modelarts, default: False"
|
||||
data_url: "Url for modelarts"
|
||||
train_url: "Url for modelarts"
|
||||
data_path: "The location of the input data."
|
||||
output_path: "The location of the output file."
|
||||
device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
|
||||
enable_profiling: 'Whether enable profiling while training, default: False'
|
||||
|
||||
do_train: "Eable train, default is false"
|
||||
do_eval: "Eable eval, default is false"
|
||||
device_id: "Device id, default is 0."
|
||||
epoch_num: "Epoch number, default is 1."
|
||||
num_class: "The number of class, default is 2."
|
||||
train_data_shuffle: "Enable train data shuffle, default is true"
|
||||
eval_data_shuffle: "Enable eval data shuffle, default is false"
|
||||
train_batch_size: "Train batch size, default is 32"
|
||||
eval_batch_size: "Eval batch size, default is 1"
|
||||
vocab_file_path: "Vocab file path"
|
||||
eval_json_path: "Evaluation json file path, can be eval.json"
|
||||
save_finetune_checkpoint_path: "Save checkpoint path"
|
||||
load_pretrain_checkpoint_path: "Load checkpoint file path"
|
||||
load_finetune_checkpoint_path: "Load checkpoint file path"
|
||||
train_data_file_path: "Data path, it is better to use absolute path"
|
||||
schema_file_path: "Schema path, it is better to use absolute path"
|
||||
|
||||
export_batch_size: "export batch size."
|
||||
export_ckpt_file: "Bert ckpt file."
|
||||
export_file_name: "bert output air name."
|
||||
file_format: "file format"
|
||||
---
|
||||
# chocies
|
||||
device_target: ['Ascend', 'GPU']
|
||||
do_train: ["true", "false"]
|
||||
do_eval: ["true", "false"]
|
||||
train_data_shuffle: ["true", "false"]
|
||||
eval_data_shuffle: ["true", "false"]
|
||||
file_format: ["AIR", "ONNX", "MINDIR"]
|
Loading…
Reference in New Issue