add emotect model for gpu and ascend
This commit is contained in:
parent
e9d2684cdb
commit
1a21b82bd0
|
@ -0,0 +1,196 @@
|
|||
|
||||
# 目录
|
||||
|
||||
- [目录](#目录)
|
||||
- [概述](#概述)
|
||||
- [模型架构](#模型架构)
|
||||
- [数据集](#数据集)
|
||||
- [环境要求](#环境要求)
|
||||
- [快速入门](#快速入门)
|
||||
- [脚本和代码结构](#脚本和代码结构)
|
||||
- [脚本参数](#脚本参数)
|
||||
- [微调与评估](#微调与评估)
|
||||
- [训练过程](#训练过程)
|
||||
- [用法](#用法)
|
||||
- [评估过程](#评估过程)
|
||||
- [用法](#用法-1)
|
||||
- [ModelZoo主页](#modelzoo主页)
|
||||
|
||||
# 概述
|
||||
|
||||
对话情绪识别(Emotion Detection,简称EmoTect),专注于识别智能对话场景中用户的情绪,针对智能对话场景中的用户文本,自动判断该文本的情绪类别并给出相应的置信度,情绪类型分为积极、消极、中性。
|
||||
|
||||
# 模型架构
|
||||
|
||||
ERNIE的主干结构为Transformer。对于ERNIE_base,Transformer包含12个编码器模块,每个模块包含一个自注意模块,每个自注意模块包含一个注意模块。
|
||||
|
||||
# 数据集
|
||||
|
||||
## **百度公开数据集**
|
||||
|
||||
这里我们使用百度提供的一份已标注的、经过分词预处理的机器人聊天数据集,其目录结构如下:
|
||||
|
||||
```text
|
||||
.
|
||||
├── train.tsv # 训练集
|
||||
├── dev.tsv # 验证集
|
||||
├── test.tsv # 测试集
|
||||
├── infer.tsv # 待预测数据
|
||||
├── vocab.txt # 词典
|
||||
|
||||
```
|
||||
|
||||
## **自定义数据**
|
||||
|
||||
数据由两列组成,以制表符('\t')分隔,第一列是情绪分类的类别(0表示消极;1表示中性;2表示积极),第二列是以空格分词的中文文本,如下示例,文件为 utf8 编码。
|
||||
|
||||
```text
|
||||
label text_a
|
||||
0 谁 骂人 了 ? 我 从来 不 骂人 , 我 骂 的 都 不是 人 , 你 是 人 吗 ?
|
||||
1 我 有事 等会儿 就 回来 和 你 聊
|
||||
2 我 见到 你 很高兴 谢谢 你 帮 我
|
||||
```
|
||||
|
||||
# 环境要求
|
||||
|
||||
- 硬件(Ascend/GPU)
|
||||
- 使用Ascend或GPU处理器来搭建硬件环境。
|
||||
- 框架
|
||||
- [MindSpore](https://www.mindspore.cn/install/en)
|
||||
- 如需查看详情,请参见如下资源:
|
||||
- [MindSpore教程](https://www.mindspore.cn/tutorial/training/zh-CN/master/index.html)
|
||||
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
|
||||
|
||||
# 快速入门
|
||||
|
||||
## 脚本说明
|
||||
|
||||
## 脚本和代码结构
|
||||
|
||||
```shell
|
||||
.
|
||||
└─emotect
|
||||
├─README.md
|
||||
├─scripts
|
||||
├─download_data.sh # 下载数据集shell脚本
|
||||
├─download_model.sh # 下载预训练模型权重shell脚本
|
||||
├─paddle_to_mindspore.sh # Paddle模型转为MindSpore权重shell脚本
|
||||
├─run_classifier_finetune_ascend.sh # Ascend上单机finetune任务shell脚本
|
||||
├─run_classifier_eval_ascend.sh # Ascend上单机评估shell脚本
|
||||
├─run_classifier_finetune_gpu.sh # GPU上单机finetune任务shell脚本
|
||||
├─run_classifier_eval_gpu.sh # GPU上单机评估shell脚本
|
||||
└─convert_dataset.sh # 数据集预处理shell脚本
|
||||
├─src
|
||||
├─__init__.py
|
||||
├─assessment_method.py # 评估过程的测评方法
|
||||
├─ernie_for_finetune.py # 网络骨干编码
|
||||
├─ernie_model.py # 网络骨干编码
|
||||
├─dataset.py # 数据预处理
|
||||
├─convert.py # 模型权重转换
|
||||
├─finetune_eval_config.py # 微调参数配置
|
||||
├─finetune_eval_model.py # 网络骨干编码
|
||||
├─reader.py # 数据读取方法
|
||||
├─tokenizer.py # tokenizer函数
|
||||
└─utils.py # util函数
|
||||
└─run_ernie_classifier.py # Emotect模型的微调和评估网络
|
||||
```
|
||||
|
||||
## 脚本参数
|
||||
|
||||
### 微调与评估
|
||||
|
||||
```shell
|
||||
用法:run_ernie_classifier.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [----do_eval DO_EVAL]
|
||||
[--device_id N] [--num_class N]
|
||||
[--train_data_shuffle TRAIN_DATA_SHUFFLE]
|
||||
[--eval_data_shuffle EVAL_DATA_SHUFFLE]
|
||||
[--eval_batch_size N]
|
||||
[--save_finetune_checkpoint_path SAVE_FINETUNE_CHECKPOINT_PATH]
|
||||
[--load_pretrain_checkpoint_path LOAD_PRETRAIN_CHECKPOINT_PATH]
|
||||
[--train_data_file_path TRAIN_DATA_FILE_PATH]
|
||||
[--eval_data_file_path EVAL_DATA_FILE_PATH]
|
||||
[--schema_file_path SCHEMA_FILE_PATH]
|
||||
选项:
|
||||
--device_target 任务运行的目标设备,可选项为Ascend或CPU
|
||||
--do_train 是否基于训练集开始训练,可选项为true或false
|
||||
--do_eval 是否基于开发集开始评估,可选项为true或false
|
||||
--device_id 任务运行的设备ID
|
||||
--epoch_num 训练轮次总数
|
||||
--num_class 标注类的数量
|
||||
--train_data_shuffle 是否使能训练数据集轮换,默认为true
|
||||
--eval_data_shuffle 是否使能评估数据集轮换,默认为true
|
||||
--eval_batch_size 评估的batchsize
|
||||
--save_finetune_checkpoint_path 保存生成微调检查点的路径
|
||||
--load_finetune_checkpoint_path 如仅执行评估,提供微调检查点保存路径
|
||||
--train_data_file_path 用于保存训练数据的TFRecord文件,如train.tfrecord文件
|
||||
--eval_data_file_path 用于保存预测数据的TFRecord文件,如dev.tfrecord
|
||||
--schema_file_path 模式文件保存路径
|
||||
|
||||
```
|
||||
|
||||
## 基于 ERNIE 进行 Finetune
|
||||
|
||||
ERNIE 是百度自研的基于海量数据和先验知识训练的通用文本语义表示模型,基于 ERNIE 进行 Finetune,能够提升对话情绪识别的效果。
|
||||
|
||||
## 训练过程
|
||||
|
||||
### 用法
|
||||
|
||||
#### 数据集下载及预处理
|
||||
|
||||
数据集下载使用如下命令:
|
||||
|
||||
```bash
|
||||
sh script/download_data.sh
|
||||
```
|
||||
|
||||
下载数据后,运行数据格式转换脚本, 将数据集转为MindRecord格式:
|
||||
|
||||
```bash
|
||||
sh scripts/convert_dataset.sh
|
||||
# `convert_dataset.sh` depend on ERNIE vocabulary,
|
||||
# you should download ERNIE model first by:
|
||||
# sh script/download_model.sh
|
||||
```
|
||||
|
||||
#### Ascend处理器或GPU上运行
|
||||
|
||||
EmoTect基于海量数据训练好的对话情绪识别模型(基于TextCNN、ERNIE等模型训练),可供用户直接使用,可通过以下方式下载。
|
||||
|
||||
```shell
|
||||
sh script/download_model.sh
|
||||
```
|
||||
|
||||
预训练模型ERNIE下载后,将其转换为MindSpore可加载权重
|
||||
|
||||
```shell
|
||||
#--input_dir ./pretrain_models/ernie
|
||||
sh script/paddle_to_midnspore.sh
|
||||
# only support x86 platform since Paddle don't support ARM
|
||||
```
|
||||
|
||||
将ERNIE迁移至Mindspore后,执行训练脚本:
|
||||
|
||||
```bash
|
||||
sh scripts/run_classifier_finetune_{platform}.sh
|
||||
# platform: gpu or ascend
|
||||
```
|
||||
|
||||
模型保存在 ```./save_models```。
|
||||
|
||||
## 评估过程
|
||||
|
||||
### 用法
|
||||
|
||||
#### Ascend处理器或GPU上运行后评估
|
||||
|
||||
根据训练结果,可选择最优的step进行评估,修改```scripts/run_classifier_eval.sh``` 脚本中```load_finetune_checkpoint_path``` 参数,然后执行
|
||||
|
||||
```shell
|
||||
sh scripts/run_classifier_eval_{platform}.sh
|
||||
# platform: gpu or ascend
|
||||
```
|
||||
|
||||
# ModelZoo主页
|
||||
|
||||
请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。
|
|
@ -0,0 +1,52 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""export checkpoint file into models"""
|
||||
import argparse
|
||||
import numpy as np
|
||||
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import Tensor, context, load_checkpoint, export
|
||||
|
||||
from src.finetune_eval_config import ernie_net_cfg
|
||||
from src.finetune_eval_model import ErnieCLSModel
|
||||
parser = argparse.ArgumentParser(description="Emotect export")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id")
|
||||
parser.add_argument("--batch_size", type=int, default=32, help="batch size")
|
||||
parser.add_argument("--number_labels", type=int, default=3, help="batch size")
|
||||
parser.add_argument("--ckpt_file", type=str, required=True, help="Bert ckpt file.")
|
||||
parser.add_argument("--file_name", type=str, default="emotect", help="bert output air name.")
|
||||
parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"],
|
||||
default="AIR", help="file format")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend",
|
||||
choices=["Ascend", "GPU", "CPU"], help="device target (default: Ascend)")
|
||||
args = parser.parse_args()
|
||||
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
|
||||
if args.device_target == "Ascend":
|
||||
context.set_context(device_id=args.device_id)
|
||||
|
||||
if __name__ == "__main__":
|
||||
net = ErnieCLSModel(ernie_net_cfg, False, num_labels=args.number_labels)
|
||||
|
||||
load_checkpoint(args.ckpt_file, net=net)
|
||||
net.set_train(False)
|
||||
|
||||
input_ids = Tensor(np.zeros([args.batch_size, ernie_net_cfg.seq_length]), mstype.int32)
|
||||
input_mask = Tensor(np.zeros([args.batch_size, ernie_net_cfg.seq_length]), mstype.int32)
|
||||
token_type_id = Tensor(np.zeros([args.batch_size, ernie_net_cfg.seq_length]), mstype.int32)
|
||||
label_ids = Tensor(np.zeros([args.batch_size, ernie_net_cfg.seq_length]), mstype.int32)
|
||||
|
||||
input_data = [input_ids, input_mask, token_type_id]
|
||||
export(net, *input_data, file_name=args.file_name, file_format=args.file_format)
|
|
@ -0,0 +1,203 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
'''
|
||||
Ernie finetune and evaluation script.
|
||||
'''
|
||||
|
||||
import os
|
||||
import time
|
||||
import argparse
|
||||
from src.ernie_for_finetune import ErnieFinetuneCell, ErnieCLS
|
||||
from src.finetune_eval_config import optimizer_cfg, ernie_net_cfg
|
||||
from src.dataset import create_classification_dataset
|
||||
from src.assessment_method import Accuracy
|
||||
from src.utils import make_directory, LossCallBack, LoadNewestCkpt, ErnieLearningRate
|
||||
import mindspore.common.dtype as mstype
|
||||
from mindspore import context
|
||||
from mindspore import log as logger
|
||||
from mindspore.nn.wrap.loss_scale import DynamicLossScaleUpdateCell
|
||||
from mindspore.nn.optim import Adam, AdamWeightDecay, Adagrad
|
||||
from mindspore.train.model import Model
|
||||
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor
|
||||
from mindspore.train.serialization import load_checkpoint, load_param_into_net
|
||||
|
||||
_cur_dir = os.getcwd()
|
||||
|
||||
def do_train(dataset=None, network=None, load_checkpoint_path="", save_checkpoint_path="", epoch_num=1):
|
||||
""" do train """
|
||||
if load_checkpoint_path == "":
|
||||
raise ValueError("Pretrain model missed, finetune task must load pretrain model!")
|
||||
steps_per_epoch = 500
|
||||
# optimizer
|
||||
if optimizer_cfg.optimizer == 'AdamWeightDecay':
|
||||
lr_schedule = ErnieLearningRate(learning_rate=optimizer_cfg.AdamWeightDecay.learning_rate,
|
||||
end_learning_rate=optimizer_cfg.AdamWeightDecay.end_learning_rate,
|
||||
warmup_steps=int(steps_per_epoch * epoch_num * 0.1),
|
||||
decay_steps=steps_per_epoch * epoch_num,
|
||||
power=optimizer_cfg.AdamWeightDecay.power)
|
||||
params = network.trainable_params()
|
||||
decay_params = list(filter(optimizer_cfg.AdamWeightDecay.decay_filter, params))
|
||||
other_params = list(filter(lambda x: not optimizer_cfg.AdamWeightDecay.decay_filter(x), params))
|
||||
group_params = [{'params': decay_params, 'weight_decay': optimizer_cfg.AdamWeightDecay.weight_decay},
|
||||
{'params': other_params, 'weight_decay': 0.0}]
|
||||
|
||||
optimizer = AdamWeightDecay(group_params, lr_schedule, eps=optimizer_cfg.AdamWeightDecay.eps)
|
||||
elif optimizer_cfg.optimizer == 'Adam':
|
||||
optimizer = Adam(network.trainable_params(), learning_rate=optimizer_cfg.Adam.learning_rate)
|
||||
elif optimizer_cfg.optimizer == 'Adagrad':
|
||||
optimizer = Adagrad(network.trainable_params(), learning_rate=optimizer_cfg.Adagrad.learning_rate)
|
||||
# load checkpoint into network
|
||||
ckpt_config = CheckpointConfig(save_checkpoint_steps=steps_per_epoch, keep_checkpoint_max=10)
|
||||
ckpoint_cb = ModelCheckpoint(prefix="classifier",
|
||||
directory=None if save_checkpoint_path == "" else save_checkpoint_path,
|
||||
config=ckpt_config)
|
||||
param_dict = load_checkpoint(load_checkpoint_path)
|
||||
unloaded_params = load_param_into_net(network, param_dict)
|
||||
if len(unloaded_params) > 2:
|
||||
print(unloaded_params)
|
||||
logger.warning('Loading ernie model failed, please check the checkpoint file.')
|
||||
|
||||
update_cell = DynamicLossScaleUpdateCell(loss_scale_value=2**32, scale_factor=2, scale_window=1000)
|
||||
netwithgrads = ErnieFinetuneCell(network, optimizer=optimizer, scale_update_cell=update_cell)
|
||||
model = Model(netwithgrads)
|
||||
callbacks = [TimeMonitor(dataset.get_dataset_size()), LossCallBack(dataset.get_dataset_size()), ckpoint_cb]
|
||||
model.train(epoch_num, dataset, callbacks=callbacks)
|
||||
|
||||
def do_eval(dataset=None, network=None, num_class=2, load_checkpoint_path=""):
|
||||
""" do eval """
|
||||
if load_checkpoint_path == "":
|
||||
raise ValueError("Finetune model missed, evaluation task must load finetune model!")
|
||||
net_for_pretraining = network(ernie_net_cfg, False, num_class)
|
||||
net_for_pretraining.set_train(False)
|
||||
param_dict = load_checkpoint(load_checkpoint_path)
|
||||
load_param_into_net(net_for_pretraining, param_dict)
|
||||
|
||||
callback = Accuracy()
|
||||
|
||||
evaluate_times = []
|
||||
columns_list = ["input_ids", "input_mask", "segment_ids", "label_ids"]
|
||||
for data in dataset.create_dict_iterator(num_epochs=1):
|
||||
input_data = []
|
||||
for i in columns_list:
|
||||
input_data.append(data[i])
|
||||
input_ids, input_mask, token_type_id, label_ids = input_data
|
||||
time_begin = time.time()
|
||||
logits = net_for_pretraining(input_ids, input_mask, token_type_id, label_ids)
|
||||
time_end = time.time()
|
||||
evaluate_times.append(time_end - time_begin)
|
||||
callback.update(logits, label_ids)
|
||||
print("==============================================================")
|
||||
print("acc_num {} , total_num {}, accuracy {:.6f}".format(callback.acc_num, callback.total_num,
|
||||
callback.acc_num / callback.total_num))
|
||||
print("(w/o first and last) elapsed time: {}, per step time : {}".format(
|
||||
sum(evaluate_times[1:-1]), sum(evaluate_times[1:-1])/(len(evaluate_times) - 2)))
|
||||
print("==============================================================")
|
||||
|
||||
def run_classifier():
|
||||
"""run classifier task"""
|
||||
parser = argparse.ArgumentParser(description="run classifier")
|
||||
parser.add_argument("--device_target", type=str, default="Ascend", choices=["Ascend", "GPU"],
|
||||
help="Device type, default is Ascend")
|
||||
parser.add_argument("--do_train", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable train, default is false")
|
||||
parser.add_argument("--do_eval", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval, default is false")
|
||||
parser.add_argument("--device_id", type=int, default=0, help="Device id, default is 0.")
|
||||
parser.add_argument("--epoch_num", type=int, default=3, help="Epoch number, default is 3.")
|
||||
parser.add_argument("--num_class", type=int, default=3, help="The number of class, default is 3.")
|
||||
parser.add_argument("--train_data_shuffle", type=str, default="true", choices=["true", "false"],
|
||||
help="Enable train data shuffle, default is true")
|
||||
parser.add_argument("--eval_data_shuffle", type=str, default="false", choices=["true", "false"],
|
||||
help="Enable eval data shuffle, default is false")
|
||||
parser.add_argument("--train_batch_size", type=int, default=32, help="Train batch size, default is 32")
|
||||
parser.add_argument("--eval_batch_size", type=int, default=1, help="Eval batch size, default is 1")
|
||||
parser.add_argument("--save_finetune_checkpoint_path", type=str, default="", help="Save checkpoint path")
|
||||
parser.add_argument("--load_pretrain_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--local_pretrain_checkpoint_path", type=str, default="",
|
||||
help="Local pretrain checkpoint file path")
|
||||
parser.add_argument("--load_finetune_checkpoint_path", type=str, default="", help="Load checkpoint file path")
|
||||
parser.add_argument("--train_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--eval_data_file_path", type=str, default="",
|
||||
help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--schema_file_path", type=str, default="",
|
||||
help="Schema path, it is better to use absolute path")
|
||||
parser.add_argument('--data_url', type=str, default=None, help='Dataset path for ModelArts')
|
||||
parser.add_argument('--train_url', type=str, default=None, help='Train output path for ModelArts')
|
||||
parser.add_argument('--modelarts', type=str, default='false',
|
||||
help='train on modelarts or not, default is false')
|
||||
args_opt = parser.parse_args()
|
||||
|
||||
epoch_num = args_opt.epoch_num
|
||||
load_pretrain_checkpoint_path = args_opt.load_pretrain_checkpoint_path
|
||||
save_finetune_checkpoint_path = args_opt.save_finetune_checkpoint_path
|
||||
load_finetune_checkpoint_path = args_opt.load_finetune_checkpoint_path
|
||||
|
||||
if args_opt.modelarts.lower() == 'true':
|
||||
import moxing as mox
|
||||
mox.file.copy_parallel(args_opt.data_url, '/cache/data')
|
||||
mox.file.copy_parallel(args_opt.load_pretrain_checkpoint_path, args_opt.local_pretrain_checkpoint_path)
|
||||
load_pretrain_checkpoint_path = args_opt.local_pretrain_checkpoint_path
|
||||
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "true":
|
||||
mox.file.copy_parallel(args_opt.save_finetune_checkpoint_path, args_opt.load_finetune_checkpoint_path)
|
||||
|
||||
if args_opt.do_train.lower() == "false" and args_opt.do_eval.lower() == "false":
|
||||
raise ValueError("At least one of 'do_train' or 'do_eval' must be true")
|
||||
if args_opt.do_train.lower() == "true" and args_opt.train_data_file_path == "":
|
||||
raise ValueError("'train_data_file_path' must be set when do finetune task")
|
||||
if args_opt.do_eval.lower() == "true" and args_opt.eval_data_file_path == "":
|
||||
raise ValueError("'eval_data_file_path' must be set when do evaluation task")
|
||||
|
||||
target = args_opt.device_target
|
||||
if target == "Ascend":
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=args_opt.device_id)
|
||||
elif target == "GPU":
|
||||
context.set_context(mode=context.GRAPH_MODE, device_target="GPU")
|
||||
if ernie_net_cfg.compute_type != mstype.float32:
|
||||
logger.warning('GPU only support fp32 temporarily, run with fp32.')
|
||||
ernie_net_cfg.compute_type = mstype.float32
|
||||
else:
|
||||
raise Exception("Target error, GPU or Ascend is supported.")
|
||||
|
||||
netwithloss = ErnieCLS(ernie_net_cfg, True, num_labels=args_opt.num_class, dropout_prob=0.1)
|
||||
|
||||
if args_opt.do_train.lower() == "true":
|
||||
ds = create_classification_dataset(batch_size=args_opt.train_batch_size, repeat_count=1,
|
||||
data_file_path=args_opt.train_data_file_path,
|
||||
schema_file_path=args_opt.schema_file_path,
|
||||
do_shuffle=(args_opt.train_data_shuffle.lower() == "true"))
|
||||
do_train(ds, netwithloss, load_pretrain_checkpoint_path, save_finetune_checkpoint_path, epoch_num)
|
||||
|
||||
if args_opt.do_eval.lower() == "true":
|
||||
if save_finetune_checkpoint_path == "":
|
||||
load_finetune_checkpoint_dir = _cur_dir
|
||||
else:
|
||||
load_finetune_checkpoint_dir = make_directory(save_finetune_checkpoint_path)
|
||||
load_finetune_checkpoint_path = LoadNewestCkpt(load_finetune_checkpoint_dir,
|
||||
ds.get_dataset_size(), epoch_num, "classifier")
|
||||
|
||||
if args_opt.do_eval.lower() == "true":
|
||||
ds = create_classification_dataset(batch_size=args_opt.eval_batch_size, repeat_count=1,
|
||||
data_file_path=args_opt.eval_data_file_path,
|
||||
schema_file_path=args_opt.schema_file_path,
|
||||
do_shuffle=(args_opt.eval_data_shuffle.lower() == "true"),
|
||||
drop_remainder=False)
|
||||
do_eval(ds, ErnieCLS, args_opt.num_class, load_finetune_checkpoint_path)
|
||||
|
||||
if args_opt.modelarts.lower() == 'true' and args_opt.do_train.lower() == "true":
|
||||
mox.file.copy_parallel(save_finetune_checkpoint_path, args_opt.train_url)
|
||||
if __name__ == "__main__":
|
||||
run_classifier()
|
|
@ -0,0 +1,46 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
CUR_DIR=`pwd`
|
||||
DATA_PATH=${CUR_DIR}/data
|
||||
MODEL_PATH=${CUR_DIR}/pretrain_models/ernie
|
||||
|
||||
# train dataset
|
||||
python ${CUR_DIR}/src/reader.py \
|
||||
--vocab_path="${MODEL_PATH}/vocab.txt" \
|
||||
--max_seq_len=64 \
|
||||
--do_lower_case="true" \
|
||||
--random_seed=1 \
|
||||
--input_file="${DATA_PATH}/train.tsv" \
|
||||
--output_file="${DATA_PATH}/train.mindrecord"
|
||||
|
||||
# dev dataset
|
||||
python ${CUR_DIR}/src/reader.py \
|
||||
--vocab_path="${MODEL_PATH}/vocab.txt" \
|
||||
--max_seq_len=64 \
|
||||
--do_lower_case="true" \
|
||||
--random_seed=1 \
|
||||
--input_file="${DATA_PATH}/dev.tsv" \
|
||||
--output_file="${DATA_PATH}/dev.mindrecord"
|
||||
|
||||
# train dataset
|
||||
python ${CUR_DIR}/src/reader.py \
|
||||
--vocab_path="${MODEL_PATH}/vocab.txt" \
|
||||
--max_seq_len=64 \
|
||||
--do_lower_case="true" \
|
||||
--random_seed=1 \
|
||||
--input_file="${DATA_PATH}/test.tsv" \
|
||||
--output_file="${DATA_PATH}/test.mindrecord"
|
|
@ -0,0 +1,22 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
# download dataset file to ./data/
|
||||
DATA_URL=https://baidu-nlp.bj.bcebos.com/emotion_detection-dataset-1.0.0.tar.gz
|
||||
wget --no-check-certificate ${DATA_URL}
|
||||
|
||||
tar xvf emotion_detection-dataset-1.0.0.tar.gz
|
||||
/bin/rm emotion_detection-dataset-1.0.0.tar.gz
|
|
@ -0,0 +1,24 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
mkdir -p pretrain_models/ernie
|
||||
cd pretrain_models/ernie
|
||||
|
||||
# download pretrain model file to ./pretrain_models/
|
||||
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz -O ERNIE_stable-1.0.1.tar.gz
|
||||
|
||||
tar -zxvf ERNIE_stable-1.0.1.tar.gz
|
||||
|
||||
/bin/rm ERNIE_stable-1.0.1.tar.gz
|
|
@ -0,0 +1,25 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
CUR_DIR=`pwd`
|
||||
SAVE_PATH=${CUR_DIR}/save_models
|
||||
EXPORT_PATH=${SAVE_PATH}
|
||||
python ${CUR_DIR}/export.py --device_id=0 \
|
||||
--batch_size=32 \
|
||||
--number_labels=3 \
|
||||
--ckpt_file="${SAVE_PATH}/classifier-3_302.ckpt" \
|
||||
--file_name="${EXPORT_PATH}/emotect.mindir" \
|
||||
--file_format="MINDIR" \
|
||||
--device_target="Ascend"
|
|
@ -0,0 +1,21 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
PROJECT_DIR=`pwd`
|
||||
MODEL_PATH=${PROJECT_DIR}/pretrain_models/ernie
|
||||
CONVERT_PATH=${PROJECT_DIR}/pretrain_models/converted
|
||||
python ${PROJECT_DIR}/src/convert.py \
|
||||
--input_dir="${MODEL_PATH}" \
|
||||
--output_dir="${CONVERT_PATH}"
|
|
@ -0,0 +1,33 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
mkdir -p ms_log
|
||||
CUR_DIR=`pwd`
|
||||
DATA_PATH=${CUR_DIR}/data
|
||||
SAVE_PATH=${CUR_DIR}/save_models
|
||||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${CUR_DIR}/run_ernie_classifier.py \
|
||||
--device_target="Ascend" \
|
||||
--do_train="false" \
|
||||
--do_eval="true" \
|
||||
--device_id=0 \
|
||||
--num_class=3 \
|
||||
--train_data_shuffle="true" \
|
||||
--eval_data_shuffle="false" \
|
||||
--eval_batch_size=32 \
|
||||
--load_finetune_checkpoint_path="${SAVE_PATH}/classifier-3_302.ckpt" \
|
||||
--eval_data_file_path="${DATA_PATH}/test.mindrecord" \
|
||||
--schema_file_path="" > ${GLOG_log_dir}/eval_classifier_log.txt 2>&1 &
|
|
@ -0,0 +1,33 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
mkdir -p ms_log
|
||||
CUR_DIR=`pwd`
|
||||
DATA_PATH=${CUR_DIR}/data
|
||||
SAVE_PATH=${CUR_DIR}/save_models
|
||||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${CUR_DIR}/run_ernie_classifier.py \
|
||||
--device_target="GPU" \
|
||||
--do_train="false" \
|
||||
--do_eval="true" \
|
||||
--device_id=0 \
|
||||
--num_class=3 \
|
||||
--train_data_shuffle="true" \
|
||||
--eval_data_shuffle="false" \
|
||||
--eval_batch_size=32 \
|
||||
--load_finetune_checkpoint_path="${SAVE_PATH}/classifier-3_301.ckpt" \
|
||||
--eval_data_file_path="${DATA_PATH}/test.mindrecord" \
|
||||
--schema_file_path="" > ${GLOG_log_dir}/eval_classifier_log.txt 2>&1 &
|
|
@ -0,0 +1,39 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
mkdir -p ms_log
|
||||
mkdir -p save_models
|
||||
CUR_DIR=`pwd`
|
||||
MODEL_PATH=${CUR_DIR}/pretrain_models
|
||||
DATA_PATH=${CUR_DIR}/data
|
||||
SAVE_PATH=${CUR_DIR}/save_models
|
||||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${CUR_DIR}/run_ernie_classifier.py \
|
||||
--device_target="Ascend" \
|
||||
--do_train="true" \
|
||||
--do_eval="true" \
|
||||
--device_id=0 \
|
||||
--epoch_num=3 \
|
||||
--num_class=3 \
|
||||
--train_data_shuffle="true" \
|
||||
--eval_data_shuffle="false" \
|
||||
--train_batch_size=32 \
|
||||
--eval_batch_size=32 \
|
||||
--save_finetune_checkpoint_path="${SAVE_PATH}" \
|
||||
--load_pretrain_checkpoint_path="${MODEL_PATH}/ernie.ckpt" \
|
||||
--train_data_file_path="${DATA_PATH}/train.mindrecord" \
|
||||
--eval_data_file_path="${DATA_PATH}/dev.mindrecord" \
|
||||
--schema_file_path="" > ${GLOG_log_dir}/train_classifier_log.txt 2>&1 &
|
|
@ -0,0 +1,39 @@
|
|||
#!/bin/bash
|
||||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
mkdir -p ms_log
|
||||
mkdir -p save_models
|
||||
CUR_DIR=`pwd`
|
||||
MODEL_PATH=${CUR_DIR}/pretrain_models
|
||||
DATA_PATH=${CUR_DIR}/data
|
||||
SAVE_PATH=${CUR_DIR}/save_models
|
||||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${CUR_DIR}/run_ernie_classifier.py \
|
||||
--device_target="GPU" \
|
||||
--do_train="true" \
|
||||
--do_eval="true" \
|
||||
--device_id=0 \
|
||||
--epoch_num=3 \
|
||||
--num_class=3 \
|
||||
--train_data_shuffle="true" \
|
||||
--eval_data_shuffle="false" \
|
||||
--train_batch_size=32 \
|
||||
--eval_batch_size=32 \
|
||||
--save_finetune_checkpoint_path="${SAVE_PATH}" \
|
||||
--load_pretrain_checkpoint_path="${MODEL_PATH}/ernie.ckpt" \
|
||||
--train_data_file_path="${DATA_PATH}/train.mindrecord" \
|
||||
--eval_data_file_path="${DATA_PATH}/dev.mindrecord" \
|
||||
--schema_file_path="" > ${GLOG_log_dir}/train_classifier_log.txt 2>&1 &
|
|
@ -0,0 +1,34 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
'''
|
||||
Bert evaluation assessment method script.
|
||||
'''
|
||||
import numpy as np
|
||||
|
||||
class Accuracy():
|
||||
'''
|
||||
calculate accuracy
|
||||
'''
|
||||
def __init__(self):
|
||||
self.acc_num = 0
|
||||
self.total_num = 0
|
||||
def update(self, logits, labels):
|
||||
labels = labels.asnumpy()
|
||||
labels = np.reshape(labels, -1)
|
||||
logits = logits.asnumpy()
|
||||
logit_id = np.argmax(logits, axis=-1)
|
||||
self.acc_num += np.sum(labels == logit_id)
|
||||
self.total_num += len(labels)
|
|
@ -0,0 +1,113 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
'''
|
||||
Convert ERNIE from paddle to mindspore.
|
||||
'''
|
||||
import collections
|
||||
import os
|
||||
import json
|
||||
import shutil
|
||||
import argparse
|
||||
import paddle.fluid.dygraph as D
|
||||
from paddle import fluid
|
||||
from mindspore import Tensor
|
||||
from mindspore.train.serialization import save_checkpoint
|
||||
|
||||
def build_params_map(attention_num=12):
|
||||
"""
|
||||
build params map from paddle-paddle's ERNIE to transformer's ernie
|
||||
"""
|
||||
weight_map = collections.OrderedDict({
|
||||
'word_embedding': "ernie.ernie.ernie_embedding_lookup.embedding_table",
|
||||
'pos_embedding': "ernie.ernie.ernie_embedding_postprocessor.full_position_embedding.embedding_table",
|
||||
'sent_embedding': "ernie.ernie.ernie_embedding_postprocessor.token_type_embedding.embedding_table",
|
||||
'pre_encoder_layer_norm_scale': 'ernie.ernie.ernie_embedding_postprocessor.layernorm.gamma',
|
||||
'pre_encoder_layer_norm_bias': 'ernie.ernie.ernie_embedding_postprocessor.layernorm.beta',
|
||||
})
|
||||
# add attention layers
|
||||
for i in range(attention_num):
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_query_fc.w_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.query_layer.weight'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_query_fc.b_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.query_layer.bias'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_key_fc.w_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.key_layer.weight'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_key_fc.b_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.key_layer.bias'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_value_fc.w_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.value_layer.weight'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_value_fc.b_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.attention.value_layer.bias'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_output_fc.w_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.output.dense.weight'
|
||||
weight_map[f'encoder_layer_{i}_multi_head_att_output_fc.b_0'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.output.dense.bias'
|
||||
weight_map[f'encoder_layer_{i}_post_att_layer_norm_scale'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.output.layernorm.gamma'
|
||||
weight_map[f'encoder_layer_{i}_post_att_layer_norm_bias'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.attention.output.layernorm.beta'
|
||||
weight_map[f'encoder_layer_{i}_ffn_fc_0.w_0'] = f'ernie.ernie.ernie_encoder.layers.{i}.intermediate.weight'
|
||||
weight_map[f'encoder_layer_{i}_ffn_fc_0.b_0'] = f'ernie.ernie.ernie_encoder.layers.{i}.intermediate.bias'
|
||||
weight_map[f'encoder_layer_{i}_ffn_fc_1.w_0'] = f'ernie.ernie.ernie_encoder.layers.{i}.output.dense.weight'
|
||||
weight_map[f'encoder_layer_{i}_ffn_fc_1.b_0'] = f'ernie.ernie.ernie_encoder.layers.{i}.output.dense.bias'
|
||||
weight_map[f'encoder_layer_{i}_post_ffn_layer_norm_scale'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.output.layernorm.gamma'
|
||||
weight_map[f'encoder_layer_{i}_post_ffn_layer_norm_bias'] = \
|
||||
f'ernie.ernie.ernie_encoder.layers.{i}.output.layernorm.beta'
|
||||
|
||||
weight_map.update(
|
||||
{
|
||||
'pooled_fc.w_0': 'ernie.ernie.dense.weight',
|
||||
'pooled_fc.b_0': 'ernie.ernie.dense.bias',
|
||||
'cls_out_w': 'ernie.dense_1.weight',
|
||||
'cls_out_b': 'ernie.dense_1.bias'
|
||||
}
|
||||
)
|
||||
return weight_map
|
||||
|
||||
def extract_and_convert(input_dir, output_dir):
|
||||
"""extract ckpt and convert"""
|
||||
if not os.path.exists(output_dir):
|
||||
os.makedirs(output_dir)
|
||||
config = json.load(open(os.path.join(input_dir, 'ernie_config.json'), 'rt', encoding='utf-8'))
|
||||
print('=' * 20 + 'save vocab file' + '=' * 20)
|
||||
shutil.copyfile(os.path.join(input_dir, 'vocab.txt'), os.path.join(output_dir, 'vocab.txt'))
|
||||
print('=' * 20 + 'extract weights' + '=' * 20)
|
||||
state_dict = []
|
||||
weight_map = build_params_map(attention_num=config['num_hidden_layers'])
|
||||
with fluid.dygraph.guard():
|
||||
paddle_paddle_params, _ = D.load_dygraph(os.path.join(input_dir, 'params'))
|
||||
for weight_name, weight_value in paddle_paddle_params.items():
|
||||
if weight_name not in weight_map.keys():
|
||||
continue
|
||||
if 'w_0' in weight_name \
|
||||
or 'post_att_layer_norm_scale' in weight_name \
|
||||
or 'post_ffn_layer_norm_scale' in weight_name \
|
||||
or 'cls_out_w' in weight_name:
|
||||
weight_value = weight_value.transpose()
|
||||
state_dict.append({'name': weight_map[weight_name], 'data': Tensor(weight_value)})
|
||||
print(weight_name, '->', weight_map[weight_name], weight_value.shape)
|
||||
save_checkpoint(state_dict, os.path.join(output_dir, "ernie.ckpt"))
|
||||
|
||||
def run_convert():
|
||||
"""run convert"""
|
||||
parser = argparse.ArgumentParser(description="run convert")
|
||||
parser.add_argument("--input_dir", type=str, default="", help="Pretrained model dir")
|
||||
parser.add_argument("--output_dir", type=str, default="", help="Converted model dir")
|
||||
args_opt = parser.parse_args()
|
||||
extract_and_convert(args_opt.input_dir, args_opt.output_dir)
|
||||
|
||||
if __name__ == '__main__':
|
||||
run_convert()
|
|
@ -0,0 +1,40 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""
|
||||
Data operations, will be used in run_pretrain.py
|
||||
"""
|
||||
import mindspore.common.dtype as mstype
|
||||
import mindspore.dataset as ds
|
||||
import mindspore.dataset.transforms.c_transforms as C
|
||||
|
||||
def create_classification_dataset(batch_size=1,
|
||||
repeat_count=1,
|
||||
data_file_path=None,
|
||||
schema_file_path=None,
|
||||
do_shuffle=True,
|
||||
drop_remainder=True):
|
||||
"""create finetune or evaluation dataset"""
|
||||
type_cast_op = C.TypeCast(mstype.int32)
|
||||
data_set = ds.MindDataset([data_file_path],
|
||||
columns_list=["input_ids", "input_mask", "segment_ids", "label_ids"],
|
||||
shuffle=do_shuffle)
|
||||
data_set = data_set.map(operations=type_cast_op, input_columns="label_ids")
|
||||
data_set = data_set.map(operations=type_cast_op, input_columns="segment_ids")
|
||||
data_set = data_set.map(operations=type_cast_op, input_columns="input_mask")
|
||||
data_set = data_set.map(operations=type_cast_op, input_columns="input_ids")
|
||||
data_set = data_set.repeat(repeat_count)
|
||||
# apply batch operations
|
||||
data_set = data_set.batch(batch_size, drop_remainder=drop_remainder)
|
||||
return data_set
|
|
@ -0,0 +1,197 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
'''
|
||||
Ernie for finetune script.
|
||||
'''
|
||||
|
||||
import mindspore.nn as nn
|
||||
from mindspore.ops import operations as P
|
||||
from mindspore.ops import functional as F
|
||||
from mindspore.ops import composite as C
|
||||
from mindspore.common.tensor import Tensor
|
||||
from mindspore.common.parameter import Parameter
|
||||
from mindspore.common import dtype as mstype
|
||||
from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
|
||||
from mindspore.context import ParallelMode
|
||||
from mindspore.communication.management import get_group_size
|
||||
from mindspore import context
|
||||
from .finetune_eval_model import ErnieCLSModel
|
||||
from .utils import CrossEntropyCalculation
|
||||
|
||||
GRADIENT_CLIP_TYPE = 1
|
||||
GRADIENT_CLIP_VALUE = 1.0
|
||||
grad_scale = C.MultitypeFuncGraph("grad_scale")
|
||||
reciprocal = P.Reciprocal()
|
||||
|
||||
clip_grad = C.MultitypeFuncGraph("clip_grad")
|
||||
|
||||
|
||||
@clip_grad.register("Number", "Number", "Tensor")
|
||||
def _clip_grad(clip_type, clip_value, grad):
|
||||
"""
|
||||
Clip gradients.
|
||||
Inputs:
|
||||
clip_type (int): The way to clip, 0 for 'value', 1 for 'norm'.
|
||||
clip_value (float): Specifies how much to clip.
|
||||
grad (tuple[Tensor]): Gradients.
|
||||
Outputs:
|
||||
tuple[Tensor], clipped gradients.
|
||||
"""
|
||||
if clip_type not in (0, 1):
|
||||
return grad
|
||||
dt = F.dtype(grad)
|
||||
if clip_type == 0:
|
||||
new_grad = C.clip_by_value(grad, F.cast(F.tuple_to_array((-clip_value,)), dt),
|
||||
F.cast(F.tuple_to_array((clip_value,)), dt))
|
||||
else:
|
||||
new_grad = nn.ClipByNorm()(grad, F.cast(F.tuple_to_array((clip_value,)), dt))
|
||||
return new_grad
|
||||
|
||||
@grad_scale.register("Tensor", "Tensor")
|
||||
def tensor_grad_scale(scale, grad):
|
||||
return grad * reciprocal(scale)
|
||||
|
||||
_grad_overflow = C.MultitypeFuncGraph("_grad_overflow")
|
||||
grad_overflow = P.FloatStatus()
|
||||
@_grad_overflow.register("Tensor")
|
||||
def _tensor_grad_overflow(grad):
|
||||
return grad_overflow(grad)
|
||||
|
||||
class ErnieFinetuneCell(nn.Cell):
|
||||
"""
|
||||
Especially defined for finetuning where only four inputs tensor are needed.
|
||||
Append an optimizer to the training network after that the construct
|
||||
function can be called to create the backward graph.
|
||||
Different from the builtin loss_scale wrapper cell, we apply grad_clip before the optimization.
|
||||
Args:
|
||||
network (Cell): The training network. Note that loss function should have been added.
|
||||
optimizer (Optimizer): Optimizer for updating the weights.
|
||||
scale_update_cell (Cell): Cell to do the loss scale. Default: None.
|
||||
"""
|
||||
def __init__(self, network, optimizer, scale_update_cell=None):
|
||||
|
||||
super(ErnieFinetuneCell, self).__init__(auto_prefix=False)
|
||||
self.network = network
|
||||
self.network.set_grad()
|
||||
self.weights = optimizer.parameters
|
||||
self.optimizer = optimizer
|
||||
self.grad = C.GradOperation(get_by_list=True,
|
||||
sens_param=True)
|
||||
self.reducer_flag = False
|
||||
self.allreduce = P.AllReduce()
|
||||
self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
|
||||
if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
|
||||
self.reducer_flag = True
|
||||
self.grad_reducer = None
|
||||
if self.reducer_flag:
|
||||
mean = context.get_auto_parallel_context("gradients_mean")
|
||||
degree = get_group_size()
|
||||
self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
|
||||
self.is_distributed = (self.parallel_mode != ParallelMode.STAND_ALONE)
|
||||
self.cast = P.Cast()
|
||||
self.gpu_target = False
|
||||
if context.get_context("device_target") == "GPU":
|
||||
self.gpu_target = True
|
||||
self.float_status = P.FloatStatus()
|
||||
self.addn = P.AddN()
|
||||
self.reshape = P.Reshape()
|
||||
else:
|
||||
self.alloc_status = P.NPUAllocFloatStatus()
|
||||
self.get_status = P.NPUGetFloatStatus()
|
||||
self.clear_status = P.NPUClearFloatStatus()
|
||||
self.reduce_sum = P.ReduceSum(keep_dims=False)
|
||||
self.base = Tensor(1, mstype.float32)
|
||||
self.less_equal = P.LessEqual()
|
||||
self.hyper_map = C.HyperMap()
|
||||
self.loss_scale = None
|
||||
self.loss_scaling_manager = scale_update_cell
|
||||
if scale_update_cell:
|
||||
self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
|
||||
|
||||
def construct(self,
|
||||
input_ids,
|
||||
input_mask,
|
||||
token_type_id,
|
||||
label_ids,
|
||||
sens=None):
|
||||
"""Ernie Finetune"""
|
||||
|
||||
weights = self.weights
|
||||
init = False
|
||||
loss = self.network(input_ids,
|
||||
input_mask,
|
||||
token_type_id,
|
||||
label_ids)
|
||||
if sens is None:
|
||||
scaling_sens = self.loss_scale
|
||||
else:
|
||||
scaling_sens = sens
|
||||
|
||||
if not self.gpu_target:
|
||||
init = self.alloc_status()
|
||||
init = F.depend(init, loss)
|
||||
clear_status = self.clear_status(init)
|
||||
scaling_sens = F.depend(scaling_sens, clear_status)
|
||||
grads = self.grad(self.network, weights)(input_ids,
|
||||
input_mask,
|
||||
token_type_id,
|
||||
label_ids,
|
||||
self.cast(scaling_sens,
|
||||
mstype.float32))
|
||||
grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads)
|
||||
grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
|
||||
if self.reducer_flag:
|
||||
grads = self.grad_reducer(grads)
|
||||
if not self.gpu_target:
|
||||
init = F.depend(init, grads)
|
||||
get_status = self.get_status(init)
|
||||
init = F.depend(init, get_status)
|
||||
flag_sum = self.reduce_sum(init, (0,))
|
||||
else:
|
||||
flag_sum = self.hyper_map(F.partial(_grad_overflow), grads)
|
||||
flag_sum = self.addn(flag_sum)
|
||||
flag_sum = self.reshape(flag_sum, (()))
|
||||
if self.is_distributed:
|
||||
flag_reduce = self.allreduce(flag_sum)
|
||||
cond = self.less_equal(self.base, flag_reduce)
|
||||
else:
|
||||
cond = self.less_equal(self.base, flag_sum)
|
||||
overflow = cond
|
||||
if sens is None:
|
||||
overflow = self.loss_scaling_manager(self.loss_scale, cond)
|
||||
if overflow:
|
||||
succ = False
|
||||
else:
|
||||
succ = self.optimizer(grads)
|
||||
ret = (loss, cond)
|
||||
return F.depend(ret, succ)
|
||||
|
||||
class ErnieCLS(nn.Cell):
|
||||
"""
|
||||
Train interface for classification finetuning task.
|
||||
"""
|
||||
def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False,
|
||||
assessment_method=""):
|
||||
super(ErnieCLS, self).__init__()
|
||||
self.ernie = ErnieCLSModel(config, is_training, num_labels, dropout_prob, use_one_hot_embeddings)
|
||||
self.loss = CrossEntropyCalculation(is_training)
|
||||
self.num_labels = num_labels
|
||||
self.is_training = is_training
|
||||
|
||||
def construct(self, input_ids, input_mask, token_type_id, label_ids):
|
||||
logits = self.ernie(input_ids, input_mask, token_type_id)
|
||||
loss = self.loss(logits, label_ids, self.num_labels)
|
||||
return loss
|
|
@ -0,0 +1,858 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""Ernie model."""
|
||||
|
||||
import math
|
||||
import copy
|
||||
import numpy as np
|
||||
import mindspore.common.dtype as mstype
|
||||
import mindspore.nn as nn
|
||||
import mindspore.ops.functional as F
|
||||
from mindspore.common.initializer import TruncatedNormal, initializer
|
||||
from mindspore.ops import operations as P
|
||||
from mindspore.ops import composite as C
|
||||
from mindspore.common.tensor import Tensor
|
||||
from mindspore.common.parameter import Parameter
|
||||
|
||||
class ErnieConfig:
|
||||
"""
|
||||
Configuration for `ErnieModel`.
|
||||
Args:
|
||||
seq_length (int): Length of input sequence. Default: 128.
|
||||
vocab_size (int): The shape of each embedding vector. Default: 32000.
|
||||
hidden_size (int): Size of the Ernie encoder layers. Default: 768.
|
||||
num_hidden_layers (int): Number of hidden layers in the ErnieTransformer encoder
|
||||
cell. Default: 12.
|
||||
num_attention_heads (int): Number of attention heads in the ErnieTransformer
|
||||
encoder cell. Default: 12.
|
||||
intermediate_size (int): Size of intermediate layer in the ErnieTransformer
|
||||
encoder cell. Default: 3072.
|
||||
hidden_act (str): Activation function used in the ErnieTransformer encoder
|
||||
cell. Default: "gelu".
|
||||
hidden_dropout_prob (float): The dropout probability for ErnieOutput. Default: 0.1.
|
||||
attention_probs_dropout_prob (float): The dropout probability for
|
||||
ErnieAttention. Default: 0.1.
|
||||
max_position_embeddings (int): Maximum length of sequences used in this
|
||||
model. Default: 512.
|
||||
type_vocab_size (int): Size of token type vocab. Default: 16.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
|
||||
dtype (:class:`mindspore.dtype`): Data type of the input. Default: mstype.float32.
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in ErnieTransformer. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self,
|
||||
seq_length=128,
|
||||
vocab_size=32000,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
hidden_act="gelu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=512,
|
||||
type_vocab_size=16,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=False,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float32):
|
||||
self.seq_length = seq_length
|
||||
self.vocab_size = vocab_size
|
||||
self.hidden_size = hidden_size
|
||||
self.num_hidden_layers = num_hidden_layers
|
||||
self.num_attention_heads = num_attention_heads
|
||||
self.hidden_act = hidden_act
|
||||
self.intermediate_size = intermediate_size
|
||||
self.hidden_dropout_prob = hidden_dropout_prob
|
||||
self.attention_probs_dropout_prob = attention_probs_dropout_prob
|
||||
self.max_position_embeddings = max_position_embeddings
|
||||
self.type_vocab_size = type_vocab_size
|
||||
self.initializer_range = initializer_range
|
||||
self.use_relative_positions = use_relative_positions
|
||||
self.dtype = dtype
|
||||
self.compute_type = compute_type
|
||||
|
||||
|
||||
class EmbeddingLookup(nn.Cell):
|
||||
"""
|
||||
A embeddings lookup table with a fixed dictionary and size.
|
||||
Args:
|
||||
vocab_size (int): Size of the dictionary of embeddings.
|
||||
embedding_size (int): The size of each embedding vector.
|
||||
embedding_shape (list): [batch_size, seq_length, embedding_size], the shape of
|
||||
each embedding vector.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
"""
|
||||
def __init__(self,
|
||||
vocab_size,
|
||||
embedding_size,
|
||||
embedding_shape,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02):
|
||||
super(EmbeddingLookup, self).__init__()
|
||||
self.vocab_size = vocab_size
|
||||
self.use_one_hot_embeddings = use_one_hot_embeddings
|
||||
self.embedding_table = Parameter(initializer
|
||||
(TruncatedNormal(initializer_range),
|
||||
[vocab_size, embedding_size]))
|
||||
self.expand = P.ExpandDims()
|
||||
self.shape_flat = (-1,)
|
||||
self.gather = P.Gather()
|
||||
self.one_hot = P.OneHot()
|
||||
self.on_value = Tensor(1.0, mstype.float32)
|
||||
self.off_value = Tensor(0.0, mstype.float32)
|
||||
self.array_mul = P.MatMul()
|
||||
self.reshape = P.Reshape()
|
||||
self.shape = tuple(embedding_shape)
|
||||
|
||||
def construct(self, input_ids):
|
||||
"""Get output and embeddings lookup table"""
|
||||
extended_ids = self.expand(input_ids, -1)
|
||||
flat_ids = self.reshape(extended_ids, self.shape_flat)
|
||||
if self.use_one_hot_embeddings:
|
||||
one_hot_ids = self.one_hot(flat_ids, self.vocab_size, self.on_value, self.off_value)
|
||||
output_for_reshape = self.array_mul(
|
||||
one_hot_ids, self.embedding_table)
|
||||
else:
|
||||
output_for_reshape = self.gather(self.embedding_table, flat_ids, 0)
|
||||
output = self.reshape(output_for_reshape, self.shape)
|
||||
return output, self.embedding_table
|
||||
|
||||
|
||||
class EmbeddingPostprocessor(nn.Cell):
|
||||
"""
|
||||
Postprocessors apply positional and token type embeddings to word embeddings.
|
||||
Args:
|
||||
embedding_size (int): The size of each embedding vector.
|
||||
embedding_shape (list): [batch_size, seq_length, embedding_size], the shape of
|
||||
each embedding vector.
|
||||
use_token_type (bool): Specifies whether to use token type embeddings. Default: False.
|
||||
token_type_vocab_size (int): Size of token type vocab. Default: 16.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
max_position_embeddings (int): Maximum length of sequences used in this
|
||||
model. Default: 512.
|
||||
dropout_prob (float): The dropout probability. Default: 0.1.
|
||||
"""
|
||||
def __init__(self,
|
||||
embedding_size,
|
||||
embedding_shape,
|
||||
use_relative_positions=False,
|
||||
use_token_type=False,
|
||||
token_type_vocab_size=16,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02,
|
||||
max_position_embeddings=512,
|
||||
dropout_prob=0.1):
|
||||
super(EmbeddingPostprocessor, self).__init__()
|
||||
self.use_token_type = use_token_type
|
||||
self.token_type_vocab_size = token_type_vocab_size
|
||||
self.use_one_hot_embeddings = use_one_hot_embeddings
|
||||
self.max_position_embeddings = max_position_embeddings
|
||||
self.token_type_embedding = nn.Embedding(
|
||||
vocab_size=token_type_vocab_size,
|
||||
embedding_size=embedding_size,
|
||||
use_one_hot=use_one_hot_embeddings)
|
||||
self.shape_flat = (-1,)
|
||||
self.one_hot = P.OneHot()
|
||||
self.on_value = Tensor(1.0, mstype.float32)
|
||||
self.off_value = Tensor(0.1, mstype.float32)
|
||||
self.array_mul = P.MatMul()
|
||||
self.reshape = P.Reshape()
|
||||
self.shape = tuple(embedding_shape)
|
||||
self.dropout = nn.Dropout(1 - dropout_prob)
|
||||
self.gather = P.Gather()
|
||||
self.use_relative_positions = use_relative_positions
|
||||
self.slice = P.StridedSlice()
|
||||
_, seq, _ = self.shape
|
||||
self.full_position_embedding = nn.Embedding(
|
||||
vocab_size=max_position_embeddings,
|
||||
embedding_size=embedding_size,
|
||||
use_one_hot=False)
|
||||
self.layernorm = nn.LayerNorm((embedding_size,))
|
||||
self.position_ids = Tensor(np.arange(seq).reshape(-1, seq).astype(np.int32))
|
||||
self.add = P.Add()
|
||||
|
||||
def construct(self, token_type_ids, word_embeddings):
|
||||
"""Postprocessors apply positional and token type embeddings to word embeddings."""
|
||||
output = word_embeddings
|
||||
if self.use_token_type:
|
||||
token_type_embeddings = self.token_type_embedding(token_type_ids)
|
||||
output = self.add(output, token_type_embeddings)
|
||||
if not self.use_relative_positions:
|
||||
position_embeddings = self.full_position_embedding(self.position_ids)
|
||||
output = self.add(output, position_embeddings)
|
||||
output = self.layernorm(output)
|
||||
output = self.dropout(output)
|
||||
return output
|
||||
|
||||
|
||||
class ErnieOutput(nn.Cell):
|
||||
"""
|
||||
Apply a linear computation to hidden status and a residual computation to input.
|
||||
Args:
|
||||
in_channels (int): Input channels.
|
||||
out_channels (int): Output channels.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
dropout_prob (float): The dropout probability. Default: 0.1.
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in ErnieTransformer. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self,
|
||||
in_channels,
|
||||
out_channels,
|
||||
initializer_range=0.02,
|
||||
dropout_prob=0.1,
|
||||
compute_type=mstype.float32):
|
||||
super(ErnieOutput, self).__init__()
|
||||
self.dense = nn.Dense(in_channels, out_channels,
|
||||
weight_init=TruncatedNormal(initializer_range)).to_float(compute_type)
|
||||
self.dropout = nn.Dropout(1 - dropout_prob)
|
||||
self.dropout_prob = dropout_prob
|
||||
self.add = P.Add()
|
||||
self.layernorm = nn.LayerNorm((out_channels,)).to_float(compute_type)
|
||||
self.cast = P.Cast()
|
||||
|
||||
def construct(self, hidden_status, input_tensor):
|
||||
output = self.dense(hidden_status)
|
||||
output = self.dropout(output)
|
||||
output = self.add(input_tensor, output)
|
||||
output = self.layernorm(output)
|
||||
return output
|
||||
|
||||
|
||||
class RelaPosMatrixGenerator(nn.Cell):
|
||||
"""
|
||||
Generates matrix of relative positions between inputs.
|
||||
Args:
|
||||
length (int): Length of one dim for the matrix to be generated.
|
||||
max_relative_position (int): Max value of relative position.
|
||||
"""
|
||||
def __init__(self, length, max_relative_position):
|
||||
super(RelaPosMatrixGenerator, self).__init__()
|
||||
self._length = length
|
||||
self._max_relative_position = max_relative_position
|
||||
self._min_relative_position = -max_relative_position
|
||||
self.range_length = -length + 1
|
||||
|
||||
self.tile = P.Tile()
|
||||
self.range_mat = P.Reshape()
|
||||
self.sub = P.Sub()
|
||||
self.expanddims = P.ExpandDims()
|
||||
self.cast = P.Cast()
|
||||
|
||||
def construct(self):
|
||||
"""Generates matrix of relative positions between inputs."""
|
||||
range_vec_row_out = self.cast(F.tuple_to_array(F.make_range(self._length)), mstype.int32)
|
||||
range_vec_col_out = self.range_mat(range_vec_row_out, (self._length, -1))
|
||||
tile_row_out = self.tile(range_vec_row_out, (self._length,))
|
||||
tile_col_out = self.tile(range_vec_col_out, (1, self._length))
|
||||
range_mat_out = self.range_mat(tile_row_out, (self._length, self._length))
|
||||
transpose_out = self.range_mat(tile_col_out, (self._length, self._length))
|
||||
distance_mat = self.sub(range_mat_out, transpose_out)
|
||||
|
||||
distance_mat_clipped = C.clip_by_value(distance_mat,
|
||||
self._min_relative_position,
|
||||
self._max_relative_position)
|
||||
|
||||
# Shift values to be >=0. Each integer still uniquely identifies a
|
||||
# relative position difference.
|
||||
final_mat = distance_mat_clipped + self._max_relative_position
|
||||
return final_mat
|
||||
|
||||
|
||||
class RelaPosEmbeddingsGenerator(nn.Cell):
|
||||
"""
|
||||
Generates tensor of size [length, length, depth].
|
||||
Args:
|
||||
length (int): Length of one dim for the matrix to be generated.
|
||||
depth (int): Size of each attention head.
|
||||
max_relative_position (int): Maxmum value of relative position.
|
||||
initializer_range (float): Initialization value of TruncatedNormal.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
"""
|
||||
def __init__(self,
|
||||
length,
|
||||
depth,
|
||||
max_relative_position,
|
||||
initializer_range,
|
||||
use_one_hot_embeddings=False):
|
||||
super(RelaPosEmbeddingsGenerator, self).__init__()
|
||||
self.depth = depth
|
||||
self.vocab_size = max_relative_position * 2 + 1
|
||||
self.use_one_hot_embeddings = use_one_hot_embeddings
|
||||
|
||||
self.embeddings_table = Parameter(
|
||||
initializer(TruncatedNormal(initializer_range),
|
||||
[self.vocab_size, self.depth]))
|
||||
|
||||
self.relative_positions_matrix = RelaPosMatrixGenerator(length=length,
|
||||
max_relative_position=max_relative_position)
|
||||
self.reshape = P.Reshape()
|
||||
self.one_hot = nn.OneHot(depth=self.vocab_size)
|
||||
self.shape = P.Shape()
|
||||
self.gather = P.Gather() # index_select
|
||||
self.matmul = P.BatchMatMul()
|
||||
|
||||
def construct(self):
|
||||
"""Generate embedding for each relative position of dimension depth."""
|
||||
relative_positions_matrix_out = self.relative_positions_matrix()
|
||||
|
||||
if self.use_one_hot_embeddings:
|
||||
flat_relative_positions_matrix = self.reshape(relative_positions_matrix_out, (-1,))
|
||||
one_hot_relative_positions_matrix = self.one_hot(
|
||||
flat_relative_positions_matrix)
|
||||
embeddings = self.matmul(one_hot_relative_positions_matrix, self.embeddings_table)
|
||||
my_shape = self.shape(relative_positions_matrix_out) + (self.depth,)
|
||||
embeddings = self.reshape(embeddings, my_shape)
|
||||
else:
|
||||
embeddings = self.gather(self.embeddings_table,
|
||||
relative_positions_matrix_out, 0)
|
||||
return embeddings
|
||||
|
||||
|
||||
class SaturateCast(nn.Cell):
|
||||
"""
|
||||
Performs a safe saturating cast. This operation applies proper clamping before casting to prevent
|
||||
the danger that the value will overflow or underflow.
|
||||
Args:
|
||||
src_type (:class:`mindspore.dtype`): The type of the elements of the input tensor. Default: mstype.float32.
|
||||
dst_type (:class:`mindspore.dtype`): The type of the elements of the output tensor. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self, src_type=mstype.float32, dst_type=mstype.float32):
|
||||
super(SaturateCast, self).__init__()
|
||||
np_type = mstype.dtype_to_nptype(dst_type)
|
||||
|
||||
self.tensor_min_type = float(np.finfo(np_type).min)
|
||||
self.tensor_max_type = float(np.finfo(np_type).max)
|
||||
|
||||
self.min_op = P.Minimum()
|
||||
self.max_op = P.Maximum()
|
||||
self.cast = P.Cast()
|
||||
self.dst_type = dst_type
|
||||
|
||||
def construct(self, x):
|
||||
out = self.max_op(x, self.tensor_min_type)
|
||||
out = self.min_op(out, self.tensor_max_type)
|
||||
return self.cast(out, self.dst_type)
|
||||
|
||||
|
||||
class ErnieAttention(nn.Cell):
|
||||
"""
|
||||
Apply multi-headed attention from "from_tensor" to "to_tensor".
|
||||
Args:
|
||||
from_tensor_width (int): Size of last dim of from_tensor.
|
||||
to_tensor_width (int): Size of last dim of to_tensor.
|
||||
from_seq_length (int): Length of from_tensor sequence.
|
||||
to_seq_length (int): Length of to_tensor sequence.
|
||||
num_attention_heads (int): Number of attention heads. Default: 1.
|
||||
size_per_head (int): Size of each attention head. Default: 512.
|
||||
query_act (str): Activation function for the query transform. Default: None.
|
||||
key_act (str): Activation function for the key transform. Default: None.
|
||||
value_act (str): Activation function for the value transform. Default: None.
|
||||
has_attention_mask (bool): Specifies whether to use attention mask. Default: False.
|
||||
attention_probs_dropout_prob (float): The dropout probability for
|
||||
ErnieAttention. Default: 0.0.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
do_return_2d_tensor (bool): True for return 2d tensor. False for return 3d
|
||||
tensor. Default: False.
|
||||
use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in ErnieAttention. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self,
|
||||
from_tensor_width,
|
||||
to_tensor_width,
|
||||
from_seq_length,
|
||||
to_seq_length,
|
||||
num_attention_heads=1,
|
||||
size_per_head=512,
|
||||
query_act=None,
|
||||
key_act=None,
|
||||
value_act=None,
|
||||
has_attention_mask=False,
|
||||
attention_probs_dropout_prob=0.0,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02,
|
||||
do_return_2d_tensor=False,
|
||||
use_relative_positions=False,
|
||||
compute_type=mstype.float32):
|
||||
|
||||
super(ErnieAttention, self).__init__()
|
||||
self.from_seq_length = from_seq_length
|
||||
self.to_seq_length = to_seq_length
|
||||
self.num_attention_heads = num_attention_heads
|
||||
self.size_per_head = size_per_head
|
||||
self.has_attention_mask = has_attention_mask
|
||||
self.use_relative_positions = use_relative_positions
|
||||
|
||||
self.scores_mul = 1.0 / math.sqrt(float(self.size_per_head))
|
||||
self.reshape = P.Reshape()
|
||||
self.shape_from_2d = (-1, from_tensor_width)
|
||||
self.shape_to_2d = (-1, to_tensor_width)
|
||||
weight = TruncatedNormal(initializer_range)
|
||||
units = num_attention_heads * size_per_head
|
||||
self.query_layer = nn.Dense(from_tensor_width,
|
||||
units,
|
||||
activation=query_act,
|
||||
weight_init=weight).to_float(compute_type)
|
||||
self.key_layer = nn.Dense(to_tensor_width,
|
||||
units,
|
||||
activation=key_act,
|
||||
weight_init=weight).to_float(compute_type)
|
||||
self.value_layer = nn.Dense(to_tensor_width,
|
||||
units,
|
||||
activation=value_act,
|
||||
weight_init=weight).to_float(compute_type)
|
||||
|
||||
self.shape_from = (-1, from_seq_length, num_attention_heads, size_per_head)
|
||||
self.shape_to = (-1, to_seq_length, num_attention_heads, size_per_head)
|
||||
|
||||
self.matmul_trans_b = P.BatchMatMul(transpose_b=True)
|
||||
self.multiply = P.Mul()
|
||||
self.transpose = P.Transpose()
|
||||
self.trans_shape = (0, 2, 1, 3)
|
||||
self.trans_shape_relative = (2, 0, 1, 3)
|
||||
self.trans_shape_position = (1, 2, 0, 3)
|
||||
self.multiply_data = -10000.0
|
||||
self.matmul = P.BatchMatMul()
|
||||
|
||||
self.softmax = nn.Softmax()
|
||||
self.dropout = nn.Dropout(1 - attention_probs_dropout_prob)
|
||||
|
||||
if self.has_attention_mask:
|
||||
self.expand_dims = P.ExpandDims()
|
||||
self.sub = P.Sub()
|
||||
self.add = P.Add()
|
||||
self.cast = P.Cast()
|
||||
self.get_dtype = P.DType()
|
||||
if do_return_2d_tensor:
|
||||
self.shape_return = (-1, num_attention_heads * size_per_head)
|
||||
else:
|
||||
self.shape_return = (-1, from_seq_length, num_attention_heads * size_per_head)
|
||||
|
||||
self.cast_compute_type = SaturateCast(dst_type=compute_type)
|
||||
if self.use_relative_positions:
|
||||
self._generate_relative_positions_embeddings = \
|
||||
RelaPosEmbeddingsGenerator(length=to_seq_length,
|
||||
depth=size_per_head,
|
||||
max_relative_position=16,
|
||||
initializer_range=initializer_range,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings)
|
||||
|
||||
def construct(self, from_tensor, to_tensor, attention_mask):
|
||||
"""reshape 2d/3d input tensors to 2d"""
|
||||
from_tensor_2d = self.reshape(from_tensor, self.shape_from_2d)
|
||||
to_tensor_2d = self.reshape(to_tensor, self.shape_to_2d)
|
||||
query_out = self.query_layer(from_tensor_2d)
|
||||
key_out = self.key_layer(to_tensor_2d)
|
||||
value_out = self.value_layer(to_tensor_2d)
|
||||
|
||||
query_layer = self.reshape(query_out, self.shape_from)
|
||||
query_layer = self.transpose(query_layer, self.trans_shape)
|
||||
key_layer = self.reshape(key_out, self.shape_to)
|
||||
key_layer = self.transpose(key_layer, self.trans_shape)
|
||||
|
||||
attention_scores = self.matmul_trans_b(query_layer, key_layer)
|
||||
|
||||
# use_relative_position, supplementary logic
|
||||
if self.use_relative_positions:
|
||||
# relations_keys is [F|T, F|T, H]
|
||||
relations_keys = self._generate_relative_positions_embeddings()
|
||||
relations_keys = self.cast_compute_type(relations_keys)
|
||||
# query_layer_t is [F, B, N, H]
|
||||
query_layer_t = self.transpose(query_layer, self.trans_shape_relative)
|
||||
# query_layer_r is [F, B * N, H]
|
||||
query_layer_r = self.reshape(query_layer_t,
|
||||
(self.from_seq_length,
|
||||
-1,
|
||||
self.size_per_head))
|
||||
# key_position_scores is [F, B * N, F|T]
|
||||
key_position_scores = self.matmul_trans_b(query_layer_r,
|
||||
relations_keys)
|
||||
# key_position_scores_r is [F, B, N, F|T]
|
||||
key_position_scores_r = self.reshape(key_position_scores,
|
||||
(self.from_seq_length,
|
||||
-1,
|
||||
self.num_attention_heads,
|
||||
self.from_seq_length))
|
||||
# key_position_scores_r_t is [B, N, F, F|T]
|
||||
key_position_scores_r_t = self.transpose(key_position_scores_r,
|
||||
self.trans_shape_position)
|
||||
attention_scores = attention_scores + key_position_scores_r_t
|
||||
|
||||
attention_scores = self.multiply(self.scores_mul, attention_scores)
|
||||
|
||||
if self.has_attention_mask:
|
||||
attention_mask = self.expand_dims(attention_mask, 1)
|
||||
multiply_out = self.sub(self.cast(F.tuple_to_array((1.0,)), self.get_dtype(attention_scores)),
|
||||
self.cast(attention_mask, self.get_dtype(attention_scores)))
|
||||
|
||||
adder = self.multiply(multiply_out, self.multiply_data)
|
||||
attention_scores = self.add(adder, attention_scores)
|
||||
|
||||
attention_probs = self.softmax(attention_scores)
|
||||
attention_probs = self.dropout(attention_probs)
|
||||
|
||||
value_layer = self.reshape(value_out, self.shape_to)
|
||||
value_layer = self.transpose(value_layer, self.trans_shape)
|
||||
context_layer = self.matmul(attention_probs, value_layer)
|
||||
|
||||
# use_relative_position, supplementary logic
|
||||
if self.use_relative_positions:
|
||||
# relations_values is [F|T, F|T, H]
|
||||
relations_values = self._generate_relative_positions_embeddings()
|
||||
relations_values = self.cast_compute_type(relations_values)
|
||||
# attention_probs_t is [F, B, N, T]
|
||||
attention_probs_t = self.transpose(attention_probs, self.trans_shape_relative)
|
||||
# attention_probs_r is [F, B * N, T]
|
||||
attention_probs_r = self.reshape(
|
||||
attention_probs_t,
|
||||
(self.from_seq_length,
|
||||
-1,
|
||||
self.to_seq_length))
|
||||
# value_position_scores is [F, B * N, H]
|
||||
value_position_scores = self.matmul(attention_probs_r,
|
||||
relations_values)
|
||||
# value_position_scores_r is [F, B, N, H]
|
||||
value_position_scores_r = self.reshape(value_position_scores,
|
||||
(self.from_seq_length,
|
||||
-1,
|
||||
self.num_attention_heads,
|
||||
self.size_per_head))
|
||||
# value_position_scores_r_t is [B, N, F, H]
|
||||
value_position_scores_r_t = self.transpose(value_position_scores_r,
|
||||
self.trans_shape_position)
|
||||
context_layer = context_layer + value_position_scores_r_t
|
||||
|
||||
context_layer = self.transpose(context_layer, self.trans_shape)
|
||||
context_layer = self.reshape(context_layer, self.shape_return)
|
||||
|
||||
return context_layer
|
||||
|
||||
|
||||
class ErnieSelfAttention(nn.Cell):
|
||||
"""
|
||||
Apply self-attention.
|
||||
Args:
|
||||
seq_length (int): Length of input sequence.
|
||||
hidden_size (int): Size of the Ernie encoder layers.
|
||||
num_attention_heads (int): Number of attention heads. Default: 12.
|
||||
attention_probs_dropout_prob (float): The dropout probability for
|
||||
ErnieAttention. Default: 0.1.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one_hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
hidden_dropout_prob (float): The dropout probability for ErnieOutput. Default: 0.1.
|
||||
use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in ErnieSelfAttention. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self,
|
||||
seq_length,
|
||||
hidden_size,
|
||||
num_attention_heads=12,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02,
|
||||
hidden_dropout_prob=0.1,
|
||||
use_relative_positions=False,
|
||||
compute_type=mstype.float32):
|
||||
super(ErnieSelfAttention, self).__init__()
|
||||
if hidden_size % num_attention_heads != 0:
|
||||
raise ValueError("The hidden size (%d) is not a multiple of the number "
|
||||
"of attention heads (%d)" % (hidden_size, num_attention_heads))
|
||||
|
||||
self.size_per_head = int(hidden_size / num_attention_heads)
|
||||
|
||||
self.attention = ErnieAttention(
|
||||
from_tensor_width=hidden_size,
|
||||
to_tensor_width=hidden_size,
|
||||
from_seq_length=seq_length,
|
||||
to_seq_length=seq_length,
|
||||
num_attention_heads=num_attention_heads,
|
||||
size_per_head=self.size_per_head,
|
||||
attention_probs_dropout_prob=attention_probs_dropout_prob,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings,
|
||||
initializer_range=initializer_range,
|
||||
use_relative_positions=use_relative_positions,
|
||||
has_attention_mask=True,
|
||||
do_return_2d_tensor=True,
|
||||
compute_type=compute_type)
|
||||
|
||||
self.output = ErnieOutput(in_channels=hidden_size,
|
||||
out_channels=hidden_size,
|
||||
initializer_range=initializer_range,
|
||||
dropout_prob=hidden_dropout_prob,
|
||||
compute_type=compute_type)
|
||||
self.reshape = P.Reshape()
|
||||
self.shape = (-1, hidden_size)
|
||||
|
||||
def construct(self, input_tensor, attention_mask):
|
||||
input_tensor = self.reshape(input_tensor, self.shape)
|
||||
attention_output = self.attention(input_tensor, input_tensor, attention_mask)
|
||||
output = self.output(attention_output, input_tensor)
|
||||
return output
|
||||
|
||||
|
||||
class ErnieEncoderCell(nn.Cell):
|
||||
"""
|
||||
Encoder cells used in ErnieTransformer.
|
||||
Args:
|
||||
hidden_size (int): Size of the Ernie encoder layers. Default: 768.
|
||||
seq_length (int): Length of input sequence. Default: 512.
|
||||
num_attention_heads (int): Number of attention heads. Default: 12.
|
||||
intermediate_size (int): Size of intermediate layer. Default: 3072.
|
||||
attention_probs_dropout_prob (float): The dropout probability for
|
||||
ErnieAttention. Default: 0.02.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
hidden_dropout_prob (float): The dropout probability for ErnieOutput. Default: 0.1.
|
||||
use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
|
||||
hidden_act (str): Activation function. Default: "gelu".
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in attention. Default: mstype.float32.
|
||||
"""
|
||||
def __init__(self,
|
||||
hidden_size=768,
|
||||
seq_length=512,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
attention_probs_dropout_prob=0.02,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02,
|
||||
hidden_dropout_prob=0.1,
|
||||
use_relative_positions=False,
|
||||
hidden_act="gelu",
|
||||
compute_type=mstype.float32):
|
||||
super(ErnieEncoderCell, self).__init__()
|
||||
self.attention = ErnieSelfAttention(
|
||||
hidden_size=hidden_size,
|
||||
seq_length=seq_length,
|
||||
num_attention_heads=num_attention_heads,
|
||||
attention_probs_dropout_prob=attention_probs_dropout_prob,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings,
|
||||
initializer_range=initializer_range,
|
||||
hidden_dropout_prob=hidden_dropout_prob,
|
||||
use_relative_positions=use_relative_positions,
|
||||
compute_type=compute_type)
|
||||
self.intermediate = nn.Dense(in_channels=hidden_size,
|
||||
out_channels=intermediate_size,
|
||||
activation=hidden_act,
|
||||
weight_init=TruncatedNormal(initializer_range)).to_float(compute_type)
|
||||
self.output = ErnieOutput(in_channels=intermediate_size,
|
||||
out_channels=hidden_size,
|
||||
initializer_range=initializer_range,
|
||||
dropout_prob=hidden_dropout_prob,
|
||||
compute_type=compute_type)
|
||||
|
||||
def construct(self, hidden_states, attention_mask):
|
||||
# self-attention
|
||||
attention_output = self.attention(hidden_states, attention_mask)
|
||||
# feed construct
|
||||
intermediate_output = self.intermediate(attention_output)
|
||||
# add and normalize
|
||||
output = self.output(intermediate_output, attention_output)
|
||||
return output
|
||||
|
||||
|
||||
class ErnieTransformer(nn.Cell):
|
||||
"""
|
||||
Multi-layer Ernie transformer.
|
||||
Args:
|
||||
hidden_size (int): Size of the encoder layers.
|
||||
seq_length (int): Length of input sequence.
|
||||
num_hidden_layers (int): Number of hidden layers in encoder cells.
|
||||
num_attention_heads (int): Number of attention heads in encoder cells. Default: 12.
|
||||
intermediate_size (int): Size of intermediate layer in encoder cells. Default: 3072.
|
||||
attention_probs_dropout_prob (float): The dropout probability for
|
||||
ErnieAttention. Default: 0.1.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
|
||||
hidden_dropout_prob (float): The dropout probability for ErnieOutput. Default: 0.1.
|
||||
use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
|
||||
hidden_act (str): Activation function used in the encoder cells. Default: "gelu".
|
||||
compute_type (:class:`mindspore.dtype`): Compute type in ErnieTransformer. Default: mstype.float32.
|
||||
return_all_encoders (bool): Specifies whether to return all encoders. Default: False.
|
||||
"""
|
||||
def __init__(self,
|
||||
hidden_size,
|
||||
seq_length,
|
||||
num_hidden_layers,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
use_one_hot_embeddings=False,
|
||||
initializer_range=0.02,
|
||||
hidden_dropout_prob=0.1,
|
||||
use_relative_positions=False,
|
||||
hidden_act="gelu",
|
||||
compute_type=mstype.float32,
|
||||
return_all_encoders=False):
|
||||
super(ErnieTransformer, self).__init__()
|
||||
self.return_all_encoders = return_all_encoders
|
||||
|
||||
layers = []
|
||||
for _ in range(num_hidden_layers):
|
||||
layer = ErnieEncoderCell(hidden_size=hidden_size,
|
||||
seq_length=seq_length,
|
||||
num_attention_heads=num_attention_heads,
|
||||
intermediate_size=intermediate_size,
|
||||
attention_probs_dropout_prob=attention_probs_dropout_prob,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings,
|
||||
initializer_range=initializer_range,
|
||||
hidden_dropout_prob=hidden_dropout_prob,
|
||||
use_relative_positions=use_relative_positions,
|
||||
hidden_act=hidden_act,
|
||||
compute_type=compute_type)
|
||||
layers.append(layer)
|
||||
|
||||
self.layers = nn.CellList(layers)
|
||||
|
||||
self.reshape = P.Reshape()
|
||||
self.shape = (-1, hidden_size)
|
||||
self.out_shape = (-1, seq_length, hidden_size)
|
||||
|
||||
def construct(self, input_tensor, attention_mask):
|
||||
"""Multi-layer Ernie transformer."""
|
||||
prev_output = self.reshape(input_tensor, self.shape)
|
||||
|
||||
all_encoder_layers = ()
|
||||
for layer_module in self.layers:
|
||||
layer_output = layer_module(prev_output, attention_mask)
|
||||
prev_output = layer_output
|
||||
|
||||
if self.return_all_encoders:
|
||||
layer_output = self.reshape(layer_output, self.out_shape)
|
||||
all_encoder_layers = all_encoder_layers + (layer_output,)
|
||||
|
||||
if not self.return_all_encoders:
|
||||
prev_output = self.reshape(prev_output, self.out_shape)
|
||||
all_encoder_layers = all_encoder_layers + (prev_output,)
|
||||
return all_encoder_layers
|
||||
|
||||
|
||||
class CreateAttentionMaskFromInputMask(nn.Cell):
|
||||
"""
|
||||
Create attention mask according to input mask.
|
||||
Args:
|
||||
config (Class): Configuration for ErnieModel.
|
||||
"""
|
||||
def __init__(self, config):
|
||||
super(CreateAttentionMaskFromInputMask, self).__init__()
|
||||
self.input_mask = None
|
||||
|
||||
self.cast = P.Cast()
|
||||
self.reshape = P.Reshape()
|
||||
self.shape = (-1, 1, config.seq_length)
|
||||
|
||||
def construct(self, input_mask):
|
||||
attention_mask = self.cast(self.reshape(input_mask, self.shape), mstype.float32)
|
||||
return attention_mask
|
||||
|
||||
|
||||
class ErnieModel(nn.Cell):
|
||||
"""
|
||||
Bidirectional Encoder Representations from Transformers.
|
||||
Args:
|
||||
config (Class): Configuration for ErnieModel.
|
||||
is_training (bool): True for training mode. False for eval mode.
|
||||
use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
|
||||
"""
|
||||
def __init__(self,
|
||||
config,
|
||||
is_training,
|
||||
use_one_hot_embeddings=False):
|
||||
super(ErnieModel, self).__init__()
|
||||
config = copy.deepcopy(config)
|
||||
if not is_training:
|
||||
config.hidden_dropout_prob = 0.0
|
||||
config.attention_probs_dropout_prob = 0.0
|
||||
|
||||
self.seq_length = config.seq_length
|
||||
self.hidden_size = config.hidden_size
|
||||
self.num_hidden_layers = config.num_hidden_layers
|
||||
self.embedding_size = config.hidden_size
|
||||
self.token_type_ids = None
|
||||
|
||||
self.last_idx = self.num_hidden_layers - 1
|
||||
output_embedding_shape = [-1, self.seq_length, self.embedding_size]
|
||||
|
||||
self.ernie_embedding_lookup = nn.Embedding(
|
||||
vocab_size=config.vocab_size,
|
||||
embedding_size=self.embedding_size,
|
||||
use_one_hot=use_one_hot_embeddings)
|
||||
|
||||
self.ernie_embedding_postprocessor = EmbeddingPostprocessor(
|
||||
embedding_size=self.embedding_size,
|
||||
embedding_shape=output_embedding_shape,
|
||||
use_relative_positions=config.use_relative_positions,
|
||||
use_token_type=True,
|
||||
token_type_vocab_size=config.type_vocab_size,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings,
|
||||
initializer_range=0.02,
|
||||
max_position_embeddings=config.max_position_embeddings,
|
||||
dropout_prob=config.hidden_dropout_prob)
|
||||
|
||||
self.ernie_encoder = ErnieTransformer(
|
||||
hidden_size=self.hidden_size,
|
||||
seq_length=self.seq_length,
|
||||
num_attention_heads=config.num_attention_heads,
|
||||
num_hidden_layers=self.num_hidden_layers,
|
||||
intermediate_size=config.intermediate_size,
|
||||
attention_probs_dropout_prob=config.attention_probs_dropout_prob,
|
||||
use_one_hot_embeddings=use_one_hot_embeddings,
|
||||
initializer_range=config.initializer_range,
|
||||
hidden_dropout_prob=config.hidden_dropout_prob,
|
||||
use_relative_positions=config.use_relative_positions,
|
||||
hidden_act=config.hidden_act,
|
||||
compute_type=config.compute_type,
|
||||
return_all_encoders=True)
|
||||
|
||||
self.cast = P.Cast()
|
||||
self.dtype = config.dtype
|
||||
self.cast_compute_type = SaturateCast(dst_type=config.compute_type)
|
||||
self.slice = P.StridedSlice()
|
||||
|
||||
self.squeeze_1 = P.Squeeze(axis=1)
|
||||
self.dense = nn.Dense(self.hidden_size, self.hidden_size,
|
||||
activation="tanh",
|
||||
weight_init=TruncatedNormal(config.initializer_range)).to_float(config.compute_type)
|
||||
self._create_attention_mask_from_input_mask = CreateAttentionMaskFromInputMask(config)
|
||||
|
||||
def construct(self, input_ids, token_type_ids, input_mask):
|
||||
"""Bidirectional Encoder Representations from Transformers."""
|
||||
# embedding
|
||||
word_embeddings = self.ernie_embedding_lookup(input_ids)
|
||||
embedding_output = self.ernie_embedding_postprocessor(token_type_ids,
|
||||
word_embeddings)
|
||||
|
||||
# attention mask [batch_size, seq_length, seq_length]
|
||||
attention_mask = self._create_attention_mask_from_input_mask(input_mask)
|
||||
|
||||
# ernie encoder
|
||||
encoder_output = self.ernie_encoder(self.cast_compute_type(embedding_output),
|
||||
attention_mask)
|
||||
|
||||
sequence_output = self.cast(encoder_output[self.last_idx], self.dtype)
|
||||
|
||||
# pooler
|
||||
batch_size = P.Shape()(input_ids)[0]
|
||||
sequence_slice = self.slice(sequence_output,
|
||||
(0, 0, 0),
|
||||
(batch_size, 1, self.hidden_size),
|
||||
(1, 1, 1))
|
||||
first_token = self.squeeze_1(sequence_slice)
|
||||
pooled_output = self.dense(first_token)
|
||||
pooled_output = self.cast(pooled_output, self.dtype)
|
||||
|
||||
return sequence_output, pooled_output
|
|
@ -0,0 +1,58 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""
|
||||
config settings, will be used in finetune.py
|
||||
"""
|
||||
|
||||
from easydict import EasyDict as edict
|
||||
import mindspore.common.dtype as mstype
|
||||
from .ernie_model import ErnieConfig
|
||||
|
||||
optimizer_cfg = edict({
|
||||
'optimizer': 'AdamWeightDecay',
|
||||
'AdamWeightDecay': edict({
|
||||
'learning_rate': 2e-5,
|
||||
'end_learning_rate': 1e-7,
|
||||
'power': 1.0,
|
||||
'weight_decay': 1e-5,
|
||||
'decay_filter': lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
|
||||
'eps': 1e-6,
|
||||
}),
|
||||
'Adam': edict({
|
||||
'learning_rate': 2e-5
|
||||
}),
|
||||
'Adagrad': edict({
|
||||
'learning_rate': 2e-5
|
||||
})
|
||||
})
|
||||
|
||||
ernie_net_cfg = ErnieConfig(
|
||||
seq_length=64,
|
||||
vocab_size=18000,
|
||||
hidden_size=768,
|
||||
num_hidden_layers=12,
|
||||
num_attention_heads=12,
|
||||
intermediate_size=3072,
|
||||
hidden_act="relu",
|
||||
hidden_dropout_prob=0.1,
|
||||
attention_probs_dropout_prob=0.1,
|
||||
max_position_embeddings=513,
|
||||
type_vocab_size=2,
|
||||
initializer_range=0.02,
|
||||
use_relative_positions=False,
|
||||
dtype=mstype.float32,
|
||||
compute_type=mstype.float16,
|
||||
)
|
|
@ -0,0 +1,57 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
'''
|
||||
Bert finetune and evaluation model script.
|
||||
'''
|
||||
|
||||
import mindspore.nn as nn
|
||||
from mindspore.common.initializer import TruncatedNormal
|
||||
from mindspore.ops import operations as P
|
||||
from .ernie_model import ErnieModel
|
||||
|
||||
class ErnieCLSModel(nn.Cell):
|
||||
"""
|
||||
This class is responsible for classification task evaluation, i.e. XNLI(num_labels=3),
|
||||
LCQMC(num_labels=2), Chnsenti(num_labels=2). The returned output represents the final
|
||||
logits as the results of log_softmax is proportional to that of softmax.
|
||||
"""
|
||||
def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False,
|
||||
assessment_method=""):
|
||||
super(ErnieCLSModel, self).__init__()
|
||||
if not is_training:
|
||||
config.hidden_dropout_prob = 0.0
|
||||
config.hidden_probs_dropout_prob = 0.0
|
||||
self.ernie = ErnieModel(config, is_training, use_one_hot_embeddings)
|
||||
self.cast = P.Cast()
|
||||
self.weight_init = TruncatedNormal(config.initializer_range)
|
||||
self.log_softmax = P.LogSoftmax(axis=-1)
|
||||
self.dtype = config.dtype
|
||||
self.num_labels = num_labels
|
||||
self.dense_1 = nn.Dense(config.hidden_size, self.num_labels, weight_init=self.weight_init,
|
||||
has_bias=True).to_float(config.compute_type)
|
||||
self.dropout = nn.Dropout(1 - dropout_prob)
|
||||
self.assessment_method = assessment_method
|
||||
|
||||
def construct(self, input_ids, input_mask, token_type_id):
|
||||
_, pooled_output = \
|
||||
self.ernie(input_ids, token_type_id, input_mask)
|
||||
cls = self.cast(pooled_output, self.dtype)
|
||||
cls = self.dropout(cls)
|
||||
logits = self.dense_1(cls)
|
||||
logits = self.cast(logits, self.dtype)
|
||||
if self.assessment_method != "spearman_correlation":
|
||||
logits = self.log_softmax(logits)
|
||||
return logits
|
|
@ -0,0 +1,285 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
'''
|
||||
Dataset reader for preprocessing and converting dataset into MindRecord.
|
||||
'''
|
||||
|
||||
import io
|
||||
import argparse
|
||||
import collections
|
||||
import six
|
||||
import numpy as np
|
||||
from mindspore.mindrecord import FileWriter
|
||||
from mindspore.log import logging
|
||||
from tokenizer import FullTokenizer
|
||||
|
||||
|
||||
def csv_reader(fd, delimiter='\t'):
|
||||
"""
|
||||
csv 文件读取
|
||||
"""
|
||||
def gen():
|
||||
for i in fd:
|
||||
slots = i.rstrip('\n').split(delimiter)
|
||||
if len(slots) == 1:
|
||||
yield (slots,)
|
||||
else:
|
||||
yield slots
|
||||
return gen()
|
||||
|
||||
def convert_to_unicode(text):
|
||||
"""Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
|
||||
if six.PY3:
|
||||
if isinstance(text, str):
|
||||
text = text
|
||||
elif isinstance(text, bytes):
|
||||
text = text.decode("utf-8", "ignore")
|
||||
else:
|
||||
raise ValueError("Unsupported string type: %s" % (type(text)))
|
||||
elif six.PY2:
|
||||
if isinstance(text, str):
|
||||
text = text.decode("utf-8", "ignore")
|
||||
elif isinstance(text, unicode):
|
||||
text = text
|
||||
else:
|
||||
raise ValueError("Unsupported string type: %s" % (type(text)))
|
||||
else:
|
||||
raise ValueError("Not running on Python2 or Python 3?")
|
||||
return text
|
||||
|
||||
class BaseReader:
|
||||
"""BaseReader for classify and sequence labeling task"""
|
||||
|
||||
def __init__(self,
|
||||
vocab_path,
|
||||
label_map_config=None,
|
||||
max_seq_len=512,
|
||||
do_lower_case=True,
|
||||
in_tokens=False,
|
||||
random_seed=None):
|
||||
self.max_seq_len = max_seq_len
|
||||
self.tokenizer = FullTokenizer(
|
||||
vocab_file=vocab_path, do_lower_case=do_lower_case)
|
||||
self.vocab = self.tokenizer.vocab
|
||||
self.pad_id = self.vocab["[PAD]"]
|
||||
self.cls_id = self.vocab["[CLS]"]
|
||||
self.sep_id = self.vocab["[SEP]"]
|
||||
self.in_tokens = in_tokens
|
||||
|
||||
np.random.seed(random_seed)
|
||||
|
||||
self.current_example = 0
|
||||
self.current_epoch = 0
|
||||
self.num_examples = 0
|
||||
|
||||
if label_map_config:
|
||||
with open(label_map_config) as f:
|
||||
self.label_map = json.load(f)
|
||||
else:
|
||||
self.label_map = None
|
||||
|
||||
def _read_tsv(self, input_file, quotechar=None):
|
||||
"""Reads a tab separated value file."""
|
||||
with io.open(input_file, "r", encoding="utf8") as f:
|
||||
reader = csv_reader(f, delimiter="\t")
|
||||
headers = next(reader)
|
||||
Example = collections.namedtuple('Example', headers)
|
||||
|
||||
examples = []
|
||||
for line in reader:
|
||||
example = Example(*line)
|
||||
examples.append(example)
|
||||
return examples
|
||||
|
||||
def _truncate_seq_pair(self, tokens_a, tokens_b, max_length):
|
||||
"""Truncates a sequence pair in place to the maximum length."""
|
||||
|
||||
# This is a simple heuristic which will always truncate the longer sequence
|
||||
# one token at a time. This makes more sense than truncating an equal percent
|
||||
# of tokens from each, since if one sequence is very short then each token
|
||||
# that's truncated likely contains more information than a longer sequence.
|
||||
while True:
|
||||
total_length = len(tokens_a) + len(tokens_b)
|
||||
if total_length <= max_length:
|
||||
break
|
||||
if len(tokens_a) > len(tokens_b):
|
||||
tokens_a.pop()
|
||||
else:
|
||||
tokens_b.pop()
|
||||
|
||||
def _convert_example_to_record(self, example, max_seq_length, tokenizer):
|
||||
"""Converts a single `Example` into a single `Record`."""
|
||||
|
||||
text_a = convert_to_unicode(example.text_a)
|
||||
tokens_a = tokenizer.tokenize(text_a)
|
||||
tokens_b = None
|
||||
if "text_b" in example._fields:
|
||||
text_b = convert_to_unicode(example.text_b)
|
||||
tokens_b = tokenizer.tokenize(text_b)
|
||||
|
||||
if tokens_b:
|
||||
# Modifies `tokens_a` and `tokens_b` in place so that the total
|
||||
# length is less than the specified length.
|
||||
# Account for [CLS], [SEP], [SEP] with "- 3"
|
||||
self._truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
|
||||
else:
|
||||
# Account for [CLS] and [SEP] with "- 2"
|
||||
if len(tokens_a) > max_seq_length - 2:
|
||||
tokens_a = tokens_a[0:(max_seq_length - 2)]
|
||||
|
||||
# The convention in BERT/ERNIE is:
|
||||
# (a) For sequence pairs:
|
||||
# tokens: [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
|
||||
# type_ids: 0 0 0 0 0 0 0 0 1 1 1 1 1 1
|
||||
# (b) For single sequences:
|
||||
# tokens: [CLS] the dog is hairy . [SEP]
|
||||
# type_ids: 0 0 0 0 0 0 0
|
||||
#
|
||||
# Where "type_ids" are used to indicate whether this is the first
|
||||
# sequence or the second sequence. The embedding vectors for `type=0` and
|
||||
# `type=1` were learned during pre-training and are added to the wordpiece
|
||||
# embedding vector (and position vector). This is not *strictly* necessary
|
||||
# since the [SEP] token unambiguously separates the sequences, but it makes
|
||||
# it easier for the model to learn the concept of sequences.
|
||||
#
|
||||
# For classification tasks, the first vector (corresponding to [CLS]) is
|
||||
# used as as the "sentence vector". Note that this only makes sense because
|
||||
# the entire model is fine-tuned.
|
||||
tokens = []
|
||||
segment_ids = []
|
||||
tokens.append("[CLS]")
|
||||
segment_ids.append(0)
|
||||
for token in tokens_a:
|
||||
tokens.append(token)
|
||||
segment_ids.append(0)
|
||||
tokens.append("[SEP]")
|
||||
segment_ids.append(0)
|
||||
|
||||
if tokens_b:
|
||||
for token in tokens_b:
|
||||
tokens.append(token)
|
||||
segment_ids.append(1)
|
||||
tokens.append("[SEP]")
|
||||
segment_ids.append(1)
|
||||
|
||||
input_ids = tokenizer.convert_tokens_to_ids(tokens)
|
||||
|
||||
input_mask = [1] * len(input_ids)
|
||||
|
||||
while len(input_ids) < max_seq_length:
|
||||
input_ids.append(0)
|
||||
input_mask.append(0)
|
||||
segment_ids.append(0)
|
||||
|
||||
if self.label_map:
|
||||
label_id = self.label_map[example.label]
|
||||
else:
|
||||
label_id = example.label
|
||||
|
||||
Record = collections.namedtuple(
|
||||
'Record',
|
||||
['input_ids', 'input_mask', 'segment_ids', 'label_id'])
|
||||
|
||||
record = Record(
|
||||
input_ids=input_ids,
|
||||
input_mask=input_mask,
|
||||
segment_ids=segment_ids,
|
||||
label_id=label_id)
|
||||
return record
|
||||
|
||||
def get_num_examples(self, input_file):
|
||||
"""return total number of examples"""
|
||||
examples = self._read_tsv(input_file)
|
||||
return len(examples)
|
||||
|
||||
def get_examples(self, input_file):
|
||||
examples = self._read_tsv(input_file)
|
||||
return examples
|
||||
|
||||
def file_based_convert_examples_to_features(self, input_file, output_file):
|
||||
""""Convert a set of `InputExample`s to a MindDataset file."""
|
||||
examples = self._read_tsv(input_file)
|
||||
|
||||
writer = FileWriter(file_name=output_file, shard_num=1)
|
||||
nlp_schema = {
|
||||
"input_ids": {"type": "int64", "shape": [-1]},
|
||||
"input_mask": {"type": "int64", "shape": [-1]},
|
||||
"segment_ids": {"type": "int64", "shape": [-1]},
|
||||
"label_ids": {"type": "int64", "shape": [-1]},
|
||||
}
|
||||
writer.add_schema(nlp_schema, "proprocessed classification dataset")
|
||||
data = []
|
||||
for index, example in enumerate(examples):
|
||||
if index % 10000 == 0:
|
||||
logging.info("Writing example %d of %d" % (index, len(examples)))
|
||||
record = self._convert_example_to_record(example, self.max_seq_len, self.tokenizer)
|
||||
sample = {
|
||||
"input_ids": np.array(record.input_ids, dtype=np.int64),
|
||||
"input_mask": np.array(record.input_mask, dtype=np.int64),
|
||||
"segment_ids": np.array(record.segment_ids, dtype=np.int64),
|
||||
"label_ids": np.array([record.label_id], dtype=np.int64),
|
||||
}
|
||||
data.append(sample)
|
||||
writer.write_raw_data(data)
|
||||
writer.commit()
|
||||
|
||||
class ClassifyReader(BaseReader):
|
||||
"""ClassifyReader"""
|
||||
|
||||
def _read_tsv(self, input_file, quotechar=None):
|
||||
"""Reads a tab separated value file."""
|
||||
with io.open(input_file, "r", encoding="utf8") as f:
|
||||
reader = csv_reader(f, delimiter="\t")
|
||||
headers = next(reader)
|
||||
text_indices = [
|
||||
index for index, h in enumerate(headers) if h != "label"
|
||||
]
|
||||
Example = collections.namedtuple('Example', headers)
|
||||
|
||||
examples = []
|
||||
for line in reader:
|
||||
for index, text in enumerate(line):
|
||||
if index in text_indices:
|
||||
line[index] = text.replace(' ', '')
|
||||
example = Example(*line)
|
||||
examples.append(example)
|
||||
return examples
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="read dataset and save it to minddata")
|
||||
parser.add_argument("--vocab_path", type=str, default="", help="vocab file")
|
||||
parser.add_argument("--label_map_config", type=str, default=None, help="label mapping config file")
|
||||
parser.add_argument("--max_seq_len", type=int, default=128,
|
||||
help="The maximum total input sequence length after WordPiece tokenization. "
|
||||
"Sequences longer than this will be truncated, and sequences shorter "
|
||||
"than this will be padded.")
|
||||
parser.add_argument("--do_lower_case", type=bool, default=True,
|
||||
help="Whether to lower case the input text. "
|
||||
"Should be True for uncased models and False for cased models.")
|
||||
parser.add_argument("--random_seed", type=int, default=0, help="random seed number")
|
||||
parser.add_argument("--input_file", type=str, default="", help="raw data file")
|
||||
parser.add_argument("--output_file", type=str, default="", help="minddata file")
|
||||
args_opt = parser.parse_args()
|
||||
reader = ClassifyReader(
|
||||
vocab_path=args_opt.vocab_path,
|
||||
label_map_config=args_opt.label_map_config,
|
||||
max_seq_len=args_opt.max_seq_len,
|
||||
do_lower_case=args_opt.do_lower_case,
|
||||
random_seed=args_opt.random_seed
|
||||
)
|
||||
reader.file_based_convert_examples_to_features(input_file=args_opt.input_file, output_file=args_opt.output_file)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -0,0 +1,305 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
'''
|
||||
Tokenizer for ERNIE
|
||||
'''
|
||||
|
||||
import io
|
||||
import collections
|
||||
import unicodedata
|
||||
import six
|
||||
|
||||
class FullTokenizer:
|
||||
"""Runs end-to-end tokenziation."""
|
||||
|
||||
def __init__(self, vocab_file, do_lower_case=True):
|
||||
self.vocab = load_vocab(vocab_file)
|
||||
self.inv_vocab = {v: k for k, v in self.vocab.items()}
|
||||
self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)
|
||||
self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
|
||||
|
||||
def tokenize(self, text):
|
||||
split_tokens = []
|
||||
for token in self.basic_tokenizer.tokenize(text):
|
||||
for sub_token in self.wordpiece_tokenizer.tokenize(token):
|
||||
split_tokens.append(sub_token)
|
||||
|
||||
return split_tokens
|
||||
|
||||
def convert_tokens_to_ids(self, tokens):
|
||||
return convert_by_vocab(self.vocab, tokens)
|
||||
|
||||
class BasicTokenizer:
|
||||
"""Runs basic tokenization (punctuation splitting, lower casing, etc.)."""
|
||||
|
||||
def __init__(self, do_lower_case=True):
|
||||
"""Constructs a BasicTokenizer.
|
||||
Args:
|
||||
do_lower_case: Whether to lower case the input.
|
||||
"""
|
||||
self.do_lower_case = do_lower_case
|
||||
|
||||
def tokenize(self, text):
|
||||
"""Tokenizes a piece of text."""
|
||||
text = convert_to_unicode(text)
|
||||
text = self._clean_text(text)
|
||||
|
||||
# This was added on November 1st, 2018 for the multilingual and Chinese
|
||||
# models. This is also applied to the English models now, but it doesn't
|
||||
# matter since the English models were not trained on any Chinese data
|
||||
# and generally don't have any Chinese data in them (there are Chinese
|
||||
# characters in the vocabulary because Wikipedia does have some Chinese
|
||||
# words in the English Wikipedia.).
|
||||
text = self._tokenize_chinese_chars(text)
|
||||
|
||||
orig_tokens = whitespace_tokenize(text)
|
||||
split_tokens = []
|
||||
for token in orig_tokens:
|
||||
if self.do_lower_case:
|
||||
token = token.lower()
|
||||
token = self._run_strip_accents(token)
|
||||
split_tokens.extend(self._run_split_on_punc(token))
|
||||
|
||||
output_tokens = whitespace_tokenize(" ".join(split_tokens))
|
||||
return output_tokens
|
||||
|
||||
def _run_strip_accents(self, text):
|
||||
"""Strips accents from a piece of text."""
|
||||
text = unicodedata.normalize("NFD", text)
|
||||
output = []
|
||||
for char in text:
|
||||
cat = unicodedata.category(char)
|
||||
if cat == "Mn":
|
||||
continue
|
||||
output.append(char)
|
||||
return "".join(output)
|
||||
|
||||
def _run_split_on_punc(self, text):
|
||||
"""Splits punctuation on a piece of text."""
|
||||
chars = list(text)
|
||||
i = 0
|
||||
start_new_word = True
|
||||
output = []
|
||||
while i < len(chars):
|
||||
char = chars[i]
|
||||
if _is_punctuation(char):
|
||||
output.append([char])
|
||||
start_new_word = True
|
||||
else:
|
||||
if start_new_word:
|
||||
output.append([])
|
||||
start_new_word = False
|
||||
output[-1].append(char)
|
||||
i += 1
|
||||
|
||||
return ["".join(x) for x in output]
|
||||
|
||||
def _tokenize_chinese_chars(self, text):
|
||||
"""Adds whitespace around any CJK character."""
|
||||
output = []
|
||||
for char in text:
|
||||
cp = ord(char)
|
||||
if self._is_chinese_char(cp):
|
||||
output.append(" ")
|
||||
output.append(char)
|
||||
output.append(" ")
|
||||
else:
|
||||
output.append(char)
|
||||
return "".join(output)
|
||||
|
||||
def _is_chinese_char(self, cp):
|
||||
"""Checks whether CP is the codepoint of a CJK character."""
|
||||
# This defines a "chinese character" as anything in the CJK Unicode block:
|
||||
# https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
|
||||
#
|
||||
# Note that the CJK Unicode block is NOT all Japanese and Korean characters,
|
||||
# despite its name. The modern Korean Hangul alphabet is a different block,
|
||||
# as is Japanese Hiragana and Katakana. Those alphabets are used to write
|
||||
# space-separated words, so they are not treated specially and handled
|
||||
# like the all of the other languages.
|
||||
if ((0x4E00 <= cp <= 0x9FFF) or
|
||||
(0x3400 <= cp <= 0x4DBF) or
|
||||
(0x20000 <= cp <= 0x2A6DF) or
|
||||
(0x2A700 <= cp <= 0x2B73F) or
|
||||
(0x2B740 <= cp <= 0x2B81F) or
|
||||
(0x2B820 <= cp <= 0x2CEAF) or
|
||||
(0xF900 <= cp <= 0xFAFF) or
|
||||
(0x2F800 <= cp <= 0x2FA1F)):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _clean_text(self, text):
|
||||
"""Performs invalid character removal and whitespace cleanup on text."""
|
||||
output = []
|
||||
for char in text:
|
||||
cp = ord(char)
|
||||
if cp == 0 or cp == 0xfffd or _is_control(char):
|
||||
continue
|
||||
if _is_whitespace(char):
|
||||
output.append(" ")
|
||||
else:
|
||||
output.append(char)
|
||||
return "".join(output)
|
||||
|
||||
class WordpieceTokenizer:
|
||||
"""Runs WordPiece tokenziation."""
|
||||
|
||||
def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
|
||||
self.vocab = vocab
|
||||
self.unk_token = unk_token
|
||||
self.max_input_chars_per_word = max_input_chars_per_word
|
||||
|
||||
def tokenize(self, text):
|
||||
"""Tokenizes a piece of text into its word pieces.
|
||||
This uses a greedy longest-match-first algorithm to perform tokenization
|
||||
using the given vocabulary.
|
||||
For example:
|
||||
input = "unaffable"
|
||||
output = ["un", "##aff", "##able"]
|
||||
Args:
|
||||
text: A single token or whitespace separated tokens. This should have
|
||||
already been passed through `BasicTokenizer.
|
||||
Returns:
|
||||
A list of wordpiece tokens.
|
||||
"""
|
||||
|
||||
text = convert_to_unicode(text)
|
||||
|
||||
output_tokens = []
|
||||
for token in whitespace_tokenize(text):
|
||||
chars = list(token)
|
||||
if len(chars) > self.max_input_chars_per_word:
|
||||
output_tokens.append(self.unk_token)
|
||||
continue
|
||||
|
||||
is_bad = False
|
||||
start = 0
|
||||
sub_tokens = []
|
||||
while start < len(chars):
|
||||
end = len(chars)
|
||||
cur_substr = None
|
||||
while start < end:
|
||||
substr = "".join(chars[start:end])
|
||||
if start > 0:
|
||||
substr = "##" + substr
|
||||
if substr in self.vocab:
|
||||
cur_substr = substr
|
||||
break
|
||||
end -= 1
|
||||
if cur_substr is None:
|
||||
is_bad = True
|
||||
break
|
||||
sub_tokens.append(cur_substr)
|
||||
start = end
|
||||
|
||||
if is_bad:
|
||||
output_tokens.append(self.unk_token)
|
||||
else:
|
||||
output_tokens.extend(sub_tokens)
|
||||
return output_tokens
|
||||
|
||||
def convert_to_unicode(text):
|
||||
"""Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
|
||||
if six.PY3:
|
||||
if isinstance(text, str):
|
||||
text = text
|
||||
elif isinstance(text, bytes):
|
||||
text = text.decode("utf-8", "ignore")
|
||||
else:
|
||||
raise ValueError("Unsupported string type: %s" % (type(text)))
|
||||
elif six.PY2:
|
||||
if isinstance(text, str):
|
||||
text = text.decode("utf-8", "ignore")
|
||||
elif isinstance(text, unicode):
|
||||
text = text
|
||||
else:
|
||||
raise ValueError("Unsupported string type: %s" % (type(text)))
|
||||
else:
|
||||
raise ValueError("Not running on Python2 or Python 3?")
|
||||
return text
|
||||
|
||||
def load_vocab(vocab_file):
|
||||
"""Loads a vocabulary file into a dictionary."""
|
||||
vocab = collections.OrderedDict()
|
||||
fin = io.open(vocab_file, encoding="utf8")
|
||||
for num, line in enumerate(fin):
|
||||
items = convert_to_unicode(line.strip()).split("\t")
|
||||
if len(items) > 2:
|
||||
break
|
||||
token = items[0]
|
||||
index = items[1] if len(items) == 2 else num
|
||||
token = token.strip()
|
||||
vocab[token] = int(index)
|
||||
return vocab
|
||||
|
||||
|
||||
def convert_by_vocab(vocab, items):
|
||||
"""Converts a sequence of [tokens|ids] using the vocab."""
|
||||
output = []
|
||||
for item in items:
|
||||
output.append(vocab[item])
|
||||
return output
|
||||
|
||||
|
||||
def whitespace_tokenize(text):
|
||||
"""Runs basic whitespace cleaning and splitting on a piece of text."""
|
||||
text = text.strip()
|
||||
if not text:
|
||||
return []
|
||||
tokens = text.split()
|
||||
return tokens
|
||||
|
||||
|
||||
def _is_whitespace(char):
|
||||
"""Checks whether `chars` is a whitespace character."""
|
||||
# \t, \n, and \r are technically control characters but we treat them
|
||||
# as whitespace since they are generally considered as such.
|
||||
if char in (" ", "\t", "\n", "\r"):
|
||||
return True
|
||||
cat = unicodedata.category(char)
|
||||
if cat == "Zs":
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _is_control(char):
|
||||
"""Checks whether `chars` is a control character."""
|
||||
# These are technically control characters but we count them as whitespace
|
||||
# characters.
|
||||
if char in ("\t", "\n", "\r"):
|
||||
return False
|
||||
cat = unicodedata.category(char)
|
||||
if cat.startswith("C"):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _is_punctuation(char):
|
||||
"""Checks whether `chars` is a punctuation character."""
|
||||
cp = ord(char)
|
||||
# We treat all non-letter/number ASCII as punctuation.
|
||||
# Characters such as "^", "$", and "`" are not in the Unicode
|
||||
# Punctuation class but we treat them as punctuation anyways, for
|
||||
# consistency.
|
||||
if ((33 <= cp <= 47) or
|
||||
(58 <= cp <= 64) or
|
||||
(91 <= cp <= 96) or
|
||||
(123 <= cp <= 126)):
|
||||
return True
|
||||
cat = unicodedata.category(char)
|
||||
if cat.startswith("P"):
|
||||
return True
|
||||
return False
|
|
@ -0,0 +1,230 @@
|
|||
# Copyright 2021 Huawei Technologies Co., Ltd
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ============================================================================
|
||||
|
||||
"""
|
||||
Functional Cells used in Ernie finetune and evaluation.
|
||||
"""
|
||||
|
||||
import os
|
||||
import math
|
||||
import collections
|
||||
import numpy as np
|
||||
import mindspore.nn as nn
|
||||
from mindspore import log as logger
|
||||
from mindspore.ops import operations as P
|
||||
from mindspore.common.tensor import Tensor
|
||||
from mindspore.common import dtype as mstype
|
||||
from mindspore.train.callback import Callback
|
||||
from mindspore.nn.learning_rate_schedule import LearningRateSchedule, PolynomialDecayLR, WarmUpLR
|
||||
|
||||
|
||||
class CrossEntropyCalculation(nn.Cell):
|
||||
"""
|
||||
Cross Entropy loss
|
||||
"""
|
||||
def __init__(self, is_training=True):
|
||||
super(CrossEntropyCalculation, self).__init__()
|
||||
self.onehot = P.OneHot()
|
||||
self.on_value = Tensor(1.0, mstype.float32)
|
||||
self.off_value = Tensor(0.0, mstype.float32)
|
||||
self.reduce_sum = P.ReduceSum()
|
||||
self.reduce_mean = P.ReduceMean()
|
||||
self.reshape = P.Reshape()
|
||||
self.last_idx = (-1,)
|
||||
self.neg = P.Neg()
|
||||
self.cast = P.Cast()
|
||||
self.is_training = is_training
|
||||
|
||||
def construct(self, logits, label_ids, num_labels):
|
||||
if self.is_training:
|
||||
label_ids = self.reshape(label_ids, self.last_idx)
|
||||
one_hot_labels = self.onehot(label_ids, num_labels, self.on_value, self.off_value)
|
||||
per_example_loss = self.neg(self.reduce_sum(one_hot_labels * logits, self.last_idx))
|
||||
loss = self.reduce_mean(per_example_loss, self.last_idx)
|
||||
return_value = self.cast(loss, mstype.float32)
|
||||
else:
|
||||
return_value = logits * 1.0
|
||||
return return_value
|
||||
|
||||
|
||||
def make_directory(path: str):
|
||||
"""Make directory."""
|
||||
if path is None or not isinstance(path, str) or path.strip() == "":
|
||||
logger.error("The path(%r) is invalid type.", path)
|
||||
raise TypeError("Input path is invalid type")
|
||||
|
||||
# convert the relative paths
|
||||
path = os.path.realpath(path)
|
||||
logger.debug("The abs path is %r", path)
|
||||
|
||||
# check the path is exist and write permissions?
|
||||
if os.path.exists(path):
|
||||
real_path = path
|
||||
else:
|
||||
# All exceptions need to be caught because create directory maybe have some limit(permissions)
|
||||
logger.debug("The directory(%s) doesn't exist, will create it", path)
|
||||
try:
|
||||
os.makedirs(path, exist_ok=True)
|
||||
real_path = path
|
||||
except PermissionError as e:
|
||||
logger.error("No write permission on the directory(%r), error = %r", path, e)
|
||||
raise TypeError("No write permission on the directory.")
|
||||
return real_path
|
||||
|
||||
class LossCallBack(Callback):
|
||||
"""
|
||||
Monitor the loss in training.
|
||||
If the loss in NAN or INF terminating training.
|
||||
Note:
|
||||
if per_print_times is 0 do not print loss.
|
||||
Args:
|
||||
per_print_times (int): Print loss every times. Default: 1.
|
||||
"""
|
||||
def __init__(self, dataset_size=-1):
|
||||
super(LossCallBack, self).__init__()
|
||||
self._dataset_size = dataset_size
|
||||
def step_end(self, run_context):
|
||||
"""
|
||||
Print loss after each step
|
||||
"""
|
||||
cb_params = run_context.original_args()
|
||||
if self._dataset_size > 0:
|
||||
percent, epoch_num = math.modf(cb_params.cur_step_num / self._dataset_size)
|
||||
if percent == 0:
|
||||
percent = 1
|
||||
epoch_num -= 1
|
||||
print("epoch: {}, current epoch percent: {}, step: {}, outputs are {}"
|
||||
.format(int(epoch_num), "%.3f" % percent, cb_params.cur_step_num, str(cb_params.net_outputs)),
|
||||
flush=True)
|
||||
else:
|
||||
print("epoch: {}, step: {}, outputs are {}".format(cb_params.cur_epoch_num, cb_params.cur_step_num,
|
||||
str(cb_params.net_outputs)), flush=True)
|
||||
|
||||
def LoadNewestCkpt(load_finetune_checkpoint_dir, steps_per_epoch, epoch_num, prefix):
|
||||
"""
|
||||
Find the ckpt finetune generated and load it into eval network.
|
||||
"""
|
||||
files = os.listdir(load_finetune_checkpoint_dir)
|
||||
pre_len = len(prefix)
|
||||
max_num = 0
|
||||
for filename in files:
|
||||
name_ext = os.path.splitext(filename)
|
||||
if name_ext[-1] != ".ckpt":
|
||||
continue
|
||||
if filename.find(prefix) == 0 and not filename[pre_len].isalpha():
|
||||
index = filename[pre_len:].find("-")
|
||||
if index == 0 and max_num == 0:
|
||||
load_finetune_checkpoint_path = os.path.join(load_finetune_checkpoint_dir, filename)
|
||||
elif index not in (0, -1):
|
||||
name_split = name_ext[-2].split('_')
|
||||
if (steps_per_epoch != int(name_split[len(name_split)-1])) \
|
||||
or (epoch_num != int(filename[pre_len + index + 1:pre_len + index + 2])):
|
||||
continue
|
||||
num = filename[pre_len + 1:pre_len + index]
|
||||
if int(num) > max_num:
|
||||
max_num = int(num)
|
||||
load_finetune_checkpoint_path = os.path.join(load_finetune_checkpoint_dir, filename)
|
||||
return load_finetune_checkpoint_path
|
||||
|
||||
|
||||
class ErnieLearningRate(LearningRateSchedule):
|
||||
"""
|
||||
Warmup-decay learning rate for Ernie network.
|
||||
"""
|
||||
def __init__(self, learning_rate, end_learning_rate, warmup_steps, decay_steps, power):
|
||||
super(ErnieLearningRate, self).__init__()
|
||||
self.warmup_flag = False
|
||||
if warmup_steps > 0:
|
||||
self.warmup_flag = True
|
||||
self.warmup_lr = WarmUpLR(learning_rate, warmup_steps)
|
||||
self.decay_lr = PolynomialDecayLR(learning_rate, end_learning_rate, decay_steps, power)
|
||||
self.warmup_steps = Tensor(np.array([warmup_steps]).astype(np.float32))
|
||||
|
||||
self.greater = P.Greater()
|
||||
self.one = Tensor(np.array([1.0]).astype(np.float32))
|
||||
self.cast = P.Cast()
|
||||
|
||||
def construct(self, global_step):
|
||||
decay_lr = self.decay_lr(global_step)
|
||||
if self.warmup_flag:
|
||||
is_warmup = self.cast(self.greater(self.warmup_steps, global_step), mstype.float32)
|
||||
warmup_lr = self.warmup_lr(global_step)
|
||||
lr = (self.one - is_warmup) * decay_lr + is_warmup * warmup_lr
|
||||
else:
|
||||
lr = decay_lr
|
||||
return lr
|
||||
|
||||
|
||||
def convert_labels_to_index(label_list):
|
||||
"""
|
||||
Convert label_list to indices for NER task.
|
||||
"""
|
||||
label2id = collections.OrderedDict()
|
||||
label2id["O"] = 0
|
||||
prefix = ["S_", "B_", "M_", "E_"]
|
||||
index = 0
|
||||
for label in label_list:
|
||||
for pre in prefix:
|
||||
index += 1
|
||||
sub_label = pre + label
|
||||
label2id[sub_label] = index
|
||||
return label2id
|
||||
|
||||
def _get_poly_lr(global_step, lr_init, lr_end, lr_max, warmup_steps, total_steps, poly_power):
|
||||
"""
|
||||
generate learning rate array
|
||||
Args:
|
||||
global_step(int): current step
|
||||
lr_init(float): init learning rate
|
||||
lr_end(float): end learning rate
|
||||
lr_max(float): max learning rate
|
||||
warmup_steps(int): number of warmup epochs
|
||||
total_steps(int): total epoch of training
|
||||
poly_power(int): poly learning rate power
|
||||
Returns:
|
||||
np.array, learning rate array
|
||||
"""
|
||||
lr_each_step = []
|
||||
if warmup_steps != 0:
|
||||
inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
|
||||
else:
|
||||
inc_each_step = 0
|
||||
for i in range(total_steps):
|
||||
if i < warmup_steps:
|
||||
lr = float(lr_init) + inc_each_step * float(i)
|
||||
else:
|
||||
base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
|
||||
lr = float(lr_max - lr_end) * (base ** poly_power)
|
||||
lr = lr + lr_end
|
||||
if lr < 0.0:
|
||||
lr = 0.0
|
||||
lr_each_step.append(lr)
|
||||
|
||||
learning_rate = np.array(lr_each_step).astype(np.float32)
|
||||
current_step = global_step
|
||||
learning_rate = learning_rate[current_step:]
|
||||
return learning_rate
|
||||
|
||||
|
||||
def get_ernie_thor_lr(lr_max=0.0034, lr_min=3.244e-05, lr_power=1.0, lr_total_steps=30000):
|
||||
learning_rate = _get_poly_lr(global_step=0, lr_init=0.0, lr_end=lr_min, lr_max=lr_max, warmup_steps=0,
|
||||
total_steps=lr_total_steps, poly_power=lr_power)
|
||||
return Tensor(learning_rate)
|
||||
|
||||
|
||||
def get_ernie_thor_damping(damping_max=5e-2, damping_min=1e-6, damping_power=1.0, damping_total_steps=30000):
|
||||
damping = _get_poly_lr(global_step=0, lr_init=0.0, lr_end=damping_min, lr_max=damping_max, warmup_steps=0,
|
||||
total_steps=damping_total_steps, poly_power=damping_power)
|
||||
return Tensor(damping)
|
Loading…
Reference in New Issue