forked from mindspore-Ecosystem/mindspore
add schema file for BERT and TinyBERT
This commit is contained in:
parent
7d38a1fb7e
commit
6674a88de4
|
@ -73,6 +73,60 @@ For distributed training, a hccl configuration file with JSON format needs to be
|
|||
Please follow the instructions in the link below:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
|
||||
For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format.
|
||||
```
|
||||
For pretraining, schema file contains ["input_ids", "input_mask", "segment_ids", "next_sentence_labels", "masked_lm_positions", "masked_lm_ids", "masked_lm_weights"].
|
||||
|
||||
For ner or classification task, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
|
||||
For squad task, training: schema file contains ["start_positions", "end_positions", "input_ids", "input_mask", "segment_ids"], evaluation: schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for pretraining as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"next_sentence_labels": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [1]
|
||||
},
|
||||
"masked_lm_positions": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [32]
|
||||
},
|
||||
"masked_lm_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [32]
|
||||
},
|
||||
"masked_lm_weights": {
|
||||
"type": "float32",
|
||||
"rank": 1,
|
||||
"shape": [32]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
# [Script Description](#contents)
|
||||
|
||||
## [Script and Sample Code](#contents)
|
||||
|
@ -87,11 +141,12 @@ https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
|||
├─hyper_parameter_config.ini # hyper paramter for distributed pretraining
|
||||
├─run_distribute_pretrain.py # script for distributed pretraining
|
||||
├─README.md
|
||||
├─run_classifier.sh # shell script for standalone classifier task
|
||||
├─run_ner.sh # shell script for standalone NER task
|
||||
├─run_squad.sh # shell script for standalone SQUAD task
|
||||
├─run_classifier.sh # shell script for standalone classifier task on ascend or gpu
|
||||
├─run_ner.sh # shell script for standalone NER task on ascend or gpu
|
||||
├─run_squad.sh # shell script for standalone SQUAD task on ascend or gpu
|
||||
├─run_standalone_pretrain_ascend.sh # shell script for standalone pretrain on ascend
|
||||
├─run_distributed_pretrain_ascend.sh # shell script for distributed pretrain on ascend
|
||||
├─run_distributed_pretrain_gpu.sh # shell script for distributed pretrain on gpu
|
||||
└─run_standaloned_pretrain_gpu.sh # shell script for distributed pretrain on gpu
|
||||
├─src
|
||||
├─__init__.py
|
||||
|
@ -122,7 +177,7 @@ https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
|||
usage: run_pretrain.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N]
|
||||
[--enable_save_ckpt ENABLE_SAVE_CKPT] [--device_target DEVICE_TARGET]
|
||||
[--enable_lossscale ENABLE_LOSSSCALE] [--do_shuffle DO_SHUFFLE]
|
||||
[--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N]
|
||||
[--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N]
|
||||
[--save_checkpoint_path SAVE_CHECKPOINT_PATH]
|
||||
[--load_checkpoint_path LOAD_CHECKPOINT_PATH]
|
||||
[--save_checkpoint_steps N] [--save_checkpoint_num N]
|
||||
|
@ -361,55 +416,59 @@ The result will be as follows:
|
|||
## [Model Description](#contents)
|
||||
## [Performance](#contents)
|
||||
### Pretraining Performance
|
||||
| Parameters | BERT | BERT |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ---------------------------------------------------------- | ------------------------- |
|
||||
| Model Version | base | base |
|
||||
| Model Version | BERT_base | BERT_base |
|
||||
| Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G |
|
||||
| uploaded Date | 08/22/2020 | 05/06/2020 |
|
||||
| MindSpore Version | 0.6.0 | 0.3.0 |
|
||||
| Dataset | cn-wiki-128 | ImageNet |
|
||||
| Dataset | cn-wiki-128(4000w) | ImageNet |
|
||||
| Training Parameters | src/config.py | src/config.py |
|
||||
| Optimizer | Lamb | Momentum |
|
||||
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
|
||||
| outputs | probability | |
|
||||
| Loss | | 1.913 |
|
||||
| Speed | 116.5 ms/step | 1.913 |
|
||||
| Total time | | |
|
||||
| Epoch | 40 | | |
|
||||
| Batch_size | 256*8 | 130(8P) | |
|
||||
| Loss | 1.7 | 1.913 |
|
||||
| Speed | 340ms/step | 1.913 |
|
||||
| Total time | 73h | |
|
||||
| Params (M) | 110M | |
|
||||
| Checkpoint for Fine tuning | 1.2G(.ckpt file) | |
|
||||
|
||||
|
||||
| Parameters | BERT | BERT |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ---------------------------------------------------------- | ------------------------- |
|
||||
| Model Version | NEZHA | NEZHA |
|
||||
| Model Version | BERT_NEZHA | BERT_NEZHA |
|
||||
| Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G |
|
||||
| uploaded Date | 08/20/2020 | 05/06/2020 |
|
||||
| MindSpore Version | 0.6.0 | 0.3.0 |
|
||||
| Dataset | cn-wiki-128 | ImageNet |
|
||||
| Dataset | cn-wiki-128(4000w) | ImageNet |
|
||||
| Training Parameters | src/config.py | src/config.py |
|
||||
| Optimizer | Lamb | Momentum |
|
||||
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
|
||||
| outputs | probability | |
|
||||
| Loss | | 1.913 |
|
||||
| Speed | | 1.913 |
|
||||
| Total time | | |
|
||||
| Epoch | 40 | | |
|
||||
| Batch_size | 96*8 | 130(8P) |
|
||||
| Loss | 1.7 | 1.913 |
|
||||
| Speed | 360ms/step | 1.913 |
|
||||
| Total time | 200h | |
|
||||
| Params (M) | 340M | |
|
||||
| Checkpoint for Fine tuning | 3.2G(.ckpt file) | |
|
||||
|
||||
#### Inference Performance
|
||||
|
||||
| Parameters | | | |
|
||||
| -------------------------- | ----------------------------- | ------------------------- | -------------------- |
|
||||
| Model Version | V1 | | |
|
||||
| Resource | Ascend 910 | NV SMX2 V100-32G | Ascend 310 |
|
||||
| uploaded Date | 08/22/2020 | 05/22/2020 | |
|
||||
| MindSpore Version | 0.6.0 | 0.2.0 | 0.2.0 |
|
||||
| Dataset | cola, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W |
|
||||
| batch_size | 32(1P) | 130(8P) | |
|
||||
| Accuracy | 0.588986 | ACC1[72.07%] ACC5[90.90%] | |
|
||||
| Speed | 59.25ms/step | | |
|
||||
| Total time | | | |
|
||||
| Model for inference | 1.2G(.ckpt file) | | |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ----------------------------- | ------------------------- |
|
||||
| Model Version | | |
|
||||
| Resource | Ascend 910 | NV SMX2 V100-32G |
|
||||
| uploaded Date | 08/22/2020 | 05/22/2020 |
|
||||
| MindSpore Version | 0.6.0 | 0.2.0 |
|
||||
| Dataset | cola, 1.2W | ImageNet, 1.2W |
|
||||
| batch_size | 32(1P) | 130(8P) |
|
||||
| Accuracy | 0.588986 | ACC1[72.07%] ACC5[90.90%] |
|
||||
| Speed | 59.25ms/step | |
|
||||
| Total time | 15min | |
|
||||
| Model for inference | 1.2G(.ckpt file) | |
|
||||
|
||||
# [Description of Random Situation](#contents)
|
||||
|
||||
|
|
|
@ -122,7 +122,7 @@ def distribute_pretrain():
|
|||
print("core_nums:", cmdopt)
|
||||
print("epoch_size:", str(cfg['epoch_size']))
|
||||
print("data_dir:", data_dir)
|
||||
print("log_file_dir: " + cur_dir + "/LOG" + str(device_id) + "/log.txt")
|
||||
print("log_file_dir: " + cur_dir + "/LOG" + str(device_id) + "/pretraining_log.txt")
|
||||
|
||||
os.chdir(cur_dir + "/LOG" + str(device_id))
|
||||
cmd = 'taskset -c ' + cmdopt + ' nohup python ' + run_script + " "
|
||||
|
|
|
@ -112,9 +112,6 @@ def create_squad_dataset(batch_size=1, repeat_count=1, data_file_path=None, sche
|
|||
else:
|
||||
ds = de.TFRecordDataset([data_file_path], schema_file_path if schema_file_path != "" else None,
|
||||
columns_list=["input_ids", "input_mask", "segment_ids", "unique_ids"])
|
||||
ds = ds.map(input_columns="input_ids", operations=type_cast_op)
|
||||
ds = ds.map(input_columns="input_mask", operations=type_cast_op)
|
||||
ds = ds.map(input_columns="segment_ids", operations=type_cast_op)
|
||||
ds = ds.map(input_columns="segment_ids", operations=type_cast_op)
|
||||
ds = ds.map(input_columns="input_mask", operations=type_cast_op)
|
||||
ds = ds.map(input_columns="input_ids", operations=type_cast_op)
|
||||
|
|
|
@ -65,6 +65,38 @@ For distributed training on Ascend, a hccl configuration file with JSON format n
|
|||
Please follow the instructions in the link below:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
|
||||
For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#tfrecord) format.
|
||||
```
|
||||
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
|
||||
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
# [Script Description](#contents)
|
||||
## [Script and Sample Code](#contents)
|
||||
|
||||
|
@ -117,7 +149,7 @@ options:
|
|||
--save_checkpoint_step steps for saving checkpoint files: N, default is 1000
|
||||
--load_teacher_ckpt_path path to load teacher checkpoint files: PATH, default is ""
|
||||
--data_dir path to dataset directory: PATH, default is ""
|
||||
--schema_dir path to schema.json file, PATH, default is ""
|
||||
--schema_dir path to schema.json file, PATH, default is ""
|
||||
```
|
||||
|
||||
### Task Distill
|
||||
|
@ -132,7 +164,7 @@ usage: run_general_task.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN
|
|||
[--load_td1_ckpt_path LOAD_TD1_CKPT_PATH]
|
||||
[--train_data_dir TRAIN_DATA_DIR]
|
||||
[--eval_data_dir EVAL_DATA_DIR]
|
||||
[--task_name TASK_NAME] [--schema_dir SCHEMA_DIR]
|
||||
[--task_name TASK_NAME] [--schema_dir SCHEMA_DIR]
|
||||
|
||||
options:
|
||||
--device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend"
|
||||
|
@ -302,9 +334,9 @@ The best acc is 0.891176
|
|||
## [Model Description](#contents)
|
||||
## [Performance](#contents)
|
||||
### training Performance
|
||||
| Parameters | TinyBERT | TinyBERT |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ---------------------------------------------------------- | ------------------------- |
|
||||
| Model Version | | |
|
||||
| Model Version | TinyBERT | TinyBERT |
|
||||
| Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G, cpu:2.10GHz 64cores, memory:251G |
|
||||
| uploaded Date | 08/20/2020 | 08/24/2020 |
|
||||
| MindSpore Version | 0.6.0 | 0.7.0 |
|
||||
|
@ -321,7 +353,7 @@ The best acc is 0.891176
|
|||
|
||||
#### Inference Performance
|
||||
|
||||
| Parameters | | |
|
||||
| Parameters | Ascend | GPU |
|
||||
| -------------------------- | ----------------------------- | ------------------------- |
|
||||
| Model Version | | |
|
||||
| Resource | Ascend 910 | NV SMX2 V100-32G |
|
||||
|
@ -344,4 +376,4 @@ In run_general_distill.py, we set the random seed to make sure distribute traini
|
|||
|
||||
# [ModelZoo Homepage](#contents)
|
||||
|
||||
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
|
||||
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
|
||||
|
|
Loading…
Reference in New Issue