forked from mindspore-Ecosystem/mindspore
!19558 I3ZCSB , Increase the input parameter 'config_path'
Merge pull request !19558 from 郑彬/tinybert0707
This commit is contained in:
commit
ad0b4fb5ff
|
@ -14,7 +14,7 @@ enable_profiling: False
|
|||
# ==============================================================================
|
||||
description: 'run_pretrain'
|
||||
distribute: 'false'
|
||||
epoch_size: 1
|
||||
epoch_size: 40
|
||||
device_id: 0
|
||||
device_num: 1
|
||||
enable_save_ckpt: 'true'
|
||||
|
|
|
@ -71,65 +71,122 @@ The backbone structure of TinyBERT is transformer, the transformer contains four
|
|||
|
||||
# [Quick Start](#contents)
|
||||
|
||||
After installing MindSpore via the official website, you can start general distill, task distill and evaluation as follows:
|
||||
- running on local
|
||||
|
||||
```text
|
||||
# run standalone general distill example
|
||||
bash scripts/run_standalone_gd.sh
|
||||
After installing MindSpore via the official website, you can start general distill, task distill and evaluation as follows:
|
||||
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`.
|
||||
```text
|
||||
# run standalone general distill example
|
||||
bash scripts/run_standalone_gd.sh
|
||||
|
||||
# For Ascend device, run distributed general distill example
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`.
|
||||
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first.
|
||||
# For Ascend device, run distributed general distill example
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json
|
||||
|
||||
# For GPU device, run distributed general distill example
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first.
|
||||
|
||||
# run task distill and evaluation example
|
||||
bash scripts/run_standalone_td.sh
|
||||
# For GPU device, run distributed general distill example
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt
|
||||
|
||||
Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first.
|
||||
If running on GPU, please set the `device_target=GPU`.
|
||||
```
|
||||
# run task distill and evaluation example
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
|
||||
For distributed training on Ascend, a hccl configuration file with JSON format needs to be created in advance.
|
||||
Please follow the instructions in the link below:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first.
|
||||
If running on GPU, please set the `device_target=GPU`.
|
||||
```
|
||||
|
||||
For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/en/master/dataset_loading.html#tfrecord) format.
|
||||
For distributed training on Ascend, a hccl configuration file with JSON format needs to be created in advance.
|
||||
Please follow the instructions in the link below:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
|
||||
```text
|
||||
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/en/master/dataset_loading.html#tfrecord) format.
|
||||
|
||||
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
```text
|
||||
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
- running on ModelArts
|
||||
|
||||
If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows
|
||||
|
||||
- general_distill with 8 cards on ModelArts
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/tinybert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/tinybert/run_general_distill.py" on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# a. setting parameters in /{path}/tinybert/gd_config.yaml.
|
||||
# 1. Set ”enable_modelarts=True“
|
||||
# 2. Set other parameters('config_path' cannot be set here), other parameter configuration can refer to `./scripts/run_distributed_gd_ascend.sh`
|
||||
# b. adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 3. Add other parameters, other parameter configuration can refer to `./scripts/run_distributed_gd_ascend.sh`
|
||||
# Note that 'data_dir' and 'schema_dir' fill in the relative path relative to the path selected in step 7.
|
||||
# Add "config_path=../../gd_config.yaml" on the webpage ('config_path' is the path of the'*.yaml' file relative to {path}/tinybert/src/model_utils/config.py, and'* .yaml' file must be in {path}/bert/)
|
||||
# (6) Upload the dataset to S3 bucket.
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path).
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of 8 cards.
|
||||
# (10) Create your job.
|
||||
# After training, the '*.ckpt' file will be saved under the'training output file path'
|
||||
```
|
||||
|
||||
- Running task_distill with single card on ModelArts
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/tinybert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/tinybert/run_ner.py"(or run_pretrain.py or run_squad.py) on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# Add "config_path=../../td_config/td_config_sst2.yaml" on the web page (select the *.yaml configuration file according to the distill task)
|
||||
# a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/`
|
||||
# 1. Set ”enable_modelarts=True“
|
||||
# 2. Set "task_name=SST-2" (depending on the task, select from ["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"])
|
||||
# 3. Set other parameters, other parameter configuration can refer to './scripts/run_standalone_td.sh'.
|
||||
# b. adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 2. Add "task_name=SST-2" (depending on the task, select from ["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"])
|
||||
# 3. Add other parameters, other parameter configuration can refer to './scripts/run_standalone_td.sh'.
|
||||
# Note that 'load_teacher_ckpt_path', 'train_data_dir', 'eval_data_dir' and 'schema_dir' fill in the relative path relative to the path selected in step 7.
|
||||
# Note that 'load_gd_ckpt_path' fills in the relative path relative to the path selected in step 3.
|
||||
# (6) Upload the dataset to S3 bucket.
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of a single card.
|
||||
# (10) Create your job.
|
||||
# After training, the '*.ckpt' file will be saved under the'training output file path'.
|
||||
```
|
||||
|
||||
# [Script Description](#contents)
|
||||
|
||||
|
@ -139,23 +196,39 @@ For example, the dataset is cn-wiki-128, the schema file for general distill pha
|
|||
.
|
||||
└─bert
|
||||
├─README.md
|
||||
├─README_CN.md
|
||||
├─scripts
|
||||
├─run_distributed_gd_ascend.sh # shell script for distributed general distill phase on Ascend
|
||||
├─run_distributed_gd_gpu.sh # shell script for distributed general distill phase on GPU
|
||||
├─run_infer_310.sh # shell script for 310 infer
|
||||
├─run_standalone_gd.sh # shell script for standalone general distill phase
|
||||
├─run_standalone_td.sh # shell script for standalone task distill phase
|
||||
├─src
|
||||
├─model_utils
|
||||
├── config.py # parse *.yaml parameter configuration file
|
||||
├── devcie_adapter.py # distinguish local/ModelArts training
|
||||
├── local_adapter.py # get related environment variables in local training
|
||||
└── moxing_adapter.py # get related environment variables in ModelArts training
|
||||
├─__init__.py
|
||||
├─assessment_method.py # assessment method for evaluation
|
||||
├─dataset.py # data processing
|
||||
├─gd_config.py # parameter configuration for general distill phase
|
||||
├─td_config.py # parameter configuration for task distill phase
|
||||
├─tinybert_for_gd_td.py # backbone code of network
|
||||
├─tinybert_model.py # backbone code of network
|
||||
├─utils.py # util function
|
||||
├─td_config # folder where *.yaml files of different distillation tasks are located
|
||||
├── td_config_15cls.yaml
|
||||
├── td_config_mnli.py
|
||||
├── td_config_ner.py
|
||||
├── td_config_qnli.py
|
||||
└── td_config_stt2.py
|
||||
├─__init__.py
|
||||
├─export.py # export scripts
|
||||
├─gd_config.yaml # parameter configuration for general_distill
|
||||
├─mindspore_hub_conf.py # Mindspore Hub interface
|
||||
├─postprocess.py # scripts for 310 postprocess
|
||||
├─preprocess.py # scripts for 310 preprocess
|
||||
├─run_general_distill.py # train net for general distillation
|
||||
├─run_task_distill.py # train and eval net for task distillation
|
||||
└─run_task_distill.py # train and eval net for task distillation
|
||||
```
|
||||
|
||||
## [Script Parameters](#contents)
|
||||
|
@ -231,7 +304,7 @@ options:
|
|||
|
||||
## Options and Parameters
|
||||
|
||||
`gd_config.py` and `td_config.py` contain parameters of BERT model and options for optimizer and lossscale.
|
||||
`gd_config.yaml` and `td_config/*.yaml` contain parameters of BERT model and options for optimizer and lossscale.
|
||||
|
||||
### Options
|
||||
|
||||
|
@ -358,7 +431,7 @@ If you want to after running and continue to eval, please set `do_train=true` an
|
|||
#### evaluation on SST-2 dataset
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows:
|
||||
|
@ -378,7 +451,7 @@ The best acc is 0.902777
|
|||
Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt".
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows:
|
||||
|
@ -398,7 +471,7 @@ The best acc is 0.813929
|
|||
Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt".
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows:
|
||||
|
@ -417,6 +490,8 @@ The best acc is 0.891176
|
|||
|
||||
### [Export MindIR](#contents)
|
||||
|
||||
- Export on local
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
@ -424,6 +499,32 @@ python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [
|
|||
The ckpt_file parameter is required,
|
||||
`EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
|
||||
|
||||
- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows)
|
||||
|
||||
```python
|
||||
# (1) Upload the code folder to S3 bucket.
|
||||
# (2) Click to "create training task" on the website UI interface.
|
||||
# (3) Set the code directory to "/{path}/tinybert" on the website UI interface.
|
||||
# (4) Set the startup file to /{path}/tinybert/export.py" on the website UI interface.
|
||||
# (5) Perform a or b.
|
||||
# a. Set parameters in a *.yaml file under /path/tinybert/td_config/
|
||||
# 1. Set ”enable_modelarts: True“
|
||||
# 2. Set “ckpt_file: ./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
|
||||
# 3. Set ”file_name: bert_ner“
|
||||
# 4. Set ”file_format:MINDIR“
|
||||
# b. Adding on the website UI interface.
|
||||
# 1. Add ”enable_modelarts=True“
|
||||
# 2. Add “ckpt_file=./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
|
||||
# 3. Add ”file_name=tinybert_sst2“
|
||||
# 4. Add ”file_format=MINDIR“
|
||||
# Finally, "config_path=../../td_config/*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task)
|
||||
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.(Although it is useless, but to do)
|
||||
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
|
||||
# (9) Under the item "resource pool selection", select the specification of a single card.
|
||||
# (10) Create your job.
|
||||
# You will see tinybert_sst2.mindir under {Output file path}.
|
||||
```
|
||||
|
||||
### Infer on Ascend310
|
||||
|
||||
Before performing inference, the mindir file must be exported by `export.py` script. We only provide an example of inference using MINDIR model.
|
||||
|
@ -459,7 +560,7 @@ Inference result is saved in current path, you can find result like this in acc.
|
|||
| uploaded Date | 08/20/2020 | 08/24/2020 |
|
||||
| MindSpore Version | 1.0.0 | 1.0.0 |
|
||||
| Dataset | en-wiki-128 | en-wiki-128 |
|
||||
| Training Parameters | src/gd_config.py | src/gd_config.py |
|
||||
| Training Parameters | src/gd_config.yaml | src/gd_config.yaml |
|
||||
| Optimizer | AdamWeightDecay | AdamWeightDecay |
|
||||
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
|
||||
| outputs | probability | probability |
|
||||
|
@ -489,7 +590,7 @@ Inference result is saved in current path, you can find result like this in acc.
|
|||
|
||||
In run_standaloned_td.sh, we set do_shuffle to shuffle the dataset.
|
||||
|
||||
In gd_config.py and td_config.py, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node.
|
||||
In gd_config.yaml and td_config/*.yaml, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node.
|
||||
|
||||
In run_general_distill.py, we set the random seed to make sure distribute training has the same init weight.
|
||||
|
||||
|
|
|
@ -78,63 +78,118 @@ TinyBERT模型的主干结构是转换器,转换器包含四个编码器模块
|
|||
|
||||
从官网下载安装MindSpore之后,可以开始一般蒸馏。任务蒸馏和评估方法如下:
|
||||
|
||||
```bash
|
||||
# 单机运行一般蒸馏示例
|
||||
bash scripts/run_standalone_gd.sh
|
||||
- 在本地运行
|
||||
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`.
|
||||
```bash
|
||||
# 单机运行一般蒸馏示例
|
||||
bash scripts/run_standalone_gd.sh
|
||||
|
||||
# Ascend设备上分布式运行一般蒸馏示例
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`.
|
||||
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first.
|
||||
# Ascend设备上分布式运行一般蒸馏示例
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json
|
||||
|
||||
# GPU设备上分布式运行一般蒸馏示例
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt
|
||||
Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first.
|
||||
|
||||
# 运行任务蒸馏和评估示例
|
||||
bash scripts/run_standalone_td.sh
|
||||
# GPU设备上分布式运行一般蒸馏示例
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt
|
||||
|
||||
Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first.
|
||||
If running on GPU, please set the `device_target=GPU`.
|
||||
```
|
||||
# 运行任务蒸馏和评估示例
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
|
||||
若在Ascend设备上运行分布式训练,请提前创建JSON格式的HCCL配置文件。
|
||||
详情参见如下链接:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first.
|
||||
If running on GPU, please set the `device_target=GPU`.
|
||||
```
|
||||
|
||||
如需设置数据集格式和参数,请创建JSON格式的视图配置文件,详见[TFRecord](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/dataset_loading.html#tfrecord) 格式。
|
||||
若在Ascend设备上运行分布式训练,请提前创建JSON格式的HCCL配置文件。
|
||||
详情参见如下链接:
|
||||
https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
|
||||
|
||||
```text
|
||||
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
如需设置数据集格式和参数,请创建JSON格式的视图配置文件,详见[TFRecord](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/dataset_loading.html#tfrecord) 格式。
|
||||
|
||||
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
```text
|
||||
For general task, schema file contains ["input_ids", "input_mask", "segment_ids"].
|
||||
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"].
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
`numRows` is the only option which could be set by user, the others value must be set according to the dataset.
|
||||
|
||||
For example, the dataset is cn-wiki-128, the schema file for general distill phase as following:
|
||||
{
|
||||
"datasetType": "TF",
|
||||
"numRows": 7680,
|
||||
"columns": {
|
||||
"input_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"input_mask": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
},
|
||||
"segment_ids": {
|
||||
"type": "int64",
|
||||
"rank": 1,
|
||||
"shape": [256]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- 在ModelArts上运行(如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/))
|
||||
|
||||
- 在ModelArt上使用8卡一般蒸馏
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/tinybert
|
||||
# (4) 选择启动文件 /{path}/tinybert/run_general_distill.py
|
||||
# (5) 执行a或b
|
||||
# a. 在 /{path}/tinybert/default_config.yaml 文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts=True“
|
||||
# 2. 设置其它参数(config_path无法在这里设置),其它参数配置可以参考 `./scripts/run_distributed_gd_ascend.sh`
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”enable_modelarts=True“
|
||||
# 2. 添加其它参数,其它参数配置可以参考 `./scripts/run_distributed_gd_ascend.sh`
|
||||
# 注意'data_dir'、'schema_dir'填写相对于第7步所选路径的相对路径。
|
||||
# 在网页上添加 “config_path=../../gd_config.yaml”('config_path' 是'*.yaml'文件相对于 {path}/tinybert/src/model_utils/config.py的路径, 且'*.yaml'文件必须在{path}/bert/内)
|
||||
# (6) 上传你的 数据 到 s3 桶上
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择8卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 训练结束后会在'训练输出文件路径'下保存训练的权重
|
||||
```
|
||||
|
||||
- 在ModelArts上使用单卡运行任务蒸馏
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/tinybert
|
||||
# (4) 选择启动文件 /{path}/tinybert/run_task_distill.py
|
||||
# (5) 在网页上添加 “config_path=../../td_config/td_config_sst2.yaml”(根据蒸馏任务选择 *.yaml 配置文件)
|
||||
# 执行a或b
|
||||
# a. 在选定的'*.yaml'文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts=True“
|
||||
# 2. 设置 ”task_name=SST-2“(根据任务不同,在["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]中选择)
|
||||
# 3. 设置其它参数,其它参数配置可以参考 './scripts/'下的 `run_standalone_td.sh`
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”enable_modelarts=True“
|
||||
# 2. 添加 ”task_name=SST-2“(根据任务不同,在["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]中选择)
|
||||
# 3. 添加其它参数,其它参数配置可以参考 './scripts/'下的 `run_standalone_td.sh`
|
||||
# 注意load_teacher_ckpt_path,train_data_dir,eval_data_dir,schema_dir填写相对于第7步所选路径的相对路径。
|
||||
# 注意load_gd_ckpt_path填写相对于第3步所选路径的相对路径
|
||||
# (6) 上传你的 数据 到 s3 桶上
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 训练结束后会在'训练输出文件路径'下保存训练的权重
|
||||
```
|
||||
|
||||
# 脚本说明
|
||||
|
||||
|
@ -142,25 +197,41 @@ For example, the dataset is cn-wiki-128, the schema file for general distill pha
|
|||
|
||||
```shell
|
||||
.
|
||||
└─bert
|
||||
└─tinybert
|
||||
├─README.md
|
||||
├─README_CN.md
|
||||
├─scripts
|
||||
├─run_distributed_gd_ascend.sh # Ascend设备上分布式运行一般蒸馏的shell脚本
|
||||
├─run_distributed_gd_gpu.sh # GPU设备上分布式运行一般蒸馏的shell脚本
|
||||
├─run_infer_310.sh # 310推理的shell脚本
|
||||
├─run_standalone_gd.sh # 单机运行一般蒸馏的shell脚本
|
||||
├─run_standalone_td.sh # 单机运行任务蒸馏的shell脚本
|
||||
├─src
|
||||
├─model_utils
|
||||
├── config.py # 解析 *.yaml参数配置文件
|
||||
├── devcie_adapter.py # 区分本地/ModelArts训练
|
||||
├── local_adapter.py # 本地训练获取相关环境变量
|
||||
└── moxing_adapter.py # ModelArts训练获取相关环境变量、交换数据
|
||||
├─__init__.py
|
||||
├─assessment_method.py # 评估过程的测评方法
|
||||
├─dataset.py # 数据处理
|
||||
├─gd_config.py # 一般蒸馏阶段的参数配置
|
||||
├─td_config.py # 任务蒸馏阶段的参数配置
|
||||
├─tinybert_for_gd_td.py # 网络骨干编码
|
||||
├─tinybert_model.py # 网络骨干编码
|
||||
├─utils.py # util函数
|
||||
├─td_config # 不同蒸馏任务的*.yaml文件所在文件夹
|
||||
├── td_config_15cls.yaml
|
||||
├── td_config_mnli.py
|
||||
├── td_config_ner.py
|
||||
├── td_config_qnli.py
|
||||
└── td_config_stt2.py
|
||||
├─__init__.py
|
||||
├─export.py # export scripts
|
||||
├─gd_config.yaml # 一般蒸馏参数配置文件
|
||||
├─mindspore_hub_conf.py # Mindspore Hub接口
|
||||
├─postprocess.py # 310推理前处理脚本
|
||||
├─preprocess.py # 310推理后处理脚本
|
||||
├─run_general_distill.py # 一般蒸馏训练网络
|
||||
├─run_task_distill.py # 任务蒸馏训练评估网络
|
||||
└─run_task_distill.py # 任务蒸馏训练评估网络
|
||||
```
|
||||
|
||||
## 脚本参数
|
||||
|
@ -233,7 +304,7 @@ options:
|
|||
|
||||
## 选项及参数
|
||||
|
||||
`gd_config.py` and `td_config.py` 包含BERT模型参数与优化器和损失缩放选项。
|
||||
`gd_config.yaml` and `td_config/*.yaml` 包含BERT模型参数与优化器和损失缩放选项。
|
||||
|
||||
### 选项
|
||||
|
||||
|
@ -321,7 +392,7 @@ epoch: 1, step: 100, outputs are 28.2093
|
|||
运行以下命令前,确保已设置load_teacher_ckpt_path、data_dir和schma_dir。请将路径设置为绝对全路径,例如/username/checkpoint_100_300.ckpt。
|
||||
|
||||
```bash
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json
|
||||
bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json /path/gd_config.json
|
||||
```
|
||||
|
||||
以上命令后台运行,您可以在log.txt文件中查看运行结果。训练后,可以得到默认log*文件夹路径下的检查点文件。 得到如下损失值:
|
||||
|
@ -339,7 +410,7 @@ epoch: 1, step: 100, outputs are (Tensor(shape=[1], dtype=Float32, 30.5901), Ten
|
|||
输入绝对全路径,例如:"/username/checkpoint_100_300.ckpt"。
|
||||
|
||||
```bash
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt
|
||||
bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt /path/gd_config.json
|
||||
```
|
||||
|
||||
以上命令后台运行,您可以在log.txt文件中查看运行结果。训练结束后,您可以在默认LOG*文件夹下找到检查点文件。得到如下损失值:
|
||||
|
@ -359,7 +430,7 @@ epoch: 1, step: 1, outputs are 63.4098
|
|||
#### 基于SST-2数据集进行评估
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
以上命令后台运行,您可以在log.txt文件中查看运行结果。得出如下测试数据集准确率:
|
||||
|
@ -379,7 +450,7 @@ The best acc is 0.902777
|
|||
运行如下命令前,请确保已设置加载与训练检查点路径。请将检查点路径设置为绝对全路径,例如,/username/pretrain/checkpoint_100_300.ckpt。
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
以上命令将在后台运行,请在log.txt文件中查看结果。测试数据集的准确率如下:
|
||||
|
@ -399,7 +470,7 @@ The best acc is 0.813929
|
|||
运行如下命令前,请确保已设置加载与训练检查点路径。请将检查点路径设置为绝对全路径,例如/username/pretrain/checkpoint_100_300.ckpt。
|
||||
|
||||
```bash
|
||||
bash scripts/run_standalone_td.sh
|
||||
bash scripts/run_standalone_td.sh {path}/*.yaml
|
||||
```
|
||||
|
||||
以上命令后台运行,您可以在log.txt文件中查看运行结果。测试数据集的准确率如下:
|
||||
|
@ -418,8 +489,36 @@ The best acc is 0.891176
|
|||
|
||||
### [导出MindIR](#contents)
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
- 在本地导出
|
||||
|
||||
```shell
|
||||
python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
|
||||
```
|
||||
|
||||
- 在ModelArts上导出
|
||||
|
||||
```python
|
||||
# (1) 上传你的代码到 s3 桶上
|
||||
# (2) 在ModelArts上创建训练任务
|
||||
# (3) 选择代码目录 /{path}/tinybert
|
||||
# (4) 选择启动文件 /{path}/tinybert/export.py
|
||||
# (5) 执行a或b
|
||||
# a. 在 /path/tinybert/td_config/ 下的某个*.yaml文件中设置参数
|
||||
# 1. 设置 ”enable_modelarts: True“
|
||||
# 2. 设置 “ckpt_file: ./{path}/*.ckpt”('ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
|
||||
# 3. 设置 ”file_name: tinybert_sst2“
|
||||
# 4. 设置 ”file_format:MINDIR“
|
||||
# b. 在 网页上设置
|
||||
# 1. 添加 ”enable_modelarts=True“
|
||||
# 2. 添加 “ckpt_file=./{path}/*.ckpt”(('ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下)
|
||||
# 3. 添加 ”file_name=tinybert_sst2“
|
||||
# 4. 添加 ”file_format=MINDIR“
|
||||
# 最后必须在网页上添加 “config_path=../../td_config/*.yaml”(根据下游任务选择 *.yaml 配置文件)
|
||||
# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径(虽然没用,但要做)
|
||||
# (8) 在网页上设置“训练输出文件路径”、“作业日志路径”
|
||||
# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源
|
||||
# (10) 创建训练作业
|
||||
# 你将在{Output file path}下看到 'tinybert_sst2.mindir'文件
|
||||
```
|
||||
|
||||
参数ckpt_file为必填项,
|
||||
|
@ -460,7 +559,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [SCHEMA_DIR] [DATASET_TYPE] [
|
|||
| 上传日期 | 2020-08-20 | 2020-08-24 |
|
||||
| MindSpore版本 | 0.6.0 | 0.7.0 |
|
||||
| 数据集 | en-wiki-128 | en-wiki-128 |
|
||||
| 训练参数 | src/gd_config.py | src/gd_config.py |
|
||||
| 训练参数 | src/gd_config.yaml | src/gd_config.yaml |
|
||||
| 优化器| AdamWeightDecay | AdamWeightDecay |
|
||||
| 损耗函数 | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
|
||||
| 输出 | 概率 | 概率 |
|
||||
|
@ -489,7 +588,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [SCHEMA_DIR] [DATASET_TYPE] [
|
|||
|
||||
run_standaloned_td.sh脚本中设置了do_shuffle来轮换数据集。
|
||||
|
||||
gd_config.py和td_config.py文件中设置了hidden_dropout_prob和attention_pros_dropout_prob,使网点随机失活。
|
||||
gd_config.yaml和td_config/*.yaml文件中设置了hidden_dropout_prob和attention_pros_dropout_prob,使网点随机失活。
|
||||
|
||||
run_general_distill.py文件中设置了随机种子,确保分布式训练初始权重相同。
|
||||
|
||||
|
|
|
@ -16,12 +16,22 @@
|
|||
"""postprocess"""
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import numpy as np
|
||||
from mindspore import Tensor
|
||||
from src.assessment_method import Accuracy, F1
|
||||
from src.model_utils.config import eval_cfg, config as args_opt
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser(description='postprocess')
|
||||
parser.add_argument("--task_name", type=str, default="", choices=["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"],
|
||||
help="The name of the task to train.")
|
||||
parser.add_argument("--assessment_method", type=str, default="accuracy", choices=["accuracy", "bf1", "mf1"],
|
||||
help="assessment_method include: [accuracy, bf1, mf1], default is accuracy")
|
||||
parser.add_argument("--result_path", type=str, default="./result_Files", help="result path")
|
||||
parser.add_argument("--label_path", type=str, default="./preprocess_Result/label_ids.npy", help="label path")
|
||||
args_opt = parser.parse_args()
|
||||
|
||||
BATCH_SIZE = 32
|
||||
DEFAULT_NUM_LABELS = 2
|
||||
DEFAULT_SEQ_LENGTH = 128
|
||||
task_params = {"SST-2": {"num_labels": 2, "seq_length": 64},
|
||||
|
@ -49,8 +59,11 @@ class Task:
|
|||
if self.task_name in task_params and "seq_length" in task_params[self.task_name]:
|
||||
return task_params[self.task_name]["seq_length"]
|
||||
return DEFAULT_SEQ_LENGTH
|
||||
|
||||
|
||||
task = Task(args_opt.task_name)
|
||||
|
||||
|
||||
def eval_result_print(assessment_method="accuracy", callback=None):
|
||||
"""print eval result"""
|
||||
if assessment_method == "accuracy":
|
||||
|
@ -79,9 +92,9 @@ def get_acc():
|
|||
labels = np.load(args_opt.label_path)
|
||||
file_num = len(os.listdir(args_opt.result_path))
|
||||
for i in range(file_num):
|
||||
f_name = "tinybert_bs" + str(eval_cfg.batch_size) + "_" + str(i) + "_0.bin"
|
||||
f_name = "tinybert_bs" + str(BATCH_SIZE) + "_" + str(i) + "_0.bin"
|
||||
logits = np.fromfile(os.path.join(args_opt.result_path, f_name), np.float32)
|
||||
logits = logits.reshape(eval_cfg.batch_size, task.num_labels)
|
||||
logits = logits.reshape(BATCH_SIZE, task.num_labels)
|
||||
label_ids = labels[i]
|
||||
callback.update(Tensor(logits), Tensor(label_ids))
|
||||
print("==============================================================")
|
||||
|
|
|
@ -16,11 +16,21 @@
|
|||
"""preprocess"""
|
||||
|
||||
import os
|
||||
import argparse
|
||||
import numpy as np
|
||||
from src.model_utils.config import eval_cfg, config as args_opt
|
||||
from src.dataset import create_tinybert_dataset, DataType
|
||||
|
||||
|
||||
parser = argparse.ArgumentParser(description='preprocess')
|
||||
parser.add_argument("--eval_data_dir", type=str, default="", help="Data path, it is better to use absolute path")
|
||||
parser.add_argument("--schema_dir", type=str, default="", help="Schema path, it is better to use absolute path")
|
||||
parser.add_argument("--dataset_type", type=str, default="tfrecord",
|
||||
help="dataset type tfrecord/mindrecord, default is tfrecord")
|
||||
parser.add_argument("--result_path", type=str, default="./preprocess_Result/", help="result path")
|
||||
args_opt = parser.parse_args()
|
||||
|
||||
BATCH_SIZE = 32
|
||||
|
||||
if args_opt.dataset_type == "tfrecord":
|
||||
dataset_type = DataType.TFRECORD
|
||||
elif args_opt.dataset_type == "mindrecord":
|
||||
|
@ -28,6 +38,7 @@ elif args_opt.dataset_type == "mindrecord":
|
|||
else:
|
||||
raise Exception("dataset format is not supported yet")
|
||||
|
||||
|
||||
def get_bin():
|
||||
"""
|
||||
generate bin files.
|
||||
|
@ -41,7 +52,7 @@ def get_bin():
|
|||
os.makedirs(token_type_id_path)
|
||||
os.makedirs(input_mask_path)
|
||||
|
||||
eval_dataset = create_tinybert_dataset('td', batch_size=eval_cfg.batch_size,
|
||||
eval_dataset = create_tinybert_dataset('td', batch_size=BATCH_SIZE,
|
||||
device_num=1, rank=0, do_shuffle="false",
|
||||
data_dir=args_opt.eval_data_dir,
|
||||
schema_dir=args_opt.schema_dir,
|
||||
|
@ -49,7 +60,7 @@ def get_bin():
|
|||
columns_list = ["input_ids", "input_mask", "segment_ids", "label_ids"]
|
||||
label_list = []
|
||||
for j, data in enumerate(eval_dataset.create_dict_iterator(output_numpy=True, num_epochs=1)):
|
||||
file_name = "tinybert_bs" + str(eval_cfg.batch_size) + "_" + str(j) + ".bin"
|
||||
file_name = "tinybert_bs" + str(BATCH_SIZE) + "_" + str(j) + ".bin"
|
||||
input_data = []
|
||||
for i in columns_list:
|
||||
input_data.append(data[i])
|
||||
|
|
|
@ -16,17 +16,21 @@
|
|||
|
||||
echo "=============================================================================================================="
|
||||
echo "Please run the script as: "
|
||||
echo "bash scripts/run_standalone_td.sh"
|
||||
echo "for example: bash scripts/run_standalone_td.sh"
|
||||
echo "bash scripts/run_standalone_td.sh [config_path]"
|
||||
echo "for example: bash scripts/run_standalone_td.sh /home/data1/td_config_sst2.yaml"
|
||||
echo "=============================================================================================================="
|
||||
|
||||
if [ $# != 1 ]; then
|
||||
echo "bash scripts/run_standalone_td.sh [config_path]"
|
||||
exit 1
|
||||
fi
|
||||
mkdir -p ms_log
|
||||
PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd)
|
||||
CUR_DIR=`pwd`
|
||||
export GLOG_log_dir=${CUR_DIR}/ms_log
|
||||
export GLOG_logtostderr=0
|
||||
python ${PROJECT_DIR}/../run_task_distill.py \
|
||||
--config_path="../../td_config/td_config_sst2.yaml" \
|
||||
--config_path=$1 \
|
||||
--device_target="Ascend" \
|
||||
--device_id=0 \
|
||||
--do_train="true" \
|
||||
|
|
|
@ -152,9 +152,11 @@ def get_config():
|
|||
"""
|
||||
Get Config according to the yaml file and cli arguments.
|
||||
"""
|
||||
def get_abs_path(path_relative):
|
||||
def get_abs_path(path_input):
|
||||
if os.path.isabs(path_input):
|
||||
return path_input
|
||||
current_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
return os.path.join(current_dir, path_relative)
|
||||
return os.path.join(current_dir, path_input)
|
||||
parser = argparse.ArgumentParser(description="default name", add_help=False)
|
||||
parser.add_argument("--config_path", type=get_abs_path, default="../../gd_config.yaml",
|
||||
help="Config file path")
|
||||
|
|
|
@ -40,9 +40,6 @@ dataset_type: "tfrecord"
|
|||
ckpt_file: ''
|
||||
file_name: "tinybert"
|
||||
file_format: "AIR"
|
||||
# postprocess related
|
||||
result_path: "./result_Files"
|
||||
label_path: "./preprocess_Result/label_ids.npy"
|
||||
phase1_cfg:
|
||||
batch_size: 32
|
||||
loss_scale_value: 256
|
||||
|
|
|
@ -40,9 +40,6 @@ dataset_type: "tfrecord"
|
|||
ckpt_file: ''
|
||||
file_name: "tinybert"
|
||||
file_format: "AIR"
|
||||
# postprocess related
|
||||
result_path: "./result_Files"
|
||||
label_path: "./preprocess_Result/label_ids.npy"
|
||||
phase1_cfg:
|
||||
batch_size: 32
|
||||
loss_scale_value: 256
|
||||
|
|
|
@ -40,9 +40,6 @@ dataset_type: "tfrecord"
|
|||
ckpt_file: ''
|
||||
file_name: "tinybert"
|
||||
file_format: "AIR"
|
||||
# postprocess related
|
||||
result_path: "./result_Files"
|
||||
label_path: "./preprocess_Result/label_ids.npy"
|
||||
phase1_cfg:
|
||||
batch_size: 32
|
||||
loss_scale_value: 256
|
||||
|
|
|
@ -40,9 +40,6 @@ dataset_type: "tfrecord"
|
|||
ckpt_file: ''
|
||||
file_name: "tinybert"
|
||||
file_format: "AIR"
|
||||
# postprocess related
|
||||
result_path: "./result_Files"
|
||||
label_path: "./preprocess_Result/label_ids.npy"
|
||||
phase1_cfg:
|
||||
batch_size: 32
|
||||
loss_scale_value: 256
|
||||
|
|
|
@ -40,9 +40,6 @@ dataset_type: "tfrecord"
|
|||
ckpt_file: ''
|
||||
file_name: "tinybert"
|
||||
file_format: "AIR"
|
||||
# postprocess related
|
||||
result_path: "./result_Files"
|
||||
label_path: "./preprocess_Result/label_ids.npy"
|
||||
phase1_cfg:
|
||||
batch_size: 32
|
||||
loss_scale_value: 256
|
||||
|
|
Loading…
Reference in New Issue