diff --git a/model_zoo/official/nlp/bert/pretrain_config.yaml b/model_zoo/official/nlp/bert/pretrain_config.yaml index 7df98a24669..329133d403f 100644 --- a/model_zoo/official/nlp/bert/pretrain_config.yaml +++ b/model_zoo/official/nlp/bert/pretrain_config.yaml @@ -14,7 +14,7 @@ enable_profiling: False # ============================================================================== description: 'run_pretrain' distribute: 'false' -epoch_size: 1 +epoch_size: 40 device_id: 0 device_num: 1 enable_save_ckpt: 'true' diff --git a/model_zoo/official/nlp/tinybert/README.md b/model_zoo/official/nlp/tinybert/README.md index 2e38dac7918..da62e04fd1d 100644 --- a/model_zoo/official/nlp/tinybert/README.md +++ b/model_zoo/official/nlp/tinybert/README.md @@ -71,65 +71,122 @@ The backbone structure of TinyBERT is transformer, the transformer contains four # [Quick Start](#contents) -After installing MindSpore via the official website, you can start general distill, task distill and evaluation as follows: +- running on local -```text -# run standalone general distill example -bash scripts/run_standalone_gd.sh + After installing MindSpore via the official website, you can start general distill, task distill and evaluation as follows: -Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`. + ```text + # run standalone general distill example + bash scripts/run_standalone_gd.sh -# For Ascend device, run distributed general distill example -bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json + Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`. -Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first. + # For Ascend device, run distributed general distill example + bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json -# For GPU device, run distributed general distill example -bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt + Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first. -# run task distill and evaluation example -bash scripts/run_standalone_td.sh + # For GPU device, run distributed general distill example + bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt -Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first. -If running on GPU, please set the `device_target=GPU`. -``` + # run task distill and evaluation example + bash scripts/run_standalone_td.sh {path}/*.yaml -For distributed training on Ascend, a hccl configuration file with JSON format needs to be created in advance. -Please follow the instructions in the link below: -https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. + Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first. + If running on GPU, please set the `device_target=GPU`. + ``` -For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/en/master/dataset_loading.html#tfrecord) format. + For distributed training on Ascend, a hccl configuration file with JSON format needs to be created in advance. + Please follow the instructions in the link below: + https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. -```text -For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. + For dataset, if you want to set the format and parameters, a schema configuration file with JSON format needs to be created, please refer to [tfrecord](https://www.mindspore.cn/doc/programming_guide/en/master/dataset_loading.html#tfrecord) format. -For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. + ```text + For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. -`numRows` is the only option which could be set by user, the others value must be set according to the dataset. + For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. -For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: -{ - "datasetType": "TF", - "numRows": 7680, - "columns": { - "input_ids": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "input_mask": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "segment_ids": { - "type": "int64", - "rank": 1, - "shape": [256] + `numRows` is the only option which could be set by user, the others value must be set according to the dataset. + + For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: + { + "datasetType": "TF", + "numRows": 7680, + "columns": { + "input_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "input_mask": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "segment_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + } } } -} -``` + ``` + +- running on ModelArts + + If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows + + - general_distill with 8 cards on ModelArts + + ```python + # (1) Upload the code folder to S3 bucket. + # (2) Click to "create training task" on the website UI interface. + # (3) Set the code directory to "/{path}/tinybert" on the website UI interface. + # (4) Set the startup file to /{path}/tinybert/run_general_distill.py" on the website UI interface. + # (5) Perform a or b. + # a. setting parameters in /{path}/tinybert/gd_config.yaml. + # 1. Set ”enable_modelarts=True“ + # 2. Set other parameters('config_path' cannot be set here), other parameter configuration can refer to `./scripts/run_distributed_gd_ascend.sh` + # b. adding on the website UI interface. + # 1. Add ”enable_modelarts=True“ + # 3. Add other parameters, other parameter configuration can refer to `./scripts/run_distributed_gd_ascend.sh` + # Note that 'data_dir' and 'schema_dir' fill in the relative path relative to the path selected in step 7. + # Add "config_path=../../gd_config.yaml" on the webpage ('config_path' is the path of the'*.yaml' file relative to {path}/tinybert/src/model_utils/config.py, and'* .yaml' file must be in {path}/bert/) + # (6) Upload the dataset to S3 bucket. + # (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (there is only data or zip package under this path). + # (8) Set the "Output file path" and "Job log path" to your path on the website UI interface. + # (9) Under the item "resource pool selection", select the specification of 8 cards. + # (10) Create your job. + # After training, the '*.ckpt' file will be saved under the'training output file path' + ``` + + - Running task_distill with single card on ModelArts + + ```python + # (1) Upload the code folder to S3 bucket. + # (2) Click to "create training task" on the website UI interface. + # (3) Set the code directory to "/{path}/tinybert" on the website UI interface. + # (4) Set the startup file to /{path}/tinybert/run_ner.py"(or run_pretrain.py or run_squad.py) on the website UI interface. + # (5) Perform a or b. + # Add "config_path=../../td_config/td_config_sst2.yaml" on the web page (select the *.yaml configuration file according to the distill task) + # a. setting parameters in task_ner_config.yaml(or task_squad_config.yaml or task_classifier_config.yaml under the folder `/{path}/bert/` + # 1. Set ”enable_modelarts=True“ + # 2. Set "task_name=SST-2" (depending on the task, select from ["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]) + # 3. Set other parameters, other parameter configuration can refer to './scripts/run_standalone_td.sh'. + # b. adding on the website UI interface. + # 1. Add ”enable_modelarts=True“ + # 2. Add "task_name=SST-2" (depending on the task, select from ["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]) + # 3. Add other parameters, other parameter configuration can refer to './scripts/run_standalone_td.sh'. + # Note that 'load_teacher_ckpt_path', 'train_data_dir', 'eval_data_dir' and 'schema_dir' fill in the relative path relative to the path selected in step 7. + # Note that 'load_gd_ckpt_path' fills in the relative path relative to the path selected in step 3. + # (6) Upload the dataset to S3 bucket. + # (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path. + # (8) Set the "Output file path" and "Job log path" to your path on the website UI interface. + # (9) Under the item "resource pool selection", select the specification of a single card. + # (10) Create your job. + # After training, the '*.ckpt' file will be saved under the'training output file path'. + ``` # [Script Description](#contents) @@ -139,23 +196,39 @@ For example, the dataset is cn-wiki-128, the schema file for general distill pha . └─bert ├─README.md + ├─README_CN.md ├─scripts ├─run_distributed_gd_ascend.sh # shell script for distributed general distill phase on Ascend ├─run_distributed_gd_gpu.sh # shell script for distributed general distill phase on GPU + ├─run_infer_310.sh # shell script for 310 infer ├─run_standalone_gd.sh # shell script for standalone general distill phase ├─run_standalone_td.sh # shell script for standalone task distill phase ├─src + ├─model_utils + ├── config.py # parse *.yaml parameter configuration file + ├── devcie_adapter.py # distinguish local/ModelArts training + ├── local_adapter.py # get related environment variables in local training + └── moxing_adapter.py # get related environment variables in ModelArts training ├─__init__.py ├─assessment_method.py # assessment method for evaluation ├─dataset.py # data processing - ├─gd_config.py # parameter configuration for general distill phase - ├─td_config.py # parameter configuration for task distill phase ├─tinybert_for_gd_td.py # backbone code of network ├─tinybert_model.py # backbone code of network ├─utils.py # util function + ├─td_config # folder where *.yaml files of different distillation tasks are located + ├── td_config_15cls.yaml + ├── td_config_mnli.py + ├── td_config_ner.py + ├── td_config_qnli.py + └── td_config_stt2.py ├─__init__.py + ├─export.py # export scripts + ├─gd_config.yaml # parameter configuration for general_distill + ├─mindspore_hub_conf.py # Mindspore Hub interface + ├─postprocess.py # scripts for 310 postprocess + ├─preprocess.py # scripts for 310 preprocess ├─run_general_distill.py # train net for general distillation - ├─run_task_distill.py # train and eval net for task distillation + └─run_task_distill.py # train and eval net for task distillation ``` ## [Script Parameters](#contents) @@ -231,7 +304,7 @@ options: ## Options and Parameters -`gd_config.py` and `td_config.py` contain parameters of BERT model and options for optimizer and lossscale. +`gd_config.yaml` and `td_config/*.yaml` contain parameters of BERT model and options for optimizer and lossscale. ### Options @@ -358,7 +431,7 @@ If you want to after running and continue to eval, please set `do_train=true` an #### evaluation on SST-2 dataset ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: @@ -378,7 +451,7 @@ The best acc is 0.902777 Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: @@ -398,7 +471,7 @@ The best acc is 0.813929 Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` The command above will run in the background, you can view the results the file log.txt. The accuracy of the test dataset will be as follows: @@ -417,6 +490,8 @@ The best acc is 0.891176 ### [Export MindIR](#contents) +- Export on local + ```shell python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT] ``` @@ -424,6 +499,32 @@ python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [ The ckpt_file parameter is required, `EXPORT_FORMAT` should be in ["AIR", "MINDIR"] +- Export on ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start as follows) + +```python +# (1) Upload the code folder to S3 bucket. +# (2) Click to "create training task" on the website UI interface. +# (3) Set the code directory to "/{path}/tinybert" on the website UI interface. +# (4) Set the startup file to /{path}/tinybert/export.py" on the website UI interface. +# (5) Perform a or b. +# a. Set parameters in a *.yaml file under /path/tinybert/td_config/ +# 1. Set ”enable_modelarts: True“ +# 2. Set “ckpt_file: ./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.) +# 3. Set ”file_name: bert_ner“ +# 4. Set ”file_format:MINDIR“ +# b. Adding on the website UI interface. +# 1. Add ”enable_modelarts=True“ +# 2. Add “ckpt_file=./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.) +# 3. Add ”file_name=tinybert_sst2“ +# 4. Add ”file_format=MINDIR“ +# Finally, "config_path=../../td_config/*.yaml" must be added on the web page (select the *.yaml configuration file according to the downstream task) +# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.(Although it is useless, but to do) +# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface. +# (9) Under the item "resource pool selection", select the specification of a single card. +# (10) Create your job. +# You will see tinybert_sst2.mindir under {Output file path}. +``` + ### Infer on Ascend310 Before performing inference, the mindir file must be exported by `export.py` script. We only provide an example of inference using MINDIR model. @@ -459,7 +560,7 @@ Inference result is saved in current path, you can find result like this in acc. | uploaded Date | 08/20/2020 | 08/24/2020 | | MindSpore Version | 1.0.0 | 1.0.0 | | Dataset | en-wiki-128 | en-wiki-128 | -| Training Parameters | src/gd_config.py | src/gd_config.py | +| Training Parameters | src/gd_config.yaml | src/gd_config.yaml | | Optimizer | AdamWeightDecay | AdamWeightDecay | | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | | outputs | probability | probability | @@ -489,7 +590,7 @@ Inference result is saved in current path, you can find result like this in acc. In run_standaloned_td.sh, we set do_shuffle to shuffle the dataset. -In gd_config.py and td_config.py, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node. +In gd_config.yaml and td_config/*.yaml, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node. In run_general_distill.py, we set the random seed to make sure distribute training has the same init weight. diff --git a/model_zoo/official/nlp/tinybert/README_CN.md b/model_zoo/official/nlp/tinybert/README_CN.md index 23569c0b79b..afbe7197ea0 100644 --- a/model_zoo/official/nlp/tinybert/README_CN.md +++ b/model_zoo/official/nlp/tinybert/README_CN.md @@ -78,63 +78,118 @@ TinyBERT模型的主干结构是转换器,转换器包含四个编码器模块 从官网下载安装MindSpore之后,可以开始一般蒸馏。任务蒸馏和评估方法如下: -```bash -# 单机运行一般蒸馏示例 -bash scripts/run_standalone_gd.sh +- 在本地运行 -Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`. + ```bash + # 单机运行一般蒸馏示例 + bash scripts/run_standalone_gd.sh -# Ascend设备上分布式运行一般蒸馏示例 -bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json + Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_standalone_gd.sh file first. If running on GPU, please set the `device_target=GPU`. -Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first. + # Ascend设备上分布式运行一般蒸馏示例 + bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json -# GPU设备上分布式运行一般蒸馏示例 -bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt + Before running the shell script, please set the `load_teacher_ckpt_path`, `data_dir`, `schema_dir` and `dataset_type` in the run_distributed_gd_ascend.sh file first. -# 运行任务蒸馏和评估示例 -bash scripts/run_standalone_td.sh + # GPU设备上分布式运行一般蒸馏示例 + bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt -Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first. -If running on GPU, please set the `device_target=GPU`. -``` + # 运行任务蒸馏和评估示例 + bash scripts/run_standalone_td.sh {path}/*.yaml -若在Ascend设备上运行分布式训练,请提前创建JSON格式的HCCL配置文件。 -详情参见如下链接: -https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. + Before running the shell script, please set the `task_name`, `load_teacher_ckpt_path`, `load_gd_ckpt_path`, `train_data_dir`, `eval_data_dir`, `schema_dir` and `dataset_type` in the run_standalone_td.sh file first. + If running on GPU, please set the `device_target=GPU`. + ``` -如需设置数据集格式和参数,请创建JSON格式的视图配置文件,详见[TFRecord](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/dataset_loading.html#tfrecord) 格式。 + 若在Ascend设备上运行分布式训练,请提前创建JSON格式的HCCL配置文件。 + 详情参见如下链接: + https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools. -```text -For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. + 如需设置数据集格式和参数,请创建JSON格式的视图配置文件,详见[TFRecord](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/dataset_loading.html#tfrecord) 格式。 -For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. + ```text + For general task, schema file contains ["input_ids", "input_mask", "segment_ids"]. -`numRows` is the only option which could be set by user, the others value must be set according to the dataset. + For task distill and eval phase, schema file contains ["input_ids", "input_mask", "segment_ids", "label_ids"]. -For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: -{ - "datasetType": "TF", - "numRows": 7680, - "columns": { - "input_ids": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "input_mask": { - "type": "int64", - "rank": 1, - "shape": [256] - }, - "segment_ids": { - "type": "int64", - "rank": 1, - "shape": [256] - } - } -} -``` + `numRows` is the only option which could be set by user, the others value must be set according to the dataset. + + For example, the dataset is cn-wiki-128, the schema file for general distill phase as following: + { + "datasetType": "TF", + "numRows": 7680, + "columns": { + "input_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "input_mask": { + "type": "int64", + "rank": 1, + "shape": [256] + }, + "segment_ids": { + "type": "int64", + "rank": 1, + "shape": [256] + } + } + } + ``` + +- 在ModelArts上运行(如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/)) + + - 在ModelArt上使用8卡一般蒸馏 + + ```python + # (1) 上传你的代码到 s3 桶上 + # (2) 在ModelArts上创建训练任务 + # (3) 选择代码目录 /{path}/tinybert + # (4) 选择启动文件 /{path}/tinybert/run_general_distill.py + # (5) 执行a或b + # a. 在 /{path}/tinybert/default_config.yaml 文件中设置参数 + # 1. 设置 ”enable_modelarts=True“ + # 2. 设置其它参数(config_path无法在这里设置),其它参数配置可以参考 `./scripts/run_distributed_gd_ascend.sh` + # b. 在 网页上设置 + # 1. 添加 ”enable_modelarts=True“ + # 2. 添加其它参数,其它参数配置可以参考 `./scripts/run_distributed_gd_ascend.sh` + # 注意'data_dir'、'schema_dir'填写相对于第7步所选路径的相对路径。 + # 在网页上添加 “config_path=../../gd_config.yaml”('config_path' 是'*.yaml'文件相对于 {path}/tinybert/src/model_utils/config.py的路径, 且'*.yaml'文件必须在{path}/bert/内) + # (6) 上传你的 数据 到 s3 桶上 + # (7) 在网页上勾选数据存储位置,设置“训练数据集”路径 + # (8) 在网页上设置“训练输出文件路径”、“作业日志路径” + # (9) 在网页上的’资源池选择‘项目下, 选择8卡规格的资源 + # (10) 创建训练作业 + # 训练结束后会在'训练输出文件路径'下保存训练的权重 + ``` + + - 在ModelArts上使用单卡运行任务蒸馏 + + ```python + # (1) 上传你的代码到 s3 桶上 + # (2) 在ModelArts上创建训练任务 + # (3) 选择代码目录 /{path}/tinybert + # (4) 选择启动文件 /{path}/tinybert/run_task_distill.py + # (5) 在网页上添加 “config_path=../../td_config/td_config_sst2.yaml”(根据蒸馏任务选择 *.yaml 配置文件) + # 执行a或b + # a. 在选定的'*.yaml'文件中设置参数 + # 1. 设置 ”enable_modelarts=True“ + # 2. 设置 ”task_name=SST-2“(根据任务不同,在["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]中选择) + # 3. 设置其它参数,其它参数配置可以参考 './scripts/'下的 `run_standalone_td.sh` + # b. 在 网页上设置 + # 1. 添加 ”enable_modelarts=True“ + # 2. 添加 ”task_name=SST-2“(根据任务不同,在["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"]中选择) + # 3. 添加其它参数,其它参数配置可以参考 './scripts/'下的 `run_standalone_td.sh` + # 注意load_teacher_ckpt_path,train_data_dir,eval_data_dir,schema_dir填写相对于第7步所选路径的相对路径。 + # 注意load_gd_ckpt_path填写相对于第3步所选路径的相对路径 + # (6) 上传你的 数据 到 s3 桶上 + # (7) 在网页上勾选数据存储位置,设置“训练数据集”路径 + # (8) 在网页上设置“训练输出文件路径”、“作业日志路径” + # (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源 + # (10) 创建训练作业 + # 训练结束后会在'训练输出文件路径'下保存训练的权重 + ``` # 脚本说明 @@ -142,25 +197,41 @@ For example, the dataset is cn-wiki-128, the schema file for general distill pha ```shell . -└─bert +└─tinybert ├─README.md + ├─README_CN.md ├─scripts ├─run_distributed_gd_ascend.sh # Ascend设备上分布式运行一般蒸馏的shell脚本 ├─run_distributed_gd_gpu.sh # GPU设备上分布式运行一般蒸馏的shell脚本 + ├─run_infer_310.sh # 310推理的shell脚本 ├─run_standalone_gd.sh # 单机运行一般蒸馏的shell脚本 ├─run_standalone_td.sh # 单机运行任务蒸馏的shell脚本 ├─src + ├─model_utils + ├── config.py # 解析 *.yaml参数配置文件 + ├── devcie_adapter.py # 区分本地/ModelArts训练 + ├── local_adapter.py # 本地训练获取相关环境变量 + └── moxing_adapter.py # ModelArts训练获取相关环境变量、交换数据 ├─__init__.py ├─assessment_method.py # 评估过程的测评方法 ├─dataset.py # 数据处理 - ├─gd_config.py # 一般蒸馏阶段的参数配置 - ├─td_config.py # 任务蒸馏阶段的参数配置 ├─tinybert_for_gd_td.py # 网络骨干编码 ├─tinybert_model.py # 网络骨干编码 ├─utils.py # util函数 + ├─td_config # 不同蒸馏任务的*.yaml文件所在文件夹 + ├── td_config_15cls.yaml + ├── td_config_mnli.py + ├── td_config_ner.py + ├── td_config_qnli.py + └── td_config_stt2.py ├─__init__.py + ├─export.py # export scripts + ├─gd_config.yaml # 一般蒸馏参数配置文件 + ├─mindspore_hub_conf.py # Mindspore Hub接口 + ├─postprocess.py # 310推理前处理脚本 + ├─preprocess.py # 310推理后处理脚本 ├─run_general_distill.py # 一般蒸馏训练网络 - ├─run_task_distill.py # 任务蒸馏训练评估网络 + └─run_task_distill.py # 任务蒸馏训练评估网络 ``` ## 脚本参数 @@ -233,7 +304,7 @@ options: ## 选项及参数 -`gd_config.py` and `td_config.py` 包含BERT模型参数与优化器和损失缩放选项。 +`gd_config.yaml` and `td_config/*.yaml` 包含BERT模型参数与优化器和损失缩放选项。 ### 选项 @@ -321,7 +392,7 @@ epoch: 1, step: 100, outputs are 28.2093 运行以下命令前,确保已设置load_teacher_ckpt_path、data_dir和schma_dir。请将路径设置为绝对全路径,例如/username/checkpoint_100_300.ckpt。 ```bash -bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json +bash scripts/run_distributed_gd_ascend.sh 8 1 /path/hccl.json /path/gd_config.json ``` 以上命令后台运行,您可以在log.txt文件中查看运行结果。训练后,可以得到默认log*文件夹路径下的检查点文件。 得到如下损失值: @@ -339,7 +410,7 @@ epoch: 1, step: 100, outputs are (Tensor(shape=[1], dtype=Float32, 30.5901), Ten 输入绝对全路径,例如:"/username/checkpoint_100_300.ckpt"。 ```bash -bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt +bash scripts/run_distributed_gd_gpu.sh 8 1 /path/data/ /path/schema.json /path/teacher.ckpt /path/gd_config.json ``` 以上命令后台运行,您可以在log.txt文件中查看运行结果。训练结束后,您可以在默认LOG*文件夹下找到检查点文件。得到如下损失值: @@ -359,7 +430,7 @@ epoch: 1, step: 1, outputs are 63.4098 #### 基于SST-2数据集进行评估 ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` 以上命令后台运行,您可以在log.txt文件中查看运行结果。得出如下测试数据集准确率: @@ -379,7 +450,7 @@ The best acc is 0.902777 运行如下命令前,请确保已设置加载与训练检查点路径。请将检查点路径设置为绝对全路径,例如,/username/pretrain/checkpoint_100_300.ckpt。 ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` 以上命令将在后台运行,请在log.txt文件中查看结果。测试数据集的准确率如下: @@ -399,7 +470,7 @@ The best acc is 0.813929 运行如下命令前,请确保已设置加载与训练检查点路径。请将检查点路径设置为绝对全路径,例如/username/pretrain/checkpoint_100_300.ckpt。 ```bash -bash scripts/run_standalone_td.sh +bash scripts/run_standalone_td.sh {path}/*.yaml ``` 以上命令后台运行,您可以在log.txt文件中查看运行结果。测试数据集的准确率如下: @@ -418,8 +489,36 @@ The best acc is 0.891176 ### [导出MindIR](#contents) -```shell -python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT] +- 在本地导出 + + ```shell + python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT] + ``` + +- 在ModelArts上导出 + +```python +# (1) 上传你的代码到 s3 桶上 +# (2) 在ModelArts上创建训练任务 +# (3) 选择代码目录 /{path}/tinybert +# (4) 选择启动文件 /{path}/tinybert/export.py +# (5) 执行a或b +# a. 在 /path/tinybert/td_config/ 下的某个*.yaml文件中设置参数 +# 1. 设置 ”enable_modelarts: True“ +# 2. 设置 “ckpt_file: ./{path}/*.ckpt”('ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下) +# 3. 设置 ”file_name: tinybert_sst2“ +# 4. 设置 ”file_format:MINDIR“ +# b. 在 网页上设置 +# 1. 添加 ”enable_modelarts=True“ +# 2. 添加 “ckpt_file=./{path}/*.ckpt”(('ckpt_file' 指待导出的'*.ckpt'权重文件相对于`export.py`的路径, 且权重文件必须包含在代码目录下) +# 3. 添加 ”file_name=tinybert_sst2“ +# 4. 添加 ”file_format=MINDIR“ +# 最后必须在网页上添加 “config_path=../../td_config/*.yaml”(根据下游任务选择 *.yaml 配置文件) +# (7) 在网页上勾选数据存储位置,设置“训练数据集”路径(虽然没用,但要做) +# (8) 在网页上设置“训练输出文件路径”、“作业日志路径” +# (9) 在网页上的’资源池选择‘项目下, 选择单卡规格的资源 +# (10) 创建训练作业 +# 你将在{Output file path}下看到 'tinybert_sst2.mindir'文件 ``` 参数ckpt_file为必填项, @@ -460,7 +559,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [SCHEMA_DIR] [DATASET_TYPE] [ | 上传日期 | 2020-08-20 | 2020-08-24 | | MindSpore版本 | 0.6.0 | 0.7.0 | | 数据集 | en-wiki-128 | en-wiki-128 | -| 训练参数 | src/gd_config.py | src/gd_config.py | +| 训练参数 | src/gd_config.yaml | src/gd_config.yaml | | 优化器| AdamWeightDecay | AdamWeightDecay | | 损耗函数 | SoftmaxCrossEntropy | SoftmaxCrossEntropy | | 输出 | 概率 | 概率 | @@ -489,7 +588,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [SCHEMA_DIR] [DATASET_TYPE] [ run_standaloned_td.sh脚本中设置了do_shuffle来轮换数据集。 -gd_config.py和td_config.py文件中设置了hidden_dropout_prob和attention_pros_dropout_prob,使网点随机失活。 +gd_config.yaml和td_config/*.yaml文件中设置了hidden_dropout_prob和attention_pros_dropout_prob,使网点随机失活。 run_general_distill.py文件中设置了随机种子,确保分布式训练初始权重相同。 diff --git a/model_zoo/official/nlp/tinybert/postprocess.py b/model_zoo/official/nlp/tinybert/postprocess.py index 276e82cbe10..defa4614c17 100644 --- a/model_zoo/official/nlp/tinybert/postprocess.py +++ b/model_zoo/official/nlp/tinybert/postprocess.py @@ -16,12 +16,22 @@ """postprocess""" import os +import argparse import numpy as np from mindspore import Tensor from src.assessment_method import Accuracy, F1 -from src.model_utils.config import eval_cfg, config as args_opt +parser = argparse.ArgumentParser(description='postprocess') +parser.add_argument("--task_name", type=str, default="", choices=["SST-2", "QNLI", "MNLI", "TNEWS", "CLUENER"], + help="The name of the task to train.") +parser.add_argument("--assessment_method", type=str, default="accuracy", choices=["accuracy", "bf1", "mf1"], + help="assessment_method include: [accuracy, bf1, mf1], default is accuracy") +parser.add_argument("--result_path", type=str, default="./result_Files", help="result path") +parser.add_argument("--label_path", type=str, default="./preprocess_Result/label_ids.npy", help="label path") +args_opt = parser.parse_args() + +BATCH_SIZE = 32 DEFAULT_NUM_LABELS = 2 DEFAULT_SEQ_LENGTH = 128 task_params = {"SST-2": {"num_labels": 2, "seq_length": 64}, @@ -49,8 +59,11 @@ class Task: if self.task_name in task_params and "seq_length" in task_params[self.task_name]: return task_params[self.task_name]["seq_length"] return DEFAULT_SEQ_LENGTH + + task = Task(args_opt.task_name) + def eval_result_print(assessment_method="accuracy", callback=None): """print eval result""" if assessment_method == "accuracy": @@ -79,9 +92,9 @@ def get_acc(): labels = np.load(args_opt.label_path) file_num = len(os.listdir(args_opt.result_path)) for i in range(file_num): - f_name = "tinybert_bs" + str(eval_cfg.batch_size) + "_" + str(i) + "_0.bin" + f_name = "tinybert_bs" + str(BATCH_SIZE) + "_" + str(i) + "_0.bin" logits = np.fromfile(os.path.join(args_opt.result_path, f_name), np.float32) - logits = logits.reshape(eval_cfg.batch_size, task.num_labels) + logits = logits.reshape(BATCH_SIZE, task.num_labels) label_ids = labels[i] callback.update(Tensor(logits), Tensor(label_ids)) print("==============================================================") diff --git a/model_zoo/official/nlp/tinybert/preprocess.py b/model_zoo/official/nlp/tinybert/preprocess.py index 0c97dea76de..e5a1612dde9 100644 --- a/model_zoo/official/nlp/tinybert/preprocess.py +++ b/model_zoo/official/nlp/tinybert/preprocess.py @@ -16,11 +16,21 @@ """preprocess""" import os +import argparse import numpy as np -from src.model_utils.config import eval_cfg, config as args_opt from src.dataset import create_tinybert_dataset, DataType +parser = argparse.ArgumentParser(description='preprocess') +parser.add_argument("--eval_data_dir", type=str, default="", help="Data path, it is better to use absolute path") +parser.add_argument("--schema_dir", type=str, default="", help="Schema path, it is better to use absolute path") +parser.add_argument("--dataset_type", type=str, default="tfrecord", + help="dataset type tfrecord/mindrecord, default is tfrecord") +parser.add_argument("--result_path", type=str, default="./preprocess_Result/", help="result path") +args_opt = parser.parse_args() + +BATCH_SIZE = 32 + if args_opt.dataset_type == "tfrecord": dataset_type = DataType.TFRECORD elif args_opt.dataset_type == "mindrecord": @@ -28,6 +38,7 @@ elif args_opt.dataset_type == "mindrecord": else: raise Exception("dataset format is not supported yet") + def get_bin(): """ generate bin files. @@ -41,7 +52,7 @@ def get_bin(): os.makedirs(token_type_id_path) os.makedirs(input_mask_path) - eval_dataset = create_tinybert_dataset('td', batch_size=eval_cfg.batch_size, + eval_dataset = create_tinybert_dataset('td', batch_size=BATCH_SIZE, device_num=1, rank=0, do_shuffle="false", data_dir=args_opt.eval_data_dir, schema_dir=args_opt.schema_dir, @@ -49,7 +60,7 @@ def get_bin(): columns_list = ["input_ids", "input_mask", "segment_ids", "label_ids"] label_list = [] for j, data in enumerate(eval_dataset.create_dict_iterator(output_numpy=True, num_epochs=1)): - file_name = "tinybert_bs" + str(eval_cfg.batch_size) + "_" + str(j) + ".bin" + file_name = "tinybert_bs" + str(BATCH_SIZE) + "_" + str(j) + ".bin" input_data = [] for i in columns_list: input_data.append(data[i]) diff --git a/model_zoo/official/nlp/tinybert/scripts/run_standalone_td.sh b/model_zoo/official/nlp/tinybert/scripts/run_standalone_td.sh index 974a1eace1f..077ec63dd58 100644 --- a/model_zoo/official/nlp/tinybert/scripts/run_standalone_td.sh +++ b/model_zoo/official/nlp/tinybert/scripts/run_standalone_td.sh @@ -16,17 +16,21 @@ echo "==============================================================================================================" echo "Please run the script as: " -echo "bash scripts/run_standalone_td.sh" -echo "for example: bash scripts/run_standalone_td.sh" +echo "bash scripts/run_standalone_td.sh [config_path]" +echo "for example: bash scripts/run_standalone_td.sh /home/data1/td_config_sst2.yaml" echo "==============================================================================================================" +if [ $# != 1 ]; then + echo "bash scripts/run_standalone_td.sh [config_path]" + exit 1 +fi mkdir -p ms_log PROJECT_DIR=$(cd "$(dirname "$0")" || exit; pwd) CUR_DIR=`pwd` export GLOG_log_dir=${CUR_DIR}/ms_log export GLOG_logtostderr=0 python ${PROJECT_DIR}/../run_task_distill.py \ - --config_path="../../td_config/td_config_sst2.yaml" \ + --config_path=$1 \ --device_target="Ascend" \ --device_id=0 \ --do_train="true" \ diff --git a/model_zoo/official/nlp/tinybert/src/model_utils/config.py b/model_zoo/official/nlp/tinybert/src/model_utils/config.py index 0637e41ce2b..78e062c0f22 100644 --- a/model_zoo/official/nlp/tinybert/src/model_utils/config.py +++ b/model_zoo/official/nlp/tinybert/src/model_utils/config.py @@ -152,9 +152,11 @@ def get_config(): """ Get Config according to the yaml file and cli arguments. """ - def get_abs_path(path_relative): + def get_abs_path(path_input): + if os.path.isabs(path_input): + return path_input current_dir = os.path.dirname(os.path.abspath(__file__)) - return os.path.join(current_dir, path_relative) + return os.path.join(current_dir, path_input) parser = argparse.ArgumentParser(description="default name", add_help=False) parser.add_argument("--config_path", type=get_abs_path, default="../../gd_config.yaml", help="Config file path") diff --git a/model_zoo/official/nlp/tinybert/td_config/td_config_15cls.yaml b/model_zoo/official/nlp/tinybert/td_config/td_config_15cls.yaml index 8d5e055aadd..eb4170fd30d 100644 --- a/model_zoo/official/nlp/tinybert/td_config/td_config_15cls.yaml +++ b/model_zoo/official/nlp/tinybert/td_config/td_config_15cls.yaml @@ -40,9 +40,6 @@ dataset_type: "tfrecord" ckpt_file: '' file_name: "tinybert" file_format: "AIR" -# postprocess related -result_path: "./result_Files" -label_path: "./preprocess_Result/label_ids.npy" phase1_cfg: batch_size: 32 loss_scale_value: 256 diff --git a/model_zoo/official/nlp/tinybert/td_config/td_config_mnli.yaml b/model_zoo/official/nlp/tinybert/td_config/td_config_mnli.yaml index 7bc413ff043..b902dd18435 100644 --- a/model_zoo/official/nlp/tinybert/td_config/td_config_mnli.yaml +++ b/model_zoo/official/nlp/tinybert/td_config/td_config_mnli.yaml @@ -40,9 +40,6 @@ dataset_type: "tfrecord" ckpt_file: '' file_name: "tinybert" file_format: "AIR" -# postprocess related -result_path: "./result_Files" -label_path: "./preprocess_Result/label_ids.npy" phase1_cfg: batch_size: 32 loss_scale_value: 256 diff --git a/model_zoo/official/nlp/tinybert/td_config/td_config_ner.yaml b/model_zoo/official/nlp/tinybert/td_config/td_config_ner.yaml index 8d5e055aadd..eb4170fd30d 100644 --- a/model_zoo/official/nlp/tinybert/td_config/td_config_ner.yaml +++ b/model_zoo/official/nlp/tinybert/td_config/td_config_ner.yaml @@ -40,9 +40,6 @@ dataset_type: "tfrecord" ckpt_file: '' file_name: "tinybert" file_format: "AIR" -# postprocess related -result_path: "./result_Files" -label_path: "./preprocess_Result/label_ids.npy" phase1_cfg: batch_size: 32 loss_scale_value: 256 diff --git a/model_zoo/official/nlp/tinybert/td_config/td_config_qnli.yaml b/model_zoo/official/nlp/tinybert/td_config/td_config_qnli.yaml index 7bc413ff043..b902dd18435 100644 --- a/model_zoo/official/nlp/tinybert/td_config/td_config_qnli.yaml +++ b/model_zoo/official/nlp/tinybert/td_config/td_config_qnli.yaml @@ -40,9 +40,6 @@ dataset_type: "tfrecord" ckpt_file: '' file_name: "tinybert" file_format: "AIR" -# postprocess related -result_path: "./result_Files" -label_path: "./preprocess_Result/label_ids.npy" phase1_cfg: batch_size: 32 loss_scale_value: 256 diff --git a/model_zoo/official/nlp/tinybert/td_config/td_config_sst2.yaml b/model_zoo/official/nlp/tinybert/td_config/td_config_sst2.yaml index 05b35383957..1e185ef4d58 100644 --- a/model_zoo/official/nlp/tinybert/td_config/td_config_sst2.yaml +++ b/model_zoo/official/nlp/tinybert/td_config/td_config_sst2.yaml @@ -40,9 +40,6 @@ dataset_type: "tfrecord" ckpt_file: '' file_name: "tinybert" file_format: "AIR" -# postprocess related -result_path: "./result_Files" -label_path: "./preprocess_Result/label_ids.npy" phase1_cfg: batch_size: 32 loss_scale_value: 256