forked from mindspore-Ecosystem/mindspore
edit readme
This commit is contained in:
parent
8fa83cca87
commit
263a591473
|
@ -411,20 +411,22 @@ epoch: 0.0, current epoch percent: 0.002, step: 200, outpus are (Tensor(shape=[1
|
||||||
Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt".
|
Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt".
|
||||||
```
|
```
|
||||||
bash scripts/run_classifier.sh
|
bash scripts/run_classifier.sh
|
||||||
|
```
|
||||||
The command above will run in the background, you can view training logs in classfier_log.txt.
|
The command above will run in the background, you can view training logs in classfier_log.txt.
|
||||||
|
|
||||||
If you choose accuracy as assessment method, the result will be as follows:
|
If you choose accuracy as assessment method, the result will be as follows:
|
||||||
|
```
|
||||||
acc_num XXX, total_num XXX, accuracy 0.588986
|
acc_num XXX, total_num XXX, accuracy 0.588986
|
||||||
```
|
```
|
||||||
|
|
||||||
#### evaluation on cluener dataset when running on Ascend
|
#### evaluation on cluener dataset when running on Ascend
|
||||||
```
|
```
|
||||||
bash scripts/ner.sh
|
bash scripts/ner.sh
|
||||||
|
```
|
||||||
The command above will run in the background, you can view training logs in ner_log.txt.
|
The command above will run in the background, you can view training logs in ner_log.txt.
|
||||||
|
|
||||||
If you choose F1 as assessment method, the result will be as follows:
|
If you choose F1 as assessment method, the result will be as follows:
|
||||||
|
```
|
||||||
Precision 0.920507
|
Precision 0.920507
|
||||||
Recall 0.948683
|
Recall 0.948683
|
||||||
F1 0.920507
|
F1 0.920507
|
||||||
|
@ -433,9 +435,10 @@ F1 0.920507
|
||||||
#### evaluation on squad v1.1 dataset when running on Ascend
|
#### evaluation on squad v1.1 dataset when running on Ascend
|
||||||
```
|
```
|
||||||
bash scripts/squad.sh
|
bash scripts/squad.sh
|
||||||
|
```
|
||||||
The command above will run in the background, you can view training logs in squad_log.txt.
|
The command above will run in the background, you can view training logs in squad_log.txt.
|
||||||
The result will be as follows:
|
The result will be as follows:
|
||||||
|
```
|
||||||
{"exact_match": 80.3878923040233284, "f1": 87.6902384023850329}
|
{"exact_match": 80.3878923040233284, "f1": 87.6902384023850329}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
@ -1,11 +1,11 @@
|
||||||
# Run distribute pretrain
|
# Run distribute pretrain
|
||||||
|
|
||||||
## description
|
## description
|
||||||
The number of D chips can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
|
The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
|
||||||
|
|
||||||
|
|
||||||
## how to use
|
## how to use
|
||||||
For example, if we want to generate the launch command of the distributed training of Bert model on D chip, we can run the following command in `/bert/` dir:
|
For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in `/bert/` dir:
|
||||||
```
|
```
|
||||||
python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
|
python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
|
||||||
```
|
```
|
||||||
|
|
|
@ -59,7 +59,7 @@ def append_cmd_env(cmd, key, value):
|
||||||
|
|
||||||
def distribute_pretrain():
|
def distribute_pretrain():
|
||||||
"""
|
"""
|
||||||
distribute pretrain scripts. The number of D chips can be automatically allocated
|
distribute pretrain scripts. The number of Ascend accelerators can be automatically allocated
|
||||||
based on the device_num set in hccl config file, You don not need to specify that.
|
based on the device_num set in hccl config file, You don not need to specify that.
|
||||||
"""
|
"""
|
||||||
cmd = ""
|
cmd = ""
|
||||||
|
|
|
@ -1,11 +1,11 @@
|
||||||
# Run distribute pretrain
|
# Run distribute pretrain
|
||||||
|
|
||||||
## description
|
## description
|
||||||
The number of D chips can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
|
The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
|
||||||
|
|
||||||
|
|
||||||
## how to use
|
## how to use
|
||||||
For example, if we want to generate the launch command of the distributed training of Bert model on D chip, we can run the following command in `/bert/` dir:
|
For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in `/bert/` dir:
|
||||||
```
|
```
|
||||||
python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
|
python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
|
||||||
```
|
```
|
||||||
|
|
|
@ -59,7 +59,7 @@ def append_cmd_env(cmd, key, value):
|
||||||
|
|
||||||
def distribute_pretrain():
|
def distribute_pretrain():
|
||||||
"""
|
"""
|
||||||
distribute pretrain scripts. The number of D chips can be automatically allocated
|
distribute pretrain scripts. The number of Ascend accelerators can be automatically allocated
|
||||||
based on the device_num set in hccl config file, You don not need to specify that.
|
based on the device_num set in hccl config file, You don not need to specify that.
|
||||||
"""
|
"""
|
||||||
cmd = ""
|
cmd = ""
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# description
|
# description
|
||||||
|
|
||||||
mindspore distributed training launch helper utilty that will generate hccl config file.
|
MindSpore distributed training launch helper utilty that will generate hccl config file.
|
||||||
|
|
||||||
# use
|
# use
|
||||||
|
|
||||||
|
@ -14,4 +14,4 @@ hccl_[device_num]p_[which device]_[server_ip].json
|
||||||
```
|
```
|
||||||
|
|
||||||
# Note
|
# Note
|
||||||
Please note that the D chips used must be continuous, such [0,4) means to use four chips 0,1,2,3; [0,1) means to use chip 0; The first four chips are a group, and the last four chips are a group. In addition to the [0,8) chips are allowed, other cross-group such as [3,6) are prohibited.
|
Please note that the Ascend accelerators used must be continuous, such [0,4) means to use four chips 0,1,2,3; [0,1) means to use chip 0; The first four chips are a group, and the last four chips are a group. In addition to the [0,8) chips are allowed, other cross-group such as [3,6) are prohibited.
|
||||||
|
|
|
@ -37,7 +37,7 @@ def parse_args():
|
||||||
"helper utilty that will generate hccl"
|
"helper utilty that will generate hccl"
|
||||||
" config file")
|
" config file")
|
||||||
parser.add_argument("--device_num", type=str, default="[0,8)",
|
parser.add_argument("--device_num", type=str, default="[0,8)",
|
||||||
help="The number of the D chip used. please note that the D chips"
|
help="The number of the Ascend accelerators used. please note that the Ascend accelerators"
|
||||||
"used must be continuous, such [0,4) means to use four chips "
|
"used must be continuous, such [0,4) means to use four chips "
|
||||||
"0,1,2,3; [0,1) means to use chip 0; The first four chips are"
|
"0,1,2,3; [0,1) means to use chip 0; The first four chips are"
|
||||||
"a group, and the last four chips are a group. In addition to"
|
"a group, and the last four chips are a group. In addition to"
|
||||||
|
|
Loading…
Reference in New Issue