forked from mindspore-Ecosystem/mindspore
!11592 add chn/eng switch for transformer readme
From: @yuchaojie Reviewed-by: @c_34,@liangchenghui Signed-off-by: @c_34
This commit is contained in:
commit
4fef53a4a2
|
@ -1,6 +1,8 @@
|
|||
# Contents
|
||||
|
||||
- [Transfomer Description](#transformer-description)
|
||||
[查看中文](./README_CN.md)
|
||||
|
||||
- [Transformer Description](#transformer-description)
|
||||
- [Model Architecture](#model-architecture)
|
||||
- [Dataset](#dataset)
|
||||
- [Environment Requirements](#environment-requirements)
|
||||
|
@ -18,7 +20,7 @@
|
|||
- [Description of Random Situation](#description-of-random-situation)
|
||||
- [ModelZoo Homepage](#modelzoo-homepage)
|
||||
|
||||
## [Transfomer Description](#contents)
|
||||
## [Transformer Description](#contents)
|
||||
|
||||
Transformer was proposed in 2017 and designed to process sequential data. It is adopted mainly in the field of natural language processing(NLP), for tasks like machine translation or text summarization. Unlike traditional recurrent neural network(RNN) which processes data in order, Transformer adopts attention mechanism and improve the parallelism, therefore reduced training times and made training on larger datasets possible. Since Transformer model was introduced, it has been used to tackle many problems in NLP and derives many network models, such as BERT(Bidirectional Encoder Representations from Transformers) and GPT(Generative Pre-trained Transformer).
|
||||
|
||||
|
@ -103,7 +105,7 @@ usage: train.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [
|
|||
[--data_path DATA_PATH] [--bucket_boundaries BUCKET_LENGTH]
|
||||
|
||||
options:
|
||||
--distribute pre_training by serveral devices: "true"(training by more than 1 device) | "false", default is "false"
|
||||
--distribute pre_training by several devices: "true"(training by more than 1 device) | "false", default is "false"
|
||||
--epoch_size epoch size: N, default is 52
|
||||
--device_num number of used devices: N, default is 1
|
||||
--device_id device id: N, default is 0
|
||||
|
@ -204,7 +206,7 @@ Parameters for learning rate:
|
|||
sh scripts/run_distribute_train_ascend.sh DEVICE_NUM EPOCH_SIZE DATA_PATH RANK_TABLE_FILE
|
||||
```
|
||||
|
||||
**Attention**: data sink mode can not be used in transformer since the input datas have different sequence lengths.
|
||||
**Attention**: data sink mode can not be used in transformer since the input data have different sequence lengths.
|
||||
|
||||
## [Evaluation Process](#contents)
|
||||
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
# 目录
|
||||
|
||||
[view English](./README.md)
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [目录](#目录)
|
||||
- [Transfomer 概述](#transfomer-概述)
|
||||
- [Transformer 概述](#transfomer-概述)
|
||||
- [模型架构](#模型架构)
|
||||
- [数据集](#数据集)
|
||||
- [环境要求](#环境要求)
|
||||
|
@ -26,7 +28,7 @@
|
|||
|
||||
<!-- /TOC -->
|
||||
|
||||
## Transfomer 概述
|
||||
## Transformer 概述
|
||||
|
||||
Transformer于2017年提出,用于处理序列数据。Transformer主要应用于自然语言处理(NLP)领域,如机器翻译或文本摘要等任务。不同于传统的循环神经网络按次序处理数据,Transformer采用注意力机制,提高并行,减少训练次数,从而实现在较大数据集上训练。自Transformer模型引入以来,许多NLP中出现的问题得以解决,衍生出众多网络模型,比如BERT(多层双向transformer编码器)和GPT(生成式预训练transformers) 。
|
||||
|
||||
|
@ -109,7 +111,7 @@ usage: train.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [
|
|||
[--data_path DATA_PATH] [--bucket_boundaries BUCKET_LENGTH]
|
||||
|
||||
options:
|
||||
--distribute pre_training by serveral devices: "true"(training by more than 1 device) | "false", default is "false"
|
||||
--distribute pre_training by several devices: "true"(training by more than 1 device) | "false", default is "false"
|
||||
--epoch_size epoch size: N, default is 52
|
||||
--device_num number of used devices: N, default is 1
|
||||
--device_id device id: N, default is 0
|
||||
|
|
Loading…
Reference in New Issue