!34671 Fix Ge Interface for AllReduce fusion

Merge pull request !34671 from huangxinjing/fix_ge_error
This commit is contained in:
i-robot 2022-05-26 11:34:42 +00:00 committed by Gitee
commit 9cdb283781
No known key found for this signature in database
GPG Key ID: 173E9B9CA92EEF8F
4 changed files with 8 additions and 5 deletions

View File

@ -3,7 +3,7 @@
具有两层线性层的多层感知器并在最终输出上使用Dropout。第一个线性层将输入维度从hidden_size投影到ffn_hidden_size并在中间应用激活层。第二个线性层将该维度从ffn_hidden_size投影到hidden_size。配置parallel_config之后
第一个线性层的权重将在输入维度上被分片,第二个线性层在输出维度上进行切分。总体过程如下
.. math:
.. math::
Dropout((xW_1+b_1)W_2 + b_2))
其中 :math:`W_1, W_2, b_1`:math:`b_2` 为可训练参数。

View File

@ -5,7 +5,7 @@
.. math::
MultiHeadAttention(query, key, vector) = Dropout(Concat(head_1, \dots, head_h)W^O)
其中, `head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)` 。注意:输出层的投影计算中带有偏置参数。
其中, :math:`head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)` 。注意:输出层的投影计算中带有偏置参数。
如果query tensor、key tensor和value tensor相同则上述即为自注意力机制的计算过程。

View File

@ -44,5 +44,5 @@
- **output** (Tensor) - 如果只有编码器则表示编码器层的输出logit。shape为[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size]。如果有编码器和解码器则输出来自于解码器层。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
- **encoder_layer_present** (Tuple) - 大小为num_layers的元组其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组。
- **decoder_layer_present** (Tuple) - 大小为num_layers的元组其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor的元组或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器返回值将为None。
- **decoder_layer_present** (Tuple) - 大小为num_layers的元组其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器返回值将为None。
- **accum_loss** (Tensor) - 表示一个辅助损失来最小化路由到每个专家的数据部分的均方且仅仅在专家数大于1时才会返回。

View File

@ -34,7 +34,6 @@ def _init_allreduce_operators(length, split_indices, group=GlobalComm.WORLD_COMM
if indices >= length:
logger.warning(f"AllReduce's split index {indices} is greater than or equal to"
f"the total gradient's number of {length}")
fusion_type = 2 ** 10
split = 0
fusion = ()
@ -50,7 +49,11 @@ def _init_allreduce_operators(length, split_indices, group=GlobalComm.WORLD_COMM
op_list = ()
for i in range(length):
op = AllReduce('sum', group)
op.add_prim_attr('fusion', fusion[i])
op_fusion_id = fusion[i]
# When running in ge and enabled all_reduce_fusion_config, hccl will check the allreduce' fusion id to be -1
if context.get_context("enable_ge") and context.get_auto_parallel_context("all_reduce_fusion_config"):
op_fusion_id = -1
op.add_prim_attr('fusion', op_fusion_id)
op.add_prim_attr('index', index[i])
op_list = op_list + (op,)
return op_list