forked from mindspore-Ecosystem/mindspore
add accum_loss for api
This commit is contained in:
parent
96681e20b7
commit
dbc8ba0abc
|
@ -40,8 +40,9 @@
|
||||||
|
|
||||||
**输出:**
|
**输出:**
|
||||||
|
|
||||||
Tuple,表示包含(`output`, `encoder_layer_present`, `encoder_layer_present`)的元组。
|
Tuple,表示包含(`output`, `encoder_layer_present`, `encoder_layer_present`, `accum_loss`)的元组。
|
||||||
|
|
||||||
- **output** (Tensor) - 如果只有编码器,则表示编码器层的输出logit。shape为[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size]。如果有编码器和解码器,则输出来自于解码器层。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
|
- **output** (Tensor) - 如果只有编码器,则表示编码器层的输出logit。shape为[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size]。如果有编码器和解码器,则输出来自于解码器层。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
|
||||||
- **encoder_layer_present** (Tuple) - 大小为num_layers的元组,其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组。
|
- **encoder_layer_present** (Tuple) - 大小为num_layers的元组,其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组。
|
||||||
- **decoder_layer_present** (Tuple) - 大小为num_layers的元组,其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor的元组,或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器,返回值将为None。
|
- **decoder_layer_present** (Tuple) - 大小为num_layers的元组,其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor的元组,或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器,返回值将为None。
|
||||||
|
- **accum_loss** (Tensor) - 表示一个辅助损失来最小化路由到每个专家的数据部分的均方,且仅仅在专家数大于1时才会返回。
|
||||||
|
|
|
@ -2335,7 +2335,7 @@ class Transformer(Cell):
|
||||||
Used for incremental prediction when the use_past is True. Default None.
|
Used for incremental prediction when the use_past is True. Default None.
|
||||||
|
|
||||||
Outputs:
|
Outputs:
|
||||||
Tuple, a tuple contains(`output`, `encoder_layer_present`, `decoder_layer_present`)
|
Tuple, a tuple contains(`output`, `encoder_layer_present`, `decoder_layer_present`, `accum_loss`)
|
||||||
|
|
||||||
- **output** (Tensor) - If there is only encoder, the output logit of the encoder layer. The shape is
|
- **output** (Tensor) - If there is only encoder, the output logit of the encoder layer. The shape is
|
||||||
[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size], if there are encoder and
|
[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size], if there are encoder and
|
||||||
|
@ -2351,6 +2351,8 @@ class Transformer(Cell):
|
||||||
(batch_size, num_heads, size_per_head, src_seq_length),
|
(batch_size, num_heads, size_per_head, src_seq_length),
|
||||||
(batch_size, num_heads, src_seq_length, size_per_head)). If the decoder is not set, the
|
(batch_size, num_heads, src_seq_length, size_per_head)). If the decoder is not set, the
|
||||||
returned value will be None.
|
returned value will be None.
|
||||||
|
- **accum_loss** (Tensor) - A Tensor indicates an auxiliary loss to minimize the mean square of the data
|
||||||
|
part routed to each expert, and only returned if the number of experts is greater than 1.
|
||||||
|
|
||||||
Supported Platforms:
|
Supported Platforms:
|
||||||
``Ascend`` ``GPU``
|
``Ascend`` ``GPU``
|
||||||
|
|
Loading…
Reference in New Issue