add accum_loss for api

This commit is contained in:
b00518648 2022-03-31 01:30:04 +08:00
parent 96681e20b7
commit dbc8ba0abc
2 changed files with 5 additions and 2 deletions

View File

@ -40,8 +40,9 @@
**输出:**
Tuple表示包含(`output`, `encoder_layer_present`, `encoder_layer_present`)的元组。
Tuple表示包含(`output`, `encoder_layer_present`, `encoder_layer_present`, `accum_loss`)的元组。
- **output** (Tensor) - 如果只有编码器则表示编码器层的输出logit。shape为[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size]。如果有编码器和解码器则输出来自于解码器层。shape为[batch, tgt_seq_length, hidden_size]或[batch * tgt_seq_length, hidden_size]。
- **encoder_layer_present** (Tuple) - 大小为num_layers的元组其中每个元组都是shape为((batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的自注意力中的投影key向量和value向量的tensor的元组。
- **decoder_layer_present** (Tuple) - 大小为num_layers的元组其中每个元组都是shape为((batch_size, num_heads, size_per_head, tgt_seq_length)或(batch_size, num_heads, tgt_seq_length, size_per_head))的self attention中的投影key向量和value向量的tensor的元组或者是shape为(batch_size, num_heads, size_per_head, src_seq_length)或(batch_size, num_heads, src_seq_length, size_per_head))的交叉注意力中的投影key向量和value向量的tensor的元组。如果未设置解码器返回值将为None。
- **accum_loss** (Tensor) - 表示一个辅助损失来最小化路由到每个专家的数据部分的均方且仅仅在专家数大于1时才会返回。

View File

@ -2335,7 +2335,7 @@ class Transformer(Cell):
Used for incremental prediction when the use_past is True. Default None.
Outputs:
Tuple, a tuple contains(`output`, `encoder_layer_present`, `decoder_layer_present`)
Tuple, a tuple contains(`output`, `encoder_layer_present`, `decoder_layer_present`, `accum_loss`)
- **output** (Tensor) - If there is only encoder, the output logit of the encoder layer. The shape is
[batch, src_seq_length, hidden_size] or [batch * src_seq_length, hidden_size], if there are encoder and
@ -2351,6 +2351,8 @@ class Transformer(Cell):
(batch_size, num_heads, size_per_head, src_seq_length),
(batch_size, num_heads, src_seq_length, size_per_head)). If the decoder is not set, the
returned value will be None.
- **accum_loss** (Tensor) - A Tensor indicates an auxiliary loss to minimize the mean square of the data
part routed to each expert, and only returned if the number of experts is greater than 1.
Supported Platforms:
``Ascend`` ``GPU``