diff --git a/docs/api/api_python/nn/mindspore.nn.MicroBatchInterleaved.rst b/docs/api/api_python/nn/mindspore.nn.MicroBatchInterleaved.rst new file mode 100644 index 00000000000..a979045b7a4 --- /dev/null +++ b/docs/api/api_python/nn/mindspore.nn.MicroBatchInterleaved.rst @@ -0,0 +1,20 @@ +mindspore.nn.MicroBatchInterleaved +================================== + +.. py:class:: mindspore.nn.MicroBatchInterleaved(network, interleave_num=2) + + 这个函数的作用是将输入在第0维度拆成 `interleave_num` 份,然后执行包裹的cell的计算。 + 使用场景:当在半自动模式以及网络中存在模型并行时,第1份的切片数据的前向计算同时,第2份的数据将会进行模型并行的通信,以此来达到通信计算并发的性能加速。 + + .. note:: + 传入的 `network` 的输出只能是单个Tensor。 + + 参数: + - **network** (Cell) - 需要封装的网络。 + - **interleave_num** (int) - batch size的拆分份数,默认值为2。 + + 输入: + tuple[Tensor],与传入的 `network` 的输入一致。 + + 输出: + 传入的network的输出。 diff --git a/docs/api/api_python/ops/mindspore.ops.ReduceOp.rst b/docs/api/api_python/ops/mindspore.ops.ReduceOp.rst new file mode 100644 index 00000000000..fad9a4fc0fe --- /dev/null +++ b/docs/api/api_python/ops/mindspore.ops.ReduceOp.rst @@ -0,0 +1,25 @@ +mindspore.ops.ReduceOp +====================== + +.. py:class:: mindspore.ops.ReduceOp + + 规约张量的操作选项。这是枚举类型,而不是运算符。 + + 主要调用方法如下: + + - SUM:ReduceOp.SUM. + - MAX:ReduceOp.MAX. + - MIN:ReduceOp.MIN. + - PROD:ReduceOp.PROD. + + .. note:: + 有关更多信息,请参阅示例。这需要在具有多个加速卡的环境中运行。 + 在运行以下示例之前,用户需要预设环境变量。请参考官方网站 `MindSpore \ + `_ 。 + + 有四种操作选项,"SUM"、"MAX"、"MIN"和"PROD"。 + + - SUM:求和。 + - MAX:求最大值。 + - MIN:求最小值。 + - PROD:求乘积。 diff --git a/docs/api/api_python/ops/mindspore.ops.ReduceScatter.rst b/docs/api/api_python/ops/mindspore.ops.ReduceScatter.rst new file mode 100644 index 00000000000..3e21aaf1923 --- /dev/null +++ b/docs/api/api_python/ops/mindspore.ops.ReduceScatter.rst @@ -0,0 +1,19 @@ +mindspore.ops.ReduceScatter +=========================== + +.. py:class:: mindspore.ops.ReduceScatter(op=ReduceOp.SUM, group=GlobalComm.WORLD_COMM_GROUP) + + 规约并且分发指定通信组中的张量。 + + .. note:: + 在集合的所有过程中,Tensor必须具有相同的shape和格式。 + 在运行以下示例之前,用户需要预设环境变量。请参考 + `MindSpore官方网站 `_。 + + 参数: + - **op** (str) - 指定用于元素的规约操作,如SUM和MAX。默认值:ReduceOp.SUM。 + - **group** (str) - 要处理的通信组。默认值:"GlobalComm.WORLD_COMM_group"。 + + 异常: + - **TypeError** - 如果 `op` 和 `group` 不是字符串。 + - **ValueError** - 如果输入的第一个维度不能被rank size整除。rank size是指通信组通信的卡数。 diff --git a/mindspore/python/mindspore/nn/wrap/cell_wrapper.py b/mindspore/python/mindspore/nn/wrap/cell_wrapper.py index 4626f2387a2..97b96f784fb 100644 --- a/mindspore/python/mindspore/nn/wrap/cell_wrapper.py +++ b/mindspore/python/mindspore/nn/wrap/cell_wrapper.py @@ -503,18 +503,31 @@ class _MicroBatch(Cell): class MicroBatchInterleaved(Cell): """ - Wrap the network with Batch Size. + This function splits the input at the 0th into interleave_num pieces and then performs + the computation of the wrapped cell. Application scenario: When there is model parallelism in semi-automatic mode + and network, if the first slice data is calculating forward, the second slice data will execute the + communication operators at the same time, to achieve the performance acceleration of communication and computing + concurrency. + + Note: + The output of the input network must be a single tensor. Args: network (Cell): The target network to wrap. interleave_num (int): split num of batch size. Default: 2. + Inputs: + tuple[Tensor]. It's the same with the input of the `network` . + + Outputs: + Tensor. The output of the input `network` . + Supported Platforms: ``Ascend`` ``GPU`` Examples: >>> net = Net() - >>> net = MicroBatchInterleaved(net, 4) + >>> net = MicroBatchInterleaved(net, 2) """ def __init__(self, network, interleave_num=2): super(MicroBatchInterleaved, self).__init__(auto_prefix=False) diff --git a/mindspore/python/mindspore/ops/operations/comm_ops.py b/mindspore/python/mindspore/ops/operations/comm_ops.py index 4cd33bd2501..58bbe960bf9 100644 --- a/mindspore/python/mindspore/ops/operations/comm_ops.py +++ b/mindspore/python/mindspore/ops/operations/comm_ops.py @@ -31,7 +31,6 @@ from mindspore.common.api import context class ReduceOp: """ Operation options for reducing tensors. This is an enumerated type, not an operator. - Mainly used in data parallel mode. The main calling methods are as follows: @@ -395,7 +394,6 @@ class ReduceScatter(PrimitiveWithInfer): Reduces and scatters tensors from the specified communication group. Note: - The back propagation of the op is not supported yet. Stay tuned for more. The tensors must have the same shape and format in all processes of the collection. The user needs to preset communication environment variables before running the following example, please check the details on the official website of `MindSpore \ @@ -403,12 +401,13 @@ class ReduceScatter(PrimitiveWithInfer): Args: op (str): Specifies an operation used for element-wise reductions, - like SUM, MAX, AVG. Default: ReduceOp.SUM. + like SUM and MAX. Default: ReduceOp.SUM. group (str): The communication group to work on. Default: "GlobalComm.WORLD_COMM_GROUP". Raises: TypeError: If any of operation and group is not a string. - ValueError: If the first dimension of the input cannot be divided by the rank size. + ValueError: If the first dimension of the input cannot be divided by the rank size. Rank size refers to the + number of cards in the communication group. Supported Platforms: ``Ascend`` ``GPU``